DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [dpdk-dev] [PATCH v11 08/18] lib: add symbol versioning to distributor
  2017-03-20 10:08  2%                   ` [dpdk-dev] [PATCH v11 08/18] lib: add symbol versioning to distributor David Hunt
@ 2017-03-27 13:02  3%                     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-03-27 13:02 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson

2017-03-20 10:08, David Hunt:
> Also bumped up the ABI version number in the Makefile

It would be good to explain the intent of versioning here.

> Signed-off-by: David Hunt <david.hunt@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |  2 +-
>  lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>  lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_v20.c       | 10 +++
>  lib/librte_distributor/rte_distributor_version.map | 14 ++++
>  5 files changed, 162 insertions(+), 10 deletions(-)
>  create mode 100644 lib/librte_distributor/rte_distributor_v1705.h
> 
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 2b28eff..2f05cf3 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
>  
>  EXPORT_MAP := rte_distributor_version.map
>  
> -LIBABIVER := 1
> +LIBABIVER := 2

Why keeping ABI compat if you bump ABIVER?

I guess you do not really want to bump now.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 03/14] ring: eliminate duplication of size and mask fields
  @ 2017-03-27 10:13  3%         ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-27 10:13 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: olivier.matz, dev, jerin.jacob

On Mon, Mar 27, 2017 at 11:52:58AM +0200, Thomas Monjalon wrote:
> 2017-03-24 17:09, Bruce Richardson:
> > The size and mask fields are duplicated in both the producer and
> > consumer data structures. Move them out of that into the top level
> > structure so they are not duplicated.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Sorry Bruce, I encounter this error:
> 
> fatal error: no member named 'size' in 'struct rte_ring_headtail'
>                 if (r->prod.size >= ring_size) {
>                     ~~~~~~~ ^
>
Hi Thomas,

again I need more information here. I've just revalidated these first
three patches doing 7 builds with each one (gcc, clang, debug, shared 
library, old ABI, default-machine and 32-bit), as well as compiling the
apps for gcc and clang, and I see no errors.

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline personalization profile processing
  2017-03-25 21:03  3%         ` Chilikin, Andrey
@ 2017-03-27  2:09  0%           ` Xing, Beilei
  0 siblings, 0 replies; 200+ results
From: Xing, Beilei @ 2017-03-27  2:09 UTC (permalink / raw)
  To: Chilikin, Andrey, Wu, Jingjing; +Cc: Zhang, Helin, dev

Hi Andrey,

> -----Original Message-----
> From: Chilikin, Andrey
> Sent: Sunday, March 26, 2017 5:04 AM
> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline
> personalization profile processing
> 
> Hi Beilei
> 
> > -----Original Message-----
> > From: Xing, Beilei
> > Sent: Saturday, March 25, 2017 4:04 AM
> > To: Chilikin, Andrey <andrey.chilikin@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>
> > Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline
> > personalization profile processing
> >
> > Hi Andrey,
> >
> > > -----Original Message-----
> > > From: Chilikin, Andrey
> > > Sent: Friday, March 24, 2017 10:53 PM
> > > To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > > <jingjing.wu@intel.com>
> > > Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline
> > > personalization profile processing
> > >
> > > Hi Beilei,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Beilei Xing
> > > > Sent: Friday, March 24, 2017 10:19 AM
> > > > To: Wu, Jingjing <jingjing.wu@intel.com>
> > > > Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline
> > > > personalization profile processing
> > > >
> > > > Add support for adding a pipeline personalization profile package.
> > > >
> > > > Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> > > > ---
> > > >  app/test-pmd/cmdline.c                    |   1 +
> > > >  drivers/net/i40e/i40e_ethdev.c            | 198
> > > > ++++++++++++++++++++++++++++++
> > > >  drivers/net/i40e/rte_pmd_i40e.h           |  51 ++++++++
> > > >  drivers/net/i40e/rte_pmd_i40e_version.map |   6 +
> > > >  4 files changed, 256 insertions(+)
> > > >
> > > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > > > 47f935d..6e0625d 100644
> > > > --- a/app/test-pmd/cmdline.c
> > > > +++ b/app/test-pmd/cmdline.c
> > > > @@ -37,6 +37,7 @@
> > > >  #include <stdio.h>
> > > >  #include <stdint.h>
> > > >  #include <stdarg.h>
> > > > +#include <stdbool.h>
> > > >  #include <string.h>
> > > >  #include <termios.h>
> > > >  #include <unistd.h>
> > > > diff --git a/drivers/net/i40e/i40e_ethdev.c
> > > > b/drivers/net/i40e/i40e_ethdev.c index 3702214..bea593f 100644
> > > > --- a/drivers/net/i40e/i40e_ethdev.c
> > > > +++ b/drivers/net/i40e/i40e_ethdev.c
> > > > @@ -11259,3 +11259,201 @@ rte_pmd_i40e_reset_vf_stats(uint8_t
> > > port,
> > > >
> > > >  	return 0;
> > > >  }
> > > > +
> > > > +static void
> > > > +i40e_generate_profile_info_sec(char *name, struct
> > > > +i40e_ppp_version
> > > > *version,
> > > > +			       uint32_t track_id, uint8_t *profile_info_sec,
> > > > +			       bool add)
> > > > +{
> > > > +	struct i40e_profile_section_header *sec = NULL;
> > > > +	struct i40e_profile_info *pinfo;
> > > > +
> > > > +	sec = (struct i40e_profile_section_header *)profile_info_sec;
> > > > +	sec->tbl_size = 1;
> > > > +	sec->data_end = sizeof(struct i40e_profile_section_header) +
> > > > +		sizeof(struct i40e_profile_info);
> > > > +	sec->section.type = SECTION_TYPE_INFO;
> > > > +	sec->section.offset = sizeof(struct i40e_profile_section_header);
> > > > +	sec->section.size = sizeof(struct i40e_profile_info);
> > > > +	pinfo = (struct i40e_profile_info *)(profile_info_sec +
> > > > +					     sec->section.offset);
> > > > +	pinfo->track_id = track_id;
> > > > +	memcpy(pinfo->name, name, I40E_PPP_NAME_SIZE);
> > > > +	memcpy(&pinfo->version, version, sizeof(struct i40e_ppp_version));
> > > > +	if (add)
> > > > +		pinfo->op = I40E_PPP_ADD_TRACKID;
> > > > +	else
> > > > +		pinfo->op = I40E_PPP_REMOVE_TRACKID; }
> > > > +
> > > > +static enum i40e_status_code
> > > > +i40e_add_rm_profile_info(struct i40e_hw *hw, uint8_t
> > > > +*profile_info_sec) {
> > > > +	enum i40e_status_code status = I40E_SUCCESS;
> > > > +	struct i40e_profile_section_header *sec;
> > > > +	uint32_t track_id;
> > > > +	uint32_t offset = 0, info = 0;
> > > > +
> > > > +	sec = (struct i40e_profile_section_header *)profile_info_sec;
> > > > +	track_id = ((struct i40e_profile_info *)(profile_info_sec +
> > > > +					 sec->section.offset))->track_id;
> > > > +
> > > > +	status = i40e_aq_write_ppp(hw, (void *)sec, sec->data_end,
> > > > +				   track_id, &offset, &info, NULL);
> > > > +	if (status)
> > > > +		PMD_DRV_LOG(ERR, "Failed to add/remove profile info: "
> > > > +			    "offset %d, info %d",
> > > > +			    offset, info);
> > > > +
> > > > +	return status;
> > > > +}
> > > > +
> > > > +#define I40E_PROFILE_INFO_SIZE 48 #define
> I40E_MAX_PROFILE_NUM 16
> > > > +
> > > > +/* Check if the profile info exists */ static int
> > > > +i40e_check_profile_info(uint8_t port, uint8_t *profile_info_sec) {
> > > > +	struct rte_eth_dev *dev = &rte_eth_devices[port];
> > > > +	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> > > > >dev_private);
> > > > +	uint8_t *buff;
> > > > +	struct rte_pmd_i40e_profile_list *p_list;
> > > > +	struct rte_pmd_i40e_profile_info *pinfo, *p;
> > > > +	uint32_t i;
> > > > +	int ret;
> > > > +
> > > > +	buff = rte_zmalloc("pinfo_list",
> > > > +			   (I40E_PROFILE_INFO_SIZE *
> > > > I40E_MAX_PROFILE_NUM + 4),
> > > > +			   0);
> > > > +	if (!buff) {
> > > > +		PMD_DRV_LOG(ERR, "failed to allocate memory");
> > > > +		return -1;
> > > > +	}
> > > > +
> > > > +	ret = i40e_aq_get_ppp_list(hw, (void *)buff,
> > > > +		      (I40E_PROFILE_INFO_SIZE * I40E_MAX_PROFILE_NUM +
> > > > 4),
> > > > +		      0, NULL);
> > > > +	if (ret) {
> > > > +		PMD_DRV_LOG(ERR, "Failed to get profile info list.");
> > > > +		rte_free(buff);
> > > > +		return -1;
> > > > +	}
> > > > +	p_list = (struct rte_pmd_i40e_profile_list *)buff;
> > > > +	pinfo = (struct rte_pmd_i40e_profile_info *)(profile_info_sec +
> > > > +			     sizeof(struct i40e_profile_section_header));
> > > > +	for (i = 0; i < p_list->p_count; i++) {
> > > > +		p = &p_list->p_info[i];
> > > > +		if ((pinfo->track_id == p->track_id) &&
> > > > +		    !memcmp(&pinfo->version, &p->version,
> > > > +			    sizeof(struct i40e_ppp_version)) &&
> > > > +		    !memcmp(&pinfo->name, &p->name,
> > > > +			    I40E_PPP_NAME_SIZE)) {
> > > > +			PMD_DRV_LOG(INFO, "Profile exists.");
> > > > +			rte_free(buff);
> > > > +			return 1;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	rte_free(buff);
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +int
> > > > +rte_pmd_i40e_process_ppp_package(uint8_t port, uint8_t *buff,
> > > > +				 uint32_t size, bool add)
> > >
> > > To make this function future-proof it is better not to use 'bool add'
> > > as there are at least three possible processing actions:
> > >
> > > 1. Process package and add it to the list of applied profiles
> > > (applying new
> > > personalization) 2. Process package and remove it from the list
> > > (restoring original configuration) 3. Process package and to not
> > > update the list (update already applied personalization)
> >
> > Thanks for your comments, I considered your suggestion before, but
> > it's a private API for user, I think there's only "add" and "remove"
> > for users, user shouldn't want to know if the profile needs to be
> > updated in the info list. So I think the first and the third items
> > above should be distinguished by driver instead of application. Driver
> > can distinguish them when driver parses the package by checking if track_id
> is 0.
> > What do you think?
> >
> > In my opinion, the first item means add a profile, the second item
> > means remove a profile, the third means add a read-only profile.
> > Please correct me if I'm wrong. In fact we only support the first in
> > this release, the second and third items above need to be supported after
> this release.
> 
> You already have "Action not supported temporarily" message now, so it will
> not break anything but will save users updating their apps ABI in future.
> In addition to read-only profiles, option 3 can be used for profiles which call
> generic admin commands, and only first call is needed to be registered. In
> this case TrackId will not be 0 and driver cannot used it directly.
> This is very minor change and do not see why not declare parameters
> properly from the first release if support for all three options is needed
> anyway.

OK, misunderstood for option 3, I thought there was only different track_id between option 1 and option 3.
Will update in next version, thanks for explanation.

> >
> > Beilei
> >
> > >
> > > > +{
> > > > +	struct rte_eth_dev *dev;
> > > > +	struct i40e_hw *hw;
> > > > +	struct i40e_package_header *pkg_hdr;
> > > > +	struct i40e_generic_seg_header *profile_seg_hdr;
> > > > +	struct i40e_generic_seg_header *metadata_seg_hdr;
> > > > +	uint32_t track_id;
> > > > +	uint8_t *profile_info_sec;
> > > > +	int is_exist;
> > > > +	enum i40e_status_code status = I40E_SUCCESS;
> > > > +
> > > > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> > > > +
> > > > +	dev = &rte_eth_devices[port];
> > > > +
> > > > +	if (!is_device_supported(dev, &rte_i40e_pmd))
> > > > +		return -ENOTSUP;
> > > > +
> > > > +	hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > > > +
> > > > +	if (size < (sizeof(struct i40e_package_header) +
> > > > +		    sizeof(struct i40e_metadata_segment) +
> > > > +		    sizeof(uint32_t) * 2)) {
> > > > +		PMD_DRV_LOG(ERR, "Buff is invalid.");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	pkg_hdr = (struct i40e_package_header *)buff;
> > > > +
> > > > +	if (!pkg_hdr) {
> > > > +		PMD_DRV_LOG(ERR, "Failed to fill the package structure");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	if (pkg_hdr->segment_count < 2) {
> > > > +		PMD_DRV_LOG(ERR, "Segment_count should be 2 at least.");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	/* Find metadata segment */
> > > > +	metadata_seg_hdr =
> > > > i40e_find_segment_in_package(SEGMENT_TYPE_METADATA,
> > > > +							pkg_hdr);
> > > > +	if (!metadata_seg_hdr) {
> > > > +		PMD_DRV_LOG(ERR, "Failed to find metadata segment
> > > > header");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +	track_id = ((struct i40e_metadata_segment
> > > > +*)metadata_seg_hdr)->track_id;
> > > > +
> > > > +	/* Find profile segment */
> > > > +	profile_seg_hdr =
> > > > i40e_find_segment_in_package(SEGMENT_TYPE_I40E,
> > > > +						       pkg_hdr);
> > > > +	if (!profile_seg_hdr) {
> > > > +		PMD_DRV_LOG(ERR, "Failed to find profile segment
> header");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	profile_info_sec = rte_zmalloc("i40e_profile_info",
> > > > +			       sizeof(struct i40e_profile_section_header) +
> > > > +			       sizeof(struct i40e_profile_info),
> > > > +			       0);
> > > > +	if (!profile_info_sec) {
> > > > +		PMD_DRV_LOG(ERR, "Failed to allocate memory");
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	if (add) {
> > > > +		/* Check if the profile exists */
> > > > +		i40e_generate_profile_info_sec(
> > > > +		     ((struct i40e_profile_segment *)profile_seg_hdr)->name,
> > > > +		     &((struct i40e_profile_segment *)profile_seg_hdr)-
> > > > >version,
> > > > +		     track_id, profile_info_sec, 1);
> > > > +		is_exist = i40e_check_profile_info(port, profile_info_sec);
> > > > +		if (is_exist) {
> > > > +			PMD_DRV_LOG(ERR, "Profile already exists.");
> > > > +			rte_free(profile_info_sec);
> > > > +			return 1;
> > > > +		}
> > > > +
> > > > +		/* Write profile to HW */
> > > > +		status = i40e_write_profile(hw,
> > > > +				 (struct i40e_profile_segment
> > > > *)profile_seg_hdr,
> > > > +				 track_id);
> > > > +		if (status)
> > > > +			PMD_DRV_LOG(ERR, "Failed to write profile.");
> > > > +
> > > > +		/* Add profile info to info list */
> > > > +		status = i40e_add_rm_profile_info(hw, profile_info_sec);
> > > > +		if (status)
> > > > +			PMD_DRV_LOG(ERR, "Failed to add profile info.");
> > > > +	} else
> > > > +		PMD_DRV_LOG(ERR, "Action not supported temporarily.");
> > > > +
> > > > +	rte_free(profile_info_sec);
> > > > +	return status;
> > > > +}
> > > > diff --git a/drivers/net/i40e/rte_pmd_i40e.h
> > > > b/drivers/net/i40e/rte_pmd_i40e.h index a0ad88c..4f6cdb5 100644
> > > > --- a/drivers/net/i40e/rte_pmd_i40e.h
> > > > +++ b/drivers/net/i40e/rte_pmd_i40e.h
> > > > @@ -65,6 +65,36 @@ struct rte_pmd_i40e_mb_event_param {
> > > >  	uint16_t msglen;   /**< length of the message */
> > > >  };
> > > >
> > > > +#define RTE_PMD_I40E_PPP_NAME_SIZE 32
> > > > +
> > > > +/**
> > > > + * Pipeline personalization profile version  */ struct
> > > > +rte_pmd_i40e_ppp_version {
> > > > +	uint8_t major;
> > > > +	uint8_t minor;
> > > > +	uint8_t update;
> > > > +	uint8_t draft;
> > > > +};
> > > > +
> > > > +/**
> > > > + * Structure of profile information  */ struct
> > > > +rte_pmd_i40e_profile_info {
> > > > +	uint32_t track_id;
> > > > +	struct rte_pmd_i40e_ppp_version version;
> > > > +	uint8_t reserved[8];
> > >
> > > Instead of uint8_t reserved[8] it should be
> > >     uint8_t owner;
> > >     uint8_t reserved[7];
> >
> > Actually I define this for getting profile info but not updating
> > profile info, so I thought there won't be "owner = REMOVE" in info
> > list. please correct me if I'm wrong. In base driver, there's "struct
> > i40e_profile_info " used for updating profile info.
> > All the structures defined in "rte_pmd_i40e.h" are used for getting
> > info by application.
> > Do you think "owner" member is used for user? If yes, I will update
> > it. If no, I think user will be confused by "owner".
> 
> After calling "get profile info list" 'owner' contains PF function which was
> used to execute a profile, not 'operation'.

OK, will update it.

> 
> >
> > >
> > > > +	uint8_t name[RTE_PMD_I40E_PPP_NAME_SIZE]; };
> > > > +
> > > > +/**
> > > > + * Structure of profile information list  */ struct
> > > > +rte_pmd_i40e_profile_list {
> > > > +	uint32_t p_count;
> > > > +	struct rte_pmd_i40e_profile_info p_info[1]; };
> > > > +
> > > >  /**
> > > >   * Notify VF when PF link status changes.
> > > >   *
> > > > @@ -332,4 +362,25 @@ int rte_pmd_i40e_get_vf_stats(uint8_t port,
> > > int
> > > > rte_pmd_i40e_reset_vf_stats(uint8_t port,
> > > >  				uint16_t vf_id);
> > > >
> > > > +/**
> > > > + * Load/Unload a ppp package
> > > > + *
> > > > + * @param port
> > > > + *    The port identifier of the Ethernet device.
> > > > + * @param buff
> > > > + *    buffer of package.
> > > > + * @param size
> > > > + *    size of buffer.
> > > > + * @param add
> > > > + *   - (1) write profile.
> > > > + *   - (0) remove profile.
> > > > + * @return
> > > > + *   - (0) if successful.
> > > > + *   - (-ENODEV) if *port* invalid.
> > > > + *   - (-EINVAL) if bad parameter.
> > > > + *   - (1) if profile exists.
> > > > + */
> > > > +int rte_pmd_i40e_process_ppp_package(uint8_t port, uint8_t *buff,
> > > > +				     uint32_t size, bool add);
> > > > +
> > > >  #endif /* _PMD_I40E_H_ */
> > > > diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map
> > > > b/drivers/net/i40e/rte_pmd_i40e_version.map
> > > > index 7a5d211..01c4a90 100644
> > > > --- a/drivers/net/i40e/rte_pmd_i40e_version.map
> > > > +++ b/drivers/net/i40e/rte_pmd_i40e_version.map
> > > > @@ -22,3 +22,9 @@ DPDK_17.02 {
> > > >  	rte_pmd_i40e_set_vf_vlan_tag;
> > > >
> > > >  } DPDK_2.0;
> > > > +
> > > > +DPDK_17.05 {
> > > > +	global:
> > > > +
> > > > +	rte_pmd_i40e_process_ppp_package; };
> > > > --
> > > > 2.5.5
> > >
> > > Regards,
> > > Andrey

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline personalization profile processing
  @ 2017-03-25 21:03  3%         ` Chilikin, Andrey
  2017-03-27  2:09  0%           ` Xing, Beilei
  0 siblings, 1 reply; 200+ results
From: Chilikin, Andrey @ 2017-03-25 21:03 UTC (permalink / raw)
  To: Xing, Beilei, Wu, Jingjing; +Cc: Zhang, Helin, dev

Hi Beilei

> -----Original Message-----
> From: Xing, Beilei
> Sent: Saturday, March 25, 2017 4:04 AM
> To: Chilikin, Andrey <andrey.chilikin@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline personalization
> profile processing
> 
> Hi Andrey,
> 
> > -----Original Message-----
> > From: Chilikin, Andrey
> > Sent: Friday, March 24, 2017 10:53 PM
> > To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>
> > Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline
> > personalization profile processing
> >
> > Hi Beilei,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Beilei Xing
> > > Sent: Friday, March 24, 2017 10:19 AM
> > > To: Wu, Jingjing <jingjing.wu@intel.com>
> > > Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline
> > > personalization profile processing
> > >
> > > Add support for adding a pipeline personalization profile package.
> > >
> > > Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> > > ---
> > >  app/test-pmd/cmdline.c                    |   1 +
> > >  drivers/net/i40e/i40e_ethdev.c            | 198
> > > ++++++++++++++++++++++++++++++
> > >  drivers/net/i40e/rte_pmd_i40e.h           |  51 ++++++++
> > >  drivers/net/i40e/rte_pmd_i40e_version.map |   6 +
> > >  4 files changed, 256 insertions(+)
> > >
> > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > > 47f935d..6e0625d 100644
> > > --- a/app/test-pmd/cmdline.c
> > > +++ b/app/test-pmd/cmdline.c
> > > @@ -37,6 +37,7 @@
> > >  #include <stdio.h>
> > >  #include <stdint.h>
> > >  #include <stdarg.h>
> > > +#include <stdbool.h>
> > >  #include <string.h>
> > >  #include <termios.h>
> > >  #include <unistd.h>
> > > diff --git a/drivers/net/i40e/i40e_ethdev.c
> > > b/drivers/net/i40e/i40e_ethdev.c index 3702214..bea593f 100644
> > > --- a/drivers/net/i40e/i40e_ethdev.c
> > > +++ b/drivers/net/i40e/i40e_ethdev.c
> > > @@ -11259,3 +11259,201 @@ rte_pmd_i40e_reset_vf_stats(uint8_t
> > port,
> > >
> > >  	return 0;
> > >  }
> > > +
> > > +static void
> > > +i40e_generate_profile_info_sec(char *name, struct i40e_ppp_version
> > > *version,
> > > +			       uint32_t track_id, uint8_t *profile_info_sec,
> > > +			       bool add)
> > > +{
> > > +	struct i40e_profile_section_header *sec = NULL;
> > > +	struct i40e_profile_info *pinfo;
> > > +
> > > +	sec = (struct i40e_profile_section_header *)profile_info_sec;
> > > +	sec->tbl_size = 1;
> > > +	sec->data_end = sizeof(struct i40e_profile_section_header) +
> > > +		sizeof(struct i40e_profile_info);
> > > +	sec->section.type = SECTION_TYPE_INFO;
> > > +	sec->section.offset = sizeof(struct i40e_profile_section_header);
> > > +	sec->section.size = sizeof(struct i40e_profile_info);
> > > +	pinfo = (struct i40e_profile_info *)(profile_info_sec +
> > > +					     sec->section.offset);
> > > +	pinfo->track_id = track_id;
> > > +	memcpy(pinfo->name, name, I40E_PPP_NAME_SIZE);
> > > +	memcpy(&pinfo->version, version, sizeof(struct i40e_ppp_version));
> > > +	if (add)
> > > +		pinfo->op = I40E_PPP_ADD_TRACKID;
> > > +	else
> > > +		pinfo->op = I40E_PPP_REMOVE_TRACKID; }
> > > +
> > > +static enum i40e_status_code
> > > +i40e_add_rm_profile_info(struct i40e_hw *hw, uint8_t
> > > +*profile_info_sec) {
> > > +	enum i40e_status_code status = I40E_SUCCESS;
> > > +	struct i40e_profile_section_header *sec;
> > > +	uint32_t track_id;
> > > +	uint32_t offset = 0, info = 0;
> > > +
> > > +	sec = (struct i40e_profile_section_header *)profile_info_sec;
> > > +	track_id = ((struct i40e_profile_info *)(profile_info_sec +
> > > +					 sec->section.offset))->track_id;
> > > +
> > > +	status = i40e_aq_write_ppp(hw, (void *)sec, sec->data_end,
> > > +				   track_id, &offset, &info, NULL);
> > > +	if (status)
> > > +		PMD_DRV_LOG(ERR, "Failed to add/remove profile info: "
> > > +			    "offset %d, info %d",
> > > +			    offset, info);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +#define I40E_PROFILE_INFO_SIZE 48
> > > +#define I40E_MAX_PROFILE_NUM 16
> > > +
> > > +/* Check if the profile info exists */ static int
> > > +i40e_check_profile_info(uint8_t port, uint8_t *profile_info_sec) {
> > > +	struct rte_eth_dev *dev = &rte_eth_devices[port];
> > > +	struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data-
> > > >dev_private);
> > > +	uint8_t *buff;
> > > +	struct rte_pmd_i40e_profile_list *p_list;
> > > +	struct rte_pmd_i40e_profile_info *pinfo, *p;
> > > +	uint32_t i;
> > > +	int ret;
> > > +
> > > +	buff = rte_zmalloc("pinfo_list",
> > > +			   (I40E_PROFILE_INFO_SIZE *
> > > I40E_MAX_PROFILE_NUM + 4),
> > > +			   0);
> > > +	if (!buff) {
> > > +		PMD_DRV_LOG(ERR, "failed to allocate memory");
> > > +		return -1;
> > > +	}
> > > +
> > > +	ret = i40e_aq_get_ppp_list(hw, (void *)buff,
> > > +		      (I40E_PROFILE_INFO_SIZE * I40E_MAX_PROFILE_NUM +
> > > 4),
> > > +		      0, NULL);
> > > +	if (ret) {
> > > +		PMD_DRV_LOG(ERR, "Failed to get profile info list.");
> > > +		rte_free(buff);
> > > +		return -1;
> > > +	}
> > > +	p_list = (struct rte_pmd_i40e_profile_list *)buff;
> > > +	pinfo = (struct rte_pmd_i40e_profile_info *)(profile_info_sec +
> > > +			     sizeof(struct i40e_profile_section_header));
> > > +	for (i = 0; i < p_list->p_count; i++) {
> > > +		p = &p_list->p_info[i];
> > > +		if ((pinfo->track_id == p->track_id) &&
> > > +		    !memcmp(&pinfo->version, &p->version,
> > > +			    sizeof(struct i40e_ppp_version)) &&
> > > +		    !memcmp(&pinfo->name, &p->name,
> > > +			    I40E_PPP_NAME_SIZE)) {
> > > +			PMD_DRV_LOG(INFO, "Profile exists.");
> > > +			rte_free(buff);
> > > +			return 1;
> > > +		}
> > > +	}
> > > +
> > > +	rte_free(buff);
> > > +	return 0;
> > > +}
> > > +
> > > +int
> > > +rte_pmd_i40e_process_ppp_package(uint8_t port, uint8_t *buff,
> > > +				 uint32_t size, bool add)
> >
> > To make this function future-proof it is better not to use 'bool add'
> > as there are at least three possible processing actions:
> >
> > 1. Process package and add it to the list of applied profiles
> > (applying new
> > personalization) 2. Process package and remove it from the list
> > (restoring original configuration) 3. Process package and to not
> > update the list (update already applied personalization)
> 
> Thanks for your comments, I considered your suggestion before, but it's a
> private API for user, I think there's only "add" and "remove" for users, user
> shouldn't want to know if the profile needs to be updated in the info list. So I
> think the first and the third items above should be distinguished by driver
> instead of application. Driver can distinguish them when driver parses the
> package by checking if track_id is 0.
> What do you think?
> 
> In my opinion, the first item means add a profile, the second item means
> remove a profile, the third means add a read-only profile. Please correct me if
> I'm wrong. In fact we only support the first in this release, the second and third
> items above need to be supported after this release.

You already have "Action not supported temporarily" message now, so it will not break anything but will save users updating their apps ABI in future.
In addition to read-only profiles, option 3 can be used for profiles which call generic admin commands, and only first call is needed to be registered. In this case TrackId will not be 0 and driver cannot used it directly.
This is very minor change and do not see why not declare parameters properly from the first release if support for all three options is needed anyway.
> 
> Beilei
> 
> >
> > > +{
> > > +	struct rte_eth_dev *dev;
> > > +	struct i40e_hw *hw;
> > > +	struct i40e_package_header *pkg_hdr;
> > > +	struct i40e_generic_seg_header *profile_seg_hdr;
> > > +	struct i40e_generic_seg_header *metadata_seg_hdr;
> > > +	uint32_t track_id;
> > > +	uint8_t *profile_info_sec;
> > > +	int is_exist;
> > > +	enum i40e_status_code status = I40E_SUCCESS;
> > > +
> > > +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
> > > +
> > > +	dev = &rte_eth_devices[port];
> > > +
> > > +	if (!is_device_supported(dev, &rte_i40e_pmd))
> > > +		return -ENOTSUP;
> > > +
> > > +	hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > > +
> > > +	if (size < (sizeof(struct i40e_package_header) +
> > > +		    sizeof(struct i40e_metadata_segment) +
> > > +		    sizeof(uint32_t) * 2)) {
> > > +		PMD_DRV_LOG(ERR, "Buff is invalid.");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	pkg_hdr = (struct i40e_package_header *)buff;
> > > +
> > > +	if (!pkg_hdr) {
> > > +		PMD_DRV_LOG(ERR, "Failed to fill the package structure");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (pkg_hdr->segment_count < 2) {
> > > +		PMD_DRV_LOG(ERR, "Segment_count should be 2 at least.");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	/* Find metadata segment */
> > > +	metadata_seg_hdr =
> > > i40e_find_segment_in_package(SEGMENT_TYPE_METADATA,
> > > +							pkg_hdr);
> > > +	if (!metadata_seg_hdr) {
> > > +		PMD_DRV_LOG(ERR, "Failed to find metadata segment
> > > header");
> > > +		return -EINVAL;
> > > +	}
> > > +	track_id = ((struct i40e_metadata_segment
> > > +*)metadata_seg_hdr)->track_id;
> > > +
> > > +	/* Find profile segment */
> > > +	profile_seg_hdr =
> > > i40e_find_segment_in_package(SEGMENT_TYPE_I40E,
> > > +						       pkg_hdr);
> > > +	if (!profile_seg_hdr) {
> > > +		PMD_DRV_LOG(ERR, "Failed to find profile segment header");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	profile_info_sec = rte_zmalloc("i40e_profile_info",
> > > +			       sizeof(struct i40e_profile_section_header) +
> > > +			       sizeof(struct i40e_profile_info),
> > > +			       0);
> > > +	if (!profile_info_sec) {
> > > +		PMD_DRV_LOG(ERR, "Failed to allocate memory");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (add) {
> > > +		/* Check if the profile exists */
> > > +		i40e_generate_profile_info_sec(
> > > +		     ((struct i40e_profile_segment *)profile_seg_hdr)->name,
> > > +		     &((struct i40e_profile_segment *)profile_seg_hdr)-
> > > >version,
> > > +		     track_id, profile_info_sec, 1);
> > > +		is_exist = i40e_check_profile_info(port, profile_info_sec);
> > > +		if (is_exist) {
> > > +			PMD_DRV_LOG(ERR, "Profile already exists.");
> > > +			rte_free(profile_info_sec);
> > > +			return 1;
> > > +		}
> > > +
> > > +		/* Write profile to HW */
> > > +		status = i40e_write_profile(hw,
> > > +				 (struct i40e_profile_segment
> > > *)profile_seg_hdr,
> > > +				 track_id);
> > > +		if (status)
> > > +			PMD_DRV_LOG(ERR, "Failed to write profile.");
> > > +
> > > +		/* Add profile info to info list */
> > > +		status = i40e_add_rm_profile_info(hw, profile_info_sec);
> > > +		if (status)
> > > +			PMD_DRV_LOG(ERR, "Failed to add profile info.");
> > > +	} else
> > > +		PMD_DRV_LOG(ERR, "Action not supported temporarily.");
> > > +
> > > +	rte_free(profile_info_sec);
> > > +	return status;
> > > +}
> > > diff --git a/drivers/net/i40e/rte_pmd_i40e.h
> > > b/drivers/net/i40e/rte_pmd_i40e.h index a0ad88c..4f6cdb5 100644
> > > --- a/drivers/net/i40e/rte_pmd_i40e.h
> > > +++ b/drivers/net/i40e/rte_pmd_i40e.h
> > > @@ -65,6 +65,36 @@ struct rte_pmd_i40e_mb_event_param {
> > >  	uint16_t msglen;   /**< length of the message */
> > >  };
> > >
> > > +#define RTE_PMD_I40E_PPP_NAME_SIZE 32
> > > +
> > > +/**
> > > + * Pipeline personalization profile version  */ struct
> > > +rte_pmd_i40e_ppp_version {
> > > +	uint8_t major;
> > > +	uint8_t minor;
> > > +	uint8_t update;
> > > +	uint8_t draft;
> > > +};
> > > +
> > > +/**
> > > + * Structure of profile information  */ struct
> > > +rte_pmd_i40e_profile_info {
> > > +	uint32_t track_id;
> > > +	struct rte_pmd_i40e_ppp_version version;
> > > +	uint8_t reserved[8];
> >
> > Instead of uint8_t reserved[8] it should be
> >     uint8_t owner;
> >     uint8_t reserved[7];
> 
> Actually I define this for getting profile info but not updating profile info, so I
> thought there won't be "owner = REMOVE" in info list. please correct me if I'm
> wrong. In base driver, there's "struct i40e_profile_info " used for updating
> profile info.
> All the structures defined in "rte_pmd_i40e.h" are used for getting info by
> application.
> Do you think "owner" member is used for user? If yes, I will update it. If no, I
> think user will be confused by "owner".

After calling "get profile info list" 'owner' contains PF function which was used to execute a profile, not 'operation'.

> 
> >
> > > +	uint8_t name[RTE_PMD_I40E_PPP_NAME_SIZE]; };
> > > +
> > > +/**
> > > + * Structure of profile information list  */ struct
> > > +rte_pmd_i40e_profile_list {
> > > +	uint32_t p_count;
> > > +	struct rte_pmd_i40e_profile_info p_info[1]; };
> > > +
> > >  /**
> > >   * Notify VF when PF link status changes.
> > >   *
> > > @@ -332,4 +362,25 @@ int rte_pmd_i40e_get_vf_stats(uint8_t port,
> > int
> > > rte_pmd_i40e_reset_vf_stats(uint8_t port,
> > >  				uint16_t vf_id);
> > >
> > > +/**
> > > + * Load/Unload a ppp package
> > > + *
> > > + * @param port
> > > + *    The port identifier of the Ethernet device.
> > > + * @param buff
> > > + *    buffer of package.
> > > + * @param size
> > > + *    size of buffer.
> > > + * @param add
> > > + *   - (1) write profile.
> > > + *   - (0) remove profile.
> > > + * @return
> > > + *   - (0) if successful.
> > > + *   - (-ENODEV) if *port* invalid.
> > > + *   - (-EINVAL) if bad parameter.
> > > + *   - (1) if profile exists.
> > > + */
> > > +int rte_pmd_i40e_process_ppp_package(uint8_t port, uint8_t *buff,
> > > +				     uint32_t size, bool add);
> > > +
> > >  #endif /* _PMD_I40E_H_ */
> > > diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map
> > > b/drivers/net/i40e/rte_pmd_i40e_version.map
> > > index 7a5d211..01c4a90 100644
> > > --- a/drivers/net/i40e/rte_pmd_i40e_version.map
> > > +++ b/drivers/net/i40e/rte_pmd_i40e_version.map
> > > @@ -22,3 +22,9 @@ DPDK_17.02 {
> > >  	rte_pmd_i40e_set_vf_vlan_tag;
> > >
> > >  } DPDK_2.0;
> > > +
> > > +DPDK_17.05 {
> > > +	global:
> > > +
> > > +	rte_pmd_i40e_process_ppp_package;
> > > +};
> > > --
> > > 2.5.5
> >
> > Regards,
> > Andrey

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 09/14] ring: allow dequeue fns to return remaining entry count
                         ` (5 preceding siblings ...)
  2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
@ 2017-03-24 17:10  2%     ` Bruce Richardson
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-24 17:10 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, jerin.jacob, thomas.monjalon, Bruce Richardson

Add an extra parameter to the ring dequeue burst/bulk functions so that
those functions can optionally return the amount of remaining objs in the
ring. This information can be used by applications in a number of ways,
for instance, with single-consumer queues, it provides a max
dequeue size which is guaranteed to work.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 app/pdump/main.c                                   |  2 +-
 doc/guides/rel_notes/release_17_05.rst             |  8 ++
 drivers/crypto/null/null_crypto_pmd.c              |  2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c             |  3 +-
 drivers/net/ring/rte_eth_ring.c                    |  2 +-
 examples/distributor/main.c                        |  2 +-
 examples/load_balancer/runtime.c                   |  6 +-
 .../client_server_mp/mp_client/client.c            |  3 +-
 examples/packet_ordering/main.c                    |  6 +-
 examples/qos_sched/app_thread.c                    |  6 +-
 examples/quota_watermark/qw/main.c                 |  5 +-
 examples/server_node_efd/node/node.c               |  2 +-
 lib/librte_hash/rte_cuckoo_hash.c                  |  3 +-
 lib/librte_mempool/rte_mempool_ring.c              |  4 +-
 lib/librte_port/rte_port_frag.c                    |  3 +-
 lib/librte_port/rte_port_ring.c                    |  6 +-
 lib/librte_ring/rte_ring.h                         | 90 +++++++++++-----------
 test/test-pipeline/runtime.c                       |  6 +-
 test/test/test_link_bonding_mode4.c                |  3 +-
 test/test/test_pmd_ring_perf.c                     |  7 +-
 test/test/test_ring.c                              | 54 ++++++-------
 test/test/test_ring_perf.c                         | 20 +++--
 test/test/test_table_acl.c                         |  2 +-
 test/test/test_table_pipeline.c                    |  2 +-
 test/test/test_table_ports.c                       |  8 +-
 test/test/virtual_pmd.c                            |  4 +-
 26 files changed, 145 insertions(+), 114 deletions(-)

diff --git a/app/pdump/main.c b/app/pdump/main.c
index b88090d..3b13753 100644
--- a/app/pdump/main.c
+++ b/app/pdump/main.c
@@ -496,7 +496,7 @@ pdump_rxtx(struct rte_ring *ring, uint8_t vdev_id, struct pdump_stats *stats)
 
 	/* first dequeue packets from ring of primary process */
 	const uint16_t nb_in_deq = rte_ring_dequeue_burst(ring,
-			(void *)rxtx_bufs, BURST_SIZE);
+			(void *)rxtx_bufs, BURST_SIZE, NULL);
 	stats->dequeue_pkts += nb_in_deq;
 
 	if (nb_in_deq) {
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index dc1749b..f0eeac2 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -133,6 +133,8 @@ API Changes
   * added an extra parameter to the burst/bulk enqueue functions to
     return the number of free spaces in the ring after enqueue. This can
     be used by an application to implement its own watermark functionality.
+  * added an extra parameter to the burst/bulk dequeue functions to return
+    the number elements remaining in the ring after dequeue.
   * changed the return value of the enqueue and dequeue bulk functions to
     match that of the burst equivalents. In all cases, ring functions which
     operate on multiple packets now return the number of elements enqueued
@@ -145,6 +147,12 @@ API Changes
     - ``rte_ring_sc_dequeue_bulk``
     - ``rte_ring_dequeue_bulk``
 
+    NOTE: the above functions all have different parameters as well as
+    different return values, due to the other listed changes above. This
+    means that all instances of the functions in existing code will be
+    flagged by the compiler. The return value usage should be checked
+    while fixing the compiler error due to the extra parameter.
+
 ABI Changes
 -----------
 
diff --git a/drivers/crypto/null/null_crypto_pmd.c b/drivers/crypto/null/null_crypto_pmd.c
index ed5a9fc..f68ec8d 100644
--- a/drivers/crypto/null/null_crypto_pmd.c
+++ b/drivers/crypto/null/null_crypto_pmd.c
@@ -155,7 +155,7 @@ null_crypto_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
 	unsigned nb_dequeued;
 
 	nb_dequeued = rte_ring_dequeue_burst(qp->processed_pkts,
-			(void **)ops, nb_ops);
+			(void **)ops, nb_ops, NULL);
 	qp->qp_stats.dequeued_count += nb_dequeued;
 
 	return nb_dequeued;
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index f3ac9e2..96638af 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1008,7 +1008,8 @@ bond_ethdev_tx_burst_8023ad(void *queue, struct rte_mbuf **bufs,
 		struct port *port = &mode_8023ad_ports[slaves[i]];
 
 		slave_slow_nb_pkts[i] = rte_ring_dequeue_burst(port->tx_ring,
-				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS);
+				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS,
+				NULL);
 		slave_nb_pkts[i] = slave_slow_nb_pkts[i];
 
 		for (j = 0; j < slave_slow_nb_pkts[i]; j++)
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index adbf478..77ef3a1 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -88,7 +88,7 @@ eth_ring_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
 	void **ptrs = (void *)&bufs[0];
 	struct ring_queue *r = q;
 	const uint16_t nb_rx = (uint16_t)rte_ring_dequeue_burst(r->rng,
-			ptrs, nb_bufs);
+			ptrs, nb_bufs, NULL);
 	if (r->rng->flags & RING_F_SC_DEQ)
 		r->rx_pkts.cnt += nb_rx;
 	else
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index bb84f13..90c9613 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -330,7 +330,7 @@ lcore_tx(struct rte_ring *in_r)
 
 			struct rte_mbuf *bufs[BURST_SIZE];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE, NULL);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 1645994..8192c08 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -349,7 +349,8 @@ app_lcore_io_tx(
 			ret = rte_ring_sc_dequeue_bulk(
 				ring,
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
-				bsz_rd);
+				bsz_rd,
+				NULL);
 
 			if (unlikely(ret == 0))
 				continue;
@@ -504,7 +505,8 @@ app_lcore_worker(
 		ret = rte_ring_sc_dequeue_bulk(
 			ring_in,
 			(void **) lp->mbuf_in.array,
-			bsz_rd);
+			bsz_rd,
+			NULL);
 
 		if (unlikely(ret == 0))
 			continue;
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index dca9eb9..01b535c 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -279,7 +279,8 @@ main(int argc, char *argv[])
 		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts,
+				PKT_READ_SIZE, NULL);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index 569b6da..49ae35b 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -462,7 +462,7 @@ worker_thread(void *args_ptr)
 
 		/* dequeue the mbufs from rx_to_workers ring */
 		burst_size = rte_ring_dequeue_burst(ring_in,
-				(void *)burst_buffer, MAX_PKTS_BURST);
+				(void *)burst_buffer, MAX_PKTS_BURST, NULL);
 		if (unlikely(burst_size == 0))
 			continue;
 
@@ -510,7 +510,7 @@ send_thread(struct send_thread_args *args)
 
 		/* deque the mbufs from workers_to_tx ring */
 		nb_dq_mbufs = rte_ring_dequeue_burst(args->ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(nb_dq_mbufs == 0))
 			continue;
@@ -595,7 +595,7 @@ tx_thread(struct rte_ring *ring_in)
 
 		/* deque the mbufs from workers_to_tx ring */
 		dqnum = rte_ring_dequeue_burst(ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(dqnum == 0))
 			continue;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 0c81a15..15f117f 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -179,7 +179,7 @@ app_tx_thread(struct thread_conf **confs)
 
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
-					burst_conf.qos_dequeue);
+					burst_conf.qos_dequeue, NULL);
 		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
@@ -218,7 +218,7 @@ app_worker_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
@@ -254,7 +254,7 @@ app_mixed_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
diff --git a/examples/quota_watermark/qw/main.c b/examples/quota_watermark/qw/main.c
index 57df8ef..2dcddea 100644
--- a/examples/quota_watermark/qw/main.c
+++ b/examples/quota_watermark/qw/main.c
@@ -247,7 +247,8 @@ pipeline_stage(__attribute__((unused)) void *args)
 			}
 
 			/* Dequeue up to quota mbuf from rx */
-			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota);
+			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts,
+					*quota, NULL);
 			if (unlikely(nb_dq_pkts < 0))
 				continue;
 
@@ -305,7 +306,7 @@ send_stage(__attribute__((unused)) void *args)
 
 			/* Dequeue packets from tx and send them */
 			nb_dq_pkts = (uint16_t) rte_ring_dequeue_burst(tx,
-					(void *) tx_pkts, *quota);
+					(void *) tx_pkts, *quota, NULL);
 			rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
 
 			/* TODO: Check if nb_dq_pkts == nb_tx_pkts? */
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index 9ec6a05..f780b92 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) == 0))
+					rx_pkts, NULL) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 6552199..645c0cf 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -536,7 +536,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
 			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
-					cached_free_slots->objs, LCORE_CACHE_SIZE);
+					cached_free_slots->objs,
+					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0)
 				return -ENOSPC;
 
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index 9b8fd2b..5c132bf 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -58,14 +58,14 @@ static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_mc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_sc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_port/rte_port_frag.c b/lib/librte_port/rte_port_frag.c
index 0fcace9..320407e 100644
--- a/lib/librte_port/rte_port_frag.c
+++ b/lib/librte_port/rte_port_frag.c
@@ -186,7 +186,8 @@ rte_port_ring_reader_frag_rx(void *port,
 		/* If "pkts" buffer is empty, read packet burst from ring */
 		if (p->n_pkts == 0) {
 			p->n_pkts = rte_ring_sc_dequeue_burst(p->ring,
-				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX);
+				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX,
+				NULL);
 			RTE_PORT_RING_READER_FRAG_STATS_PKTS_IN_ADD(p, p->n_pkts);
 			if (p->n_pkts == 0)
 				return n_pkts_out;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index c5dbe07..85fad44 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -111,7 +111,8 @@ rte_port_ring_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts)
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
@@ -124,7 +125,8 @@ rte_port_ring_multi_reader_rx(void *port, struct rte_mbuf **pkts,
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 61a4dc8..b05fecb 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -488,7 +488,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -497,11 +498,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	unsigned int i;
 	uint32_t mask = r->mask;
 
-	/* Avoid the unnecessary cmpset operation below, which is also
-	 * potentially harmful when n equals 0. */
-	if (n == 0)
-		return 0;
-
 	/* move cons.head atomically */
 	do {
 		/* Restore n as it may change every loop */
@@ -516,15 +512,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		entries = (prod_tail - cons_head);
 
 		/* Set the actual entries for dequeue */
-		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED)
-				return 0;
-			else {
-				if (unlikely(entries == 0))
-					return 0;
-				n = entries;
-			}
-		}
+		if (n > entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+		if (unlikely(n == 0))
+			goto end;
 
 		cons_next = cons_head + n;
 		success = rte_atomic32_cmpset(&r->cons.head, cons_head,
@@ -543,7 +535,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		rte_pause();
 
 	r->cons.tail = cons_next;
-
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -567,7 +561,8 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  */
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -582,15 +577,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * and size(ring)-1. */
 	entries = prod_tail - cons_head;
 
-	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED)
-			return 0;
-		else {
-			if (unlikely(entries == 0))
-				return 0;
-			n = entries;
-		}
-	}
+	if (n > entries)
+		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+	if (unlikely(entries == 0))
+		goto end;
 
 	cons_next = cons_head + n;
 	r->cons.head = cons_next;
@@ -600,6 +591,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -746,9 +740,11 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -765,9 +761,11 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -787,12 +785,13 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
+		unsigned int *available)
 {
 	if (r->cons.single)
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
 }
 
 /**
@@ -813,7 +812,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -831,7 +830,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -853,7 +852,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -1043,9 +1042,11 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1063,9 +1064,11 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1085,12 +1088,13 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - Number of objects dequeued
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
 	if (r->cons.single)
-		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
 }
 
 #ifdef __cplusplus
diff --git a/test/test-pipeline/runtime.c b/test/test-pipeline/runtime.c
index c06ff54..8970e1c 100644
--- a/test/test-pipeline/runtime.c
+++ b/test/test-pipeline/runtime.c
@@ -121,7 +121,8 @@ app_main_loop_worker(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_rx[i],
 			(void **) worker_mbuf->array,
-			app.burst_size_worker_read);
+			app.burst_size_worker_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
@@ -151,7 +152,8 @@ app_main_loop_tx(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_tx[i],
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
-			app.burst_size_tx_read);
+			app.burst_size_tx_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
diff --git a/test/test/test_link_bonding_mode4.c b/test/test/test_link_bonding_mode4.c
index 8df28b4..15091b1 100644
--- a/test/test/test_link_bonding_mode4.c
+++ b/test/test/test_link_bonding_mode4.c
@@ -193,7 +193,8 @@ static uint8_t lacpdu_rx_count[RTE_MAX_ETHPORTS] = {0, };
 static int
 slave_get_pkts(struct slave_conf *slave, struct rte_mbuf **buf, uint16_t size)
 {
-	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf, size);
+	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf,
+			size, NULL);
 }
 
 /*
diff --git a/test/test/test_pmd_ring_perf.c b/test/test/test_pmd_ring_perf.c
index 045a7f2..004882a 100644
--- a/test/test/test_pmd_ring_perf.c
+++ b/test/test/test_pmd_ring_perf.c
@@ -67,7 +67,7 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t eth_start = rte_rdtsc();
@@ -99,7 +99,7 @@ test_single_enqueue_dequeue(void)
 	rte_compiler_barrier();
 	for (i = 0; i < iterations; i++) {
 		rte_ring_enqueue_bulk(r, &burst, 1, NULL);
-		rte_ring_dequeue_bulk(r, &burst, 1);
+		rte_ring_dequeue_bulk(r, &burst, 1, NULL);
 	}
 	const uint64_t sc_end = rte_rdtsc_precise();
 	rte_compiler_barrier();
@@ -133,7 +133,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, (void *)burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, (void *)burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, (void *)burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index b0ca88b..858ebc1 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -119,7 +119,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		    __func__, i, rand);
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
 				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
+				NULL) == rand);
 
 		/* fill the ring */
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
@@ -129,7 +130,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
+				NULL) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -186,19 +188,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -232,19 +234,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -265,7 +267,7 @@ test_ring_basic(void)
 		cur_src += MAX_BULK;
 		if (ret == 0)
 			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if (ret == 0)
 			goto fail;
@@ -303,13 +305,13 @@ test_ring_basic(void)
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue2\n");
@@ -390,19 +392,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1) ;
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -451,19 +453,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -505,19 +507,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -539,7 +541,7 @@ test_ring_burst_basic(void)
 		cur_src += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
@@ -578,19 +580,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -613,7 +615,7 @@ test_ring_burst_basic(void)
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret != 2)
 		goto fail;
@@ -753,7 +755,7 @@ test_ring_basic_ex(void)
 		goto fail_test;
 	}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2);
+	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
 	if (ret != 2) {
 		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
 		goto fail_test;
diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c
index f95a8e9..ed89896 100644
--- a/test/test/test_ring_perf.c
+++ b/test/test/test_ring_perf.c
@@ -152,12 +152,12 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t mc_end = rte_rdtsc();
 
 	printf("SC empty dequeue: %.2F\n",
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
@@ -325,7 +325,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -333,7 +334,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
@@ -361,7 +363,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -369,7 +372,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
diff --git a/test/test/test_table_acl.c b/test/test/test_table_acl.c
index b3bfda4..4d43be7 100644
--- a/test/test/test_table_acl.c
+++ b/test/test/test_table_acl.c
@@ -713,7 +713,7 @@ test_pipeline_single_filter(int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0) {
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/test/test/test_table_pipeline.c b/test/test/test_table_pipeline.c
index 36bfeda..b58aa5d 100644
--- a/test/test/test_table_pipeline.c
+++ b/test/test/test_table_pipeline.c
@@ -494,7 +494,7 @@ test_pipeline_single_filter(int test_type, int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0)
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/test/test/test_table_ports.c b/test/test/test_table_ports.c
index 395f4f3..39592ce 100644
--- a/test/test/test_table_ports.c
+++ b/test/test/test_table_ports.c
@@ -163,7 +163,7 @@ test_port_ring_writer(void)
 	rte_port_ring_writer_ops.f_flush(port);
 	expected_pkts = 1;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -7;
@@ -178,7 +178,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -193,7 +193,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -208,7 +208,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -9;
diff --git a/test/test/virtual_pmd.c b/test/test/virtual_pmd.c
index 39e070c..b209355 100644
--- a/test/test/virtual_pmd.c
+++ b/test/test/virtual_pmd.c
@@ -342,7 +342,7 @@ virtual_ethdev_rx_burst_success(void *queue __rte_unused,
 	dev_private = vrtl_eth_dev->data->dev_private;
 
 	rx_count = rte_ring_dequeue_burst(dev_private->rx_queue, (void **) bufs,
-			nb_pkts);
+			nb_pkts, NULL);
 
 	/* increments ipackets count */
 	dev_private->eth_stats.ipackets += rx_count;
@@ -508,7 +508,7 @@ virtual_ethdev_get_mbufs_from_tx_queue(uint8_t port_id,
 
 	dev_private = vrtl_eth_dev->data->dev_private;
 	return rte_ring_dequeue_burst(dev_private->tx_queue, (void **)pkt_burst,
-		burst_length);
+		burst_length, NULL);
 }
 
 static uint8_t
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 07/14] ring: make bulk and burst fn return vals consistent
                         ` (4 preceding siblings ...)
  2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 06/14] ring: remove watermark support Bruce Richardson
@ 2017-03-24 17:10  2%     ` Bruce Richardson
  2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-24 17:10 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, jerin.jacob, thomas.monjalon, Bruce Richardson

The bulk fns for rings returns 0 for all elements enqueued and negative
for no space. Change that to make them consistent with the burst functions
in returning the number of elements enqueued/dequeued, i.e. 0 or N.
This change also allows the return value from enq/deq to be used directly
without a branch for error checking.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/release_17_05.rst             |  11 +++
 doc/guides/sample_app_ug/server_node_efd.rst       |   2 +-
 examples/load_balancer/runtime.c                   |  16 ++-
 .../client_server_mp/mp_client/client.c            |   8 +-
 .../client_server_mp/mp_server/main.c              |   2 +-
 examples/qos_sched/app_thread.c                    |   8 +-
 examples/server_node_efd/node/node.c               |   2 +-
 examples/server_node_efd/server/main.c             |   2 +-
 lib/librte_mempool/rte_mempool_ring.c              |  12 ++-
 lib/librte_ring/rte_ring.h                         | 109 +++++++--------------
 test/test-pipeline/pipeline_hash.c                 |   2 +-
 test/test-pipeline/runtime.c                       |   8 +-
 test/test/test_ring.c                              |  46 +++++----
 test/test/test_ring_perf.c                         |   8 +-
 14 files changed, 106 insertions(+), 130 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index af907b8..a465c69 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -130,6 +130,17 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
   * removed the function ``rte_ring_set_water_mark`` as part of a general
     removal of watermarks support in the library.
+  * changed the return value of the enqueue and dequeue bulk functions to
+    match that of the burst equivalents. In all cases, ring functions which
+    operate on multiple packets now return the number of elements enqueued
+    or dequeued, as appropriate. The updated functions are:
+
+    - ``rte_ring_mp_enqueue_bulk``
+    - ``rte_ring_sp_enqueue_bulk``
+    - ``rte_ring_enqueue_bulk``
+    - ``rte_ring_mc_dequeue_bulk``
+    - ``rte_ring_sc_dequeue_bulk``
+    - ``rte_ring_dequeue_bulk``
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/server_node_efd.rst b/doc/guides/sample_app_ug/server_node_efd.rst
index 9b69cfe..e3a63c8 100644
--- a/doc/guides/sample_app_ug/server_node_efd.rst
+++ b/doc/guides/sample_app_ug/server_node_efd.rst
@@ -286,7 +286,7 @@ repeated infinitely.
 
         cl = &nodes[node];
         if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-                cl_rx_buf[node].count) != 0){
+                cl_rx_buf[node].count) != cl_rx_buf[node].count){
             for (j = 0; j < cl_rx_buf[node].count; j++)
                 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
             cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 6944325..82b10bc 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -146,7 +146,7 @@ app_lcore_io_rx_buffer_to_send (
 		(void **) lp->rx.mbuf_out[worker].array,
 		bsz);
 
-	if (unlikely(ret == -ENOBUFS)) {
+	if (unlikely(ret == 0)) {
 		uint32_t k;
 		for (k = 0; k < bsz; k ++) {
 			struct rte_mbuf *m = lp->rx.mbuf_out[worker].array[k];
@@ -312,7 +312,7 @@ app_lcore_io_rx_flush(struct app_lcore_params_io *lp, uint32_t n_workers)
 			(void **) lp->rx.mbuf_out[worker].array,
 			lp->rx.mbuf_out[worker].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->rx.mbuf_out[worker].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->rx.mbuf_out[worker].array[k];
@@ -349,9 +349,8 @@ app_lcore_io_tx(
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
 				bsz_rd);
 
-			if (unlikely(ret == -ENOENT)) {
+			if (unlikely(ret == 0))
 				continue;
-			}
 
 			n_mbufs += bsz_rd;
 
@@ -505,9 +504,8 @@ app_lcore_worker(
 			(void **) lp->mbuf_in.array,
 			bsz_rd);
 
-		if (unlikely(ret == -ENOENT)) {
+		if (unlikely(ret == 0))
 			continue;
-		}
 
 #if APP_WORKER_DROP_ALL_PACKETS
 		for (j = 0; j < bsz_rd; j ++) {
@@ -559,7 +557,7 @@ app_lcore_worker(
 
 #if APP_STATS
 			lp->rings_out_iters[port] ++;
-			if (ret == 0) {
+			if (ret > 0) {
 				lp->rings_out_count[port] += 1;
 			}
 			if (lp->rings_out_iters[port] == APP_STATS){
@@ -572,7 +570,7 @@ app_lcore_worker(
 			}
 #endif
 
-			if (unlikely(ret == -ENOBUFS)) {
+			if (unlikely(ret == 0)) {
 				uint32_t k;
 				for (k = 0; k < bsz_wr; k ++) {
 					struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
@@ -609,7 +607,7 @@ app_lcore_worker_flush(struct app_lcore_params_worker *lp)
 			(void **) lp->mbuf_out[port].array,
 			lp->mbuf_out[port].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->mbuf_out[port].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index d4f9ca3..dca9eb9 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -276,14 +276,10 @@ main(int argc, char *argv[])
 	printf("[Press Ctrl-C to quit ...]\n");
 
 	for (;;) {
-		uint16_t i, rx_pkts = PKT_READ_SIZE;
+		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		/* try dequeuing max possible packets first, if that fails, get the
-		 * most we can. Loop body should only execute once, maximum */
-		while (rx_pkts > 0 &&
-				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts, rx_pkts) != 0))
-			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring), PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/multi_process/client_server_mp/mp_server/main.c b/examples/multi_process/client_server_mp/mp_server/main.c
index a6dc12d..19c95b2 100644
--- a/examples/multi_process/client_server_mp/mp_server/main.c
+++ b/examples/multi_process/client_server_mp/mp_server/main.c
@@ -227,7 +227,7 @@ flush_rx_queue(uint16_t client)
 
 	cl = &clients[client];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[client].buffer,
-			cl_rx_buf[client].count) != 0){
+			cl_rx_buf[client].count) == 0){
 		for (j = 0; j < cl_rx_buf[client].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[client].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[client].count;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 70fdcdb..dab4594 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -107,7 +107,7 @@ app_rx_thread(struct thread_conf **confs)
 			}
 
 			if (unlikely(rte_ring_sp_enqueue_bulk(conf->rx_ring,
-								(void **)rx_mbufs, nb_rx) != 0)) {
+					(void **)rx_mbufs, nb_rx) == 0)) {
 				for(i = 0; i < nb_rx; i++) {
 					rte_pktmbuf_free(rx_mbufs[i]);
 
@@ -180,7 +180,7 @@ app_tx_thread(struct thread_conf **confs)
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
 					burst_conf.qos_dequeue);
-		if (likely(retval == 0)) {
+		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
 			conf->counter = 0; /* reset empty read loop counter */
@@ -230,7 +230,9 @@ app_worker_thread(struct thread_conf **confs)
 		nb_pkt = rte_sched_port_dequeue(conf->sched_port, mbufs,
 					burst_conf.qos_dequeue);
 		if (likely(nb_pkt > 0))
-			while (rte_ring_sp_enqueue_bulk(conf->tx_ring, (void **)mbufs, nb_pkt) != 0);
+			while (rte_ring_sp_enqueue_bulk(conf->tx_ring,
+					(void **)mbufs, nb_pkt) == 0)
+				; /* empty body */
 
 		conf_idx++;
 		if (confs[conf_idx] == NULL)
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index a6c0c70..9ec6a05 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) != 0))
+					rx_pkts) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/examples/server_node_efd/server/main.c b/examples/server_node_efd/server/main.c
index 1a54d1b..3eb7fac 100644
--- a/examples/server_node_efd/server/main.c
+++ b/examples/server_node_efd/server/main.c
@@ -247,7 +247,7 @@ flush_rx_queue(uint16_t node)
 
 	cl = &nodes[node];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-			cl_rx_buf[node].count) != 0){
+			cl_rx_buf[node].count) != cl_rx_buf[node].count){
 		for (j = 0; j < cl_rx_buf[node].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index b9aa64d..409b860 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -42,26 +42,30 @@ static int
 common_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_mp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_sp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_mc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_sc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 906e8ae..34b438c 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -349,14 +349,10 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -388,7 +384,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOBUFS;
+				return 0;
 			else {
 				/* No free entry available */
 				if (unlikely(free_entries == 0))
@@ -414,7 +410,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -430,14 +426,10 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -457,7 +449,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOBUFS;
+			return 0;
 		else {
 			/* No free entry available */
 			if (unlikely(free_entries == 0))
@@ -474,7 +466,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -495,16 +487,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
 
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -536,7 +523,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOENT;
+				return 0;
 			else {
 				if (unlikely(entries == 0))
 					return 0;
@@ -562,7 +549,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	r->cons.tail = cons_next;
 
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -580,15 +567,10 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -607,7 +589,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	if (n > entries) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOENT;
+			return 0;
 		else {
 			if (unlikely(entries == 0))
 				return 0;
@@ -623,7 +605,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -639,10 +621,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -659,10 +640,9 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -683,10 +663,9 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned n)
 {
@@ -713,7 +692,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 static inline int __attribute__((always_inline))
 rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_mp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -730,7 +709,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_sp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -751,10 +730,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_enqueue(struct rte_ring *r, void *obj)
 {
-	if (r->prod.single)
-		return rte_ring_sp_enqueue(r, obj);
-	else
-		return rte_ring_mp_enqueue(r, obj);
+	return rte_ring_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -770,11 +746,9 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -791,11 +765,9 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects to dequeue from the ring to the obj_table,
  *   must be strictly positive.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -815,11 +787,9 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	if (r->cons.single)
@@ -846,7 +816,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -864,7 +834,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -886,10 +856,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	if (r->cons.single)
-		return rte_ring_sc_dequeue(r, obj_p);
-	else
-		return rte_ring_mc_dequeue(r, obj_p);
+	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
diff --git a/test/test-pipeline/pipeline_hash.c b/test/test-pipeline/pipeline_hash.c
index 10d2869..1ac0aa8 100644
--- a/test/test-pipeline/pipeline_hash.c
+++ b/test/test-pipeline/pipeline_hash.c
@@ -547,6 +547,6 @@ app_main_loop_rx_metadata(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
diff --git a/test/test-pipeline/runtime.c b/test/test-pipeline/runtime.c
index 42a6142..4e20669 100644
--- a/test/test-pipeline/runtime.c
+++ b/test/test-pipeline/runtime.c
@@ -98,7 +98,7 @@ app_main_loop_rx(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -123,7 +123,7 @@ app_main_loop_worker(void) {
 			(void **) worker_mbuf->array,
 			app.burst_size_worker_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		do {
@@ -131,7 +131,7 @@ app_main_loop_worker(void) {
 				app.rings_tx[i ^ 1],
 				(void **) worker_mbuf->array,
 				app.burst_size_worker_write);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -152,7 +152,7 @@ app_main_loop_tx(void) {
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
 			app.burst_size_tx_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		n_mbufs += app.burst_size_tx_read;
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 666a451..112433b 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -117,20 +117,18 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
 		printf("%s: iteration %u, random shift: %u;\n",
 		    __func__, i, rand);
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rand));
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rand));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand) != 0);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
 
 		/* fill the ring */
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rsz));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz) != 0);
 		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
 		TEST_RING_VERIFY(rsz == rte_ring_count(r));
 		TEST_RING_VERIFY(rte_ring_full(r));
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rsz));
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -171,37 +169,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -217,37 +215,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -264,11 +262,11 @@ test_ring_basic(void)
 	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
 		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 		cur_src += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 		cur_dst += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 	}
 
@@ -294,25 +292,25 @@ test_ring_basic(void)
 
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue2\n");
 		goto fail;
 	}
diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c
index 320c20c..8ccbdef 100644
--- a/test/test/test_ring_perf.c
+++ b/test/test/test_ring_perf.c
@@ -195,13 +195,13 @@ enqueue_bulk(void *p)
 
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_sp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_mp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mp_end = rte_rdtsc();
 
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 06/14] ring: remove watermark support
                         ` (3 preceding siblings ...)
  2017-03-24 17:09  4%     ` [dpdk-dev] [PATCH v3 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
@ 2017-03-24 17:10  2%     ` Bruce Richardson
  2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
  2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-24 17:10 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, jerin.jacob, thomas.monjalon, Bruce Richardson

Remove the watermark support. A future commit will add support for having
enqueue functions return the amount of free space in the ring, which will
allow applications to implement their own watermark checks, while also
being more useful to the app.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
V2: fix missed references to watermarks in v1
---
 doc/guides/prog_guide/ring_lib.rst     |   8 --
 doc/guides/rel_notes/release_17_05.rst |   2 +
 examples/Makefile                      |   2 +-
 lib/librte_ring/rte_ring.c             |  23 -----
 lib/librte_ring/rte_ring.h             |  58 +------------
 test/test/autotest_test_funcs.py       |   7 --
 test/test/commands.c                   |  52 ------------
 test/test/test_ring.c                  | 149 +--------------------------------
 8 files changed, 8 insertions(+), 293 deletions(-)

diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index d4ab502..b31ab7a 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -102,14 +102,6 @@ Name
 A ring is identified by a unique name.
 It is not possible to create two rings with the same name (rte_ring_create() returns NULL if this is attempted).
 
-Water Marking
-~~~~~~~~~~~~~
-
-The ring can have a high water mark (threshold).
-Once an enqueue operation reaches the high water mark, the producer is notified, if the water mark is configured.
-
-This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE.
-
 Use Cases
 ---------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 556869f..af907b8 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -128,6 +128,8 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
+  * removed the function ``rte_ring_set_water_mark`` as part of a general
+    removal of watermarks support in the library.
 
 ABI Changes
 -----------
diff --git a/examples/Makefile b/examples/Makefile
index da2bfdd..19cd5ad 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -81,7 +81,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += packet_ordering
 DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += qos_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += qos_sched
-DIRS-y += quota_watermark
+#DIRS-y += quota_watermark
 DIRS-$(CONFIG_RTE_ETHDEV_RXTX_CALLBACKS) += rxtx_callbacks
 DIRS-y += skeleton
 ifeq ($(CONFIG_RTE_LIBRTE_HASH),y)
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 934ce87..25f64f0 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -138,7 +138,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->watermark = count;
 	r->prod.single = !!(flags & RING_F_SP_ENQ);
 	r->cons.single = !!(flags & RING_F_SC_DEQ);
 	r->size = count;
@@ -256,24 +255,6 @@ rte_ring_free(struct rte_ring *r)
 	rte_free(te);
 }
 
-/*
- * change the high water mark. If *count* is 0, water marking is
- * disabled
- */
-int
-rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
-{
-	if (count >= r->size)
-		return -EINVAL;
-
-	/* if count is 0, disable the watermarking */
-	if (count == 0)
-		count = r->size;
-
-	r->watermark = count;
-	return 0;
-}
-
 /* dump the status of the ring on the console */
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
@@ -287,10 +268,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->watermark == r->size)
-		fprintf(f, "  watermark=0\n");
-	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index f8ac7f5..906e8ae 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -153,7 +153,6 @@ struct rte_ring {
 			/**< Memzone, if any, containing the rte_ring */
 	uint32_t size;           /**< Size of ring. */
 	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_headtail prod __rte_aligned(PROD_ALIGN);
@@ -168,7 +167,6 @@ struct rte_ring {
 
 #define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
 #define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
@@ -274,26 +272,6 @@ struct rte_ring *rte_ring_create(const char *name, unsigned count,
 void rte_ring_free(struct rte_ring *r);
 
 /**
- * Change the high water mark.
- *
- * If *count* is 0, water marking is disabled. Otherwise, it is set to the
- * *count* value. The *count* value must be greater than 0 and less
- * than the ring size.
- *
- * This function can be called at any time (not necessarily at
- * initialization).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param count
- *   The new water mark value.
- * @return
- *   - 0: Success; water mark changed.
- *   - -EINVAL: Invalid water mark value.
- */
-int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
-
-/**
  * Dump the status of the ring to a file.
  *
  * @param f
@@ -374,8 +352,6 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -390,7 +366,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	int success;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -431,13 +406,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-				(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	/*
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
@@ -446,7 +414,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -465,8 +433,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -479,7 +445,6 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_next, free_entries;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	prod_head = r->prod.head;
 	cons_tail = r->cons.tail;
@@ -508,15 +473,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-			(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -682,8 +640,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -704,8 +660,6 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -730,8 +684,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -756,8 +708,6 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -775,8 +725,6 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -798,8 +746,6 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
diff --git a/test/test/autotest_test_funcs.py b/test/test/autotest_test_funcs.py
index 1c5f390..8da8fcd 100644
--- a/test/test/autotest_test_funcs.py
+++ b/test/test/autotest_test_funcs.py
@@ -292,11 +292,4 @@ def ring_autotest(child, test_name):
     elif index == 2:
         return -1, "Fail [Timeout]"
 
-    child.sendline("set_watermark test 100")
-    child.sendline("dump_ring test")
-    index = child.expect(["  watermark=100",
-                          pexpect.TIMEOUT], timeout=1)
-    if index != 0:
-        return -1, "Fail [Bad watermark]"
-
     return 0, "Success"
diff --git a/test/test/commands.c b/test/test/commands.c
index 2df46b0..551c81d 100644
--- a/test/test/commands.c
+++ b/test/test/commands.c
@@ -228,57 +228,6 @@ cmdline_parse_inst_t cmd_dump_one = {
 
 /****************/
 
-struct cmd_set_ring_result {
-	cmdline_fixed_string_t set;
-	cmdline_fixed_string_t name;
-	uint32_t value;
-};
-
-static void cmd_set_ring_parsed(void *parsed_result, struct cmdline *cl,
-				__attribute__((unused)) void *data)
-{
-	struct cmd_set_ring_result *res = parsed_result;
-	struct rte_ring *r;
-	int ret;
-
-	r = rte_ring_lookup(res->name);
-	if (r == NULL) {
-		cmdline_printf(cl, "Cannot find ring\n");
-		return;
-	}
-
-	if (!strcmp(res->set, "set_watermark")) {
-		ret = rte_ring_set_water_mark(r, res->value);
-		if (ret != 0)
-			cmdline_printf(cl, "Cannot set water mark\n");
-	}
-}
-
-cmdline_parse_token_string_t cmd_set_ring_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, set,
-				 "set_watermark");
-
-cmdline_parse_token_string_t cmd_set_ring_name =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, name, NULL);
-
-cmdline_parse_token_num_t cmd_set_ring_value =
-	TOKEN_NUM_INITIALIZER(struct cmd_set_ring_result, value, UINT32);
-
-cmdline_parse_inst_t cmd_set_ring = {
-	.f = cmd_set_ring_parsed,  /* function to call */
-	.data = NULL,      /* 2nd arg of func */
-	.help_str = "set watermark: "
-			"set_watermark <ring_name> <value>",
-	.tokens = {        /* token list, NULL terminated */
-		(void *)&cmd_set_ring_set,
-		(void *)&cmd_set_ring_name,
-		(void *)&cmd_set_ring_value,
-		NULL,
-	},
-};
-
-/****************/
-
 struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
@@ -419,7 +368,6 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_autotest,
 	(cmdline_parse_inst_t *)&cmd_dump,
 	(cmdline_parse_inst_t *)&cmd_dump_one,
-	(cmdline_parse_inst_t *)&cmd_set_ring,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx_anchor,
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 3891f5d..666a451 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -78,21 +78,6 @@
  *      - Dequeue one object, two objects, MAX_BULK objects
  *      - Check that dequeued pointers are correct
  *
- *    - Test watermark and default bulk enqueue/dequeue:
- *
- *      - Set watermark
- *      - Set default bulk value
- *      - Enqueue objects, check that -EDQUOT is returned when
- *        watermark is exceeded
- *      - Check that dequeued pointers are correct
- *
- * #. Check live watermark change
- *
- *    - Start a loop on another lcore that will enqueue and dequeue
- *      objects in a ring. It will monitor the value of watermark.
- *    - At the same time, change the watermark on the master lcore.
- *    - The slave lcore will check that watermark changes from 16 to 32.
- *
  * #. Performance tests.
  *
  * Tests done in test_ring_perf.c
@@ -115,123 +100,6 @@ static struct rte_ring *r;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
-static int
-check_live_watermark_change(__attribute__((unused)) void *dummy)
-{
-	uint64_t hz = rte_get_timer_hz();
-	void *obj_table[MAX_BULK];
-	unsigned watermark, watermark_old = 16;
-	uint64_t cur_time, end_time;
-	int64_t diff = 0;
-	int i, ret;
-	unsigned count = 4;
-
-	/* init the object table */
-	memset(obj_table, 0, sizeof(obj_table));
-	end_time = rte_get_timer_cycles() + (hz / 4);
-
-	/* check that bulk and watermark are 4 and 32 (respectively) */
-	while (diff >= 0) {
-
-		/* add in ring until we reach watermark */
-		ret = 0;
-		for (i = 0; i < 16; i ++) {
-			if (ret != 0)
-				break;
-			ret = rte_ring_enqueue_bulk(r, obj_table, count);
-		}
-
-		if (ret != -EDQUOT) {
-			printf("Cannot enqueue objects, or watermark not "
-			       "reached (ret=%d)\n", ret);
-			return -1;
-		}
-
-		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->watermark;
-		if (watermark != watermark_old &&
-		    (watermark_old != 16 || watermark != 32)) {
-			printf("Bad watermark change %u -> %u\n", watermark_old,
-			       watermark);
-			return -1;
-		}
-		watermark_old = watermark;
-
-		/* dequeue objects from ring */
-		while (i--) {
-			ret = rte_ring_dequeue_bulk(r, obj_table, count);
-			if (ret != 0) {
-				printf("Cannot dequeue (ret=%d)\n", ret);
-				return -1;
-			}
-		}
-
-		cur_time = rte_get_timer_cycles();
-		diff = end_time - cur_time;
-	}
-
-	if (watermark_old != 32 ) {
-		printf(" watermark was not updated (wm=%u)\n",
-		       watermark_old);
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-test_live_watermark_change(void)
-{
-	unsigned lcore_id = rte_lcore_id();
-	unsigned lcore_id2 = rte_get_next_lcore(lcore_id, 0, 1);
-
-	printf("Test watermark live modification\n");
-	rte_ring_set_water_mark(r, 16);
-
-	/* launch a thread that will enqueue and dequeue, checking
-	 * watermark and quota */
-	rte_eal_remote_launch(check_live_watermark_change, NULL, lcore_id2);
-
-	rte_delay_ms(100);
-	rte_ring_set_water_mark(r, 32);
-	rte_delay_ms(100);
-
-	if (rte_eal_wait_lcore(lcore_id2) < 0)
-		return -1;
-
-	return 0;
-}
-
-/* Test for catch on invalid watermark values */
-static int
-test_set_watermark( void ){
-	unsigned count;
-	int setwm;
-
-	struct rte_ring *r = rte_ring_lookup("test_ring_basic_ex");
-	if(r == NULL){
-		printf( " ring lookup failed\n" );
-		goto error;
-	}
-	count = r->size * 2;
-	setwm = rte_ring_set_water_mark(r, count);
-	if (setwm != -EINVAL){
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-
-	count = 0;
-	rte_ring_set_water_mark(r, count);
-	if (r->watermark != r->size) {
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-	return 0;
-
-error:
-	return -1;
-}
-
 /*
  * helper routine for test_ring_basic
  */
@@ -418,8 +286,7 @@ test_ring_basic(void)
 	cur_src = src;
 	cur_dst = dst;
 
-	printf("test watermark and default bulk enqueue / dequeue\n");
-	rte_ring_set_water_mark(r, 20);
+	printf("test default bulk enqueue / dequeue\n");
 	num_elems = 16;
 
 	cur_src = src;
@@ -433,8 +300,8 @@ test_ring_basic(void)
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != -EDQUOT) {
-		printf("Watermark not exceeded\n");
+	if (ret != 0) {
+		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
@@ -930,16 +797,6 @@ test_ring(void)
 		return -1;
 
 	/* basic operations */
-	if (test_live_watermark_change() < 0)
-		return -1;
-
-	if ( test_set_watermark() < 0){
-		printf ("Test failed to detect invalid parameter\n");
-		return -1;
-	}
-	else
-		printf ( "Test detected forced bad watermark values\n");
-
 	if ( test_create_count_odd() < 0){
 			printf ("Test failed to detect odd count\n");
 			return -1;
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 05/14] ring: remove the yield when waiting for tail update
                         ` (2 preceding siblings ...)
  2017-03-24 17:09  2%     ` [dpdk-dev] [PATCH v3 04/14] ring: remove debug setting Bruce Richardson
@ 2017-03-24 17:09  4%     ` Bruce Richardson
  2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 06/14] ring: remove watermark support Bruce Richardson
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-24 17:09 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, jerin.jacob, thomas.monjalon, Bruce Richardson

There was a compile time setting to enable a ring to yield when
it entered a loop in mp or mc rings waiting for the tail pointer update.
Build time settings are not recommended for enabling/disabling features,
and since this was off by default, remove it completely. If needed, a
runtime enabled equivalent can be used.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 config/common_base                              |  1 -
 doc/guides/prog_guide/env_abstraction_layer.rst |  5 ----
 doc/guides/rel_notes/release_17_05.rst          |  1 +
 lib/librte_ring/rte_ring.h                      | 35 +++++--------------------
 4 files changed, 7 insertions(+), 35 deletions(-)

diff --git a/config/common_base b/config/common_base
index 69e91ae..2d54ddf 100644
--- a/config/common_base
+++ b/config/common_base
@@ -452,7 +452,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
 # Compile librte_mempool
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 10a10a8..7c39cd2 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -352,11 +352,6 @@ Known Issues
 
   3. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
 
-  ``RTE_RING_PAUSE_REP_COUNT`` is defined for rte_ring to reduce contention. It's mainly for case 2, a yield is issued after number of times pause repeat.
-
-  It adds a sched_yield() syscall if the thread spins for too long while waiting on the other thread to finish its operations on the ring.
-  This gives the preempted thread a chance to proceed and finish with the ring enqueue/dequeue operation.
-
 + rte_timer
 
   Running  ``rte_timer_manager()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 742ad6c..556869f 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -127,6 +127,7 @@ API Changes
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
+  * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2777b41..f8ac7f5 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -114,11 +114,6 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-#ifndef RTE_RING_PAUSE_REP_COUNT
-#define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
-                                    *   if RTE_RING_PAUSE_REP not defined. */
-#endif
-
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
 #if RTE_CACHE_LINE_SIZE < 128
@@ -393,7 +388,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t cons_tail, free_entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -447,18 +442,9 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->prod.tail != prod_head)) {
+	while (unlikely(r->prod.tail != prod_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->prod.tail = prod_next;
 	return ret;
 }
@@ -491,7 +477,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 {
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -568,7 +554,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_next, entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -613,18 +599,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * If there are other dequeues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->cons.tail != cons_head)) {
+	while (unlikely(r->cons.tail != cons_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -659,7 +636,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 04/14] ring: remove debug setting
    2017-03-24 17:09  5%     ` [dpdk-dev] [PATCH v3 01/14] ring: remove split cacheline build setting Bruce Richardson
  2017-03-24 17:09  3%     ` [dpdk-dev] [PATCH v3 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
@ 2017-03-24 17:09  2%     ` Bruce Richardson
  2017-03-24 17:09  4%     ` [dpdk-dev] [PATCH v3 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-24 17:09 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, jerin.jacob, thomas.monjalon, Bruce Richardson

The debug option only provided statistics to the user, most of
which could be tracked by the application itself. Remove this as a
compile time option, and feature, simplifying the code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 config/common_base                     |   1 -
 doc/guides/prog_guide/ring_lib.rst     |   7 -
 doc/guides/rel_notes/release_17_05.rst |   1 +
 lib/librte_ring/rte_ring.c             |  41 ----
 lib/librte_ring/rte_ring.h             |  97 +-------
 test/test/test_ring.c                  | 410 ---------------------------------
 6 files changed, 13 insertions(+), 544 deletions(-)

diff --git a/config/common_base b/config/common_base
index c394651..69e91ae 100644
--- a/config/common_base
+++ b/config/common_base
@@ -452,7 +452,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_LIBRTE_RING_DEBUG=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index 9f69753..d4ab502 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -110,13 +110,6 @@ Once an enqueue operation reaches the high water mark, the producer is notified,
 
 This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE.
 
-Debug
-~~~~~
-
-When debug is enabled (CONFIG_RTE_LIBRTE_RING_DEBUG is set),
-the library stores some per-ring statistic counters about the number of enqueues/dequeues.
-These statistics are per-core to avoid concurrent accesses or atomic operations.
-
 Use Cases
 ---------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 57ae8bf..742ad6c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -126,6 +126,7 @@ API Changes
   have been made to it:
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
+  * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 93485d4..934ce87 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -131,12 +131,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 			  RTE_CACHE_LINE_MASK) != 0);
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_LIBRTE_RING_DEBUG
-	RTE_BUILD_BUG_ON((sizeof(struct rte_ring_debug_stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
@@ -284,11 +278,6 @@ rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
 {
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats sum;
-	unsigned lcore_id;
-#endif
-
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
 	fprintf(f, "  size=%"PRIu32"\n", r->size);
@@ -302,36 +291,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		fprintf(f, "  watermark=0\n");
 	else
 		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
-
-	/* sum and dump statistics */
-#ifdef RTE_LIBRTE_RING_DEBUG
-	memset(&sum, 0, sizeof(sum));
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		sum.enq_success_bulk += r->stats[lcore_id].enq_success_bulk;
-		sum.enq_success_objs += r->stats[lcore_id].enq_success_objs;
-		sum.enq_quota_bulk += r->stats[lcore_id].enq_quota_bulk;
-		sum.enq_quota_objs += r->stats[lcore_id].enq_quota_objs;
-		sum.enq_fail_bulk += r->stats[lcore_id].enq_fail_bulk;
-		sum.enq_fail_objs += r->stats[lcore_id].enq_fail_objs;
-		sum.deq_success_bulk += r->stats[lcore_id].deq_success_bulk;
-		sum.deq_success_objs += r->stats[lcore_id].deq_success_objs;
-		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
-		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
-	}
-	fprintf(f, "  size=%"PRIu32"\n", r->size);
-	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
-	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
-	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
-	fprintf(f, "  enq_quota_objs=%"PRIu64"\n", sum.enq_quota_objs);
-	fprintf(f, "  enq_fail_bulk=%"PRIu64"\n", sum.enq_fail_bulk);
-	fprintf(f, "  enq_fail_objs=%"PRIu64"\n", sum.enq_fail_objs);
-	fprintf(f, "  deq_success_bulk=%"PRIu64"\n", sum.deq_success_bulk);
-	fprintf(f, "  deq_success_objs=%"PRIu64"\n", sum.deq_success_objs);
-	fprintf(f, "  deq_fail_bulk=%"PRIu64"\n", sum.deq_fail_bulk);
-	fprintf(f, "  deq_fail_objs=%"PRIu64"\n", sum.deq_fail_objs);
-#else
-	fprintf(f, "  no statistics available\n");
-#endif
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index d650215..2777b41 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -109,24 +109,6 @@ enum rte_ring_queue_behavior {
 	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
 };
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-/**
- * A structure that stores the ring statistics (per-lcore).
- */
-struct rte_ring_debug_stats {
-	uint64_t enq_success_bulk; /**< Successful enqueues number. */
-	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
-	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
-	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
-	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
-	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
-	uint64_t deq_success_bulk; /**< Successful dequeues number. */
-	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
-	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
-	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
-} __rte_cache_aligned;
-#endif
-
 #define RTE_RING_MZ_PREFIX "RG_"
 /**< The maximum length of a ring name. */
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
@@ -184,10 +166,6 @@ struct rte_ring {
 	/** Ring consumer status. */
 	struct rte_ring_headtail cons __rte_aligned(CONS_ALIGN);
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-#endif
-
 	void *ring[] __rte_cache_aligned;   /**< Memory space of ring starts here.
 	                                     * not volatile so need to be careful
 	                                     * about compiler re-ordering */
@@ -199,27 +177,6 @@ struct rte_ring {
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
- * @internal When debug is enabled, store ring statistics.
- * @param r
- *   A pointer to the ring.
- * @param name
- *   The name of the statistics field to increment in the ring.
- * @param n
- *   The number to add to the object-oriented statistics.
- */
-#ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {                        \
-		unsigned __lcore_id = rte_lcore_id();           \
-		if (__lcore_id < RTE_MAX_LCORE) {               \
-			r->stats[__lcore_id].name##_objs += n;  \
-			r->stats[__lcore_id].name##_bulk += 1;  \
-		}                                               \
-	} while(0)
-#else
-#define __RING_STAT_ADD(r, name, n) do {} while(0)
-#endif
-
-/**
  * Calculate the memory size needed for a ring
  *
  * This function returns the number of bytes needed for a ring, given
@@ -460,17 +417,12 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOBUFS;
-			}
 			else {
 				/* No free entry available */
-				if (unlikely(free_entries == 0)) {
-					__RING_STAT_ADD(r, enq_fail, n);
+				if (unlikely(free_entries == 0))
 					return 0;
-				}
-
 				n = free_entries;
 			}
 		}
@@ -485,15 +437,11 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	/*
 	 * If there are other enqueues in progress that preceded us,
@@ -557,17 +505,12 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, enq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOBUFS;
-		}
 		else {
 			/* No free entry available */
-			if (unlikely(free_entries == 0)) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (unlikely(free_entries == 0))
 				return 0;
-			}
-
 			n = free_entries;
 		}
 	}
@@ -580,15 +523,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	r->prod.tail = prod_next;
 	return ret;
@@ -652,16 +591,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOENT;
-			}
 			else {
-				if (unlikely(entries == 0)){
-					__RING_STAT_ADD(r, deq_fail, n);
+				if (unlikely(entries == 0))
 					return 0;
-				}
-
 				n = entries;
 			}
 		}
@@ -691,7 +625,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 			sched_yield();
 		}
 	}
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -738,16 +671,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	entries = prod_tail - cons_head;
 
 	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, deq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOENT;
-		}
 		else {
-			if (unlikely(entries == 0)){
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (unlikely(entries == 0))
 				return 0;
-			}
-
 			n = entries;
 		}
 	}
@@ -759,7 +687,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	DEQUEUE_PTRS();
 	rte_smp_rmb();
 
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
 }
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 5f09097..3891f5d 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -763,412 +763,6 @@ test_ring_burst_basic(void)
 	return -1;
 }
 
-static int
-test_ring_stats(void)
-{
-
-#ifndef RTE_LIBRTE_RING_DEBUG
-	printf("Enable RTE_LIBRTE_RING_DEBUG to test ring stats.\n");
-	return 0;
-#else
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i;
-	unsigned num_items            = 0;
-	unsigned failed_enqueue_ops   = 0;
-	unsigned failed_enqueue_items = 0;
-	unsigned failed_dequeue_ops   = 0;
-	unsigned failed_dequeue_items = 0;
-	unsigned last_enqueue_ops     = 0;
-	unsigned last_enqueue_items   = 0;
-	unsigned last_quota_ops       = 0;
-	unsigned last_quota_items     = 0;
-	unsigned lcore_id = rte_lcore_id();
-	struct rte_ring_debug_stats *ring_stats = &r->stats[lcore_id];
-
-	printf("Test the ring stats.\n");
-
-	/* Reset the watermark in case it was set in another test. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Allocate some dummy object pointers. */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
-
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-
-	/* Allocate some memory for copied objects. */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-
-	/* Set the head and tail pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	/* Do Enqueue tests. */
-	printf("Test the dequeue stats.\n");
-
-	/* Fill the ring up to RING_SIZE -1. */
-	printf("Fill the ring.\n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK); i++) {
-		rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK);
-		cur_src += MAX_BULK;
-	}
-
-	/* Adjust for final enqueue = MAX_BULK -1. */
-	cur_src--;
-
-	printf("Verify that the ring is full.\n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
-
-
-	printf("Verify the enqueue success stats.\n");
-	/* Stats should match above enqueue operations to fill the ring. */
-	if (ring_stats->enq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Current max objects is RING_SIZE -1. */
-	if (ring_stats->enq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any failures yet. */
-	if (ring_stats->enq_fail_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_fail_objs != 0)
-		goto fail;
-
-
-	printf("Test stats for SP burst enqueue to a full ring.\n");
-	num_items = 2;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for SP bulk enqueue to a full ring.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP burst enqueue to a full ring.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP bulk enqueue to a full ring.\n");
-	num_items = 16;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	/* Do Dequeue tests. */
-	printf("Test the dequeue stats.\n");
-
-	printf("Empty the ring.\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* There was only RING_SIZE -1 objects to dequeue. */
-	cur_dst++;
-
-	printf("Verify ring is empty.\n");
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	printf("Verify the dequeue success stats.\n");
-	/* Stats should match above dequeue operations. */
-	if (ring_stats->deq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Objects dequeued is RING_SIZE -1. */
-	if (ring_stats->deq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any dequeue failure stats yet. */
-	if (ring_stats->deq_fail_bulk != 0)
-		goto fail;
-
-	printf("Test stats for SC burst dequeue with an empty ring.\n");
-	num_items = 2;
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for SC bulk dequeue with an empty ring.\n");
-	num_items = 4;
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC burst dequeue with an empty ring.\n");
-	num_items = 8;
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC bulk dequeue with an empty ring.\n");
-	num_items = 16;
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test total enqueue/dequeue stats.\n");
-	/* At this point the enqueue and dequeue stats should be the same. */
-	if (ring_stats->enq_success_bulk != ring_stats->deq_success_bulk)
-		goto fail;
-	if (ring_stats->enq_success_objs != ring_stats->deq_success_objs)
-		goto fail;
-	if (ring_stats->enq_fail_bulk    != ring_stats->deq_fail_bulk)
-		goto fail;
-	if (ring_stats->enq_fail_objs    != ring_stats->deq_fail_objs)
-		goto fail;
-
-
-	/* Watermark Tests. */
-	printf("Test the watermark/quota stats.\n");
-
-	printf("Verify the initial watermark stats.\n");
-	/* Watermark stats should be 0 since there is no watermark. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Set a watermark. */
-	rte_ring_set_water_mark(r, 16);
-
-	/* Reset pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue below watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should still be 0. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Success stats should have increased. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops + 1)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items + num_items)
-		goto fail;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue at watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != 1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP burst enqueue above watermark.\n");
-	num_items = 1;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP burst enqueue above watermark.\n");
-	num_items = 2;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP bulk enqueue above watermark.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP bulk enqueue above watermark.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	printf("Test watermark success stats.\n");
-	/* Success stats should be same as last non-watermarked enqueue. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items)
-		goto fail;
-
-
-	/* Cleanup. */
-
-	/* Empty the ring. */
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* Reset the watermark. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
-	return 0;
-
-fail:
-	free(src);
-	free(dst);
-	return -1;
-#endif
-}
-
 /*
  * it will always fail to create ring with a wrong ring size number in this function
  */
@@ -1335,10 +929,6 @@ test_ring(void)
 	if (test_ring_basic() < 0)
 		return -1;
 
-	/* ring stats */
-	if (test_ring_stats() < 0)
-		return -1;
-
 	/* basic operations */
 	if (test_live_watermark_change() < 0)
 		return -1;
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 03/14] ring: eliminate duplication of size and mask fields
    2017-03-24 17:09  5%     ` [dpdk-dev] [PATCH v3 01/14] ring: remove split cacheline build setting Bruce Richardson
@ 2017-03-24 17:09  3%     ` Bruce Richardson
    2017-03-24 17:09  2%     ` [dpdk-dev] [PATCH v3 04/14] ring: remove debug setting Bruce Richardson
                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-03-24 17:09 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, jerin.jacob, thomas.monjalon, Bruce Richardson

The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_ring/rte_ring.c | 20 ++++++++++----------
 lib/librte_ring/rte_ring.h | 32 ++++++++++++++++----------------
 test/test/test_ring.c      |  6 +++---
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 93a8692..93485d4 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -144,11 +144,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.watermark = count;
+	r->watermark = count;
 	r->prod.single = !!(flags & RING_F_SP_ENQ);
 	r->cons.single = !!(flags & RING_F_SC_DEQ);
-	r->prod.size = r->cons.size = count;
-	r->prod.mask = r->cons.mask = count-1;
+	r->size = count;
+	r->mask = count - 1;
 	r->prod.head = r->cons.head = 0;
 	r->prod.tail = r->cons.tail = 0;
 
@@ -269,14 +269,14 @@ rte_ring_free(struct rte_ring *r)
 int
 rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 {
-	if (count >= r->prod.size)
+	if (count >= r->size)
 		return -EINVAL;
 
 	/* if count is 0, disable the watermarking */
 	if (count == 0)
-		count = r->prod.size;
+		count = r->size;
 
-	r->prod.watermark = count;
+	r->watermark = count;
 	return 0;
 }
 
@@ -291,17 +291,17 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->prod.watermark == r->prod.size)
+	if (r->watermark == r->size)
 		fprintf(f, "  watermark=0\n");
 	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->prod.watermark);
+		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 
 	/* sum and dump statistics */
 #ifdef RTE_LIBRTE_RING_DEBUG
@@ -318,7 +318,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
 		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
 	}
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
 	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
 	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 331c94f..d650215 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -151,10 +151,7 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 struct rte_ring_headtail {
 	volatile uint32_t head;  /**< Prod/consumer head. */
 	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
 	uint32_t single;         /**< True if single prod/cons */
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 };
 
 /**
@@ -174,9 +171,12 @@ struct rte_ring {
 	 * next time the ABI changes
 	 */
 	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
+	int flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_headtail prod __rte_aligned(PROD_ALIGN);
@@ -355,7 +355,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * Placed here since identical code needed in both
  * single and multi producer enqueue functions */
 #define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
+	const uint32_t size = r->size; \
 	uint32_t idx = prod_head & mask; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
@@ -382,7 +382,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * single and multi consumer dequeue functions */
 #define DEQUEUE_PTRS() do { \
 	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
+	const uint32_t size = r->size; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
 			obj_table[i] = r->ring[idx]; \
@@ -437,7 +437,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -485,7 +485,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -544,7 +544,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	prod_head = r->prod.head;
@@ -580,7 +580,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -630,7 +630,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -727,7 +727,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
 	prod_tail = r->prod.tail;
@@ -1056,7 +1056,7 @@ rte_ring_full(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+	return ((cons_tail - prod_tail - 1) & r->mask) == 0;
 }
 
 /**
@@ -1089,7 +1089,7 @@ rte_ring_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (prod_tail - cons_tail) & r->prod.mask;
+	return (prod_tail - cons_tail) & r->mask;
 }
 
 /**
@@ -1105,7 +1105,7 @@ rte_ring_free_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (cons_tail - prod_tail - 1) & r->prod.mask;
+	return (cons_tail - prod_tail - 1) & r->mask;
 }
 
 /**
@@ -1119,7 +1119,7 @@ rte_ring_free_count(const struct rte_ring *r)
 static inline unsigned int
 rte_ring_get_size(const struct rte_ring *r)
 {
-	return r->prod.size;
+	return r->size;
 }
 
 /**
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index ebcb896..5f09097 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -148,7 +148,7 @@ check_live_watermark_change(__attribute__((unused)) void *dummy)
 		}
 
 		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->prod.watermark;
+		watermark = r->watermark;
 		if (watermark != watermark_old &&
 		    (watermark_old != 16 || watermark != 32)) {
 			printf("Bad watermark change %u -> %u\n", watermark_old,
@@ -213,7 +213,7 @@ test_set_watermark( void ){
 		printf( " ring lookup failed\n" );
 		goto error;
 	}
-	count = r->prod.size*2;
+	count = r->size * 2;
 	setwm = rte_ring_set_water_mark(r, count);
 	if (setwm != -EINVAL){
 		printf("Test failed to detect invalid watermark count value\n");
@@ -222,7 +222,7 @@ test_set_watermark( void ){
 
 	count = 0;
 	rte_ring_set_water_mark(r, count);
-	if (r->prod.watermark != r->prod.size) {
+	if (r->watermark != r->size) {
 		printf("Test failed to detect invalid watermark count value\n");
 		goto error;
 	}
-- 
2.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 01/14] ring: remove split cacheline build setting
  @ 2017-03-24 17:09  5%     ` Bruce Richardson
  2017-03-24 17:09  3%     ` [dpdk-dev] [PATCH v3 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-24 17:09 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, jerin.jacob, thomas.monjalon, Bruce Richardson

Users compiling DPDK should not need to know or care about the arrangement
of cachelines in the rte_ring structure.  Therefore just remove the build
option and set the structures to be always split. On platforms with 64B
cachelines, for improved performance use 128B rather than 64B alignment
since it stops the producer and consumer data being on adjacent cachelines.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
---
V2: Limit the cacheline * 2 alignment to platforms with < 128B line size
---
 config/common_base                     |  1 -
 doc/guides/rel_notes/release_17_05.rst |  7 +++++++
 lib/librte_ring/rte_ring.c             |  2 --
 lib/librte_ring/rte_ring.h             | 16 ++++++++++------
 4 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/config/common_base b/config/common_base
index 37aa1e1..c394651 100644
--- a/config/common_base
+++ b/config/common_base
@@ -453,7 +453,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 #
 CONFIG_RTE_LIBRTE_RING=y
 CONFIG_RTE_LIBRTE_RING_DEBUG=n
-CONFIG_RTE_RING_SPLIT_PROD_CONS=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 918f483..57ae8bf 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -120,6 +120,13 @@ API Changes
 * The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
   while keeping ABI compatibility.
 
+* **Reworked rte_ring library**
+
+  The rte_ring library has been reworked and updated. The following changes
+  have been made to it:
+
+  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
+
 ABI Changes
 -----------
 
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index ca0a108..4bc6da1 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	/* compilation-time checks */
 	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_RING_SPLIT_PROD_CONS
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #ifdef RTE_LIBRTE_RING_DEBUG
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 72ccca5..399ae3b 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -139,6 +139,14 @@ struct rte_ring_debug_stats {
 
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
+#if RTE_CACHE_LINE_SIZE < 128
+#define PROD_ALIGN (RTE_CACHE_LINE_SIZE * 2)
+#define CONS_ALIGN (RTE_CACHE_LINE_SIZE * 2)
+#else
+#define PROD_ALIGN RTE_CACHE_LINE_SIZE
+#define CONS_ALIGN RTE_CACHE_LINE_SIZE
+#endif
+
 /**
  * An RTE ring structure.
  *
@@ -168,7 +176,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Producer head. */
 		volatile uint32_t tail;  /**< Producer tail. */
-	} prod __rte_cache_aligned;
+	} prod __rte_aligned(PROD_ALIGN);
 
 	/** Ring consumer status. */
 	struct cons {
@@ -177,11 +185,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Consumer head. */
 		volatile uint32_t tail;  /**< Consumer tail. */
-#ifdef RTE_RING_SPLIT_PROD_CONS
-	} cons __rte_cache_aligned;
-#else
-	} cons;
-#endif
+	} cons __rte_aligned(CONS_ALIGN);
 
 #ifdef RTE_LIBRTE_RING_DEBUG
 	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-- 
2.9.3

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v5 1/7] net/ark: PMD for Atomic Rules Arkville driver stub
  2017-03-23  1:03  3% [dpdk-dev] [PATCH v4 " Ed Czeck
@ 2017-03-23 22:59  3% ` Ed Czeck
  0 siblings, 0 replies; 200+ results
From: Ed Czeck @ 2017-03-23 22:59 UTC (permalink / raw)
  To: dev; +Cc: john.miller, shepard.siegel, ferruh.yigit, stephen, Ed Czeck

Enable Arkville on supported configurations
Add overview documentation
Minimum driver support for valid compile
Arkville PMD is not supported on ARM or PowerPC at this time

v5:
* Address comments from Ferruh Yigit <ferruh.yigit@intel.com>
* Added documentation on driver args
* Makefile fixes
* Safe argument processing
* vdev args to dev args

v4:
* Address issues report from review
* Add internal comments on driver arg
* provide a bare-bones dev init to avoid compiler warnings

v3:
* Split large patch into several smaller ones

Signed-off-by: Ed Czeck <ed.czeck@atomicrules.com>
Signed-off-by: John Miller <john.miller@atomicrules.com>
---
 MAINTAINERS                                 |   8 +
 config/common_base                          |  10 +
 config/defconfig_arm-armv7a-linuxapp-gcc    |   1 +
 config/defconfig_ppc_64-power8-linuxapp-gcc |   1 +
 doc/guides/nics/ark.rst                     | 310 ++++++++++++++++++++++++++++
 doc/guides/nics/index.rst                   |   1 +
 drivers/net/Makefile                        |   1 +
 drivers/net/ark/Makefile                    |  62 ++++++
 drivers/net/ark/ark_debug.h                 |  71 +++++++
 drivers/net/ark/ark_ethdev.c                | 294 ++++++++++++++++++++++++++
 drivers/net/ark/ark_ethdev.h                |  39 ++++
 drivers/net/ark/ark_global.h                | 116 +++++++++++
 drivers/net/ark/rte_pmd_ark_version.map     |   4 +
 mk/rte.app.mk                               |   1 +
 14 files changed, 919 insertions(+)
 create mode 100644 doc/guides/nics/ark.rst
 create mode 100644 drivers/net/ark/Makefile
 create mode 100644 drivers/net/ark/ark_debug.h
 create mode 100644 drivers/net/ark/ark_ethdev.c
 create mode 100644 drivers/net/ark/ark_ethdev.h
 create mode 100644 drivers/net/ark/ark_global.h
 create mode 100644 drivers/net/ark/rte_pmd_ark_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 0c78b58..19ee27f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -278,6 +278,14 @@ M: Evgeny Schemeilin <evgenys@amazon.com>
 F: drivers/net/ena/
 F: doc/guides/nics/ena.rst
 
+Atomic Rules ARK
+M: Shepard Siegel <shepard.siegel@atomicrules.com>
+M: Ed Czeck       <ed.czeck@atomicrules.com>
+M: John Miller    <john.miller@atomicrules.com>
+F: drivers/net/ark/
+F: doc/guides/nics/ark.rst
+F: doc/guides/nics/features/ark.ini
+
 Broadcom bnxt
 M: Stephen Hurd <stephen.hurd@broadcom.com>
 M: Ajit Khaparde <ajit.khaparde@broadcom.com>
diff --git a/config/common_base b/config/common_base
index 37aa1e1..4feb5e4 100644
--- a/config/common_base
+++ b/config/common_base
@@ -353,6 +353,16 @@ CONFIG_RTE_LIBRTE_QEDE_FW=""
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
 #
+# Compile ARK PMD
+#
+CONFIG_RTE_LIBRTE_ARK_PMD=y
+CONFIG_RTE_LIBRTE_ARK_PAD_TX=y
+CONFIG_RTE_LIBRTE_ARK_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE=n
+
+#
 # Compile the TAP PMD
 # It is enabled by default for Linux only.
 #
diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index d9bd2a8..6d2b5e0 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -61,6 +61,7 @@ CONFIG_RTE_SCHED_VECTOR=n
 
 # cannot use those on ARM
 CONFIG_RTE_KNI_KMOD=n
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_EM_PMD=n
 CONFIG_RTE_LIBRTE_IGB_PMD=n
 CONFIG_RTE_LIBRTE_CXGBE_PMD=n
diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc b/config/defconfig_ppc_64-power8-linuxapp-gcc
index 35f7fb6..89bc396 100644
--- a/config/defconfig_ppc_64-power8-linuxapp-gcc
+++ b/config/defconfig_ppc_64-power8-linuxapp-gcc
@@ -48,6 +48,7 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
 
 # Note: Initially, all of the PMD drivers compilation are turned off on Power
 # Will turn on them only after the successful testing on Power
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_IXGBE_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
diff --git a/doc/guides/nics/ark.rst b/doc/guides/nics/ark.rst
new file mode 100644
index 0000000..7df07ce
--- /dev/null
+++ b/doc/guides/nics/ark.rst
@@ -0,0 +1,310 @@
+.. BSD LICENSE
+
+    Copyright (c) 2015-2017 Atomic Rules LLC
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Atomic Rules LLC nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ARK Poll Mode Driver
+====================
+
+The ARK PMD is a DPDK poll-mode driver for the Atomic Rules Arkville
+(ARK) family of devices.
+
+More information can be found at the `Atomic Rules website
+<http://atomicrules.com>`_.
+
+Overview
+--------
+
+The Atomic Rules Arkville product is DPDK and AXI compliant product
+that marshals packets across a PCIe conduit between host DPDK mbufs and
+FPGA AXI streams.
+
+The ARK PMD, and the spirit of the overall Arkville product,
+has been to take the DPDK API/ABI as a fixed specification;
+then implement much of the business logic in FPGA RTL circuits.
+The approach of *working backwards* from the DPDK API/ABI and having
+the GPP host software *dictate*, while the FPGA hardware *copes*,
+results in significant performance gains over a naive implementation.
+
+While this document describes the ARK PMD software, it is helpful to
+understand what the FPGA hardware is and is not. The Arkville RTL
+component provides a single PCIe Physical Function (PF) supporting
+some number of RX/Ingress and TX/Egress Queues. The ARK PMD controls
+the Arkville core through a dedicated opaque Core BAR (CBAR).
+To allow users full freedom for their own FPGA application IP,
+an independent FPGA Application BAR (ABAR) is provided.
+
+One popular way to imagine Arkville's FPGA hardware aspect is as the
+FPGA PCIe-facing side of a so-called Smart NIC. The Arkville core does
+not contain any MACs, and is link-speed independent, as well as
+agnostic to the number of physical ports the application chooses to
+use. The ARK driver exposes the familiar PMD interface to allow packet
+movement to and from mbufs across multiple queues.
+
+However FPGA RTL applications could contain a universe of added
+functionality that an Arkville RTL core does not provide or can
+not anticipate. To allow for this expectation of user-defined
+innovation, the ARK PMD provides a dynamic mechanism of adding
+capabilities without having to modify the ARK PMD.
+
+The ARK PMD is intended to support all instances of the Arkville
+RTL Core, regardless of configuration, FPGA vendor, or target
+board. While specific capabilities such as number of physical
+hardware queue-pairs are negotiated; the driver is designed to
+remain constant over a broad and extendable feature set.
+
+Intentionally, Arkville by itself DOES NOT provide common NIC
+capabilities such as offload or receive-side scaling (RSS).
+These capabilities would be viewed as a gate-level "tax" on
+Green-box FPGA applications that do not require such function.
+Instead, they can be added as needed with essentially no
+overhead to the FPGA Application.
+
+The ARK PMD also supports optional user extensions, through dynamic linking.
+The ARK PMD user extensions are a feature of Arkville’s DPDK
+net/ark poll mode driver, allowing users to add their
+own code to extend the net/ark functionality without
+having to make source code changes to the driver. One motivation for
+this capability is that while DPDK provides a rich set of functions
+to interact with NIC-like capabilities (e.g. MAC addresses and statistics),
+the Arkville RTL IP does not include a MAC.  Users can supply their
+own MAC or custom FPGA applications, which may require control from
+the PMD.  The user extension is the means providing the control
+between the user's FPGA application and the existing DPDK features via
+the PMD.
+
+Device Parameters
+-------------------
+
+The ARK PMD supports a series of parameters that are used for packet routing
+and for internal packet generation and packet checking.  This section describes
+the supported parameters.  These features are primarily used for
+diagnostics, testing, and performance verification.  The nominal use
+of Arkville does not require any configuration using these parameters.
+
+"Pkt_dir"
+
+The Packet Director controls connectivity between the Packet Generator,
+Packet Checker, UDM, DDM, and external ingress and egress interfaces for
+diagnostic purposes. It includes an internal loopback path from the DDM to the UDM.
+
+NOTE: Packets from the packet generator to the UDM are all directed to UDM RX
+queue 0. Packets looped back from the DDM to the UDM are directed to the same
+queue number they originated from.
+
+bit 24: enRxChk (default 0)
+bit 20: enDDMChk (default 1)
+bit 16: enPGIngress (default 1)
+bit 12: enPGEgress (default 0)
+bit 8:  enExtIngress (default 1)
+bit 4:  enDDMEgress (default 1)
+bit 0:  enIntLpbk (default 0)
+
+Power On state
+0x00110110
+
+These bits control which diagnostic paths are enabled. Refer to the PktDirector block
+diagram in the Arkville documentation.
+
+Format:
+Pkt_dir=0x00110110
+
+"Pkt_gen"
+
+The packet generator parameter takes a file as its argument.  The file contains configuration
+parameters used internally for regression testing and are not intended to be published at this
+level.
+
+Format:
+Pkt_gen=./config/pg.conf
+
+"Pkt_chkr"
+
+The packet checker parameter takes a file as its argument.  The file contains configuration
+parameters used internally for regression testing and are not intended to be published at this
+level.
+
+Format:
+Pkt_chkr=./config/pc.conf
+
+
+Data Path Interface
+-------------------
+
+Ingress RX and Egress TX operation is by the nominal DPDK API .
+The driver supports single-port, multi-queue for both RX and TX.
+
+Refer to ``ark_ethdev.h`` for the list of supported methods to
+act upon RX and TX Queues.
+
+Configuration Information
+-------------------------
+
+**DPDK Configuration Parameters**
+
+  The following configuration options are available for the ARK PMD:
+
+   * **CONFIG_RTE_LIBRTE_ARK_PMD** (default y): Enables or disables inclusion
+     of the ARK PMD driver in the DPDK compilation.
+
+   * **CONFIG_RTE_LIBRTE_ARK_PAD_TX** (default y):  When enabled TX
+     packets are padded to 60 bytes to support downstream MACS.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_RX** (default n): Enables or disables debug
+     logging and internal checking of RX ingress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TX** (default n): Enables or disables debug
+     logging and internal checking of TX egress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS** (default n): Enables or disables debug
+     logging of detailed packet and performance statistics gathered in
+     the PMD and FPGA.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE** (default n): Enables or disables debug
+     logging of detailed PMD events and status.
+
+
+Building DPDK
+-------------
+
+See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for
+instructions on how to build DPDK.
+
+By default the ARK PMD library will be built into the DPDK library.
+
+For configuring and using UIO and VFIO frameworks, please also refer :ref:`the
+documentation that comes with DPDK suite <linux_gsg>`.
+
+Supported ARK RTL PCIe Instances
+--------------------------------
+
+ARK PMD supports the following Arkville RTL PCIe instances including:
+
+* ``1d6c:100d`` - AR-ARKA-FX0 [Arkville 32B DPDK Data Mover]
+* ``1d6c:100e`` - AR-ARKA-FX1 [Arkville 64B DPDK Data Mover]
+
+Supported Operating Systems
+---------------------------
+
+Any Linux distribution fulfilling the conditions described in ``System Requirements``
+section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK
+Release Notes*.  ARM and PowerPC architectures are not supported at this time.
+
+
+Supported Features
+------------------
+
+* Dynamic ARK PMD extensions
+* Multiple receive and transmit queues
+* Jumbo frames up to 9K
+* Hardware Statistics
+
+Unsupported Features
+--------------------
+
+Features that may be part of, or become part of, the Arkville RTL IP that are
+not currently supported or exposed by the ARK PMD include:
+
+* PCIe SR-IOV Virtual Functions (VFs)
+* Arkville's Packet Generator Control and Status
+* Arkville's Packet Director Control and Status
+* Arkville's Packet Checker Control and Status
+* Arkville's Timebase Management
+
+Pre-Requisites
+--------------
+
+#. Prepare the system as recommended by DPDK suite.  This includes environment
+   variables, hugepages configuration, tool-chains and configuration
+
+#. Insert igb_uio kernel module using the command 'modprobe igb_uio'
+
+#. Bind the intended ARK device to igb_uio module
+
+At this point the system should be ready to run DPDK applications. Once the
+application runs to completion, the ARK PMD can be detached from igb_uio if necessary.
+
+Usage Example
+-------------
+
+This section demonstrates how to launch **testpmd** with Atomic Rules ARK
+devices managed by librte_pmd_ark.
+
+#. Load the kernel modules:
+
+   .. code-block:: console
+
+      modprobe uio
+      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
+
+   .. note::
+
+      The ARK PMD driver depends upon the igb_uio user space I/O kernel module
+
+#. Mount and request huge pages:
+
+   .. code-block:: console
+
+      mount -t hugetlbfs nodev /mnt/huge
+      echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
+
+#. Bind UIO driver to ARK device at 0000:01:00.0 (using dpdk-devbind.py):
+
+   .. code-block:: console
+
+      ./usertools/dpdk-devbind.py --bind=igb_uio 0000:01:00.0
+
+   .. note::
+
+      The last argument to dpdk-devbind.py is the 4-tuple that indentifies a specific PCIe
+      device. You can use lspci -d 1d6c: to indentify all Atomic Rules devices in the system,
+      and thus determine the correct 4-tuple argument to dpdk-devbind.py
+
+#. Start testpmd with basic parameters:
+
+   .. code-block:: console
+
+      ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -- -i
+
+   Example output:
+
+   .. code-block:: console
+
+      [...]
+      EAL: PCI device 0000:01:00.0 on NUMA socket -1
+      EAL:   probe driver: 1d6c:100e rte_ark_pmd
+      EAL:   PCI memory mapped at 0x7f9b6c400000
+      PMD: eth_ark_dev_init(): Initializing 0:2:0.1
+      ARKP PMD CommitID: 378f3a67
+      Configuring Port 0 (socket 0)
+      Port 0: DC:3C:F6:00:00:01
+      Checking link statuses...
+      Port 0 Link Up - speed 100000 Mbps - full-duplex
+      Done
+      testpmd>
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 87f9334..381d82c 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
     :numbered:
 
     overview
+    ark
     bnx2x
     bnxt
     cxgbe
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a16f25e..ea9868b 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -32,6 +32,7 @@
 include $(RTE_SDK)/mk/rte.vars.mk
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += bnx2x
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += bonding
 DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
diff --git a/drivers/net/ark/Makefile b/drivers/net/ark/Makefile
new file mode 100644
index 0000000..afe69c4
--- /dev/null
+++ b/drivers/net/ark/Makefile
@@ -0,0 +1,62 @@
+# BSD LICENSE
+#
+# Copyright (c) 2015-2017 Atomic Rules LLC
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of copyright holder nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_ark.a
+
+CFLAGS += -O3 -I./
+CFLAGS += $(WERROR_FLAGS) -Werror
+
+EXPORT_MAP := rte_pmd_ark_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev.c
+
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mempool
+
+LDLIBS += -lpthread
+LDLIBS += -ldl
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/ark/ark_debug.h b/drivers/net/ark/ark_debug.h
new file mode 100644
index 0000000..62f7462
--- /dev/null
+++ b/drivers/net/ark/ark_debug.h
@@ -0,0 +1,71 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_DEBUG_H_
+#define _ARK_DEBUG_H_
+
+#include <inttypes.h>
+#include <rte_log.h>
+
+/* Format specifiers for string data pairs */
+#define ARK_SU32  "\n\t%-20s    %'20" PRIu32
+#define ARK_SU64  "\n\t%-20s    %'20" PRIu64
+#define ARK_SU64X "\n\t%-20s    %#20" PRIx64
+#define ARK_SPTR  "\n\t%-20s    %20p"
+
+#define ARK_TRACE_ON(fmt, ...) \
+	PMD_DRV_LOG(DEBUG, fmt, ##__VA_ARGS__)
+
+#define ARK_TRACE_OFF(fmt, ...) \
+	do {if (0) PMD_DRV_LOG(DEBUG, fmt, ##__VA_ARGS__); } while (0)
+
+/* Debug macro for reporting Packet stats */
+#ifdef RTE_LIBRTE_ARK_DEBUG_STATS
+#define ARK_DEBUG_STATS(fmt, ...) ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_STATS(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+/* Debug macro for tracing full behavior*/
+#ifdef RTE_LIBRTE_ARK_DEBUG_TRACE
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+/* tracing including the function name */
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt, __func__, ## args)
+
+
+#endif
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
new file mode 100644
index 0000000..0a47543
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.c
@@ -0,0 +1,294 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dlfcn.h>
+
+#include <rte_kvargs.h>
+
+#include "ark_global.h"
+#include "ark_debug.h"
+#include "ark_ethdev.h"
+
+/*  Internal prototypes */
+static int eth_ark_check_args(struct ark_adapter *ark, const char *params);
+static int eth_ark_dev_init(struct rte_eth_dev *dev);
+static int eth_ark_dev_uninit(struct rte_eth_dev *eth_dev);
+static int eth_ark_dev_configure(struct rte_eth_dev *dev);
+static void eth_ark_dev_info_get(struct rte_eth_dev *dev,
+				 struct rte_eth_dev_info *dev_info);
+
+#define ARK_DEV_TO_PCI(eth_dev)			\
+	RTE_DEV_TO_PCI((eth_dev)->device)
+
+/*
+ * The packet generator is a functional block used to generate egress packet
+ * patterns.
+ */
+#define ARK_PKTGEN_ARG "Pkt_gen"
+
+/*
+ * The packet checker is a functional block used to test ingress packet
+ * patterns.
+ */
+#define ARK_PKTCHKR_ARG "Pkt_chkr"
+
+/*
+ * The packet director is used to select the internal ingress and egress packets
+ * paths.
+ */
+#define ARK_PKTDIR_ARG "Pkt_dir"
+
+/* Devinfo configurations */
+#define ARK_RX_MAX_QUEUE (4096 * 4)
+#define ARK_RX_MIN_QUEUE (512)
+#define ARK_RX_MAX_PKT_LEN ((16 * 1024) - 128)
+#define ARK_RX_MIN_BUFSIZE (1024)
+
+#define ARK_TX_MAX_QUEUE (4096 * 4)
+#define ARK_TX_MIN_QUEUE (256)
+
+static const char * const valid_arguments[] = {
+	ARK_PKTGEN_ARG,
+	ARK_PKTCHKR_ARG,
+	ARK_PKTDIR_ARG,
+	NULL
+};
+
+static const struct rte_pci_id pci_id_ark_map[] = {
+	{RTE_PCI_DEVICE(0x1d6c, 0x100d)},
+	{RTE_PCI_DEVICE(0x1d6c, 0x100e)},
+	{.vendor_id = 0, /* sentinel */ },
+};
+
+static struct eth_driver rte_ark_pmd = {
+	.pci_drv = {
+		.probe = rte_eth_dev_pci_probe,
+		.remove = rte_eth_dev_pci_remove,
+		.id_table = pci_id_ark_map,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC
+	},
+	.eth_dev_init = eth_ark_dev_init,
+	.eth_dev_uninit = eth_ark_dev_uninit,
+	.dev_private_size = sizeof(struct ark_adapter),
+};
+
+static const struct eth_dev_ops ark_eth_dev_ops = {
+	.dev_configure = eth_ark_dev_configure,
+	.dev_infos_get = eth_ark_dev_info_get,
+};
+
+static int
+eth_ark_dev_init(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+	struct rte_pci_device *pci_dev;
+	int ret = -1;
+
+	ark->eth_dev = dev;
+
+	ARK_DEBUG_TRACE("eth_ark_dev_init(struct rte_eth_dev *dev)\n");
+
+	pci_dev = ARK_DEV_TO_PCI(dev);
+	rte_eth_copy_pci_info(dev, pci_dev);
+
+	if (pci_dev->device.devargs)
+		eth_ark_check_args(ark, pci_dev->device.devargs->args);
+	else
+		PMD_DRV_LOG(INFO, "No Device args found\n");
+
+
+	ark->bar0 = (uint8_t *)pci_dev->mem_resource[0].addr;
+	ark->a_bar = (uint8_t *)pci_dev->mem_resource[2].addr;
+
+	dev->dev_ops = &ark_eth_dev_ops;
+
+	/*  We process our args last as they require everything to be setup */
+	if (pci_dev->device.devargs)
+		eth_ark_check_args(ark, pci_dev->device.devargs->args);
+	else
+		PMD_DRV_LOG(INFO, "No Device args found\n");
+
+	return ret;
+}
+
+static int
+eth_ark_dev_uninit(struct rte_eth_dev *dev)
+{
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	dev->dev_ops = NULL;
+	dev->rx_pkt_burst = NULL;
+	dev->tx_pkt_burst = NULL;
+	return 0;
+}
+
+static int
+eth_ark_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	ARK_DEBUG_TRACE("ARKP: In %s\n", __func__);
+	return 0;
+}
+
+static void
+eth_ark_dev_info_get(struct rte_eth_dev *dev,
+		     struct rte_eth_dev_info *dev_info)
+{
+	dev_info->max_rx_pktlen = ARK_RX_MAX_PKT_LEN;
+	dev_info->min_rx_bufsize = ARK_RX_MIN_BUFSIZE;
+
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = ARK_RX_MAX_QUEUE,
+		.nb_min = ARK_RX_MIN_QUEUE,
+		.nb_align = ARK_RX_MIN_QUEUE}; /* power of 2 */
+
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = ARK_TX_MAX_QUEUE,
+		.nb_min = ARK_TX_MIN_QUEUE,
+		.nb_align = ARK_TX_MIN_QUEUE}; /* power of 2 */
+
+	/* ARK PMD supports all line rates, how do we indicate that here ?? */
+	dev_info->speed_capa = (ETH_LINK_SPEED_1G |
+				ETH_LINK_SPEED_10G |
+				ETH_LINK_SPEED_25G |
+				ETH_LINK_SPEED_40G |
+				ETH_LINK_SPEED_50G |
+				ETH_LINK_SPEED_100G);
+	dev_info->pci_dev = ARK_DEV_TO_PCI(dev);
+}
+
+static inline int
+process_pktdir_arg(const char *key, const char *value,
+		   void *extra_args)
+{
+	ARK_DEBUG_TRACE("In process_pktdir_arg, key = %s, value = %s\n",
+			key, value);
+	struct ark_adapter *ark =
+		(struct ark_adapter *)extra_args;
+
+	ark->pkt_dir_v = strtol(value, NULL, 16);
+	ARK_DEBUG_TRACE("pkt_dir_v = 0x%x\n", ark->pkt_dir_v);
+	return 0;
+}
+
+static inline int
+process_file_args(const char *key, const char *value, void *extra_args)
+{
+	ARK_DEBUG_TRACE("**** IN process_pktgen_arg, key = %s, value = %s\n",
+			key, value);
+	char *args = (char *)extra_args;
+
+	/* Open the configuration file */
+	FILE *file = fopen(value, "r");
+	char line[ARK_MAX_ARG_LEN];
+	int  size = 0;
+	int first = 1;
+
+	while (fgets(line, sizeof(line), file)) {
+		size += strlen(line);
+		if (size >= ARK_MAX_ARG_LEN) {
+			PMD_DRV_LOG(ERR, "Unable to parse file %s args, "
+				    "parameter list is too long\n", value);
+			fclose(file);
+			return -1;
+		}
+		if (first) {
+			strncpy(args, line, ARK_MAX_ARG_LEN);
+			first = 0;
+		} else {
+			strncat(args, line, ARK_MAX_ARG_LEN);
+		}
+	}
+	ARK_DEBUG_TRACE("file = %s\n", args);
+	fclose(file);
+	return 0;
+}
+
+static int
+eth_ark_check_args(struct ark_adapter *ark, const char *params)
+{
+	struct rte_kvargs *kvlist;
+	unsigned int k_idx;
+	struct rte_kvargs_pair *pair = NULL;
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return 0;
+
+	ark->pkt_gen_args[0] = 0;
+	ark->pkt_chkr_args[0] = 0;
+
+	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
+		pair = &kvlist->pairs[k_idx];
+		ARK_DEBUG_TRACE("**** Arg passed to PMD = %s:%s\n", pair->key,
+				pair->value);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTDIR_ARG,
+			       &process_pktdir_arg,
+			       ark) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTDIR_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTGEN_ARG,
+			       &process_file_args,
+			       ark->pkt_gen_args) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTGEN_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTCHKR_ARG,
+			       &process_file_args,
+			       ark->pkt_chkr_args) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTCHKR_ARG);
+	}
+
+	ARK_DEBUG_TRACE("INFO: packet director set to 0x%x\n", ark->pkt_dir_v);
+
+	return 1;
+}
+
+RTE_PMD_REGISTER_PCI(eth_ark, rte_ark_pmd.pci_drv);
+RTE_PMD_REGISTER_KMOD_DEP(net_ark, "* igb_uio | uio_pci_generic ");
+RTE_PMD_REGISTER_PCI_TABLE(eth_ark, pci_id_ark_map);
+RTE_PMD_REGISTER_PARAM_STRING(eth_ark,
+			      ARK_PKTGEN_ARG "=<filename> "
+			      ARK_PKTCHKR_ARG "=<filename> "
+			      ARK_PKTDIR_ARG "=<bitmap>");
+
+
diff --git a/drivers/net/ark/ark_ethdev.h b/drivers/net/ark/ark_ethdev.h
new file mode 100644
index 0000000..08d7fb1
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.h
@@ -0,0 +1,39 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_ETHDEV_H_
+#define _ARK_ETHDEV_H_
+
+/* STUB */
+
+#endif
diff --git a/drivers/net/ark/ark_global.h b/drivers/net/ark/ark_global.h
new file mode 100644
index 0000000..033ac87
--- /dev/null
+++ b/drivers/net/ark/ark_global.h
@@ -0,0 +1,116 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_GLOBAL_H_
+#define _ARK_GLOBAL_H_
+
+#include <time.h>
+#include <assert.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_cycles.h>
+#include <rte_kvargs.h>
+#include <rte_dev.h>
+#include <rte_version.h>
+
+#define ETH_ARK_ARG_MAXLEN	64
+#define ARK_SYSCTRL_BASE  0x0
+#define ARK_PKTGEN_BASE   0x10000
+#define ARK_MPU_RX_BASE   0x20000
+#define ARK_UDM_BASE      0x30000
+#define ARK_MPU_TX_BASE   0x40000
+#define ARK_DDM_BASE      0x60000
+#define ARK_CMAC_BASE     0x80000
+#define ARK_PKTDIR_BASE   0xa0000
+#define ARK_PKTCHKR_BASE  0x90000
+#define ARK_RCPACING_BASE 0xb0000
+#define ARK_EXTERNAL_BASE 0x100000
+#define ARK_MPU_QOFFSET   0x00100
+#define ARK_MAX_PORTS     8
+
+#define offset8(n)     n
+#define offset16(n)   ((n) / 2)
+#define offset32(n)   ((n) / 4)
+#define offset64(n)   ((n) / 8)
+
+/* Maximum length of arg list in bytes */
+#define ARK_MAX_ARG_LEN 256
+
+/*
+ * Structure to store private data for each PF/VF instance.
+ */
+#define def_ptr(type, name) \
+	union type {		   \
+		uint64_t *t64;	   \
+		uint32_t *t32;	   \
+		uint16_t *t16;	   \
+		uint8_t  *t8;	   \
+		void     *v;	   \
+	} name
+
+struct ark_port {
+	struct rte_eth_dev *eth_dev;
+	int id;
+};
+
+struct ark_adapter {
+	/* User extension private data */
+	void *user_data;
+
+	struct ark_port port[ARK_MAX_PORTS];
+	int num_ports;
+
+	/* Packet generator/checker args */
+	char pkt_gen_args[ARK_MAX_ARG_LEN];
+	char pkt_chkr_args[ARK_MAX_ARG_LEN];
+	uint32_t pkt_dir_v;
+
+	/* eth device */
+	struct rte_eth_dev *eth_dev;
+
+	void *d_handle;
+
+	/* Our Bar 0 */
+	uint8_t *bar0;
+
+	/* Application Bar */
+	uint8_t *a_bar;
+};
+
+typedef uint32_t *ark_t;
+
+#endif
diff --git a/drivers/net/ark/rte_pmd_ark_version.map b/drivers/net/ark/rte_pmd_ark_version.map
new file mode 100644
index 0000000..1062e04
--- /dev/null
+++ b/drivers/net/ark/rte_pmd_ark_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+	 local: *;
+
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0e0b600..da23898 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -104,6 +104,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD)      += -lrte_pmd_bnx2x -lz
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNXT_PMD)       += -lrte_pmd_bnxt
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
1.9.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 19/22] vhost: rename header file
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
                     ` (6 preceding siblings ...)
  2017-03-23  7:10  3%   ` [dpdk-dev] [PATCH v2 18/22] vhost: introduce API to start a specific driver Yuanhan Liu
@ 2017-03-23  7:10  5%   ` Yuanhan Liu
  7 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/deprecation.rst   |   9 -
 doc/guides/rel_notes/release_17_05.rst |   3 +
 drivers/net/vhost/rte_eth_vhost.c      |   2 +-
 drivers/net/vhost/rte_eth_vhost.h      |   2 +-
 examples/tep_termination/main.c        |   2 +-
 examples/tep_termination/vxlan_setup.c |   2 +-
 examples/vhost/main.c                  |   2 +-
 lib/librte_vhost/Makefile              |   2 +-
 lib/librte_vhost/rte_vhost.h           | 421 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/rte_virtio_net.h      | 421 ---------------------------------
 lib/librte_vhost/vhost.c               |   2 +-
 lib/librte_vhost/vhost.h               |   2 +-
 lib/librte_vhost/vhost_user.h          |   2 +-
 lib/librte_vhost/virtio_net.c          |   2 +-
 14 files changed, 434 insertions(+), 440 deletions(-)
 create mode 100644 lib/librte_vhost/rte_vhost.h
 delete mode 100644 lib/librte_vhost/rte_virtio_net.h

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d6544ed..9708b39 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -95,15 +95,6 @@ Deprecation Notices
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
 
-* vhost: API/ABI changes are planned for 17.05, for making DPDK vhost library
-  generic enough so that applications can build different vhost-user drivers
-  (instead of vhost-user net only) on top of that.
-  Specifically, ``virtio_net_device_ops`` will be renamed to ``vhost_device_ops``.
-  Correspondingly, some API's parameter need be changed. Few more functions also
-  need be reworked to let it be device aware. For example, different virtio device
-  has different feature set, meaning functions like ``rte_vhost_feature_disable``
-  need be changed. Last, file rte_virtio_net.h will be renamed to rte_vhost.h.
-
 * ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
   A pointer to a rte_cryptodev_config structure will be added to the
   function prototype ``cryptodev_configure_t``, as a new parameter.
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 8f06fc4..c053fff 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -165,6 +165,9 @@ API Changes
    * The vhost API ``rte_vhost_driver_session_start`` is removed. Instead,
      ``rte_vhost_driver_start`` should be used.
 
+   * The vhost public header file ``rte_virtio_net.h`` is renamed to
+     ``rte_vhost.h``
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index e6c0758..32e774b 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -40,7 +40,7 @@
 #include <rte_memcpy.h>
 #include <rte_vdev.h>
 #include <rte_kvargs.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_spinlock.h>
 
 #include "rte_eth_vhost.h"
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
index ea4bce4..39ca771 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -41,7 +41,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 /*
  * Event description.
diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c
index 24c62cd..cd6e3f1 100644
--- a/examples/tep_termination/main.c
+++ b/examples/tep_termination/main.c
@@ -49,7 +49,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 #include "main.h"
 #include "vxlan.h"
diff --git a/examples/tep_termination/vxlan_setup.c b/examples/tep_termination/vxlan_setup.c
index 8f1f15b..87de74d 100644
--- a/examples/tep_termination/vxlan_setup.c
+++ b/examples/tep_termination/vxlan_setup.c
@@ -49,7 +49,7 @@
 #include <rte_tcp.h>
 
 #include "main.h"
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 #include "vxlan.h"
 #include "vxlan_setup.h"
 
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 64b3eea..08b82f6 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -49,7 +49,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_ip.h>
 #include <rte_tcp.h>
 
diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 5cf4e93..4847069 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -51,7 +51,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c socket.c vhost.c vhost_user.c \
 				   virtio_net.c
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
 
 # dependencies
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_eal
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
new file mode 100644
index 0000000..d4ee210
--- /dev/null
+++ b/lib/librte_vhost/rte_vhost.h
@@ -0,0 +1,421 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_VHOST_H_
+#define _RTE_VHOST_H_
+
+/**
+ * @file
+ * Interface to vhost-user
+ */
+
+#include <stdint.h>
+#include <linux/vhost.h>
+#include <linux/virtio_ring.h>
+#include <sys/eventfd.h>
+
+#include <rte_memory.h>
+#include <rte_mempool.h>
+
+#define RTE_VHOST_USER_CLIENT		(1ULL << 0)
+#define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
+#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
+
+/**
+ * Information relating to memory regions including offsets to
+ * addresses in QEMUs memory file.
+ */
+struct rte_vhost_mem_region {
+	uint64_t guest_phys_addr;
+	uint64_t guest_user_addr;
+	uint64_t host_user_addr;
+	uint64_t size;
+	void	 *mmap_addr;
+	uint64_t mmap_size;
+	int fd;
+};
+
+/**
+ * Memory structure includes region and mapping information.
+ */
+struct rte_vhost_memory {
+	uint32_t nregions;
+	struct rte_vhost_mem_region regions[0];
+};
+
+struct rte_vhost_vring {
+	struct vring_desc	*desc;
+	struct vring_avail	*avail;
+	struct vring_used	*used;
+	uint64_t		log_guest_addr;
+
+	int			callfd;
+	int			kickfd;
+	uint16_t		size;
+};
+
+/**
+ * Device and vring operations.
+ */
+struct vhost_device_ops {
+	int (*new_device)(int vid);		/**< Add device. */
+	void (*destroy_device)(int vid);	/**< Remove device. */
+
+	int (*vring_state_changed)(int vid, uint16_t queue_id, int enable);	/**< triggered when a vring is enabled or disabled */
+
+	/**
+	 * Features could be changed after the feature negotiation.
+	 * For example, VHOST_F_LOG_ALL will be set/cleared at the
+	 * start/end of live migration, respectively. This callback
+	 * is used to inform the application on such change.
+	 */
+	int (*features_changed)(int vid, uint64_t features);
+
+	void *reserved[4]; /**< Reserved for future extension */
+};
+
+/**
+ * Convert guest physical address to host virtual address
+ *
+ * @param mem
+ *  the guest memory regions
+ * @param gpa
+ *  the guest physical address for querying
+ * @return
+ *  the host virtual address on success, 0 on failure
+ */
+static inline uint64_t __attribute__((always_inline))
+rte_vhost_gpa_to_vva(struct rte_vhost_memory *mem, uint64_t gpa)
+{
+	struct rte_vhost_mem_region *reg;
+	uint32_t i;
+
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		if (gpa >= reg->guest_phys_addr &&
+		    gpa <  reg->guest_phys_addr + reg->size) {
+			return gpa - reg->guest_phys_addr +
+			       reg->host_user_addr;
+		}
+	}
+
+	return 0;
+}
+
+#define RTE_VHOST_NEED_LOG(features)	((features) & (1ULL << VHOST_F_LOG_ALL))
+
+/**
+ * Log the memory write start with given address.
+ *
+ * This function only need be invoked when the live migration starts.
+ * Therefore, we won't need call it at all in the most of time. For
+ * making the performance impact be minimum, it's suggested to do a
+ * check before calling it:
+ *
+ *        if (unlikely(RTE_VHOST_NEED_LOG(features)))
+ *                rte_vhost_log_write(vid, addr, len);
+ *
+ * @param vid
+ *  vhost device ID
+ * @param addr
+ *  the starting address for write
+ * @param len
+ *  the length to write
+ */
+void rte_vhost_log_write(int vid, uint64_t addr, uint64_t len);
+
+/**
+ * Log the used ring update start at given offset.
+ *
+ * Same as rte_vhost_log_write, it's suggested to do a check before
+ * calling it:
+ *
+ *        if (unlikely(RTE_VHOST_NEED_LOG(features)))
+ *                rte_vhost_log_used_vring(vid, vring_idx, offset, len);
+ *
+ * @param vid
+ *  vhost device ID
+ * @param vring_idx
+ *  the vring index
+ * @param offset
+ *  the offset inside the used ring
+ * @param len
+ *  the length to write
+ */
+void rte_vhost_log_used_vring(int vid, uint16_t vring_idx,
+			      uint64_t offset, uint64_t len);
+
+int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable);
+
+/**
+ * Register vhost driver. path could be different for multiple
+ * instance support.
+ */
+int rte_vhost_driver_register(const char *path, uint64_t flags);
+
+/* Unregister vhost driver. This is only meaningful to vhost user. */
+int rte_vhost_driver_unregister(const char *path);
+
+/**
+ * Set the feature bits the vhost-user driver supports.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_set_features(const char *path, uint64_t features);
+
+/**
+ * Enable vhost-user driver features.
+ *
+ * Note that
+ * - the param @features should be a subset of the feature bits provided
+ *   by rte_vhost_driver_set_features().
+ * - it must be invoked before vhost-user negotiation starts.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param features
+ *  Features to enable
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_enable_features(const char *path, uint64_t features);
+
+/**
+ * Disable vhost-user driver features.
+ *
+ * The two notes at rte_vhost_driver_enable_features() also apply here.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @param features
+ *  Features to disable
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_disable_features(const char *path, uint64_t features);
+
+/**
+ * Get the final feature bits for feature negotiation.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @return
+ *  Feature bits on success, 0 on failure
+ */
+uint64_t rte_vhost_driver_get_features(const char *path);
+
+/**
+ * Get the feature bits after negotiation
+ *
+ * @param vid
+ *  Vhost device ID
+ * @return
+ *  Negotiated feature bits on success, 0 on failure
+ */
+uint64_t rte_vhost_get_negotiated_features(int vid);
+
+/* Register callbacks. */
+int rte_vhost_driver_callback_register(const char *path,
+	struct vhost_device_ops const * const ops);
+
+/**
+ *
+ * Start the vhost-user driver.
+ *
+ * This function triggers the vhost-user negotiation.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_start(const char *path);
+
+/**
+ * Get the MTU value of the device if set in QEMU.
+ *
+ * @param vid
+ *  virtio-net device ID
+ * @param mtu
+ *  The variable to store the MTU value
+ *
+ * @return
+ *  0: success
+ *  -EAGAIN: device not yet started
+ *  -ENOTSUP: device does not support MTU feature
+ */
+int rte_vhost_get_mtu(int vid, uint16_t *mtu);
+
+/**
+ * Get the numa node from which the virtio net device's memory
+ * is allocated.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The numa node, -1 on failure
+ */
+int rte_vhost_get_numa_node(int vid);
+
+/**
+ * @deprecated
+ * Get the number of queues the device supports.
+ *
+ * Note this function is deprecated, as it returns a queue pair number,
+ * which is vhost specific. Instead, rte_vhost_get_vring_num should
+ * be used.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The number of queues, 0 on failure
+ */
+__rte_deprecated
+uint32_t rte_vhost_get_queue_num(int vid);
+
+/**
+ * Get the number of vrings the device supports.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The number of vrings, 0 on failure
+ */
+uint16_t rte_vhost_get_vring_num(int vid);
+
+/**
+ * Get the virtio net device's ifname, which is the vhost-user socket
+ * file path.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param buf
+ *  The buffer to stored the queried ifname
+ * @param len
+ *  The length of buf
+ *
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_get_ifname(int vid, char *buf, size_t len);
+
+/**
+ * Get how many avail entries are left in the queue
+ *
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index
+ *
+ * @return
+ *  num of avail entires left
+ */
+uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id);
+
+/**
+ * This function adds buffers to the virtio devices RX virtqueue. Buffers can
+ * be received from the physical port or from another virtual device. A packet
+ * count is returned to indicate the number of packets that were succesfully
+ * added to the RX queue.
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index in mq case
+ * @param pkts
+ *  array to contain packets to be enqueued
+ * @param count
+ *  packets num to be enqueued
+ * @return
+ *  num of packets enqueued
+ */
+uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
+	struct rte_mbuf **pkts, uint16_t count);
+
+/**
+ * This function gets guest buffers from the virtio device TX virtqueue,
+ * construct host mbufs, copies guest buffer content to host mbufs and
+ * store them in pkts to be processed.
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index in mq case
+ * @param mbuf_pool
+ *  mbuf_pool where host mbuf is allocated.
+ * @param pkts
+ *  array to contain packets to be dequeued
+ * @param count
+ *  packets num to be dequeued
+ * @return
+ *  num of packets dequeued
+ */
+uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
+
+/**
+ * Get guest mem table: a list of memory regions.
+ *
+ * An rte_vhost_vhost_memory object will be allocated internaly, to hold the
+ * guest memory regions. Application should free it at destroy_device()
+ * callback.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param mem
+ *  To store the returned mem regions
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_get_mem_table(int vid, struct rte_vhost_memory **mem);
+
+/**
+ * Get guest vring info, including the vring address, vring size, etc.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param vring_idx
+ *  vring index
+ * @param vring
+ *  the structure to hold the requested vring info
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
+			      struct rte_vhost_vring *vring);
+
+#endif /* _RTE_VHOST_H_ */
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
deleted file mode 100644
index 627708d..0000000
--- a/lib/librte_vhost/rte_virtio_net.h
+++ /dev/null
@@ -1,421 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_NET_H_
-#define _VIRTIO_NET_H_
-
-/**
- * @file
- * Interface to vhost net
- */
-
-#include <stdint.h>
-#include <linux/vhost.h>
-#include <linux/virtio_ring.h>
-#include <sys/eventfd.h>
-
-#include <rte_memory.h>
-#include <rte_mempool.h>
-
-#define RTE_VHOST_USER_CLIENT		(1ULL << 0)
-#define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
-#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
-
-/**
- * Information relating to memory regions including offsets to
- * addresses in QEMUs memory file.
- */
-struct rte_vhost_mem_region {
-	uint64_t guest_phys_addr;
-	uint64_t guest_user_addr;
-	uint64_t host_user_addr;
-	uint64_t size;
-	void	 *mmap_addr;
-	uint64_t mmap_size;
-	int fd;
-};
-
-/**
- * Memory structure includes region and mapping information.
- */
-struct rte_vhost_memory {
-	uint32_t nregions;
-	struct rte_vhost_mem_region regions[0];
-};
-
-struct rte_vhost_vring {
-	struct vring_desc	*desc;
-	struct vring_avail	*avail;
-	struct vring_used	*used;
-	uint64_t		log_guest_addr;
-
-	int			callfd;
-	int			kickfd;
-	uint16_t		size;
-};
-
-/**
- * Device and vring operations.
- */
-struct vhost_device_ops {
-	int (*new_device)(int vid);		/**< Add device. */
-	void (*destroy_device)(int vid);	/**< Remove device. */
-
-	int (*vring_state_changed)(int vid, uint16_t queue_id, int enable);	/**< triggered when a vring is enabled or disabled */
-
-	/**
-	 * Features could be changed after the feature negotiation.
-	 * For example, VHOST_F_LOG_ALL will be set/cleared at the
-	 * start/end of live migration, respectively. This callback
-	 * is used to inform the application on such change.
-	 */
-	int (*features_changed)(int vid, uint64_t features);
-
-	void *reserved[4]; /**< Reserved for future extension */
-};
-
-/**
- * Convert guest physical address to host virtual address
- *
- * @param mem
- *  the guest memory regions
- * @param gpa
- *  the guest physical address for querying
- * @return
- *  the host virtual address on success, 0 on failure
- */
-static inline uint64_t __attribute__((always_inline))
-rte_vhost_gpa_to_vva(struct rte_vhost_memory *mem, uint64_t gpa)
-{
-	struct rte_vhost_mem_region *reg;
-	uint32_t i;
-
-	for (i = 0; i < mem->nregions; i++) {
-		reg = &mem->regions[i];
-		if (gpa >= reg->guest_phys_addr &&
-		    gpa <  reg->guest_phys_addr + reg->size) {
-			return gpa - reg->guest_phys_addr +
-			       reg->host_user_addr;
-		}
-	}
-
-	return 0;
-}
-
-#define RTE_VHOST_NEED_LOG(features)	((features) & (1ULL << VHOST_F_LOG_ALL))
-
-/**
- * Log the memory write start with given address.
- *
- * This function only need be invoked when the live migration starts.
- * Therefore, we won't need call it at all in the most of time. For
- * making the performance impact be minimum, it's suggested to do a
- * check before calling it:
- *
- *        if (unlikely(RTE_VHOST_NEED_LOG(features)))
- *                rte_vhost_log_write(vid, addr, len);
- *
- * @param vid
- *  vhost device ID
- * @param addr
- *  the starting address for write
- * @param len
- *  the length to write
- */
-void rte_vhost_log_write(int vid, uint64_t addr, uint64_t len);
-
-/**
- * Log the used ring update start at given offset.
- *
- * Same as rte_vhost_log_write, it's suggested to do a check before
- * calling it:
- *
- *        if (unlikely(RTE_VHOST_NEED_LOG(features)))
- *                rte_vhost_log_used_vring(vid, vring_idx, offset, len);
- *
- * @param vid
- *  vhost device ID
- * @param vring_idx
- *  the vring index
- * @param offset
- *  the offset inside the used ring
- * @param len
- *  the length to write
- */
-void rte_vhost_log_used_vring(int vid, uint16_t vring_idx,
-			      uint64_t offset, uint64_t len);
-
-int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable);
-
-/**
- * Register vhost driver. path could be different for multiple
- * instance support.
- */
-int rte_vhost_driver_register(const char *path, uint64_t flags);
-
-/* Unregister vhost driver. This is only meaningful to vhost user. */
-int rte_vhost_driver_unregister(const char *path);
-
-/**
- * Set the feature bits the vhost-user driver supports.
- *
- * @param path
- *  The vhost-user socket file path
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_driver_set_features(const char *path, uint64_t features);
-
-/**
- * Enable vhost-user driver features.
- *
- * Note that
- * - the param @features should be a subset of the feature bits provided
- *   by rte_vhost_driver_set_features().
- * - it must be invoked before vhost-user negotiation starts.
- *
- * @param path
- *  The vhost-user socket file path
- * @param features
- *  Features to enable
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_driver_enable_features(const char *path, uint64_t features);
-
-/**
- * Disable vhost-user driver features.
- *
- * The two notes at rte_vhost_driver_enable_features() also apply here.
- *
- * @param path
- *  The vhost-user socket file path
- * @param features
- *  Features to disable
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_driver_disable_features(const char *path, uint64_t features);
-
-/**
- * Get the final feature bits for feature negotiation.
- *
- * @param path
- *  The vhost-user socket file path
- * @return
- *  Feature bits on success, 0 on failure
- */
-uint64_t rte_vhost_driver_get_features(const char *path);
-
-/**
- * Get the feature bits after negotiation
- *
- * @param vid
- *  Vhost device ID
- * @return
- *  Negotiated feature bits on success, 0 on failure
- */
-uint64_t rte_vhost_get_negotiated_features(int vid);
-
-/* Register callbacks. */
-int rte_vhost_driver_callback_register(const char *path,
-	struct vhost_device_ops const * const ops);
-
-/**
- *
- * Start the vhost-user driver.
- *
- * This function triggers the vhost-user negotiation.
- *
- * @param path
- *  The vhost-user socket file path
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_driver_start(const char *path);
-
-/**
- * Get the MTU value of the device if set in QEMU.
- *
- * @param vid
- *  virtio-net device ID
- * @param mtu
- *  The variable to store the MTU value
- *
- * @return
- *  0: success
- *  -EAGAIN: device not yet started
- *  -ENOTSUP: device does not support MTU feature
- */
-int rte_vhost_get_mtu(int vid, uint16_t *mtu);
-
-/**
- * Get the numa node from which the virtio net device's memory
- * is allocated.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The numa node, -1 on failure
- */
-int rte_vhost_get_numa_node(int vid);
-
-/**
- * @deprecated
- * Get the number of queues the device supports.
- *
- * Note this function is deprecated, as it returns a queue pair number,
- * which is vhost specific. Instead, rte_vhost_get_vring_num should
- * be used.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The number of queues, 0 on failure
- */
-__rte_deprecated
-uint32_t rte_vhost_get_queue_num(int vid);
-
-/**
- * Get the number of vrings the device supports.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The number of vrings, 0 on failure
- */
-uint16_t rte_vhost_get_vring_num(int vid);
-
-/**
- * Get the virtio net device's ifname, which is the vhost-user socket
- * file path.
- *
- * @param vid
- *  vhost device ID
- * @param buf
- *  The buffer to stored the queried ifname
- * @param len
- *  The length of buf
- *
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_get_ifname(int vid, char *buf, size_t len);
-
-/**
- * Get how many avail entries are left in the queue
- *
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index
- *
- * @return
- *  num of avail entires left
- */
-uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id);
-
-/**
- * This function adds buffers to the virtio devices RX virtqueue. Buffers can
- * be received from the physical port or from another virtual device. A packet
- * count is returned to indicate the number of packets that were succesfully
- * added to the RX queue.
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index in mq case
- * @param pkts
- *  array to contain packets to be enqueued
- * @param count
- *  packets num to be enqueued
- * @return
- *  num of packets enqueued
- */
-uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
-	struct rte_mbuf **pkts, uint16_t count);
-
-/**
- * This function gets guest buffers from the virtio device TX virtqueue,
- * construct host mbufs, copies guest buffer content to host mbufs and
- * store them in pkts to be processed.
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index in mq case
- * @param mbuf_pool
- *  mbuf_pool where host mbuf is allocated.
- * @param pkts
- *  array to contain packets to be dequeued
- * @param count
- *  packets num to be dequeued
- * @return
- *  num of packets dequeued
- */
-uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
-	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
-
-/**
- * Get guest mem table: a list of memory regions.
- *
- * An rte_vhost_vhost_memory object will be allocated internaly, to hold the
- * guest memory regions. Application should free it at destroy_device()
- * callback.
- *
- * @param vid
- *  vhost device ID
- * @param mem
- *  To store the returned mem regions
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_get_mem_table(int vid, struct rte_vhost_memory **mem);
-
-/**
- * Get guest vring info, including the vring address, vring size, etc.
- *
- * @param vid
- *  vhost device ID
- * @param vring_idx
- *  vring index
- * @param vring
- *  the structure to hold the requested vring info
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
-			      struct rte_vhost_vring *vring);
-
-#endif /* _VIRTIO_NET_H_ */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 8be5b6a..3105a47 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -45,7 +45,7 @@
 #include <rte_string_fns.h>
 #include <rte_memory.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 #include "vhost.h"
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index a199ee6..ddd8a9c 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -46,7 +46,7 @@
 #include <rte_log.h>
 #include <rte_ether.h>
 
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 
 /* Used to indicate that the device is running on a data core */
 #define VIRTIO_DEV_RUNNING 1
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 838dec8..2ba22db 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -37,7 +37,7 @@
 #include <stdint.h>
 #include <linux/vhost.h>
 
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 
 /* refer to hw/virtio/vhost-user.c */
 
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 7ae7904..1004ae6 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -39,7 +39,7 @@
 #include <rte_memcpy.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_tcp.h>
 #include <rte_udp.h>
 #include <rte_sctp.h>
-- 
1.9.0

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v2 18/22] vhost: introduce API to start a specific driver
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
                     ` (5 preceding siblings ...)
  2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 14/22] vhost: rename device ops struct Yuanhan Liu
@ 2017-03-23  7:10  3%   ` Yuanhan Liu
  2017-03-23  7:10  5%   ` [dpdk-dev] [PATCH v2 19/22] vhost: rename header file Yuanhan Liu
  7 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

We used to use rte_vhost_driver_session_start() to trigger the vhost-user
session. It takes no argument, thus it's a global trigger. And it could
be problematic.

The issue is, currently, rte_vhost_driver_register(path, flags) actually
tries to put it into the session loop (by fdset_add). However, it needs
a set of APIs to set a vhost-user driver properly:
  * rte_vhost_driver_register(path, flags);
  * rte_vhost_driver_set_features(path, features);
  * rte_vhost_driver_callback_register(path, vhost_device_ops);

If a new vhost-user driver is registered after the trigger (think OVS-DPDK
that could add a port dynamically from cmdline), the current code will
effectively starts the session for the new driver just after the first
API rte_vhost_driver_register() is invoked, leaving later calls taking
no effect at all.

To handle the case properly, this patch introduce a new API,
rte_vhost_driver_start(path), to trigger a specific vhost-user driver.
To do that, the rte_vhost_driver_register(path, flags) is simplified
to create the socket only and let rte_vhost_driver_start(path) to
actually put it into the session loop.

Meanwhile, the rte_vhost_driver_session_start is removed: we could hide
the session thread internally (create the thread if it has not been
created). This would also simplify the application.

NOTE: the API order in prog guide is slightly adjusted for showing the
correct invoke order.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 doc/guides/prog_guide/vhost_lib.rst    | 24 +++++------
 doc/guides/rel_notes/release_17_05.rst |  8 ++++
 drivers/net/vhost/rte_eth_vhost.c      | 50 ++-------------------
 examples/tep_termination/main.c        |  8 +++-
 examples/vhost/main.c                  |  9 +++-
 lib/librte_vhost/fd_man.c              |  9 ++--
 lib/librte_vhost/fd_man.h              |  2 +-
 lib/librte_vhost/rte_vhost_version.map |  2 +-
 lib/librte_vhost/rte_virtio_net.h      | 15 ++++++-
 lib/librte_vhost/socket.c              | 79 +++++++++++++++++++---------------
 10 files changed, 104 insertions(+), 102 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index a4fb1f1..5979290 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -116,12 +116,6 @@ The following is an overview of some key Vhost API functions:
   vhost-user driver could be vhost-user net, yet it could be something else,
   say, vhost-user SCSI.
 
-* ``rte_vhost_driver_session_start()``
-
-  This function starts the vhost session loop to handle vhost messages. It
-  starts an infinite loop, therefore it should be called in a dedicated
-  thread.
-
 * ``rte_vhost_driver_callback_register(path, vhost_device_ops)``
 
   This function registers a set of callbacks, to let DPDK applications take
@@ -149,6 +143,17 @@ The following is an overview of some key Vhost API functions:
     ``VHOST_F_LOG_ALL`` will be set/cleared at the start/end of live
     migration, respectively.
 
+* ``rte_vhost_driver_disable/enable_features(path, features))``
+
+  This function disables/enables some features. For example, it can be used to
+  disable mergeable buffers and TSO features, which both are enabled by
+  default.
+
+* ``rte_vhost_driver_start(path)``
+
+  This function triggers the vhost-user negotiation. It should be invoked at
+  the end of initializing a vhost-user driver.
+
 * ``rte_vhost_enqueue_burst(vid, queue_id, pkts, count)``
 
   Transmits (enqueues) ``count`` packets from host to guest.
@@ -157,13 +162,6 @@ The following is an overview of some key Vhost API functions:
 
   Receives (dequeues) ``count`` packets from guest, and stored them at ``pkts``.
 
-* ``rte_vhost_driver_disable/enable_features(path, features))``
-
-  This function disables/enables some features. For example, it can be used to
-  disable mergeable buffers and TSO features, which both are enabled by
-  default.
-
-
 Vhost-user Implementations
 --------------------------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 2efe292..8f06fc4 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -57,6 +57,11 @@ New Features
   * Enable Vhost PMD's MTU get feature.
   * Get max MTU value from host in Virtio PMD
 
+* **Made the vhost lib be a generic vhost-user lib.**
+
+  Now it could be used to implement any other vhost-user drivers, such
+  as, vhost-user SCSI.
+
 
 Resolved Issues
 ---------------
@@ -157,6 +162,9 @@ API Changes
    * The vhost struct ``virtio_net_device_ops`` is renamed to
      ``vhost_device_ops``
 
+   * The vhost API ``rte_vhost_driver_session_start`` is removed. Instead,
+     ``rte_vhost_driver_start`` should be used.
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 97a765f..e6c0758 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -127,9 +127,6 @@ struct internal_list {
 
 static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
 
-static rte_atomic16_t nb_started_ports;
-static pthread_t session_th;
-
 static struct rte_eth_link pmd_link = {
 		.link_speed = 10000,
 		.link_duplex = ETH_LINK_FULL_DUPLEX,
@@ -743,42 +740,6 @@ struct vhost_xstats_name_off {
 	return vid;
 }
 
-static void *
-vhost_driver_session(void *param __rte_unused)
-{
-	/* start event handling */
-	rte_vhost_driver_session_start();
-
-	return NULL;
-}
-
-static int
-vhost_driver_session_start(void)
-{
-	int ret;
-
-	ret = pthread_create(&session_th,
-			NULL, vhost_driver_session, NULL);
-	if (ret)
-		RTE_LOG(ERR, PMD, "Can't create a thread\n");
-
-	return ret;
-}
-
-static void
-vhost_driver_session_stop(void)
-{
-	int ret;
-
-	ret = pthread_cancel(session_th);
-	if (ret)
-		RTE_LOG(ERR, PMD, "Can't cancel the thread\n");
-
-	ret = pthread_join(session_th, NULL);
-	if (ret)
-		RTE_LOG(ERR, PMD, "Can't join the thread\n");
-}
-
 static int
 eth_dev_start(struct rte_eth_dev *dev)
 {
@@ -1083,10 +1044,10 @@ struct vhost_xstats_name_off {
 		goto error;
 	}
 
-	/* We need only one message handling thread */
-	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1) {
-		if (vhost_driver_session_start())
-			goto error;
+	if (rte_vhost_driver_start(iface_name) < 0) {
+		RTE_LOG(ERR, PMD, "Failed to start driver for %s\n",
+			iface_name);
+		goto error;
 	}
 
 	return data->port_id;
@@ -1213,9 +1174,6 @@ struct vhost_xstats_name_off {
 
 	eth_dev_close(eth_dev);
 
-	if (rte_atomic16_sub_return(&nb_started_ports, 1) == 0)
-		vhost_driver_session_stop();
-
 	rte_free(vring_states[eth_dev->data->port_id]);
 	vring_states[eth_dev->data->port_id] = NULL;
 
diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c
index 738f2d2..24c62cd 100644
--- a/examples/tep_termination/main.c
+++ b/examples/tep_termination/main.c
@@ -1263,7 +1263,13 @@ static inline void __attribute__((always_inline))
 			"failed to register vhost driver callbacks.\n");
 	}
 
-	rte_vhost_driver_session_start();
+	if (rte_vhost_driver_start(dev_basename) < 0) {
+		rte_exit(EXIT_FAILURE,
+			"failed to start vhost driver.\n");
+	}
+
+	RTE_LCORE_FOREACH_SLAVE(lcore_id)
+		rte_eal_wait_lcore(lcore_id);
 
 	return 0;
 }
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 4395306..64b3eea 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1545,9 +1545,16 @@ static inline void __attribute__((always_inline))
 			rte_exit(EXIT_FAILURE,
 				"failed to register vhost driver callbacks.\n");
 		}
+
+		if (rte_vhost_driver_start(file) < 0) {
+			rte_exit(EXIT_FAILURE,
+				"failed to start vhost driver.\n");
+		}
 	}
 
-	rte_vhost_driver_session_start();
+	RTE_LCORE_FOREACH_SLAVE(lcore_id)
+		rte_eal_wait_lcore(lcore_id);
+
 	return 0;
 
 }
diff --git a/lib/librte_vhost/fd_man.c b/lib/librte_vhost/fd_man.c
index c7a4490..2ceacc9 100644
--- a/lib/librte_vhost/fd_man.c
+++ b/lib/librte_vhost/fd_man.c
@@ -210,8 +210,8 @@
  * will wait until the flag is reset to zero(which indicates the callback is
  * finished), then it could free the context after fdset_del.
  */
-void
-fdset_event_dispatch(struct fdset *pfdset)
+void *
+fdset_event_dispatch(void *arg)
 {
 	int i;
 	struct pollfd *pfd;
@@ -221,9 +221,10 @@
 	int fd, numfds;
 	int remove1, remove2;
 	int need_shrink;
+	struct fdset *pfdset = arg;
 
 	if (pfdset == NULL)
-		return;
+		return NULL;
 
 	while (1) {
 
@@ -294,4 +295,6 @@
 		if (need_shrink)
 			fdset_shrink(pfdset);
 	}
+
+	return NULL;
 }
diff --git a/lib/librte_vhost/fd_man.h b/lib/librte_vhost/fd_man.h
index d319cac..90d34db 100644
--- a/lib/librte_vhost/fd_man.h
+++ b/lib/librte_vhost/fd_man.h
@@ -64,6 +64,6 @@ int fdset_add(struct fdset *pfdset, int fd,
 
 void *fdset_del(struct fdset *pfdset, int fd);
 
-void fdset_event_dispatch(struct fdset *pfdset);
+void *fdset_event_dispatch(void *arg);
 
 #endif
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 70c28f7..4395fa5 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -4,7 +4,6 @@ DPDK_2.0 {
 	rte_vhost_dequeue_burst;
 	rte_vhost_driver_callback_register;
 	rte_vhost_driver_register;
-	rte_vhost_driver_session_start;
 	rte_vhost_enable_guest_notification;
 	rte_vhost_enqueue_burst;
 
@@ -35,6 +34,7 @@ DPDK_17.05 {
 	rte_vhost_driver_enable_features;
 	rte_vhost_driver_get_features;
 	rte_vhost_driver_set_features;
+	rte_vhost_driver_start;
 	rte_vhost_get_mem_table;
 	rte_vhost_get_mtu;
 	rte_vhost_get_negotiated_features
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 11b204d..627708d 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -250,8 +250,19 @@ void rte_vhost_log_used_vring(int vid, uint16_t vring_idx,
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(const char *path,
 	struct vhost_device_ops const * const ops);
-/* Start vhost driver session blocking loop. */
-int rte_vhost_driver_session_start(void);
+
+/**
+ *
+ * Start the vhost-user driver.
+ *
+ * This function triggers the vhost-user negotiation.
+ *
+ * @param path
+ *  The vhost-user socket file path
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_driver_start(const char *path);
 
 /**
  * Get the MTU value of the device if set in QEMU.
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 31b868d..b056a17 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -58,8 +58,9 @@
  */
 struct vhost_user_socket {
 	char *path;
-	int listenfd;
 	int connfd;
+	struct sockaddr_un un;
+	int socket_fd;
 	bool is_server;
 	bool reconnect;
 	bool dequeue_zero_copy;
@@ -94,7 +95,7 @@ struct vhost_user {
 
 static void vhost_user_server_new_connection(int fd, void *data, int *remove);
 static void vhost_user_read_cb(int fd, void *dat, int *remove);
-static int vhost_user_create_client(struct vhost_user_socket *vsocket);
+static int vhost_user_start_client(struct vhost_user_socket *vsocket);
 
 static struct vhost_user vhost_user = {
 	.fdset = {
@@ -266,22 +267,23 @@ struct vhost_user {
 		free(conn);
 
 		if (vsocket->reconnect)
-			vhost_user_create_client(vsocket);
+			vhost_user_start_client(vsocket);
 	}
 }
 
 static int
-create_unix_socket(const char *path, struct sockaddr_un *un, bool is_server)
+create_unix_socket(struct vhost_user_socket *vsocket)
 {
 	int fd;
+	struct sockaddr_un *un = &vsocket->un;
 
 	fd = socket(AF_UNIX, SOCK_STREAM, 0);
 	if (fd < 0)
 		return -1;
 	RTE_LOG(INFO, VHOST_CONFIG, "vhost-user %s: socket created, fd: %d\n",
-		is_server ? "server" : "client", fd);
+		vsocket->is_server ? "server" : "client", fd);
 
-	if (!is_server && fcntl(fd, F_SETFL, O_NONBLOCK)) {
+	if (!vsocket->is_server && fcntl(fd, F_SETFL, O_NONBLOCK)) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"vhost-user: can't set nonblocking mode for socket, fd: "
 			"%d (%s)\n", fd, strerror(errno));
@@ -291,25 +293,21 @@ struct vhost_user {
 
 	memset(un, 0, sizeof(*un));
 	un->sun_family = AF_UNIX;
-	strncpy(un->sun_path, path, sizeof(un->sun_path));
+	strncpy(un->sun_path, vsocket->path, sizeof(un->sun_path));
 	un->sun_path[sizeof(un->sun_path) - 1] = '\0';
 
-	return fd;
+	vsocket->socket_fd = fd;
+	return 0;
 }
 
 static int
-vhost_user_create_server(struct vhost_user_socket *vsocket)
+vhost_user_start_server(struct vhost_user_socket *vsocket)
 {
-	int fd;
 	int ret;
-	struct sockaddr_un un;
+	int fd = vsocket->socket_fd;
 	const char *path = vsocket->path;
 
-	fd = create_unix_socket(path, &un, vsocket->is_server);
-	if (fd < 0)
-		return -1;
-
-	ret = bind(fd, (struct sockaddr *)&un, sizeof(un));
+	ret = bind(fd, (struct sockaddr *)&vsocket->un, sizeof(vsocket->un));
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"failed to bind to %s: %s; remove it and try again\n",
@@ -322,7 +320,6 @@ struct vhost_user {
 	if (ret < 0)
 		goto err;
 
-	vsocket->listenfd = fd;
 	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
 		  NULL, vsocket);
 	if (ret < 0) {
@@ -441,20 +438,15 @@ struct vhost_user_reconnect_list {
 }
 
 static int
-vhost_user_create_client(struct vhost_user_socket *vsocket)
+vhost_user_start_client(struct vhost_user_socket *vsocket)
 {
-	int fd;
 	int ret;
-	struct sockaddr_un un;
+	int fd = vsocket->socket_fd;
 	const char *path = vsocket->path;
 	struct vhost_user_reconnect *reconn;
 
-	fd = create_unix_socket(path, &un, vsocket->is_server);
-	if (fd < 0)
-		return -1;
-
-	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&un,
-					  sizeof(un));
+	ret = vhost_user_connect_nonblock(fd, (struct sockaddr *)&vsocket->un,
+					  sizeof(vsocket->un));
 	if (ret == 0) {
 		vhost_user_add_connection(fd, vsocket);
 		return 0;
@@ -477,7 +469,7 @@ struct vhost_user_reconnect_list {
 		close(fd);
 		return -1;
 	}
-	reconn->un = un;
+	reconn->un = vsocket->un;
 	reconn->fd = fd;
 	reconn->vsocket = vsocket;
 	pthread_mutex_lock(&reconn_list.mutex);
@@ -627,11 +619,10 @@ struct vhost_user_reconnect_list {
 				goto out;
 			}
 		}
-		ret = vhost_user_create_client(vsocket);
 	} else {
 		vsocket->is_server = true;
-		ret = vhost_user_create_server(vsocket);
 	}
+	ret = create_unix_socket(vsocket);
 	if (ret < 0) {
 		free(vsocket->path);
 		free(vsocket);
@@ -687,8 +678,8 @@ struct vhost_user_reconnect_list {
 
 		if (!strcmp(vsocket->path, path)) {
 			if (vsocket->is_server) {
-				fdset_del(&vhost_user.fdset, vsocket->listenfd);
-				close(vsocket->listenfd);
+				fdset_del(&vhost_user.fdset, vsocket->socket_fd);
+				close(vsocket->socket_fd);
 				unlink(path);
 			} else if (vsocket->reconnect) {
 				vhost_user_remove_reconnect(vsocket);
@@ -751,8 +742,28 @@ struct vhost_device_ops const *
 }
 
 int
-rte_vhost_driver_session_start(void)
+rte_vhost_driver_start(const char *path)
 {
-	fdset_event_dispatch(&vhost_user.fdset);
-	return 0;
+	struct vhost_user_socket *vsocket;
+	static pthread_t fdset_tid;
+
+	pthread_mutex_lock(&vhost_user.mutex);
+	vsocket = find_vhost_user_socket(path);
+	pthread_mutex_unlock(&vhost_user.mutex);
+
+	if (!vsocket)
+		return -1;
+
+	if (fdset_tid == 0) {
+		int ret = pthread_create(&fdset_tid, NULL, fdset_event_dispatch,
+				     &vhost_user.fdset);
+		if (ret < 0)
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to create fdset handling thread");
+	}
+
+	if (vsocket->is_server)
+		return vhost_user_start_server(vsocket);
+	else
+		return vhost_user_start_client(vsocket);
 }
-- 
1.9.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 14/22] vhost: rename device ops struct
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
                     ` (4 preceding siblings ...)
  2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 13/22] vhost: do not include net specific headers Yuanhan Liu
@ 2017-03-23  7:10  4%   ` Yuanhan Liu
  2017-03-23  7:10  3%   ` [dpdk-dev] [PATCH v2 18/22] vhost: introduce API to start a specific driver Yuanhan Liu
  2017-03-23  7:10  5%   ` [dpdk-dev] [PATCH v2 19/22] vhost: rename header file Yuanhan Liu
  7 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

rename "virtio_net_device_ops" to "vhost_device_ops", to not let it
be virtio-net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/prog_guide/vhost_lib.rst    | 2 +-
 doc/guides/rel_notes/release_17_05.rst | 3 +++
 drivers/net/vhost/rte_eth_vhost.c      | 2 +-
 examples/tep_termination/main.c        | 2 +-
 examples/vhost/main.c                  | 2 +-
 lib/librte_vhost/Makefile              | 2 +-
 lib/librte_vhost/rte_virtio_net.h      | 4 ++--
 lib/librte_vhost/socket.c              | 6 +++---
 lib/librte_vhost/vhost.h               | 4 ++--
 9 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 40f3b3b..e6e34f3 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -122,7 +122,7 @@ The following is an overview of some key Vhost API functions:
   starts an infinite loop, therefore it should be called in a dedicated
   thread.
 
-* ``rte_vhost_driver_callback_register(path, virtio_net_device_ops)``
+* ``rte_vhost_driver_callback_register(path, vhost_device_ops)``
 
   This function registers a set of callbacks, to let DPDK applications take
   the appropriate action when some events happen. The following events are
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 2b56e80..2efe292 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -154,6 +154,9 @@ API Changes
      * ``linux/if.h``
      * ``rte_ether.h``
 
+   * The vhost struct ``virtio_net_device_ops`` is renamed to
+     ``vhost_device_ops``
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 891ee70..97a765f 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -671,7 +671,7 @@ struct vhost_xstats_name_off {
 	return 0;
 }
 
-static struct virtio_net_device_ops vhost_ops = {
+static struct vhost_device_ops vhost_ops = {
 	.new_device          = new_device,
 	.destroy_device      = destroy_device,
 	.vring_state_changed = vring_state_changed,
diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c
index 18b977e..738f2d2 100644
--- a/examples/tep_termination/main.c
+++ b/examples/tep_termination/main.c
@@ -1081,7 +1081,7 @@ static inline void __attribute__((always_inline))
  * These callback allow devices to be added to the data core when configuration
  * has been fully complete.
  */
-static const struct virtio_net_device_ops virtio_net_device_ops = {
+static const struct vhost_device_ops virtio_net_device_ops = {
 	.new_device =  new_device,
 	.destroy_device = destroy_device,
 };
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 72a9d69..4395306 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1270,7 +1270,7 @@ static inline void __attribute__((always_inline))
  * These callback allow devices to be added to the data core when configuration
  * has been fully complete.
  */
-static const struct virtio_net_device_ops virtio_net_device_ops =
+static const struct vhost_device_ops virtio_net_device_ops =
 {
 	.new_device =  new_device,
 	.destroy_device = destroy_device,
diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 415ffc6..5cf4e93 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -36,7 +36,7 @@ LIB = librte_vhost.a
 
 EXPORT_MAP := rte_vhost_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -D_FILE_OFFSET_BITS=64
 CFLAGS += -I vhost_user
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 0063949..26ac35f 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -87,7 +87,7 @@ struct rte_vhost_vring {
 /**
  * Device and vring operations.
  */
-struct virtio_net_device_ops {
+struct vhost_device_ops {
 	int (*new_device)(int vid);		/**< Add device. */
 	void (*destroy_device)(int vid);	/**< Remove device. */
 
@@ -198,7 +198,7 @@ static inline uint64_t __attribute__((always_inline))
 
 /* Register callbacks. */
 int rte_vhost_driver_callback_register(const char *path,
-	struct virtio_net_device_ops const * const ops);
+	struct vhost_device_ops const * const ops);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index 8431511..31b868d 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -74,7 +74,7 @@ struct vhost_user_socket {
 	uint64_t supported_features;
 	uint64_t features;
 
-	struct virtio_net_device_ops const *notify_ops;
+	struct vhost_device_ops const *notify_ops;
 };
 
 struct vhost_user_connection {
@@ -725,7 +725,7 @@ struct vhost_user_reconnect_list {
  */
 int
 rte_vhost_driver_callback_register(const char *path,
-	struct virtio_net_device_ops const * const ops)
+	struct vhost_device_ops const * const ops)
 {
 	struct vhost_user_socket *vsocket;
 
@@ -738,7 +738,7 @@ struct vhost_user_reconnect_list {
 	return vsocket ? 0 : -1;
 }
 
-struct virtio_net_device_ops const *
+struct vhost_device_ops const *
 vhost_driver_callback_get(const char *path)
 {
 	struct vhost_user_socket *vsocket;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 672098b..225ff2e 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -191,7 +191,7 @@ struct virtio_net {
 	struct ether_addr	mac;
 	uint16_t		mtu;
 
-	struct virtio_net_device_ops const *notify_ops;
+	struct vhost_device_ops const *notify_ops;
 
 	uint32_t		nr_guest_pages;
 	uint32_t		max_guest_pages;
@@ -265,7 +265,7 @@ static inline phys_addr_t __attribute__((always_inline))
 void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
 void vhost_enable_dequeue_zero_copy(int vid);
 
-struct virtio_net_device_ops const *vhost_driver_callback_get(const char *path);
+struct vhost_device_ops const *vhost_driver_callback_get(const char *path);
 
 /*
  * Backend-specific cleanup.
-- 
1.9.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 13/22] vhost: do not include net specific headers
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
                     ` (3 preceding siblings ...)
  2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 12/22] vhost: drop the Rx and Tx queue macro Yuanhan Liu
@ 2017-03-23  7:10  4%   ` Yuanhan Liu
  2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 14/22] vhost: rename device ops struct Yuanhan Liu
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

Include it internally, at vhost.h.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---

v2: - update release note
---
 doc/guides/rel_notes/release_17_05.rst | 7 +++++++
 examples/vhost/main.h                  | 2 ++
 lib/librte_vhost/rte_virtio_net.h      | 4 ----
 lib/librte_vhost/vhost.h               | 4 ++++
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 55bf136..2b56e80 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -147,6 +147,13 @@ API Changes
      * ``VIRTIO_TXQ``
      * ``VIRTIO_QNUM``
 
+   * Few net specific header files are removed in ``rte_virtio_net.h``
+
+     * ``linux/virtio_net.h``
+     * ``sys/socket.h``
+     * ``linux/if.h``
+     * ``rte_ether.h``
+
 
 ABI Changes
 -----------
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index 7a3d251..ddcd858 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -36,6 +36,8 @@
 
 #include <sys/queue.h>
 
+#include <rte_ether.h>
+
 /* Macros for printing using RTE_LOG */
 #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
 #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER2
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 1ae1920..0063949 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -42,14 +42,10 @@
 #include <stdint.h>
 #include <linux/vhost.h>
 #include <linux/virtio_ring.h>
-#include <linux/virtio_net.h>
 #include <sys/eventfd.h>
-#include <sys/socket.h>
-#include <linux/if.h>
 
 #include <rte_memory.h>
 #include <rte_mempool.h>
-#include <rte_ether.h>
 
 #define RTE_VHOST_USER_CLIENT		(1ULL << 0)
 #define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 84e379a..672098b 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -39,8 +39,12 @@
 #include <sys/queue.h>
 #include <unistd.h>
 #include <linux/vhost.h>
+#include <linux/virtio_net.h>
+#include <sys/socket.h>
+#include <linux/if.h>
 
 #include <rte_log.h>
+#include <rte_ether.h>
 
 #include "rte_virtio_net.h"
 
-- 
1.9.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 12/22] vhost: drop the Rx and Tx queue macro
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
                     ` (2 preceding siblings ...)
  2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 10/22] vhost: export the number of vrings Yuanhan Liu
@ 2017-03-23  7:10  4%   ` Yuanhan Liu
  2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 13/22] vhost: do not include net specific headers Yuanhan Liu
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

They are virtio-net specific and should be defined inside the virtio-net
driver.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---

v2: - update release note
---
 doc/guides/rel_notes/release_17_05.rst | 6 ++++++
 drivers/net/vhost/rte_eth_vhost.c      | 2 ++
 examples/tep_termination/main.h        | 2 ++
 examples/vhost/main.h                  | 2 ++
 lib/librte_vhost/rte_virtio_net.h      | 3 ---
 5 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index eca9451..55bf136 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -141,6 +141,12 @@ API Changes
    * The vhost API ``rte_vhost_get_queue_num`` is deprecated, instead,
      ``rte_vhost_get_vring_num`` should be used.
 
+   * Few macros are removed in ``rte_virtio_net.h``
+
+     * ``VIRTIO_RXQ``
+     * ``VIRTIO_TXQ``
+     * ``VIRTIO_QNUM``
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index dc583e4..891ee70 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -45,6 +45,8 @@
 
 #include "rte_eth_vhost.h"
 
+enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
+
 #define ETH_VHOST_IFACE_ARG		"iface"
 #define ETH_VHOST_QUEUES_ARG		"queues"
 #define ETH_VHOST_CLIENT_ARG		"client"
diff --git a/examples/tep_termination/main.h b/examples/tep_termination/main.h
index c0ea766..8ed817d 100644
--- a/examples/tep_termination/main.h
+++ b/examples/tep_termination/main.h
@@ -54,6 +54,8 @@
 /* Max number of devices. Limited by the application. */
 #define MAX_DEVICES 64
 
+enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
+
 /* Per-device statistics struct */
 struct device_statistics {
 	uint64_t tx_total;
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index 6bb42e8..7a3d251 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -41,6 +41,8 @@
 #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER2
 #define RTE_LOGTYPE_VHOST_PORT   RTE_LOGTYPE_USER3
 
+enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
+
 struct device_statistics {
 	uint64_t	tx;
 	uint64_t	tx_total;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index f700d2f..1ae1920 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -55,9 +55,6 @@
 #define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
 #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
 
-/* Enum for virtqueue management. */
-enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
-
 /**
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
-- 
1.9.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 10/22] vhost: export the number of vrings
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
  2017-03-23  7:10  5%   ` [dpdk-dev] [PATCH v2 02/22] net/vhost: remove feature related APIs Yuanhan Liu
  2017-03-23  7:10  3%   ` [dpdk-dev] [PATCH v2 04/22] vhost: make notify ops per vhost driver Yuanhan Liu
@ 2017-03-23  7:10  4%   ` Yuanhan Liu
  2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 12/22] vhost: drop the Rx and Tx queue macro Yuanhan Liu
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

We used to use rte_vhost_get_queue_num() for telling how many vrings.
However, the return value is the number of "queue pairs", which is
very virtio-net specific. To make it generic, we should return the
number of vrings instead, and let the driver do the proper translation.
Say, virtio-net driver could turn it to the number of queue pairs by
dividing 2.

Meanwhile, mark rte_vhost_get_queue_num as deprecated.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
v2: - update release note
---
 doc/guides/rel_notes/release_17_05.rst |  3 +++
 drivers/net/vhost/rte_eth_vhost.c      |  2 +-
 lib/librte_vhost/rte_vhost_version.map |  1 +
 lib/librte_vhost/rte_virtio_net.h      | 17 +++++++++++++++++
 lib/librte_vhost/vhost.c               | 11 +++++++++++
 5 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index dfa636d..eca9451 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -138,6 +138,9 @@ API Changes
    * The vhost API ``rte_vhost_driver_callback_register(ops)`` takes one
      more argument: ``rte_vhost_driver_callback_register(path, ops)``.
 
+   * The vhost API ``rte_vhost_get_queue_num`` is deprecated, instead,
+     ``rte_vhost_get_vring_num`` should be used.
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index f6ad616..dc583e4 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -569,7 +569,7 @@ struct vhost_xstats_name_off {
 		vq->port = eth_dev->data->port_id;
 	}
 
-	for (i = 0; i < rte_vhost_get_queue_num(vid) * VIRTIO_QNUM; i++)
+	for (i = 0; i < rte_vhost_get_vring_num(vid); i++)
 		rte_vhost_enable_guest_notification(vid, i, 0);
 
 	rte_vhost_get_mtu(vid, &eth_dev->data->mtu);
diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
index 7df7af6..ff62c39 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -40,6 +40,7 @@ DPDK_17.05 {
 	rte_vhost_get_negotiated_features
 	rte_vhost_get_vhost_memory;
 	rte_vhost_get_vhost_vring;
+	rte_vhost_get_vring_num;
 	rte_vhost_gpa_to_vva;
 
 } DPDK_16.07;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 36674bb..f700d2f 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -237,17 +237,34 @@ int rte_vhost_driver_callback_register(const char *path,
 int rte_vhost_get_numa_node(int vid);
 
 /**
+ * @deprecated
  * Get the number of queues the device supports.
  *
+ * Note this function is deprecated, as it returns a queue pair number,
+ * which is virtio-net specific. Instead, rte_vhost_get_vring_num should
+ * be used.
+ *
  * @param vid
  *  virtio-net device ID
  *
  * @return
  *  The number of queues, 0 on failure
  */
+__rte_deprecated
 uint32_t rte_vhost_get_queue_num(int vid);
 
 /**
+ * Get the number of vrings the device supports.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The number of vrings, 0 on failure
+ */
+uint16_t rte_vhost_get_vring_num(int vid);
+
+/**
  * Get the virtio net device's ifname, which is the vhost-user socket
  * file path.
  *
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 70477c6..74ae3b2 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -317,6 +317,17 @@ struct virtio_net *
 	return dev->nr_vring / 2;
 }
 
+uint16_t
+rte_vhost_get_vring_num(int vid)
+{
+	struct virtio_net *dev = get_device(vid);
+
+	if (dev == NULL)
+		return 0;
+
+	return dev->nr_vring;
+}
+
 int
 rte_vhost_get_ifname(int vid, char *buf, size_t len)
 {
-- 
1.9.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 04/22] vhost: make notify ops per vhost driver
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
  2017-03-23  7:10  5%   ` [dpdk-dev] [PATCH v2 02/22] net/vhost: remove feature related APIs Yuanhan Liu
@ 2017-03-23  7:10  3%   ` Yuanhan Liu
  2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 10/22] vhost: export the number of vrings Yuanhan Liu
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

Assume there is an application both support vhost-user net and
vhost-user scsi, the callback should be different. Making notify
ops per vhost driver allow application define different set of
callbacks for different driver.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---

v2: - check the return value of callback_register and callback_get
    - update release note
---
 doc/guides/prog_guide/vhost_lib.rst    |  2 +-
 doc/guides/rel_notes/release_17_05.rst |  3 +++
 drivers/net/vhost/rte_eth_vhost.c      | 20 +++++++++++---------
 examples/tep_termination/main.c        |  7 ++++++-
 examples/vhost/main.c                  |  9 +++++++--
 lib/librte_vhost/rte_virtio_net.h      |  3 ++-
 lib/librte_vhost/socket.c              | 32 ++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.c               | 16 +---------------
 lib/librte_vhost/vhost.h               |  5 ++++-
 lib/librte_vhost/vhost_user.c          | 22 ++++++++++++++++------
 10 files changed, 83 insertions(+), 36 deletions(-)

diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst
index 6a4d206..40f3b3b 100644
--- a/doc/guides/prog_guide/vhost_lib.rst
+++ b/doc/guides/prog_guide/vhost_lib.rst
@@ -122,7 +122,7 @@ The following is an overview of some key Vhost API functions:
   starts an infinite loop, therefore it should be called in a dedicated
   thread.
 
-* ``rte_vhost_driver_callback_register(virtio_net_device_ops)``
+* ``rte_vhost_driver_callback_register(path, virtio_net_device_ops)``
 
   This function registers a set of callbacks, to let DPDK applications take
   the appropriate action when some events happen. The following events are
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4e405b1..dfa636d 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -135,6 +135,9 @@ API Changes
      * ``rte_eth_vhost_feature_enable``
      * ``rte_eth_vhost_feature_get``
 
+   * The vhost API ``rte_vhost_driver_callback_register(ops)`` takes one
+     more argument: ``rte_vhost_driver_callback_register(path, ops)``.
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index 83063c2..f6ad616 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -669,6 +669,12 @@ struct vhost_xstats_name_off {
 	return 0;
 }
 
+static struct virtio_net_device_ops vhost_ops = {
+	.new_device          = new_device,
+	.destroy_device      = destroy_device,
+	.vring_state_changed = vring_state_changed,
+};
+
 int
 rte_eth_vhost_get_queue_event(uint8_t port_id,
 		struct rte_eth_vhost_queue_event *event)
@@ -738,15 +744,6 @@ struct vhost_xstats_name_off {
 static void *
 vhost_driver_session(void *param __rte_unused)
 {
-	static struct virtio_net_device_ops vhost_ops;
-
-	/* set vhost arguments */
-	vhost_ops.new_device = new_device;
-	vhost_ops.destroy_device = destroy_device;
-	vhost_ops.vring_state_changed = vring_state_changed;
-	if (rte_vhost_driver_callback_register(&vhost_ops) < 0)
-		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
-
 	/* start event handling */
 	rte_vhost_driver_session_start();
 
@@ -1079,6 +1076,11 @@ struct vhost_xstats_name_off {
 	if (rte_vhost_driver_register(iface_name, flags))
 		goto error;
 
+	if (rte_vhost_driver_callback_register(iface_name, &vhost_ops) < 0) {
+		RTE_LOG(ERR, PMD, "Can't register callbacks\n");
+		goto error;
+	}
+
 	/* We need only one message handling thread */
 	if (rte_atomic16_add_return(&nb_started_ports, 1) == 1) {
 		if (vhost_driver_session_start())
diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c
index 8097dcd..18b977e 100644
--- a/examples/tep_termination/main.c
+++ b/examples/tep_termination/main.c
@@ -1256,7 +1256,12 @@ static inline void __attribute__((always_inline))
 	rte_vhost_driver_disable_features(dev_basename,
 		1ULL << VIRTIO_NET_F_MRG_RXBUF);
 
-	rte_vhost_driver_callback_register(&virtio_net_device_ops);
+	ret = rte_vhost_driver_callback_register(dev_basename,
+		&virtio_net_device_ops);
+	if (ret != 0) {
+		rte_exit(EXIT_FAILURE,
+			"failed to register vhost driver callbacks.\n");
+	}
 
 	rte_vhost_driver_session_start();
 
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 972a6a8..72a9d69 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1538,9 +1538,14 @@ static inline void __attribute__((always_inline))
 			rte_vhost_driver_enable_features(file,
 				1ULL << VIRTIO_NET_F_CTRL_RX);
 		}
-	}
 
-	rte_vhost_driver_callback_register(&virtio_net_device_ops);
+		ret = rte_vhost_driver_callback_register(file,
+			&virtio_net_device_ops);
+		if (ret != 0) {
+			rte_exit(EXIT_FAILURE,
+				"failed to register vhost driver callbacks.\n");
+		}
+	}
 
 	rte_vhost_driver_session_start();
 	return 0;
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index 5dadd3d..67bd125 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -133,7 +133,8 @@ struct virtio_net_device_ops {
 uint64_t rte_vhost_driver_get_features(const char *path);
 
 /* Register callbacks. */
-int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const);
+int rte_vhost_driver_callback_register(const char *path,
+	struct virtio_net_device_ops const * const ops);
 /* Start vhost driver session blocking loop. */
 int rte_vhost_driver_session_start(void);
 
diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index bbb4112..8431511 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -73,6 +73,8 @@ struct vhost_user_socket {
 	 */
 	uint64_t supported_features;
 	uint64_t features;
+
+	struct virtio_net_device_ops const *notify_ops;
 };
 
 struct vhost_user_connection {
@@ -718,6 +720,36 @@ struct vhost_user_reconnect_list {
 	return -1;
 }
 
+/*
+ * Register ops so that we can add/remove device to data core.
+ */
+int
+rte_vhost_driver_callback_register(const char *path,
+	struct virtio_net_device_ops const * const ops)
+{
+	struct vhost_user_socket *vsocket;
+
+	pthread_mutex_lock(&vhost_user.mutex);
+	vsocket = find_vhost_user_socket(path);
+	if (vsocket)
+		vsocket->notify_ops = ops;
+	pthread_mutex_unlock(&vhost_user.mutex);
+
+	return vsocket ? 0 : -1;
+}
+
+struct virtio_net_device_ops const *
+vhost_driver_callback_get(const char *path)
+{
+	struct vhost_user_socket *vsocket;
+
+	pthread_mutex_lock(&vhost_user.mutex);
+	vsocket = find_vhost_user_socket(path);
+	pthread_mutex_unlock(&vhost_user.mutex);
+
+	return vsocket ? vsocket->notify_ops : NULL;
+}
+
 int
 rte_vhost_driver_session_start(void)
 {
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 7b40a92..7d7bb3c 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -51,9 +51,6 @@
 
 struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
 
-/* device ops to add/remove device to/from data core. */
-struct virtio_net_device_ops const *notify_ops;
-
 struct virtio_net *
 get_device(int vid)
 {
@@ -253,7 +250,7 @@ struct virtio_net *
 
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
 		dev->flags &= ~VIRTIO_DEV_RUNNING;
-		notify_ops->destroy_device(vid);
+		dev->notify_ops->destroy_device(vid);
 	}
 
 	cleanup_device(dev, 1);
@@ -396,14 +393,3 @@ struct virtio_net *
 	dev->virtqueue[queue_id]->used->flags = VRING_USED_F_NO_NOTIFY;
 	return 0;
 }
-
-/*
- * Register ops so that we can add/remove device to data core.
- */
-int
-rte_vhost_driver_callback_register(struct virtio_net_device_ops const * const ops)
-{
-	notify_ops = ops;
-
-	return 0;
-}
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 692691b..6186216 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -185,6 +185,8 @@ struct virtio_net {
 	struct ether_addr	mac;
 	uint16_t		mtu;
 
+	struct virtio_net_device_ops const *notify_ops;
+
 	uint32_t		nr_guest_pages;
 	uint32_t		max_guest_pages;
 	struct guest_page       *guest_pages;
@@ -288,7 +290,6 @@ static inline phys_addr_t __attribute__((always_inline))
 	return 0;
 }
 
-struct virtio_net_device_ops const *notify_ops;
 struct virtio_net *get_device(int vid);
 
 int vhost_new_device(void);
@@ -301,6 +302,8 @@ static inline phys_addr_t __attribute__((always_inline))
 void vhost_set_ifname(int, const char *if_name, unsigned int if_len);
 void vhost_enable_dequeue_zero_copy(int vid);
 
+struct virtio_net_device_ops const *vhost_driver_callback_get(const char *path);
+
 /*
  * Backend-specific cleanup.
  *
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index d630098..0cadd79 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -135,7 +135,7 @@
 {
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
 		dev->flags &= ~VIRTIO_DEV_RUNNING;
-		notify_ops->destroy_device(dev->vid);
+		dev->notify_ops->destroy_device(dev->vid);
 	}
 
 	cleanup_device(dev, 0);
@@ -503,7 +503,7 @@
 	/* Remove from the data plane. */
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
 		dev->flags &= ~VIRTIO_DEV_RUNNING;
-		notify_ops->destroy_device(dev->vid);
+		dev->notify_ops->destroy_device(dev->vid);
 	}
 
 	if (dev->mem) {
@@ -687,7 +687,7 @@
 						"dequeue zero copy is enabled\n");
 			}
 
-			if (notify_ops->new_device(dev->vid) == 0)
+			if (dev->notify_ops->new_device(dev->vid) == 0)
 				dev->flags |= VIRTIO_DEV_RUNNING;
 		}
 	}
@@ -721,7 +721,7 @@
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
 		dev->flags &= ~VIRTIO_DEV_RUNNING;
-		notify_ops->destroy_device(dev->vid);
+		dev->notify_ops->destroy_device(dev->vid);
 	}
 
 	dev->flags &= ~VIRTIO_DEV_READY;
@@ -763,8 +763,8 @@
 		"set queue enable: %d to qp idx: %d\n",
 		enable, state->index);
 
-	if (notify_ops->vring_state_changed)
-		notify_ops->vring_state_changed(dev->vid, state->index, enable);
+	if (dev->notify_ops->vring_state_changed)
+		dev->notify_ops->vring_state_changed(dev->vid, state->index, enable);
 
 	dev->virtqueue[state->index]->enabled = enable;
 
@@ -978,6 +978,16 @@
 	if (dev == NULL)
 		return -1;
 
+	if (!dev->notify_ops) {
+		dev->notify_ops = vhost_driver_callback_get(dev->ifname);
+		if (!dev->notify_ops) {
+			RTE_LOG(ERR, VHOST_CONFIG,
+				"failed to get callback ops for driver %s\n",
+				dev->ifname);
+			return -1;
+		}
+	}
+
 	ret = read_vhost_message(fd, &msg);
 	if (ret <= 0 || msg.request >= VHOST_USER_MAX) {
 		if (ret < 0)
-- 
1.9.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 02/22] net/vhost: remove feature related APIs
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
@ 2017-03-23  7:10  5%   ` Yuanhan Liu
  2017-03-23  7:10  3%   ` [dpdk-dev] [PATCH v2 04/22] vhost: make notify ops per vhost driver Yuanhan Liu
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

The rte_eth_vhost_feature_disable/enable/get APIs are just a wrapper of
rte_vhost_feature_disable/enable/get. However, the later are going to
be refactored; it's going to take an extra parameter (socket_file path),
to let it be per-device.

Instead of changing those vhost-pmd APIs to adapt to the new vhost APIs,
we could simply remove them, and let vdev to serve this purpose. After
all, vdev options is better for disabling/enabling some features.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---

v2: - write more informative commit log on why they are removed.
    - update release note
---
 doc/guides/rel_notes/release_17_05.rst      |  7 +++++++
 drivers/net/vhost/rte_eth_vhost.c           | 25 ------------------------
 drivers/net/vhost/rte_eth_vhost.h           | 30 -----------------------------
 drivers/net/vhost/rte_pmd_vhost_version.map |  3 ---
 4 files changed, 7 insertions(+), 58 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index bb64428..4e405b1 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -129,6 +129,13 @@ API Changes
 * The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
   while keeping ABI compatibility.
 
+   * The following vhost-pmd APIs are removed
+
+     * ``rte_eth_vhost_feature_disable``
+     * ``rte_eth_vhost_feature_enable``
+     * ``rte_eth_vhost_feature_get``
+
+
 ABI Changes
 -----------
 
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index a4435da..83063c2 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -965,31 +965,6 @@ struct vhost_xstats_name_off {
 	return 0;
 }
 
-/**
- * Disable features in feature_mask. Returns 0 on success.
- */
-int
-rte_eth_vhost_feature_disable(uint64_t feature_mask)
-{
-	return rte_vhost_feature_disable(feature_mask);
-}
-
-/**
- * Enable features in feature_mask. Returns 0 on success.
- */
-int
-rte_eth_vhost_feature_enable(uint64_t feature_mask)
-{
-	return rte_vhost_feature_enable(feature_mask);
-}
-
-/* Returns currently supported vhost features */
-uint64_t
-rte_eth_vhost_feature_get(void)
-{
-	return rte_vhost_feature_get();
-}
-
 static const struct eth_dev_ops ops = {
 	.dev_start = eth_dev_start,
 	.dev_stop = eth_dev_stop,
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
index 7c98b1a..ea4bce4 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -43,36 +43,6 @@
 
 #include <rte_virtio_net.h>
 
-/**
- * Disable features in feature_mask.
- *
- * @param feature_mask
- *  Vhost features defined in "linux/virtio_net.h".
- * @return
- *  - On success, zero.
- *  - On failure, a negative value.
- */
-int rte_eth_vhost_feature_disable(uint64_t feature_mask);
-
-/**
- * Enable features in feature_mask.
- *
- * @param feature_mask
- *  Vhost features defined in "linux/virtio_net.h".
- * @return
- *  - On success, zero.
- *  - On failure, a negative value.
- */
-int rte_eth_vhost_feature_enable(uint64_t feature_mask);
-
-/**
- * Returns currently supported vhost features.
- *
- * @return
- *  Vhost features defined in "linux/virtio_net.h".
- */
-uint64_t rte_eth_vhost_feature_get(void);
-
 /*
  * Event description.
  */
diff --git a/drivers/net/vhost/rte_pmd_vhost_version.map b/drivers/net/vhost/rte_pmd_vhost_version.map
index 3d44083..695db85 100644
--- a/drivers/net/vhost/rte_pmd_vhost_version.map
+++ b/drivers/net/vhost/rte_pmd_vhost_version.map
@@ -1,9 +1,6 @@
 DPDK_16.04 {
 	global:
 
-	rte_eth_vhost_feature_disable;
-	rte_eth_vhost_feature_enable;
-	rte_eth_vhost_feature_get;
 	rte_eth_vhost_get_queue_event;
 
 	local: *;
-- 
1.9.0

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API
  2017-03-03  9:51  4% [dpdk-dev] [PATCH 00/17] vhost: generic vhost API Yuanhan Liu
  2017-03-03  9:51  3% ` [dpdk-dev] [PATCH 16/17] vhost: rename header file Yuanhan Liu
@ 2017-03-23  7:10  4% ` Yuanhan Liu
  2017-03-23  7:10  5%   ` [dpdk-dev] [PATCH v2 02/22] net/vhost: remove feature related APIs Yuanhan Liu
                     ` (7 more replies)
  1 sibling, 8 replies; 200+ results
From: Yuanhan Liu @ 2017-03-23  7:10 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

This patchset makes DPDK vhost library be generic enough, so that we could
build other vhost-user drivers on top of it. For example, SPDK (Storage
Performance Development Kit) is trying to enable vhost-user SCSI.

The basic idea is, let DPDK vhost be a vhost-user agent. It stores all
the info about the virtio device (i.e. vring address, negotiated features,
etc) and let the specific vhost-user driver to fetch them (by the API
provided by DPDK vhost lib). With those info being provided, the vhost-user
driver then could get/put vring entries, thus, it could exchange data
between the guest and host.

The last patch demonstrates how to use these new APIs to implement a
very simple vhost-user net driver, without any fancy features enabled.


Major API/ABI Changes summary
=============================

- some renames
  * "struct virtio_net_device_ops" ==> "struct vhost_device_ops"
  * "rte_virtio_net.h"  ==> "rte_vhost.h"

- driver related APIs are bond with the socket file
  * rte_vhost_driver_set_features(socket_file, features);
  * rte_vhost_driver_get_features(socket_file, features);
  * rte_vhost_driver_enable_features(socket_file, features)
  * rte_vhost_driver_disable_features(socket_file, features)
  * rte_vhost_driver_callback_register(socket_file, notify_ops);
  * rte_vhost_driver_start(socket_file);
    This function replaces rte_vhost_driver_session_start(). Check patch
    18 for more information.

- new APIs to fetch guest and vring info
  * rte_vhost_get_mem_table(vid, mem);
  * rte_vhost_get_negotiated_features(vid);
  * rte_vhost_get_vhost_vring(vid, vring_idx, vring);

- new exported structures 
  * struct rte_vhost_vring
  * struct rte_vhost_mem_region
  * struct rte_vhost_memory

- a new device ops callback: features_changed().


Change log
==========

v2: - rebase
    - updated release note
    - updated API comments
    - renamed rte_vhost_get_vhost_memory to rte_vhost_get_mem_table

    - added a new device callback: features_changed(), bascially for live
      migration support
    - introduced rte_vhost_driver_start() to start a specific driver
    - misc fixes


Some design choices
===================

While making this patchset, I met quite few design choices and here are
two of them, with the issue and the reason I made such choices provided.
Please let me know if you have any comments (or better ideas).

Export public structures or not
-------------------------------

I made an ABI refactor last time (v16.07): move all the structures
internally and let applications use a "vid" to reference the internal
struct. With that, I hope we could never worry about the annoying ABI
issues.

It works great (and as expected) since then, as far as we only support
virito-net, as far as we can handle all the descs inside vhost lib. It
becomes problematic when a user wants to implement a vhost-user driver
somewhere. For example, it needs do the GPA to VVA translation. Without
any structs exported, some functions like gpa_to_vva() can't be inlined.
Calling it would be costly, especially it's a function we have to invoke
for processing each vring desc.

For that reason, the guest memory regions are exported. With that, the
gpa_to_vva could be inlined.

  
Add helper functions to fetch/update descs or not
-------------------------------------------------

I intended to do it like this way: introduce one function to get @count
of descs from a specific vring and another one to update the used descs.
It's something like
    rte_vhost_vring_get_descs(vid, vring_idx, count, offset, iov, descs);
    rte_vhost_vring_update_used_descs(vid, vring_idx, count, offset, descs);

With that, vhost-user driver programmer's task would be easier, as he/she
doesn't have to parse the descs any more (such as to handle indirect desc).

But judging that virtio 1.1 is just emerged and it proposes a completely
ring layout, and most importantly, the vring desc structure is also changed,
I'd like to hold the introducation of such two functions. Otherwise, it's
very likely the two will be invalid when virtio 1.1 is out. Though I think
it may could be addressed with a care design, something like making the IOV
generic enough:

	struct rte_vhost_iov {
		uint64_t	gpa;
		uint64_t	vva;
		uint64_t	len;
	};

Instead, I go with the other way: introduce few APIs to export all the vring
infos (vring size, vring addr, callfd, etc), and let the vhost-user driver
read and update the descs. Those info could be passed to vhost-user driver
by introducing one API for each, but for saving few APIs and reducing few
calls for the programmer, I packed few key fields into a new structure, so
that it can be fetched with one call:
        struct rte_vhost_vring {
                struct vring_desc       *desc;
                struct vring_avail      *avail;
                struct vring_used       *used;
                uint64_t                log_guest_addr;
       
                int                     callfd;
                int                     kickfd;
                uint16_t                size;
        };

When virtio 1.1 comes out, likely a simple change like following would
just work:
        struct rte_vhost_vring {
		union {
			struct {
                		struct vring_desc       *desc;
                		struct vring_avail      *avail;
                		struct vring_used       *used;
                		uint64_t                log_guest_addr;
			};
			struct desc	*desc_1_1;	/* vring addr for virtio 1.1 */
		};
       
                int                     callfd;
                int                     kickfd;
                uint16_t                size;
        };

AFAIK, it's not an ABI breakage. Even if it does, we could introduce a new
API to get the virtio 1.1 ring address.

Those fields are the minimum set I got for a specific vring, with the mind
it would bring the minimum chance to break ABI for future extension. If we
need more info, we could introduce a new API.

OTOH, for getting the best performance, the two functions also have to be
inlined ("vid + vring_idx" combo is replaced with "vring"):
    rte_vhost_vring_get_descs(vring, count, offset, iov, descs);
    rte_vhost_vring_update_used_descs(vring, count, offset, descs);

That said, one way or another, we have to export rte_vhost_vring struct.
For this reason, I didn't rush into introducing the two APIs.

	--yliu

---
Yuanhan Liu (22):
  vhost: introduce driver features related APIs
  net/vhost: remove feature related APIs
  vhost: use new APIs to handle features
  vhost: make notify ops per vhost driver
  vhost: export guest memory regions
  vhost: introduce API to fetch negotiated features
  vhost: export vhost vring info
  vhost: export API to translate gpa to vva
  vhost: turn queue pair to vring
  vhost: export the number of vrings
  vhost: move the device ready check at proper place
  vhost: drop the Rx and Tx queue macro
  vhost: do not include net specific headers
  vhost: rename device ops struct
  vhost: rename virtio-net to vhost
  vhost: add features changed callback
  vhost: export APIs for live migration support
  vhost: introduce API to start a specific driver
  vhost: rename header file
  vhost: workaround the build dependency on mbuf header
  vhost: do not destroy device on repeat mem table message
  examples/vhost: demonstrate the new generic vhost APIs

 doc/guides/prog_guide/vhost_lib.rst         |  42 +--
 doc/guides/rel_notes/deprecation.rst        |   9 -
 doc/guides/rel_notes/release_17_05.rst      |  40 +++
 drivers/net/vhost/rte_eth_vhost.c           | 101 ++-----
 drivers/net/vhost/rte_eth_vhost.h           |  32 +--
 drivers/net/vhost/rte_pmd_vhost_version.map |   3 -
 examples/tep_termination/main.c             |  23 +-
 examples/tep_termination/main.h             |   2 +
 examples/tep_termination/vxlan_setup.c      |   2 +-
 examples/vhost/Makefile                     |   2 +-
 examples/vhost/main.c                       | 100 +++++--
 examples/vhost/main.h                       |  33 ++-
 examples/vhost/virtio_net.c                 | 405 ++++++++++++++++++++++++++
 lib/librte_vhost/Makefile                   |   4 +-
 lib/librte_vhost/fd_man.c                   |   9 +-
 lib/librte_vhost/fd_man.h                   |   2 +-
 lib/librte_vhost/rte_vhost.h                | 423 ++++++++++++++++++++++++++++
 lib/librte_vhost/rte_vhost_version.map      |  17 +-
 lib/librte_vhost/rte_virtio_net.h           | 208 --------------
 lib/librte_vhost/socket.c                   | 222 ++++++++++++---
 lib/librte_vhost/vhost.c                    | 229 ++++++++-------
 lib/librte_vhost/vhost.h                    | 113 +++++---
 lib/librte_vhost/vhost_user.c               | 115 ++++----
 lib/librte_vhost/vhost_user.h               |   2 +-
 lib/librte_vhost/virtio_net.c               |  71 ++---
 25 files changed, 1523 insertions(+), 686 deletions(-)
 create mode 100644 examples/vhost/virtio_net.c
 create mode 100644 lib/librte_vhost/rte_vhost.h
 delete mode 100644 lib/librte_vhost/rte_virtio_net.h

-- 
1.9.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4 1/7] net/ark: PMD for Atomic Rules Arkville driver stub
@ 2017-03-23  1:03  3% Ed Czeck
  2017-03-23 22:59  3% ` [dpdk-dev] [PATCH v5 " Ed Czeck
  0 siblings, 1 reply; 200+ results
From: Ed Czeck @ 2017-03-23  1:03 UTC (permalink / raw)
  To: dev; +Cc: Ed Czeck

Enable Arkville on supported configurations
Add overview documentation
Minimum driver support for valid compile
Arkville PMD is not supported on ARM or PowerPC at this time

v4:
* Address issues report from review
* Add internal comments on driver arg
* provide a bare-biones dev init to avoid compiler warnings

v3:
* Split large patch into several smaller ones

Signed-off-by: Ed Czeck <ed.czeck@atomicrules.com>
---
 MAINTAINERS                                 |   8 +
 config/common_base                          |  10 +
 config/defconfig_arm-armv7a-linuxapp-gcc    |   1 +
 config/defconfig_ppc_64-power8-linuxapp-gcc |   1 +
 doc/guides/nics/ark.rst                     | 242 +++++++++++++++++++++
 doc/guides/nics/features/ark.ini            |  15 ++
 doc/guides/nics/index.rst                   |   1 +
 drivers/net/Makefile                        |   1 +
 drivers/net/ark/Makefile                    |  62 ++++++
 drivers/net/ark/ark_debug.h                 |  71 +++++++
 drivers/net/ark/ark_ethdev.c                | 316 ++++++++++++++++++++++++++++
 drivers/net/ark/ark_ethdev.h                |  39 ++++
 drivers/net/ark/ark_global.h                | 108 ++++++++++
 drivers/net/ark/rte_pmd_ark_version.map     |   4 +
 mk/rte.app.mk                               |   1 +
 15 files changed, 880 insertions(+)
 create mode 100644 doc/guides/nics/ark.rst
 create mode 100644 doc/guides/nics/features/ark.ini
 create mode 100644 drivers/net/ark/Makefile
 create mode 100644 drivers/net/ark/ark_debug.h
 create mode 100644 drivers/net/ark/ark_ethdev.c
 create mode 100644 drivers/net/ark/ark_ethdev.h
 create mode 100644 drivers/net/ark/ark_global.h
 create mode 100644 drivers/net/ark/rte_pmd_ark_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 0c78b58..19ee27f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -278,6 +278,14 @@ M: Evgeny Schemeilin <evgenys@amazon.com>
 F: drivers/net/ena/
 F: doc/guides/nics/ena.rst
 
+Atomic Rules ARK
+M: Shepard Siegel <shepard.siegel@atomicrules.com>
+M: Ed Czeck       <ed.czeck@atomicrules.com>
+M: John Miller    <john.miller@atomicrules.com>
+F: drivers/net/ark/
+F: doc/guides/nics/ark.rst
+F: doc/guides/nics/features/ark.ini
+
 Broadcom bnxt
 M: Stephen Hurd <stephen.hurd@broadcom.com>
 M: Ajit Khaparde <ajit.khaparde@broadcom.com>
diff --git a/config/common_base b/config/common_base
index 37aa1e1..4feb5e4 100644
--- a/config/common_base
+++ b/config/common_base
@@ -353,6 +353,16 @@ CONFIG_RTE_LIBRTE_QEDE_FW=""
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
 #
+# Compile ARK PMD
+#
+CONFIG_RTE_LIBRTE_ARK_PMD=y
+CONFIG_RTE_LIBRTE_ARK_PAD_TX=y
+CONFIG_RTE_LIBRTE_ARK_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE=n
+
+#
 # Compile the TAP PMD
 # It is enabled by default for Linux only.
 #
diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index d9bd2a8..6d2b5e0 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -61,6 +61,7 @@ CONFIG_RTE_SCHED_VECTOR=n
 
 # cannot use those on ARM
 CONFIG_RTE_KNI_KMOD=n
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_EM_PMD=n
 CONFIG_RTE_LIBRTE_IGB_PMD=n
 CONFIG_RTE_LIBRTE_CXGBE_PMD=n
diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc b/config/defconfig_ppc_64-power8-linuxapp-gcc
index 35f7fb6..89bc396 100644
--- a/config/defconfig_ppc_64-power8-linuxapp-gcc
+++ b/config/defconfig_ppc_64-power8-linuxapp-gcc
@@ -48,6 +48,7 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
 
 # Note: Initially, all of the PMD drivers compilation are turned off on Power
 # Will turn on them only after the successful testing on Power
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_IXGBE_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
diff --git a/doc/guides/nics/ark.rst b/doc/guides/nics/ark.rst
new file mode 100644
index 0000000..ff3090a
--- /dev/null
+++ b/doc/guides/nics/ark.rst
@@ -0,0 +1,242 @@
+.. BSD LICENSE
+
+    Copyright (c) 2015-2017 Atomic Rules LLC
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Atomic Rules LLC nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ARK Poll Mode Driver
+====================
+
+The ARK PMD is a DPDK poll-mode driver for the Atomic Rules Arkville
+(ARK) family of devices.
+
+More information can be found at the `Atomic Rules website
+<http://atomicrules.com>`_.
+
+Overview
+--------
+
+The Atomic Rules Arkville product is DPDK and AXI compliant product
+that marshals packets across a PCIe conduit between host DPDK mbufs and
+FPGA AXI streams.
+
+The ARK PMD, and the spirit of the overall Arkville product,
+has been to take the DPDK API/ABI as a fixed specification;
+then implement much of the business logic in FPGA RTL circuits.
+The approach of *working backwards* from the DPDK API/ABI and having
+the GPP host software *dictate*, while the FPGA hardware *copes*,
+results in significant performance gains over a naive implementation.
+
+While this document describes the ARK PMD software, it is helpful to
+understand what the FPGA hardware is and is not. The Arkville RTL
+component provides a single PCIe Physical Function (PF) supporting
+some number of RX/Ingress and TX/Egress Queues. The ARK PMD controls
+the Arkville core through a dedicated opaque Core BAR (CBAR).
+To allow users full freedom for their own FPGA application IP,
+an independent FPGA Application BAR (ABAR) is provided.
+
+One popular way to imagine Arkville's FPGA hardware aspect is as the
+FPGA PCIe-facing side of a so-called Smart NIC. The Arkville core does
+not contain any MACs, and is link-speed independent, as well as
+agnostic to the number of physical ports the application chooses to
+use. The ARK driver exposes the familiar PMD interface to allow packet
+movement to and from mbufs across multiple queues.
+
+However FPGA RTL applications could contain a universe of added
+functionality that an Arkville RTL core does not provide or can
+not anticipate. To allow for this expectation of user-defined
+innovation, the ARK PMD provides a dynamic mechanism of adding
+capabilities without having to modify the ARK PMD.
+
+The ARK PMD is intended to support all instances of the Arkville
+RTL Core, regardless of configuration, FPGA vendor, or target
+board. While specific capabilities such as number of physical
+hardware queue-pairs are negotiated; the driver is designed to
+remain constant over a broad and extendable feature set.
+
+Intentionally, Arkville by itself DOES NOT provide common NIC
+capabilities such as offload or receive-side scaling (RSS).
+These capabilities would be viewed as a gate-level "tax" on
+Green-box FPGA applications that do not require such function.
+Instead, they can be added as needed with essentially no
+overhead to the FPGA Application.
+
+Data Path Interface
+-------------------
+
+Ingress RX and Egress TX operation is by the nominal DPDK API .
+The driver supports single-port, multi-queue for both RX and TX.
+
+Refer to ``ark_ethdev.h`` for the list of supported methods to
+act upon RX and TX Queues.
+
+Configuration Information
+-------------------------
+
+**DPDK Configuration Parameters**
+
+  The following configuration options are available for the ARK PMD:
+
+   * **CONFIG_RTE_LIBRTE_ARK_PMD** (default y): Enables or disables inclusion
+     of the ARK PMD driver in the DPDK compilation.
+
+   * **CONFIG_RTE_LIBRTE_ARK_PAD_TX** (default y):  When enabled TX
+     packets are padded to 60 bytes to support downstream MACS.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_RX** (default n): Enables or disables debug
+     logging and internal checking of RX ingress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TX** (default n): Enables or disables debug
+     logging and internal checking of TX egress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS** (default n): Enables or disables debug
+     logging of detailed packet and performance statistics gathered in
+     the PMD and FPGA.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE** (default n): Enables or disables debug
+     logging of detailed PMD events and status.
+
+
+Building DPDK
+-------------
+
+See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for
+instructions on how to build DPDK.
+
+By default the ARK PMD library will be built into the DPDK library.
+
+For configuring and using UIO and VFIO frameworks, please also refer :ref:`the
+documentation that comes with DPDK suite <linux_gsg>`.
+
+Supported ARK RTL PCIe Instances
+--------------------------------
+
+ARK PMD supports the following Arkville RTL PCIe instances including:
+
+* ``1d6c:100d`` - AR-ARKA-FX0 [Arkville 32B DPDK Data Mover]
+* ``1d6c:100e`` - AR-ARKA-FX1 [Arkville 64B DPDK Data Mover]
+
+Supported Operating Systems
+---------------------------
+
+Any Linux distribution fulfilling the conditions described in ``System Requirements``
+section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK
+Release Notes*.  ARM and PowerPC architectures are not supported at this time.
+
+
+Supported Features
+------------------
+
+* Dynamic ARK PMD extensions
+* Multiple receive and transmit queues
+* Jumbo frames up to 9K
+* Hardware Statistics
+
+Unsupported Features
+--------------------
+
+Features that may be part of, or become part of, the Arkville RTL IP that are
+not currently supported or exposed by the ARK PMD include:
+
+* PCIe SR-IOV Virtual Functions (VFs)
+* Arkville's Packet Generator Control and Status
+* Arkville's Packet Director Control and Status
+* Arkville's Packet Checker Control and Status
+* Arkville's Timebase Management
+
+Pre-Requisites
+--------------
+
+#. Prepare the system as recommended by DPDK suite.  This includes environment
+   variables, hugepages configuration, tool-chains and configuration
+
+#. Insert igb_uio kernel module using the command 'modprobe igb_uio'
+
+#. Bind the intended ARK device to igb_uio module
+
+At this point the system should be ready to run DPDK applications. Once the
+application runs to completion, the ARK PMD can be detached from igb_uio if necessary.
+
+Usage Example
+-------------
+
+This section demonstrates how to launch **testpmd** with Atomic Rules ARK
+devices managed by librte_pmd_ark.
+
+#. Load the kernel modules:
+
+   .. code-block:: console
+
+      modprobe uio
+      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
+
+   .. note::
+
+      The ARK PMD driver depends upon the igb_uio user space I/O kernel module
+
+#. Mount and request huge pages:
+
+   .. code-block:: console
+
+      mount -t hugetlbfs nodev /mnt/huge
+      echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
+
+#. Bind UIO driver to ARK device at 0000:01:00.0 (using dpdk-devbind.py):
+
+   .. code-block:: console
+
+      ./usertools/dpdk-devbind.py --bind=igb_uio 0000:01:00.0
+
+   .. note::
+
+      The last argument to dpdk-devbind.py is the 4-tuple that indentifies a specific PCIe
+      device. You can use lspci -d 1d6c: to indentify all Atomic Rules devices in the system,
+      and thus determine the correct 4-tuple argument to dpdk-devbind.py
+
+#. Start testpmd with basic parameters:
+
+   .. code-block:: console
+
+      ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -- -i
+
+   Example output:
+
+   .. code-block:: console
+
+      [...]
+      EAL: PCI device 0000:01:00.0 on NUMA socket -1
+      EAL:   probe driver: 1d6c:100e rte_ark_pmd
+      EAL:   PCI memory mapped at 0x7f9b6c400000
+      PMD: eth_ark_dev_init(): Initializing 0:2:0.1
+      ARKP PMD CommitID: 378f3a67
+      Configuring Port 0 (socket 0)
+      Port 0: DC:3C:F6:00:00:01
+      Checking link statuses...
+      Port 0 Link Up - speed 100000 Mbps - full-duplex
+      Done
+      testpmd>
diff --git a/doc/guides/nics/features/ark.ini b/doc/guides/nics/features/ark.ini
new file mode 100644
index 0000000..dc8a0e2
--- /dev/null
+++ b/doc/guides/nics/features/ark.ini
@@ -0,0 +1,15 @@
+;
+; Supported features of the 'ark' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Queue start/stop     = Y
+Jumbo frame          = Y
+Scattered Rx         = Y
+Basic stats          = Y
+Stats per queue      = Y
+FW version           = Y
+Linux UIO            = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 87f9334..381d82c 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
     :numbered:
 
     overview
+    ark
     bnx2x
     bnxt
     cxgbe
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a16f25e..ea9868b 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -32,6 +32,7 @@
 include $(RTE_SDK)/mk/rte.vars.mk
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += bnx2x
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += bonding
 DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
diff --git a/drivers/net/ark/Makefile b/drivers/net/ark/Makefile
new file mode 100644
index 0000000..cf5d618
--- /dev/null
+++ b/drivers/net/ark/Makefile
@@ -0,0 +1,62 @@
+# BSD LICENSE
+#
+# Copyright (c) 2015-2017 Atomic Rules LLC
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of copyright holder nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_ark.a
+
+CFLAGS += -O3 -I./
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_ark_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-y += ark_ethdev.c
+
+
+# this lib depends upon:
+DEPDIRS-y += lib/librte_mbuf
+DEPDIRS-y += lib/librte_ether
+DEPDIRS-y += lib/librte_kvargs
+DEPDIRS-y += lib/librte_eal
+DEPDIRS-y += lib/librte_mempool
+
+LDLIBS += lpthread
+LDLIBS += ldl
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/ark/ark_debug.h b/drivers/net/ark/ark_debug.h
new file mode 100644
index 0000000..52b08a1
--- /dev/null
+++ b/drivers/net/ark/ark_debug.h
@@ -0,0 +1,71 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_DEBUG_H_
+#define _ARK_DEBUG_H_
+
+#include <inttypes.h>
+#include <rte_log.h>
+
+/* Format specifiers for string data pairs */
+#define ARK_SU32  "\n\t%-20s    %'20" PRIu32
+#define ARK_SU64  "\n\t%-20s    %'20" PRIu64
+#define ARK_SU64X "\n\t%-20s    %#20" PRIx64
+#define ARK_SPTR  "\n\t%-20s    %20p"
+
+#define ARK_TRACE_ON(fmt, ...) \
+	PMD_DRV_LOG(ERR, fmt, ##__VA_ARGS__)
+
+#define ARK_TRACE_OFF(fmt, ...) \
+	do {if (0) PMD_DRV_LOG(ERR, fmt, ##__VA_ARGS__); } while (0)
+
+/* Debug macro for reporting Packet stats */
+#ifdef RTE_LIBRTE_ARK_DEBUG_STATS
+#define ARK_DEBUG_STATS(fmt, ...) ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_STATS(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+/* Debug macro for tracing full behavior*/
+#ifdef RTE_LIBRTE_ARK_DEBUG_TRACE
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+/* tracing including the function name */
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt, __func__, ## args)
+
+
+#endif
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
new file mode 100644
index 0000000..6ae5ffc
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.c
@@ -0,0 +1,316 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dlfcn.h>
+
+#include <rte_kvargs.h>
+#include <rte_vdev.h>
+
+#include "ark_global.h"
+#include "ark_debug.h"
+#include "ark_ethdev.h"
+
+/*  Internal prototypes */
+static int eth_ark_check_args(const char *params);
+static int eth_ark_dev_init(struct rte_eth_dev *dev);
+static int eth_ark_dev_uninit(struct rte_eth_dev *eth_dev);
+static int eth_ark_dev_configure(struct rte_eth_dev *dev);
+static void eth_ark_dev_info_get(struct rte_eth_dev *dev,
+				 struct rte_eth_dev_info *dev_info);
+
+#define ARK_DEV_TO_PCI(eth_dev)			\
+	RTE_DEV_TO_PCI((eth_dev)->device)
+
+#define ARK_MAX_ARG_LEN 256
+static uint32_t pkt_dir_v;
+static char pkt_gen_args[ARK_MAX_ARG_LEN];
+static char pkt_chkr_args[ARK_MAX_ARG_LEN];
+
+/*
+ * The packet generator is a functional block used to generate egress packet
+ * patterns.
+ */
+#define ARK_PKTGEN_ARG "Pkt_gen"
+
+/*
+ * The packet checker is a functional block used to test ingress packet
+ * patterns.
+ */
+#define ARK_PKTCHKR_ARG "Pkt_chkr"
+
+/*
+ * The packet director is used to select the internal ingress and egress packets
+ * paths.
+ */
+#define ARK_PKTDIR_ARG "Pkt_dir"
+
+#define ARK_RX_MAX_QUEUE (4096 * 4)
+#define ARK_RX_MIN_QUEUE (512)
+#define ARK_TX_MAX_QUEUE (4096 * 4)
+#define ARK_TX_MIN_QUEUE (256)
+
+static const char * const valid_arguments[] = {
+	ARK_PKTGEN_ARG,
+	ARK_PKTCHKR_ARG,
+	ARK_PKTDIR_ARG,
+	NULL
+};
+
+#define MAX_ARK_PHYS 16
+struct ark_adapter *gark[MAX_ARK_PHYS];
+
+static const struct rte_pci_id pci_id_ark_map[] = {
+	{RTE_PCI_DEVICE(0x1d6c, 0x100d)},
+	{RTE_PCI_DEVICE(0x1d6c, 0x100e)},
+	{.vendor_id = 0, /* sentinel */ },
+};
+
+static struct eth_driver rte_ark_pmd = {
+	.pci_drv = {
+		.probe = rte_eth_dev_pci_probe,
+		.remove = rte_eth_dev_pci_remove,
+		.id_table = pci_id_ark_map,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC
+	},
+	.eth_dev_init = eth_ark_dev_init,
+	.eth_dev_uninit = eth_ark_dev_uninit,
+	.dev_private_size = sizeof(struct ark_adapter),
+};
+
+static const struct eth_dev_ops ark_eth_dev_ops = {
+	.dev_configure = eth_ark_dev_configure,
+	.dev_infos_get = eth_ark_dev_info_get,
+};
+
+static int
+eth_ark_dev_init(struct rte_eth_dev *dev __rte_unused)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+	struct rte_pci_device *pci_dev;
+	int ret = -1;
+
+	ark->eth_dev = dev;
+
+	ARK_DEBUG_TRACE("eth_ark_dev_init(struct rte_eth_dev *dev)\n");
+	gark[0] = ark;
+
+	pci_dev = ARK_DEV_TO_PCI(dev);
+	rte_eth_copy_pci_info(dev, pci_dev);
+
+	if (pci_dev->device.devargs)
+		eth_ark_check_args(pci_dev->device.devargs->args);
+	else
+		PMD_DRV_LOG(INFO, "No Device args found\n");
+
+
+	ark->bar0 = (uint8_t *)pci_dev->mem_resource[0].addr;
+	ark->a_bar = (uint8_t *)pci_dev->mem_resource[2].addr;
+
+	dev->dev_ops = &ark_eth_dev_ops;
+
+	return ret;
+}
+
+static int
+eth_ark_dev_uninit(struct rte_eth_dev *dev)
+{
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	dev->dev_ops = NULL;
+	dev->rx_pkt_burst = NULL;
+	dev->tx_pkt_burst = NULL;
+	return 0;
+}
+
+static int
+eth_ark_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	ARK_DEBUG_TRACE("ARKP: In %s\n", __func__);
+	return 0;
+}
+
+static void
+eth_ark_dev_info_get(struct rte_eth_dev *dev,
+		     struct rte_eth_dev_info *dev_info)
+{
+	/* device specific configuration */
+	memset(dev_info, 0, sizeof(*dev_info));
+
+	dev_info->max_rx_pktlen = (16 * 1024) - 128;
+	dev_info->min_rx_bufsize = 1024;
+
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = ARK_RX_MAX_QUEUE,
+		.nb_min = ARK_RX_MIN_QUEUE,
+		.nb_align = ARK_RX_MIN_QUEUE}; /* power of 2 */
+
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = ARK_TX_MAX_QUEUE,
+		.nb_min = ARK_TX_MIN_QUEUE,
+		.nb_align = ARK_TX_MIN_QUEUE}; /* power of 2 */
+
+	/* ARK PMD supports all line rates, how do we indicate that here ?? */
+	dev_info->speed_capa = (ETH_LINK_SPEED_1G |
+				ETH_LINK_SPEED_10G |
+				ETH_LINK_SPEED_25G |
+				ETH_LINK_SPEED_40G |
+				ETH_LINK_SPEED_50G |
+				ETH_LINK_SPEED_100G);
+	dev_info->pci_dev = ARK_DEV_TO_PCI(dev);
+}
+
+static inline int
+process_pktdir_arg(const char *key, const char *value,
+		   void *extra_args __rte_unused)
+{
+	ARK_DEBUG_TRACE("In process_pktdir_arg, key = %s, value = %s\n",
+			key, value);
+	pkt_dir_v = strtol(value, NULL, 16);
+	ARK_DEBUG_TRACE("pkt_dir_v = 0x%x\n", pkt_dir_v);
+	return 0;
+}
+
+static inline int
+process_file_args(const char *key, const char *value, void *extra_args)
+{
+	ARK_DEBUG_TRACE("**** IN process_pktgen_arg, key = %s, value = %s\n",
+			key, value);
+	char *args = (char *)extra_args;
+
+	/* Open the configuration file */
+	FILE *file = fopen(value, "r");
+	char line[ARK_MAX_ARG_LEN];
+	int first = 1;
+
+	while (fgets(line, sizeof(line), file)) {
+		if (first) {
+			strncpy(args, line, ARK_MAX_ARG_LEN);
+			first = 0;
+		} else {
+			strncat(args, line, ARK_MAX_ARG_LEN);
+		}
+	}
+	ARK_DEBUG_TRACE("file = %s\n", args);
+	fclose(file);
+	return 0;
+}
+
+static int
+eth_ark_check_args(const char *params)
+{
+	struct rte_kvargs *kvlist;
+	unsigned int k_idx;
+	struct rte_kvargs_pair *pair = NULL;
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return 0;
+
+	pkt_gen_args[0] = 0;
+	pkt_chkr_args[0] = 0;
+
+	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
+		pair = &kvlist->pairs[k_idx];
+		ARK_DEBUG_TRACE("**** Arg passed to PMD = %s:%s\n", pair->key,
+				pair->value);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTDIR_ARG,
+			       &process_pktdir_arg,
+			       NULL) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTDIR_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTGEN_ARG,
+			       &process_file_args,
+			       pkt_gen_args) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTGEN_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTCHKR_ARG,
+			       &process_file_args,
+			       pkt_chkr_args) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTCHKR_ARG);
+	}
+
+	ARK_DEBUG_TRACE("INFO: packet director set to 0x%x\n", pkt_dir_v);
+
+	return 1;
+}
+
+static int
+pmd_ark_probe(const char *name, const char *params)
+{
+	RTE_LOG(INFO, PMD, "Initializing pmd_ark for %s params = %s\n", name,
+		params);
+	eth_ark_check_args(params);
+	return 0;
+}
+
+static int
+pmd_ark_remove(const char *name)
+{
+	RTE_LOG(INFO, PMD, "Closing ark %s ethdev on numa socket %u\n", name,
+		rte_socket_id());
+	return 1;
+}
+
+/*
+ * Although Arkville is a physical device we take advantage of the virtual
+ * device initialization as a per test runtime initialization for
+ * regression testing.  Parameters are passed into the virtual device to
+ * configure the packet generator, packet director and packet checker.
+ */
+static struct rte_vdev_driver pmd_ark_drv = {
+	.probe = pmd_ark_probe,
+	.remove = pmd_ark_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_ark, pmd_ark_drv);
+RTE_PMD_REGISTER_ALIAS(net_ark, eth_ark);
+RTE_PMD_REGISTER_PCI(eth_ark, rte_ark_pmd.pci_drv);
+RTE_PMD_REGISTER_KMOD_DEP(net_ark, "* igb_uio | uio_pci_generic ");
+RTE_PMD_REGISTER_PCI_TABLE(eth_ark, pci_id_ark_map);
+RTE_PMD_REGISTER_PARAM_STRING(eth_ark,
+			      ARK_PKTGEN_ARG "=<filename> "
+			      ARK_PKTCHKR_ARG "=<filename> "
+			      ARK_PKTDIR_ARG "=<bitmap>");
+
+
diff --git a/drivers/net/ark/ark_ethdev.h b/drivers/net/ark/ark_ethdev.h
new file mode 100644
index 0000000..08d7fb1
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.h
@@ -0,0 +1,39 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_ETHDEV_H_
+#define _ARK_ETHDEV_H_
+
+/* STUB */
+
+#endif
diff --git a/drivers/net/ark/ark_global.h b/drivers/net/ark/ark_global.h
new file mode 100644
index 0000000..7cd62d5
--- /dev/null
+++ b/drivers/net/ark/ark_global.h
@@ -0,0 +1,108 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_GLOBAL_H_
+#define _ARK_GLOBAL_H_
+
+#include <time.h>
+#include <assert.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_cycles.h>
+#include <rte_kvargs.h>
+#include <rte_dev.h>
+#include <rte_version.h>
+
+#define ETH_ARK_ARG_MAXLEN	64
+#define ARK_SYSCTRL_BASE  0x0
+#define ARK_PKTGEN_BASE   0x10000
+#define ARK_MPU_RX_BASE   0x20000
+#define ARK_UDM_BASE      0x30000
+#define ARK_MPU_TX_BASE   0x40000
+#define ARK_DDM_BASE      0x60000
+#define ARK_CMAC_BASE     0x80000
+#define ARK_PKTDIR_BASE   0xa0000
+#define ARK_PKTCHKR_BASE  0x90000
+#define ARK_RCPACING_BASE 0xb0000
+#define ARK_EXTERNAL_BASE 0x100000
+#define ARK_MPU_QOFFSET   0x00100
+#define ARK_MAX_PORTS     8
+
+#define offset8(n)     n
+#define offset16(n)   ((n) / 2)
+#define offset32(n)   ((n) / 4)
+#define offset64(n)   ((n) / 8)
+
+/*
+ * Structure to store private data for each PF/VF instance.
+ */
+#define def_ptr(type, name) \
+	union type {		   \
+		uint64_t *t64;	   \
+		uint32_t *t32;	   \
+		uint16_t *t16;	   \
+		uint8_t  *t8;	   \
+		void     *v;	   \
+	} name
+
+struct ark_port {
+	struct rte_eth_dev *eth_dev;
+	int id;
+};
+
+struct ark_adapter {
+	/* User extension private data */
+	void *user_data;
+
+	struct ark_port port[ARK_MAX_PORTS];
+	int num_ports;
+
+	/* Common for both PF and VF */
+	struct rte_eth_dev *eth_dev;
+
+	void *d_handle;
+
+	/* Our Bar 0 */
+	uint8_t *bar0;
+
+	/* Application Bar */
+	uint8_t *a_bar;
+};
+
+typedef uint32_t *ark_t;
+
+#endif
diff --git a/drivers/net/ark/rte_pmd_ark_version.map b/drivers/net/ark/rte_pmd_ark_version.map
new file mode 100644
index 0000000..7f84780
--- /dev/null
+++ b/drivers/net/ark/rte_pmd_ark_version.map
@@ -0,0 +1,4 @@
+DPDK_2.0 {
+	 local: *;
+
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0e0b600..da23898 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -104,6 +104,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD)      += -lrte_pmd_bnx2x -lz
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNXT_PMD)       += -lrte_pmd_bnxt
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
1.9.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/7] net/ark: PMD for Atomic Rules Arkville driver stub
  2017-03-21 21:43  3% [dpdk-dev] [PATCH v3 1/7] net/ark: PMD for Atomic Rules Arkville driver stub Ed Czeck
@ 2017-03-22 18:16  0% ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-03-22 18:16 UTC (permalink / raw)
  To: Ed Czeck, dev

On 3/21/2017 9:43 PM, Ed Czeck wrote:
> Enable Arkville on supported configurations
> Add overview documentation
> Minimum driver support for valid compile
> 
> 
> Signed-off-by: Ed Czeck <ed.czeck@atomicrules.com>
> ---
>  MAINTAINERS                                 |   8 +
>  config/common_base                          |  11 ++
>  config/defconfig_arm-armv7a-linuxapp-gcc    |   1 +
>  config/defconfig_ppc_64-power8-linuxapp-gcc |   1 +
>  doc/guides/nics/ark.rst                     | 237 +++++++++++++++++++++++
>  doc/guides/nics/features/ark.ini            |  15 ++
>  doc/guides/nics/index.rst                   |   1 +
>  drivers/net/Makefile                        |   1 +
>  drivers/net/ark/Makefile                    |  63 +++++++
>  drivers/net/ark/ark_debug.h                 |  74 ++++++++
>  drivers/net/ark/ark_ethdev.c                | 281 ++++++++++++++++++++++++++++
>  drivers/net/ark/ark_ethdev.h                |  39 ++++
>  drivers/net/ark/ark_global.h                | 108 +++++++++++
>  drivers/net/ark/rte_pmd_ark_version.map     |   4 +
>  mk/rte.app.mk                               |   1 +
>  15 files changed, 845 insertions(+)
>  create mode 100644 doc/guides/nics/ark.rst
>  create mode 100644 doc/guides/nics/features/ark.ini
>  create mode 100644 drivers/net/ark/Makefile
>  create mode 100644 drivers/net/ark/ark_debug.h
>  create mode 100644 drivers/net/ark/ark_ethdev.c
>  create mode 100644 drivers/net/ark/ark_ethdev.h
>  create mode 100644 drivers/net/ark/ark_global.h
>  create mode 100644 drivers/net/ark/rte_pmd_ark_version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0c78b58..8043d75 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -278,6 +278,14 @@ M: Evgeny Schemeilin <evgenys@amazon.com>
>  F: drivers/net/ena/
>  F: doc/guides/nics/ena.rst
>  
> +Atomic Rules ark

Should prefer uppercase "ARK" here?

> +M: Shepard Siegel <shepard.siegel@atomicrules.com>
> +M: Ed Czeck       <ed.czeck@atomicrules.com>
> +M: John Miller    <john.miller@atomicrules.com>
> +F: /drivers/net/ark/

Can you please drop the leading "/". There is a script
"check-maintainers.sh", which is broken with that.

> +F: doc/guides/nics/ark.rst
> +F: doc/guides/nics/features/ark.ini
> +
>  Broadcom bnxt
>  M: Stephen Hurd <stephen.hurd@broadcom.com>
>  M: Ajit Khaparde <ajit.khaparde@broadcom.com>
> diff --git a/config/common_base b/config/common_base
> index 37aa1e1..0916c44 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -353,6 +353,17 @@ CONFIG_RTE_LIBRTE_QEDE_FW=""
>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
>  
>  #
> +# Compile ARK PMD
> +#
> +CONFIG_RTE_LIBRTE_ARK_PMD=y
> +CONFIG_RTE_LIBRTE_ARK_PAD_TX=y
> +CONFIG_RTE_LIBRTE_ARK_DEBUG_RX=n
> +CONFIG_RTE_LIBRTE_ARK_DEBUG_TX=n
> +CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS=n
> +CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE=n
> +
> +

Extra line

> +#
>  # Compile the TAP PMD
>  # It is enabled by default for Linux only.
>  #
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> index d9bd2a8..6d2b5e0 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -61,6 +61,7 @@ CONFIG_RTE_SCHED_VECTOR=n
>  
>  # cannot use those on ARM
>  CONFIG_RTE_KNI_KMOD=n
> +CONFIG_RTE_LIBRTE_ARK_PMD=n
>  CONFIG_RTE_LIBRTE_EM_PMD=n
>  CONFIG_RTE_LIBRTE_IGB_PMD=n
>  CONFIG_RTE_LIBRTE_CXGBE_PMD=n
> diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc b/config/defconfig_ppc_64-power8-linuxapp-gcc
> index 35f7fb6..89bc396 100644
> --- a/config/defconfig_ppc_64-power8-linuxapp-gcc
> +++ b/config/defconfig_ppc_64-power8-linuxapp-gcc
> @@ -48,6 +48,7 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
>  
>  # Note: Initially, all of the PMD drivers compilation are turned off on Power
>  # Will turn on them only after the successful testing on Power
> +CONFIG_RTE_LIBRTE_ARK_PMD=n

Is it not tested or known that it is not supported?

>  CONFIG_RTE_LIBRTE_IXGBE_PMD=n
>  CONFIG_RTE_LIBRTE_I40E_PMD=n
>  CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
> diff --git a/doc/guides/nics/ark.rst b/doc/guides/nics/ark.rst
> new file mode 100644
> index 0000000..72fb8d6
> --- /dev/null
> +++ b/doc/guides/nics/ark.rst
> @@ -0,0 +1,237 @@
> +.. BSD LICENSE
> +
> +    Copyright (c) 2015-2017 Atomic Rules LLC
> +    All rights reserved.
> +
> +    Redistribution and use in source and binary forms, with or without
> +    modification, are permitted provided that the following conditions
> +    are met:
> +
> +    * Redistributions of source code must retain the above copyright
> +    notice, this list of conditions and the following disclaimer.
> +    * Redistributions in binary form must reproduce the above copyright
> +    notice, this list of conditions and the following disclaimer in
> +    the documentation and/or other materials provided with the
> +    distribution.
> +    * Neither the name of Atomic Rules LLC nor the names of its
> +    contributors may be used to endorse or promote products derived
> +    from this software without specific prior written permission.
> +
> +    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +ARK Poll Mode Driver
> +====================
> +
> +The ARK PMD is a DPDK poll-mode driver for the Atomic Rules Arkville
> +(ARK) family of devices.
> +
> +More information can be found at the `Atomic Rules website
> +<http://atomicrules.com>`_.
> +
> +Overview
> +--------
> +
> +The Atomic Rules Arkville product is DPDK and AXI compliant product
> +that marshals packets across a PCIe conduit between host DPDK mbufs and
> +FPGA AXI streams.
> +
> +The ARK PMD, and the spirit of the overall Arkville product,
> +has been to take the DPDK API/ABI as a fixed specification;
> +then implement much of the business logic in FPGA RTL circuits.
> +The approach of *working backwards* from the DPDK API/ABI and having
> +the GPP host software *dictate*, while the FPGA hardware *copes*,
> +results in significant performance gains over a naive implementation.
> +
> +While this document describes the ARK PMD software, it is helpful to
> +understand what the FPGA hardware is and is not. The Arkville RTL
> +component provides a single PCIe Physical Function (PF) supporting
> +some number of RX/Ingress and TX/Egress Queues. The ARK PMD controls
> +the Arkville core through a dedicated opaque Core BAR (CBAR).
> +To allow users full freedom for their own FPGA application IP,
> +an independent FPGA Application BAR (ABAR) is provided.
> +
> +One popular way to imagine Arkville's FPGA hardware aspect is as the
> +FPGA PCIe-facing side of a so-called Smart NIC. The Arkville core does
> +not contain any MACs, and is link-speed independent, as well as
> +agnostic to the number of physical ports the application chooses to
> +use. The ARK driver exposes the familiar PMD interface to allow packet
> +movement to and from mbufs across multiple queues.
> +
> +However FPGA RTL applications could contain a universe of added
> +functionality that an Arkville RTL core does not provide or can
> +not anticipate. To allow for this expectation of user-defined
> +innovation, the ARK PMD provides a dynamic mechanism of adding
> +capabilities without having to modify the ARK PMD.
> +
> +The ARK PMD is intended to support all instances of the Arkville
> +RTL Core, regardless of configuration, FPGA vendor, or target
> +board. While specific capabilities such as number of physical
> +hardware queue-pairs are negotiated; the driver is designed to
> +remain constant over a broad and extendable feature set.
> +
> +Intentionally, Arkville by itself DOES NOT provide common NIC
> +capabilities such as offload or receive-side scaling (RSS).
> +These capabilities would be viewed as a gate-level "tax" on
> +Green-box FPGA applications that do not require such function.
> +Instead, they can be added as needed with essentially no
> +overhead to the FPGA Application.
> +
> +Data Path Interface
> +-------------------
> +
> +Ingress RX and Egress TX operation is by the nominal DPDK API .
> +The driver supports single-port, multi-queue for both RX and TX.
> +
> +Refer to ``ark_ethdev.h`` for the list of supported methods to
> +act upon RX and TX Queues.
> +
> +Configuration Information
> +-------------------------
> +
> +**DPDK Configuration Parameters**
> +
> +  The following configuration options are available for the ARK PMD:
> +
> +   * **CONFIG_RTE_LIBRTE_ARK_PMD** (default y): Enables or disables inclusion
> +     of the ARK PMD driver in the DPDK compilation.

Missing "CONFIG_RTE_LIBRTE_ARK_PAD_TX"

> +
> +   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_RX** (default n): Enables or disables debug
> +     logging and internal checking of RX ingress logic within the ARK PMD driver.
> +
> +   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TX** (default n): Enables or disables debug
> +     logging and internal checking of TX egress logic within the ARK PMD driver.
> +
> +   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS** (default n): Enables or disables debug
> +     logging of detailed packet and performance statistics gathered in
> +     the PMD and FPGA.
> +
> +   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE** (default n): Enables or disables debug
> +     logging of detailed PMD events and status.
> +
> +

Can you also please document the device arguments in this file?

> +Building DPDK
> +-------------
> +
> +See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for
> +instructions on how to build DPDK.
> +
> +By default the ARK PMD library will be built into the DPDK library.
> +
> +For configuring and using UIO and VFIO frameworks, please also refer :ref:`the
> +documentation that comes with DPDK suite <linux_gsg>`.
> +
> +Supported ARK RTL PCIe Instances
> +--------------------------------
> +
> +ARK PMD supports the following Arkville RTL PCIe instances including:
> +
> +* ``1d6c:100d`` - AR-ARKA-FX0 [Arkville 32B DPDK Data Mover]
> +* ``1d6c:100e`` - AR-ARKA-FX1 [Arkville 64B DPDK Data Mover]
> +
> +Supported Operating Systems
> +---------------------------
> +
> +Any Linux distribution fulfilling the conditions described in ``System Requirements``
> +section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK Release Notes*.
> +
> +Supported Features
> +------------------
> +
> +* Dynamic ARK PMD extensions
> +* Multiple receive and transmit queues
> +* Jumbo frames up to 9K
> +* Hardware Statistics
> +
> +Unsupported Features
> +--------------------
> +
> +Features that may be part of, or become part of, the Arkville RTL IP that are
> +not currently supported or exposed by the ARK PMD include:
> +
> +* PCIe SR-IOV Virtual Functions (VFs)
> +* Arkville's Packet Generator Control and Status
> +* Arkville's Packet Director Control and Status
> +* Arkville's Packet Checker Control and Status
> +* Arkville's Timebase Management
> +
> +Pre-Requisites
> +--------------
> +
> +#. Prepare the system as recommended by DPDK suite.  This includes environment
> +   variables, hugepages configuration, tool-chains and configuration
> +
> +#. Insert igb_uio kernel module using the command 'modprobe igb_uio'
> +
> +#. Bind the intended ARK device to igb_uio module
> +
> +At this point the system should be ready to run DPDK applications. Once the
> +application runs to completion, the ARK PMD can be detached from igb_uio if necessary.
> +
> +Usage Example
> +-------------
> +
> +This section demonstrates how to launch **testpmd** with Atomic Rules ARK
> +devices managed by librte_pmd_ark.
> +
> +#. Load the kernel modules:
> +
> +   .. code-block:: console
> +
> +      modprobe uio
> +      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
> +
> +   .. note::
> +
> +      The ARK PMD driver depends upon the igb_uio user space I/O kernel module
> +
> +#. Mount and request huge pages:
> +
> +   .. code-block:: console
> +
> +      mount -t hugetlbfs nodev /mnt/huge
> +      echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
> +
> +#. Bind UIO driver to ARK device at 0000:01:00.0 (using dpdk-devbind.py):
> +
> +   .. code-block:: console
> +
> +      ./usertools/dpdk-devbind.py --bind=igb_uio 0000:01:00.0
> +
> +   .. note::
> +
> +      The last argument to dpdk-devbind.py is the 4-tuple that indentifies a specific PCIe
> +      device. You can use lspci -d 1d6c: to indentify all Atomic Rules devices in the system,
> +      and thus determine the correct 4-tuple argument to dpdk-devbind.py
> +
> +#. Start testpmd with basic parameters:
> +
> +   .. code-block:: console
> +
> +      ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -- -i
> +
> +   Example output:
> +
> +   .. code-block:: console
> +
> +      [...]
> +      EAL: PCI device 0000:01:00.0 on NUMA socket -1
> +      EAL:   probe driver: 1d6c:100e rte_ark_pmd
> +      EAL:   PCI memory mapped at 0x7f9b6c400000
> +      PMD: eth_ark_dev_init(): Initializing 0:2:0.1
> +      ARKP PMD CommitID: 378f3a67
> +      Configuring Port 0 (socket 0)
> +      Port 0: DC:3C:F6:00:00:01
> +      Checking link statuses...
> +      Port 0 Link Up - speed 100000 Mbps - full-duplex
> +      Done
> +      testpmd>
> diff --git a/doc/guides/nics/features/ark.ini b/doc/guides/nics/features/ark.ini
> new file mode 100644
> index 0000000..dc8a0e2
> --- /dev/null
> +++ b/doc/guides/nics/features/ark.ini
> @@ -0,0 +1,15 @@
> +;
> +; Supported features of the 'ark' poll mode driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Queue start/stop     = Y
> +Jumbo frame          = Y
> +Scattered Rx         = Y
> +Basic stats          = Y
> +Stats per queue      = Y
> +FW version           = Y

Features can be added with the patch that adds functionality. I believe
above features not supported with current patch.

> +Linux UIO            = Y
> +x86-64               = Y
> +Usage doc            = Y
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index 87f9334..381d82c 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -36,6 +36,7 @@ Network Interface Controller Drivers
>      :numbered:
>  
>      overview
> +    ark
>      bnx2x
>      bnxt
>      cxgbe
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index a16f25e..ea9868b 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -32,6 +32,7 @@
>  include $(RTE_SDK)/mk/rte.vars.mk
>  
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
> +DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
>  DIRS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += bnx2x
>  DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += bonding
>  DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
> diff --git a/drivers/net/ark/Makefile b/drivers/net/ark/Makefile
> new file mode 100644
> index 0000000..217bd34
> --- /dev/null
> +++ b/drivers/net/ark/Makefile
> @@ -0,0 +1,63 @@
> +# BSD LICENSE
> +#
> +# Copyright (c) 2015-2017 Atomic Rules LLC
> +# All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +#
> +# * Redistributions of source code must retain the above copyright
> +#   notice, this list of conditions and the following disclaimer.
> +# * Redistributions in binary form must reproduce the above copyright
> +#   notice, this list of conditions and the following disclaimer in
> +#   the documentation and/or other materials provided with the
> +#   distribution.
> +# * Neither the name of copyright holder nor the names of its
> +#   contributors may be used to endorse or promote products derived
> +#   from this software without specific prior written permission.
> +#
> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_ark.a
> +
> +CFLAGS += -O3 -I./
> +CFLAGS += $(WERROR_FLAGS)
> +
> +EXPORT_MAP := rte_pmd_ark_version.map
> +
> +LIBABIVER := 1
> +
> +#
> +# all source are stored in SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD)

No need to put config option to comment, SRCS-y looks more proper.

> +#
> +
> +SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev.c
> +
> +
> +# this lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mbuf
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_kvargs
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_eal
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mempool

> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/libpthread
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/libdl

DEPDIRS is for internal library dependencies. Please use LDLIBS for
external dependencies, like:

LDLIBS += -lpthread
LDLIBS += -ldl

> +
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/ark/ark_debug.h b/drivers/net/ark/ark_debug.h
> new file mode 100644
> index 0000000..a108c28
> --- /dev/null
> +++ b/drivers/net/ark/ark_debug.h
> @@ -0,0 +1,74 @@
> +/*-
> + * BSD LICENSE
> + *
> + * Copyright (c) 2015-2017 Atomic Rules LLC
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the documentation and/or other materials provided with the
> + * distribution.
> + * * Neither the name of copyright holder nor the names of its
> + * contributors may be used to endorse or promote products derived
> + * from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _ARK_DEBUG_H_
> +#define _ARK_DEBUG_H_
> +
> +#include <inttypes.h>
> +#include <rte_log.h>
> +
> +/* Format specifiers for string data pairs */
> +#define ARK_SU32  "\n\t%-20s    %'20" PRIu32
> +#define ARK_SU64  "\n\t%-20s    %'20" PRIu64
> +#define ARK_SU64X "\n\t%-20s    %#20" PRIx64
> +#define ARK_SPTR  "\n\t%-20s    %20p"
> +
> +#define ARK_TRACE_ON(fmt, ...) \
> +	PMD_DRV_LOG(ERR, fmt, ##__VA_ARGS__)
> +
> +#define ARK_TRACE_OFF(fmt, ...) \
> +	do {if (0) PMD_DRV_LOG(ERR, fmt, ##__VA_ARGS__); } while (0)
> +
> +/* Debug macro for reporting Packet stats */
> +#ifdef RTE_LIBRTE_ARK_DEBUG_STATS
> +#define ARK_DEBUG_STATS(fmt, ...) ARK_TRACE_ON(fmt, ##__VA_ARGS__)
> +#else
> +#define ARK_DEBUG_STATS(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
> +#endif
> +
> +/* Debug macro for tracing full behavior*/
> +#ifdef RTE_LIBRTE_ARK_DEBUG_TRACE
> +#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_ON(fmt, ##__VA_ARGS__)
> +#else
> +#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
> +#endif
> +
> +#ifdef ARK_STD_LOG

How this define passed? Should it be something like
RTE_LIBRTE_ARK_DEBUG_DRIVER config option?

> +#define PMD_DRV_LOG(level, fmt, args...) \
> +	fprintf(stderr, fmt, args)

It is possible to use rte log functions instead of fprintf to stderr.

> +#else
> +#define PMD_DRV_LOG(level, fmt, args...) \
> +	RTE_LOG(level, PMD, "%s(): " fmt, __func__, ## args)
> +#endif
> +
> +#endif

CONFIG_RTE_LIBRTE_ARK_DEBUG_RX, CONFIG_RTE_LIBRTE_ARK_DEBUG_TX not used,
if so can be removed.

> diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
> new file mode 100644
> index 0000000..124b73c
> --- /dev/null
> +++ b/drivers/net/ark/ark_ethdev.c
> @@ -0,0 +1,281 @@
> +/*-
> + * BSD LICENSE
> + *
> + * Copyright (c) 2015-2017 Atomic Rules LLC
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the documentation and/or other materials provided with the
> + * distribution.
> + * * Neither the name of copyright holder nor the names of its
> + * contributors may be used to endorse or promote products derived
> + * from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <unistd.h>
> +#include <sys/stat.h>
> +#include <dlfcn.h>
> +
> +#include <rte_kvargs.h>
> +#include <rte_vdev.h>
> +
> +#include "ark_global.h"
> +#include "ark_debug.h"
> +#include "ark_ethdev.h"
> +
> +/*  Internal prototypes */
> +static int eth_ark_check_args(const char *params);
> +static int eth_ark_dev_init(struct rte_eth_dev *dev);
> +static int eth_ark_dev_uninit(struct rte_eth_dev *eth_dev);
> +static int eth_ark_dev_configure(struct rte_eth_dev *dev);
> +static void eth_ark_dev_info_get(struct rte_eth_dev *dev,
> +				 struct rte_eth_dev_info *dev_info);
> +
> +
> +#define ARK_DEV_TO_PCI(eth_dev)			\
> +	RTE_DEV_TO_PCI((eth_dev)->device)
> +
> +#define ARK_MAX_ARG_LEN 256
> +static uint32_t pkt_dir_v;
> +static char pkt_gen_args[ARK_MAX_ARG_LEN];
> +static char pkt_chkr_args[ARK_MAX_ARG_LEN];
> +
> +#define ARK_PKTGEN_ARG "Pkt_gen"
> +#define ARK_PKTCHKR_ARG "Pkt_chkr"
> +#define ARK_PKTDIR_ARG "Pkt_dir"

Is it possible to add one line comments to device arguments. For example
what "Pkt_dir" (packet director) is for?

> +
> +static const char * const valid_arguments[] = {
> +	ARK_PKTGEN_ARG,
> +	ARK_PKTCHKR_ARG,
> +	ARK_PKTDIR_ARG,
> +	"iface",

Why not make this one too a define?

> +	NULL
> +};
> +
> +#define MAX_ARK_PHYS 16
> +struct ark_adapter *gark[MAX_ARK_PHYS];
> +
> +static const struct rte_pci_id pci_id_ark_map[] = {
> +	{RTE_PCI_DEVICE(0x1d6c, 0x100d)},
> +	{RTE_PCI_DEVICE(0x1d6c, 0x100e)},
> +	{.vendor_id = 0, /* sentinel */ },
> +};
> +
> +static struct eth_driver rte_ark_pmd = {
> +	.pci_drv = {
> +		.probe = rte_eth_dev_pci_probe,
> +		.remove = rte_eth_dev_pci_remove,
> +		.id_table = pci_id_ark_map,
> +		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC
> +	},
> +	.eth_dev_init = eth_ark_dev_init,
> +	.eth_dev_uninit = eth_ark_dev_uninit,
> +	.dev_private_size = sizeof(struct ark_adapter),
> +};
> +
> +static const struct eth_dev_ops ark_eth_dev_ops = {
> +	.dev_configure = eth_ark_dev_configure,
> +	.dev_infos_get = eth_ark_dev_info_get,
> +

Extra line.

> +};
> +
> +

Extra line.

> +static int
> +eth_ark_dev_init(struct rte_eth_dev *dev __rte_unused)
> +{
> +	return -1;					/* STUB */

You may want to set ark_eth_dev_ops here, since they already implemented.

And for proper .dev_infos_get implementation, you may want to have [1] here:

[1]
rte_eth_copy_pci_info(eth_dev, pci_dev);
eth_dev->data->dev_flags |= RTE_ETH_DEV_DETACHABLE;

Also you may want to parse device arguments in this stage.

> +}
> +
> +
> +static int
> +eth_ark_dev_uninit(struct rte_eth_dev *dev)
> +{
> +	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
> +		return 0;
> +
> +	dev->dev_ops = NULL;
> +	dev->rx_pkt_burst = NULL;
> +	dev->tx_pkt_burst = NULL;
> +	return 0;
> +}
> +
> +static int
> +eth_ark_dev_configure(struct rte_eth_dev *dev __rte_unused)
> +{
> +	ARK_DEBUG_TRACE("ARKP: In %s\n", __func__);
> +	return 0;
> +}
> +
> +static void
> +eth_ark_dev_info_get(struct rte_eth_dev *dev,
> +		     struct rte_eth_dev_info *dev_info)
> +{
> +	/* device specific configuration */
> +	memset(dev_info, 0, sizeof(*dev_info));

memset not required, since already done by ethdev abstraction layer,
specially desc_lim values already overwritten below.

> +
> +	dev_info->max_rx_pktlen = (16 * 1024) - 128;
> +	dev_info->min_rx_bufsize = 1024;

Using some macros instead of hardcoded values helps to understand values.

> +	dev_info->rx_offload_capa = 0;
> +	dev_info->tx_offload_capa = 0;
> +
> +	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
> +		.nb_max = 4096 * 4,
> +		.nb_min = 512,	/* HW Q size for RX */
> +		.nb_align = 2,};
> +
> +	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
> +		.nb_max = 4096 * 4,
> +		.nb_min = 256,	/* HW Q size for TX */
> +		.nb_align = 2,};
> +
> +	dev_info->rx_offload_capa = 0;
> +	dev_info->tx_offload_capa = 0;

Dublication, please check ~10 lines above.
Also not required to set 0 at all because of memset.

> +
> +	/* ARK PMD supports all line rates, how do we indicate that here ?? */
> +	dev_info->speed_capa = (ETH_LINK_SPEED_1G |
> +				ETH_LINK_SPEED_10G |
> +				ETH_LINK_SPEED_25G |
> +				ETH_LINK_SPEED_40G |
> +				ETH_LINK_SPEED_50G |
> +				ETH_LINK_SPEED_100G);
> +	dev_info->pci_dev = ARK_DEV_TO_PCI(dev);
> +	dev_info->driver_name = dev->data->drv_name;

setting driver_name not required, ethdev layer will overwrite this value.

And to have driver_name correct, rte_eth_copy_pci_info() should be
called, please check above [1].

> +}
> +
> +
> +static inline int
> +process_pktdir_arg(const char *key, const char *value,
> +		   void *extra_args __rte_unused)
> +{
> +	ARK_DEBUG_TRACE("In process_pktdir_arg, key = %s, value = %s\n",
> +			key, value);

The general usage of DEBUG_TRACE is providing backtrace log, function
enterance / exit informations. I guess, that is why it has been
controlled by different config option.
Here what you need looks like regular debugging functions, PMD_DRV_LOG /
RTE_LOG variant.

> +	pkt_dir_v = strtol(value, NULL, 16);
> +	ARK_DEBUG_TRACE("pkt_dir_v = 0x%x\n", pkt_dir_v);
> +	return 0;
> +}
> +
> +static inline int
> +process_file_args(const char *key, const char *value, void *extra_args)
> +{
> +	ARK_DEBUG_TRACE("**** IN process_pktgen_arg, key = %s, value = %s\n",
> +			key, value);
> +	char *args = (char *)extra_args;
> +
> +	/* Open the configuration file */
> +	FILE *file = fopen(value, "r");
> +	char line[256];
> +	int first = 1;
> +
> +	while (fgets(line, sizeof(line), file)) {
> +		/* ARK_DEBUG_TRACE("%s\n", line); */

Please remove dead code.

> +		if (first) {
> +			strncpy(args, line, ARK_MAX_ARG_LEN);

Can this overflow args variable? Any way to prevent possible crash?

> +			first = 0;
> +		} else {
> +			strncat(args, line, ARK_MAX_ARG_LEN);
> +		}
> +	}
> +	ARK_DEBUG_TRACE("file = %s\n", args);
> +	fclose(file);
> +	return 0;
> +}
> +
> +static int
> +eth_ark_check_args(const char *params)
> +{
> +	struct rte_kvargs *kvlist;
> +	unsigned int k_idx;
> +	struct rte_kvargs_pair *pair = NULL;
> +
> +	kvlist = rte_kvargs_parse(params, valid_arguments);
> +	if (kvlist == NULL)
> +		return 0;
> +
> +	pkt_gen_args[0] = 0;
> +	pkt_chkr_args[0] = 0;
> +
> +	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
> +		pair = &kvlist->pairs[k_idx];
> +		ARK_DEBUG_TRACE("**** Arg passed to PMD = %s:%s\n", pair->key,
> +				pair->value);
> +	}
> +
> +	if (rte_kvargs_process(kvlist,
> +			       ARK_PKTDIR_ARG,
> +			       &process_pktdir_arg,
> +			       NULL) != 0) {
> +		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTDIR_ARG);
> +	}
> +
> +	if (rte_kvargs_process(kvlist,
> +			       ARK_PKTGEN_ARG,
> +			       &process_file_args,
> +			       pkt_gen_args) != 0) {
> +		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTGEN_ARG);
> +	}
> +
> +	if (rte_kvargs_process(kvlist,
> +			       ARK_PKTCHKR_ARG,
> +			       &process_file_args,
> +			       pkt_chkr_args) != 0) {
> +		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTCHKR_ARG);
> +	}

Not processing "iface" device argument?

> +
> +	ARK_DEBUG_TRACE("INFO: packet director set to 0x%x\n", pkt_dir_v);
> +
> +	return 1;
> +}
> +
> +
> +/*
> + * PCIE

Can you elaborate this comment?

> + */
> +static int
> +pmd_ark_probe(const char *name, const char *params)
> +{
> +	RTE_LOG(INFO, PMD, "Initializing pmd_ark for %s params = %s\n", name,
> +		params);
> +
> +	/* Parse off the v index */
> +
> +	eth_ark_check_args(params);
> +	return 0;
> +}
> +
> +static int
> +pmd_ark_remove(const char *name)
> +{
> +	RTE_LOG(INFO, PMD, "Closing ark %s ethdev on numa socket %u\n", name,
> +		rte_socket_id());
> +	return 1;
> +}
> +
> +static struct rte_vdev_driver pmd_ark_drv = {
> +	.probe = pmd_ark_probe,
> +	.remove = pmd_ark_remove,
> +};

Sorry, I am confused here.
Why both virtual and physical initialization routine exists together?
This PMD for physical PCI device, right?

> +
> +RTE_PMD_REGISTER_VDEV(net_ark, pmd_ark_drv);
> +RTE_PMD_REGISTER_ALIAS(net_ark, eth_ark);
> +RTE_PMD_REGISTER_PCI(eth_ark, rte_ark_pmd.pci_drv);
> +RTE_PMD_REGISTER_KMOD_DEP(net_ark, "* igb_uio | uio_pci_generic ");
> +RTE_PMD_REGISTER_PCI_TABLE(eth_ark, pci_id_ark_map);

Can add RTE_PMD_REGISTER_PARAM_STRING macro.

> diff --git a/drivers/net/ark/ark_ethdev.h b/drivers/net/ark/ark_ethdev.h
> new file mode 100644
> index 0000000..08d7fb1
> --- /dev/null
> +++ b/drivers/net/ark/ark_ethdev.h
> @@ -0,0 +1,39 @@
> +/*-
> + * BSD LICENSE
> + *
> + * Copyright (c) 2015-2017 Atomic Rules LLC
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the documentation and/or other materials provided with the
> + * distribution.
> + * * Neither the name of copyright holder nor the names of its
> + * contributors may be used to endorse or promote products derived
> + * from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _ARK_ETHDEV_H_
> +#define _ARK_ETHDEV_H_
> +
> +/* STUB */
> +
> +#endif
> diff --git a/drivers/net/ark/ark_global.h b/drivers/net/ark/ark_global.h
> new file mode 100644
> index 0000000..7cd62d5
> --- /dev/null
> +++ b/drivers/net/ark/ark_global.h
> @@ -0,0 +1,108 @@
> +/*-
> + * BSD LICENSE
> + *
> + * Copyright (c) 2015-2017 Atomic Rules LLC
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the documentation and/or other materials provided with the
> + * distribution.
> + * * Neither the name of copyright holder nor the names of its
> + * contributors may be used to endorse or promote products derived
> + * from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _ARK_GLOBAL_H_
> +#define _ARK_GLOBAL_H_
> +
> +#include <time.h>
> +#include <assert.h>
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +#include <rte_memcpy.h>
> +#include <rte_string_fns.h>
> +#include <rte_cycles.h>
> +#include <rte_kvargs.h>
> +#include <rte_dev.h>
> +#include <rte_version.h>
> +
> +#define ETH_ARK_ARG_MAXLEN	64
> +#define ARK_SYSCTRL_BASE  0x0
> +#define ARK_PKTGEN_BASE   0x10000
> +#define ARK_MPU_RX_BASE   0x20000
> +#define ARK_UDM_BASE      0x30000
> +#define ARK_MPU_TX_BASE   0x40000
> +#define ARK_DDM_BASE      0x60000
> +#define ARK_CMAC_BASE     0x80000
> +#define ARK_PKTDIR_BASE   0xa0000
> +#define ARK_PKTCHKR_BASE  0x90000
> +#define ARK_RCPACING_BASE 0xb0000
> +#define ARK_EXTERNAL_BASE 0x100000
> +#define ARK_MPU_QOFFSET   0x00100
> +#define ARK_MAX_PORTS     8
> +
> +#define offset8(n)     n
> +#define offset16(n)   ((n) / 2)
> +#define offset32(n)   ((n) / 4)
> +#define offset64(n)   ((n) / 8)
> +
> +/*
> + * Structure to store private data for each PF/VF instance.
> + */
> +#define def_ptr(type, name) \
> +	union type {		   \
> +		uint64_t *t64;	   \
> +		uint32_t *t32;	   \
> +		uint16_t *t16;	   \
> +		uint8_t  *t8;	   \
> +		void     *v;	   \
> +	} name
> +
> +struct ark_port {
> +	struct rte_eth_dev *eth_dev;
> +	int id;
> +};
> +
> +struct ark_adapter {
> +	/* User extension private data */
> +	void *user_data;
> +
> +	struct ark_port port[ARK_MAX_PORTS];
> +	int num_ports;
> +
> +	/* Common for both PF and VF */
> +	struct rte_eth_dev *eth_dev;
> +
> +	void *d_handle;
> +
> +	/* Our Bar 0 */
> +	uint8_t *bar0;
> +
> +	/* Application Bar */
> +	uint8_t *a_bar;
> +};
> +
> +typedef uint32_t *ark_t;
> +
> +#endif
> diff --git a/drivers/net/ark/rte_pmd_ark_version.map b/drivers/net/ark/rte_pmd_ark_version.map
> new file mode 100644
> index 0000000..7f84780
> --- /dev/null
> +++ b/drivers/net/ark/rte_pmd_ark_version.map
> @@ -0,0 +1,4 @@
> +DPDK_2.0 {

DPDK_17.05

> +	 local: *;
> +
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 0e0b600..da23898 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -104,6 +104,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
>  # plugins (link only if static libraries)
>  
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD)      += -lrte_pmd_bnx2x -lz
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_BNXT_PMD)       += -lrte_pmd_bnxt
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 1/7] net/ark: PMD for Atomic Rules Arkville driver stub
@ 2017-03-21 21:43  3% Ed Czeck
  2017-03-22 18:16  0% ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ed Czeck @ 2017-03-21 21:43 UTC (permalink / raw)
  To: dev; +Cc: Ed Czeck

Enable Arkville on supported configurations
Add overview documentation
Minimum driver support for valid compile


Signed-off-by: Ed Czeck <ed.czeck@atomicrules.com>
---
 MAINTAINERS                                 |   8 +
 config/common_base                          |  11 ++
 config/defconfig_arm-armv7a-linuxapp-gcc    |   1 +
 config/defconfig_ppc_64-power8-linuxapp-gcc |   1 +
 doc/guides/nics/ark.rst                     | 237 +++++++++++++++++++++++
 doc/guides/nics/features/ark.ini            |  15 ++
 doc/guides/nics/index.rst                   |   1 +
 drivers/net/Makefile                        |   1 +
 drivers/net/ark/Makefile                    |  63 +++++++
 drivers/net/ark/ark_debug.h                 |  74 ++++++++
 drivers/net/ark/ark_ethdev.c                | 281 ++++++++++++++++++++++++++++
 drivers/net/ark/ark_ethdev.h                |  39 ++++
 drivers/net/ark/ark_global.h                | 108 +++++++++++
 drivers/net/ark/rte_pmd_ark_version.map     |   4 +
 mk/rte.app.mk                               |   1 +
 15 files changed, 845 insertions(+)
 create mode 100644 doc/guides/nics/ark.rst
 create mode 100644 doc/guides/nics/features/ark.ini
 create mode 100644 drivers/net/ark/Makefile
 create mode 100644 drivers/net/ark/ark_debug.h
 create mode 100644 drivers/net/ark/ark_ethdev.c
 create mode 100644 drivers/net/ark/ark_ethdev.h
 create mode 100644 drivers/net/ark/ark_global.h
 create mode 100644 drivers/net/ark/rte_pmd_ark_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 0c78b58..8043d75 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -278,6 +278,14 @@ M: Evgeny Schemeilin <evgenys@amazon.com>
 F: drivers/net/ena/
 F: doc/guides/nics/ena.rst
 
+Atomic Rules ark
+M: Shepard Siegel <shepard.siegel@atomicrules.com>
+M: Ed Czeck       <ed.czeck@atomicrules.com>
+M: John Miller    <john.miller@atomicrules.com>
+F: /drivers/net/ark/
+F: doc/guides/nics/ark.rst
+F: doc/guides/nics/features/ark.ini
+
 Broadcom bnxt
 M: Stephen Hurd <stephen.hurd@broadcom.com>
 M: Ajit Khaparde <ajit.khaparde@broadcom.com>
diff --git a/config/common_base b/config/common_base
index 37aa1e1..0916c44 100644
--- a/config/common_base
+++ b/config/common_base
@@ -353,6 +353,17 @@ CONFIG_RTE_LIBRTE_QEDE_FW=""
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
 #
+# Compile ARK PMD
+#
+CONFIG_RTE_LIBRTE_ARK_PMD=y
+CONFIG_RTE_LIBRTE_ARK_PAD_TX=y
+CONFIG_RTE_LIBRTE_ARK_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE=n
+
+
+#
 # Compile the TAP PMD
 # It is enabled by default for Linux only.
 #
diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index d9bd2a8..6d2b5e0 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -61,6 +61,7 @@ CONFIG_RTE_SCHED_VECTOR=n
 
 # cannot use those on ARM
 CONFIG_RTE_KNI_KMOD=n
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_EM_PMD=n
 CONFIG_RTE_LIBRTE_IGB_PMD=n
 CONFIG_RTE_LIBRTE_CXGBE_PMD=n
diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc b/config/defconfig_ppc_64-power8-linuxapp-gcc
index 35f7fb6..89bc396 100644
--- a/config/defconfig_ppc_64-power8-linuxapp-gcc
+++ b/config/defconfig_ppc_64-power8-linuxapp-gcc
@@ -48,6 +48,7 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
 
 # Note: Initially, all of the PMD drivers compilation are turned off on Power
 # Will turn on them only after the successful testing on Power
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_IXGBE_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
diff --git a/doc/guides/nics/ark.rst b/doc/guides/nics/ark.rst
new file mode 100644
index 0000000..72fb8d6
--- /dev/null
+++ b/doc/guides/nics/ark.rst
@@ -0,0 +1,237 @@
+.. BSD LICENSE
+
+    Copyright (c) 2015-2017 Atomic Rules LLC
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Atomic Rules LLC nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ARK Poll Mode Driver
+====================
+
+The ARK PMD is a DPDK poll-mode driver for the Atomic Rules Arkville
+(ARK) family of devices.
+
+More information can be found at the `Atomic Rules website
+<http://atomicrules.com>`_.
+
+Overview
+--------
+
+The Atomic Rules Arkville product is DPDK and AXI compliant product
+that marshals packets across a PCIe conduit between host DPDK mbufs and
+FPGA AXI streams.
+
+The ARK PMD, and the spirit of the overall Arkville product,
+has been to take the DPDK API/ABI as a fixed specification;
+then implement much of the business logic in FPGA RTL circuits.
+The approach of *working backwards* from the DPDK API/ABI and having
+the GPP host software *dictate*, while the FPGA hardware *copes*,
+results in significant performance gains over a naive implementation.
+
+While this document describes the ARK PMD software, it is helpful to
+understand what the FPGA hardware is and is not. The Arkville RTL
+component provides a single PCIe Physical Function (PF) supporting
+some number of RX/Ingress and TX/Egress Queues. The ARK PMD controls
+the Arkville core through a dedicated opaque Core BAR (CBAR).
+To allow users full freedom for their own FPGA application IP,
+an independent FPGA Application BAR (ABAR) is provided.
+
+One popular way to imagine Arkville's FPGA hardware aspect is as the
+FPGA PCIe-facing side of a so-called Smart NIC. The Arkville core does
+not contain any MACs, and is link-speed independent, as well as
+agnostic to the number of physical ports the application chooses to
+use. The ARK driver exposes the familiar PMD interface to allow packet
+movement to and from mbufs across multiple queues.
+
+However FPGA RTL applications could contain a universe of added
+functionality that an Arkville RTL core does not provide or can
+not anticipate. To allow for this expectation of user-defined
+innovation, the ARK PMD provides a dynamic mechanism of adding
+capabilities without having to modify the ARK PMD.
+
+The ARK PMD is intended to support all instances of the Arkville
+RTL Core, regardless of configuration, FPGA vendor, or target
+board. While specific capabilities such as number of physical
+hardware queue-pairs are negotiated; the driver is designed to
+remain constant over a broad and extendable feature set.
+
+Intentionally, Arkville by itself DOES NOT provide common NIC
+capabilities such as offload or receive-side scaling (RSS).
+These capabilities would be viewed as a gate-level "tax" on
+Green-box FPGA applications that do not require such function.
+Instead, they can be added as needed with essentially no
+overhead to the FPGA Application.
+
+Data Path Interface
+-------------------
+
+Ingress RX and Egress TX operation is by the nominal DPDK API .
+The driver supports single-port, multi-queue for both RX and TX.
+
+Refer to ``ark_ethdev.h`` for the list of supported methods to
+act upon RX and TX Queues.
+
+Configuration Information
+-------------------------
+
+**DPDK Configuration Parameters**
+
+  The following configuration options are available for the ARK PMD:
+
+   * **CONFIG_RTE_LIBRTE_ARK_PMD** (default y): Enables or disables inclusion
+     of the ARK PMD driver in the DPDK compilation.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_RX** (default n): Enables or disables debug
+     logging and internal checking of RX ingress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TX** (default n): Enables or disables debug
+     logging and internal checking of TX egress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS** (default n): Enables or disables debug
+     logging of detailed packet and performance statistics gathered in
+     the PMD and FPGA.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE** (default n): Enables or disables debug
+     logging of detailed PMD events and status.
+
+
+Building DPDK
+-------------
+
+See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for
+instructions on how to build DPDK.
+
+By default the ARK PMD library will be built into the DPDK library.
+
+For configuring and using UIO and VFIO frameworks, please also refer :ref:`the
+documentation that comes with DPDK suite <linux_gsg>`.
+
+Supported ARK RTL PCIe Instances
+--------------------------------
+
+ARK PMD supports the following Arkville RTL PCIe instances including:
+
+* ``1d6c:100d`` - AR-ARKA-FX0 [Arkville 32B DPDK Data Mover]
+* ``1d6c:100e`` - AR-ARKA-FX1 [Arkville 64B DPDK Data Mover]
+
+Supported Operating Systems
+---------------------------
+
+Any Linux distribution fulfilling the conditions described in ``System Requirements``
+section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK Release Notes*.
+
+Supported Features
+------------------
+
+* Dynamic ARK PMD extensions
+* Multiple receive and transmit queues
+* Jumbo frames up to 9K
+* Hardware Statistics
+
+Unsupported Features
+--------------------
+
+Features that may be part of, or become part of, the Arkville RTL IP that are
+not currently supported or exposed by the ARK PMD include:
+
+* PCIe SR-IOV Virtual Functions (VFs)
+* Arkville's Packet Generator Control and Status
+* Arkville's Packet Director Control and Status
+* Arkville's Packet Checker Control and Status
+* Arkville's Timebase Management
+
+Pre-Requisites
+--------------
+
+#. Prepare the system as recommended by DPDK suite.  This includes environment
+   variables, hugepages configuration, tool-chains and configuration
+
+#. Insert igb_uio kernel module using the command 'modprobe igb_uio'
+
+#. Bind the intended ARK device to igb_uio module
+
+At this point the system should be ready to run DPDK applications. Once the
+application runs to completion, the ARK PMD can be detached from igb_uio if necessary.
+
+Usage Example
+-------------
+
+This section demonstrates how to launch **testpmd** with Atomic Rules ARK
+devices managed by librte_pmd_ark.
+
+#. Load the kernel modules:
+
+   .. code-block:: console
+
+      modprobe uio
+      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
+
+   .. note::
+
+      The ARK PMD driver depends upon the igb_uio user space I/O kernel module
+
+#. Mount and request huge pages:
+
+   .. code-block:: console
+
+      mount -t hugetlbfs nodev /mnt/huge
+      echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
+
+#. Bind UIO driver to ARK device at 0000:01:00.0 (using dpdk-devbind.py):
+
+   .. code-block:: console
+
+      ./usertools/dpdk-devbind.py --bind=igb_uio 0000:01:00.0
+
+   .. note::
+
+      The last argument to dpdk-devbind.py is the 4-tuple that indentifies a specific PCIe
+      device. You can use lspci -d 1d6c: to indentify all Atomic Rules devices in the system,
+      and thus determine the correct 4-tuple argument to dpdk-devbind.py
+
+#. Start testpmd with basic parameters:
+
+   .. code-block:: console
+
+      ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -- -i
+
+   Example output:
+
+   .. code-block:: console
+
+      [...]
+      EAL: PCI device 0000:01:00.0 on NUMA socket -1
+      EAL:   probe driver: 1d6c:100e rte_ark_pmd
+      EAL:   PCI memory mapped at 0x7f9b6c400000
+      PMD: eth_ark_dev_init(): Initializing 0:2:0.1
+      ARKP PMD CommitID: 378f3a67
+      Configuring Port 0 (socket 0)
+      Port 0: DC:3C:F6:00:00:01
+      Checking link statuses...
+      Port 0 Link Up - speed 100000 Mbps - full-duplex
+      Done
+      testpmd>
diff --git a/doc/guides/nics/features/ark.ini b/doc/guides/nics/features/ark.ini
new file mode 100644
index 0000000..dc8a0e2
--- /dev/null
+++ b/doc/guides/nics/features/ark.ini
@@ -0,0 +1,15 @@
+;
+; Supported features of the 'ark' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Queue start/stop     = Y
+Jumbo frame          = Y
+Scattered Rx         = Y
+Basic stats          = Y
+Stats per queue      = Y
+FW version           = Y
+Linux UIO            = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 87f9334..381d82c 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
     :numbered:
 
     overview
+    ark
     bnx2x
     bnxt
     cxgbe
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a16f25e..ea9868b 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -32,6 +32,7 @@
 include $(RTE_SDK)/mk/rte.vars.mk
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += bnx2x
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += bonding
 DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
diff --git a/drivers/net/ark/Makefile b/drivers/net/ark/Makefile
new file mode 100644
index 0000000..217bd34
--- /dev/null
+++ b/drivers/net/ark/Makefile
@@ -0,0 +1,63 @@
+# BSD LICENSE
+#
+# Copyright (c) 2015-2017 Atomic Rules LLC
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of copyright holder nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_ark.a
+
+CFLAGS += -O3 -I./
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_ark_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD)
+#
+
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev.c
+
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mempool
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/libpthread
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/libdl
+
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/ark/ark_debug.h b/drivers/net/ark/ark_debug.h
new file mode 100644
index 0000000..a108c28
--- /dev/null
+++ b/drivers/net/ark/ark_debug.h
@@ -0,0 +1,74 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_DEBUG_H_
+#define _ARK_DEBUG_H_
+
+#include <inttypes.h>
+#include <rte_log.h>
+
+/* Format specifiers for string data pairs */
+#define ARK_SU32  "\n\t%-20s    %'20" PRIu32
+#define ARK_SU64  "\n\t%-20s    %'20" PRIu64
+#define ARK_SU64X "\n\t%-20s    %#20" PRIx64
+#define ARK_SPTR  "\n\t%-20s    %20p"
+
+#define ARK_TRACE_ON(fmt, ...) \
+	PMD_DRV_LOG(ERR, fmt, ##__VA_ARGS__)
+
+#define ARK_TRACE_OFF(fmt, ...) \
+	do {if (0) PMD_DRV_LOG(ERR, fmt, ##__VA_ARGS__); } while (0)
+
+/* Debug macro for reporting Packet stats */
+#ifdef RTE_LIBRTE_ARK_DEBUG_STATS
+#define ARK_DEBUG_STATS(fmt, ...) ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_STATS(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+/* Debug macro for tracing full behavior*/
+#ifdef RTE_LIBRTE_ARK_DEBUG_TRACE
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+#ifdef ARK_STD_LOG
+#define PMD_DRV_LOG(level, fmt, args...) \
+	fprintf(stderr, fmt, args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt, __func__, ## args)
+#endif
+
+#endif
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
new file mode 100644
index 0000000..124b73c
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.c
@@ -0,0 +1,281 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dlfcn.h>
+
+#include <rte_kvargs.h>
+#include <rte_vdev.h>
+
+#include "ark_global.h"
+#include "ark_debug.h"
+#include "ark_ethdev.h"
+
+/*  Internal prototypes */
+static int eth_ark_check_args(const char *params);
+static int eth_ark_dev_init(struct rte_eth_dev *dev);
+static int eth_ark_dev_uninit(struct rte_eth_dev *eth_dev);
+static int eth_ark_dev_configure(struct rte_eth_dev *dev);
+static void eth_ark_dev_info_get(struct rte_eth_dev *dev,
+				 struct rte_eth_dev_info *dev_info);
+
+
+#define ARK_DEV_TO_PCI(eth_dev)			\
+	RTE_DEV_TO_PCI((eth_dev)->device)
+
+#define ARK_MAX_ARG_LEN 256
+static uint32_t pkt_dir_v;
+static char pkt_gen_args[ARK_MAX_ARG_LEN];
+static char pkt_chkr_args[ARK_MAX_ARG_LEN];
+
+#define ARK_PKTGEN_ARG "Pkt_gen"
+#define ARK_PKTCHKR_ARG "Pkt_chkr"
+#define ARK_PKTDIR_ARG "Pkt_dir"
+
+static const char * const valid_arguments[] = {
+	ARK_PKTGEN_ARG,
+	ARK_PKTCHKR_ARG,
+	ARK_PKTDIR_ARG,
+	"iface",
+	NULL
+};
+
+#define MAX_ARK_PHYS 16
+struct ark_adapter *gark[MAX_ARK_PHYS];
+
+static const struct rte_pci_id pci_id_ark_map[] = {
+	{RTE_PCI_DEVICE(0x1d6c, 0x100d)},
+	{RTE_PCI_DEVICE(0x1d6c, 0x100e)},
+	{.vendor_id = 0, /* sentinel */ },
+};
+
+static struct eth_driver rte_ark_pmd = {
+	.pci_drv = {
+		.probe = rte_eth_dev_pci_probe,
+		.remove = rte_eth_dev_pci_remove,
+		.id_table = pci_id_ark_map,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC
+	},
+	.eth_dev_init = eth_ark_dev_init,
+	.eth_dev_uninit = eth_ark_dev_uninit,
+	.dev_private_size = sizeof(struct ark_adapter),
+};
+
+static const struct eth_dev_ops ark_eth_dev_ops = {
+	.dev_configure = eth_ark_dev_configure,
+	.dev_infos_get = eth_ark_dev_info_get,
+
+};
+
+
+static int
+eth_ark_dev_init(struct rte_eth_dev *dev __rte_unused)
+{
+	return -1;					/* STUB */
+}
+
+
+static int
+eth_ark_dev_uninit(struct rte_eth_dev *dev)
+{
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	dev->dev_ops = NULL;
+	dev->rx_pkt_burst = NULL;
+	dev->tx_pkt_burst = NULL;
+	return 0;
+}
+
+static int
+eth_ark_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	ARK_DEBUG_TRACE("ARKP: In %s\n", __func__);
+	return 0;
+}
+
+static void
+eth_ark_dev_info_get(struct rte_eth_dev *dev,
+		     struct rte_eth_dev_info *dev_info)
+{
+	/* device specific configuration */
+	memset(dev_info, 0, sizeof(*dev_info));
+
+	dev_info->max_rx_pktlen = (16 * 1024) - 128;
+	dev_info->min_rx_bufsize = 1024;
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = 4096 * 4,
+		.nb_min = 512,	/* HW Q size for RX */
+		.nb_align = 2,};
+
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = 4096 * 4,
+		.nb_min = 256,	/* HW Q size for TX */
+		.nb_align = 2,};
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	/* ARK PMD supports all line rates, how do we indicate that here ?? */
+	dev_info->speed_capa = (ETH_LINK_SPEED_1G |
+				ETH_LINK_SPEED_10G |
+				ETH_LINK_SPEED_25G |
+				ETH_LINK_SPEED_40G |
+				ETH_LINK_SPEED_50G |
+				ETH_LINK_SPEED_100G);
+	dev_info->pci_dev = ARK_DEV_TO_PCI(dev);
+	dev_info->driver_name = dev->data->drv_name;
+}
+
+
+static inline int
+process_pktdir_arg(const char *key, const char *value,
+		   void *extra_args __rte_unused)
+{
+	ARK_DEBUG_TRACE("In process_pktdir_arg, key = %s, value = %s\n",
+			key, value);
+	pkt_dir_v = strtol(value, NULL, 16);
+	ARK_DEBUG_TRACE("pkt_dir_v = 0x%x\n", pkt_dir_v);
+	return 0;
+}
+
+static inline int
+process_file_args(const char *key, const char *value, void *extra_args)
+{
+	ARK_DEBUG_TRACE("**** IN process_pktgen_arg, key = %s, value = %s\n",
+			key, value);
+	char *args = (char *)extra_args;
+
+	/* Open the configuration file */
+	FILE *file = fopen(value, "r");
+	char line[256];
+	int first = 1;
+
+	while (fgets(line, sizeof(line), file)) {
+		/* ARK_DEBUG_TRACE("%s\n", line); */
+		if (first) {
+			strncpy(args, line, ARK_MAX_ARG_LEN);
+			first = 0;
+		} else {
+			strncat(args, line, ARK_MAX_ARG_LEN);
+		}
+	}
+	ARK_DEBUG_TRACE("file = %s\n", args);
+	fclose(file);
+	return 0;
+}
+
+static int
+eth_ark_check_args(const char *params)
+{
+	struct rte_kvargs *kvlist;
+	unsigned int k_idx;
+	struct rte_kvargs_pair *pair = NULL;
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return 0;
+
+	pkt_gen_args[0] = 0;
+	pkt_chkr_args[0] = 0;
+
+	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
+		pair = &kvlist->pairs[k_idx];
+		ARK_DEBUG_TRACE("**** Arg passed to PMD = %s:%s\n", pair->key,
+				pair->value);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTDIR_ARG,
+			       &process_pktdir_arg,
+			       NULL) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTDIR_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTGEN_ARG,
+			       &process_file_args,
+			       pkt_gen_args) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTGEN_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist,
+			       ARK_PKTCHKR_ARG,
+			       &process_file_args,
+			       pkt_chkr_args) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTCHKR_ARG);
+	}
+
+	ARK_DEBUG_TRACE("INFO: packet director set to 0x%x\n", pkt_dir_v);
+
+	return 1;
+}
+
+
+/*
+ * PCIE
+ */
+static int
+pmd_ark_probe(const char *name, const char *params)
+{
+	RTE_LOG(INFO, PMD, "Initializing pmd_ark for %s params = %s\n", name,
+		params);
+
+	/* Parse off the v index */
+
+	eth_ark_check_args(params);
+	return 0;
+}
+
+static int
+pmd_ark_remove(const char *name)
+{
+	RTE_LOG(INFO, PMD, "Closing ark %s ethdev on numa socket %u\n", name,
+		rte_socket_id());
+	return 1;
+}
+
+static struct rte_vdev_driver pmd_ark_drv = {
+	.probe = pmd_ark_probe,
+	.remove = pmd_ark_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_ark, pmd_ark_drv);
+RTE_PMD_REGISTER_ALIAS(net_ark, eth_ark);
+RTE_PMD_REGISTER_PCI(eth_ark, rte_ark_pmd.pci_drv);
+RTE_PMD_REGISTER_KMOD_DEP(net_ark, "* igb_uio | uio_pci_generic ");
+RTE_PMD_REGISTER_PCI_TABLE(eth_ark, pci_id_ark_map);
diff --git a/drivers/net/ark/ark_ethdev.h b/drivers/net/ark/ark_ethdev.h
new file mode 100644
index 0000000..08d7fb1
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.h
@@ -0,0 +1,39 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_ETHDEV_H_
+#define _ARK_ETHDEV_H_
+
+/* STUB */
+
+#endif
diff --git a/drivers/net/ark/ark_global.h b/drivers/net/ark/ark_global.h
new file mode 100644
index 0000000..7cd62d5
--- /dev/null
+++ b/drivers/net/ark/ark_global.h
@@ -0,0 +1,108 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_GLOBAL_H_
+#define _ARK_GLOBAL_H_
+
+#include <time.h>
+#include <assert.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_cycles.h>
+#include <rte_kvargs.h>
+#include <rte_dev.h>
+#include <rte_version.h>
+
+#define ETH_ARK_ARG_MAXLEN	64
+#define ARK_SYSCTRL_BASE  0x0
+#define ARK_PKTGEN_BASE   0x10000
+#define ARK_MPU_RX_BASE   0x20000
+#define ARK_UDM_BASE      0x30000
+#define ARK_MPU_TX_BASE   0x40000
+#define ARK_DDM_BASE      0x60000
+#define ARK_CMAC_BASE     0x80000
+#define ARK_PKTDIR_BASE   0xa0000
+#define ARK_PKTCHKR_BASE  0x90000
+#define ARK_RCPACING_BASE 0xb0000
+#define ARK_EXTERNAL_BASE 0x100000
+#define ARK_MPU_QOFFSET   0x00100
+#define ARK_MAX_PORTS     8
+
+#define offset8(n)     n
+#define offset16(n)   ((n) / 2)
+#define offset32(n)   ((n) / 4)
+#define offset64(n)   ((n) / 8)
+
+/*
+ * Structure to store private data for each PF/VF instance.
+ */
+#define def_ptr(type, name) \
+	union type {		   \
+		uint64_t *t64;	   \
+		uint32_t *t32;	   \
+		uint16_t *t16;	   \
+		uint8_t  *t8;	   \
+		void     *v;	   \
+	} name
+
+struct ark_port {
+	struct rte_eth_dev *eth_dev;
+	int id;
+};
+
+struct ark_adapter {
+	/* User extension private data */
+	void *user_data;
+
+	struct ark_port port[ARK_MAX_PORTS];
+	int num_ports;
+
+	/* Common for both PF and VF */
+	struct rte_eth_dev *eth_dev;
+
+	void *d_handle;
+
+	/* Our Bar 0 */
+	uint8_t *bar0;
+
+	/* Application Bar */
+	uint8_t *a_bar;
+};
+
+typedef uint32_t *ark_t;
+
+#endif
diff --git a/drivers/net/ark/rte_pmd_ark_version.map b/drivers/net/ark/rte_pmd_ark_version.map
new file mode 100644
index 0000000..7f84780
--- /dev/null
+++ b/drivers/net/ark/rte_pmd_ark_version.map
@@ -0,0 +1,4 @@
+DPDK_2.0 {
+	 local: *;
+
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0e0b600..da23898 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -104,6 +104,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD)      += -lrte_pmd_bnx2x -lz
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNXT_PMD)       += -lrte_pmd_bnxt
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
1.9.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2] net/ark: poll-mode driver for AtomicRules Arkville
@ 2017-03-20 21:14  1% Ed Czeck
  0 siblings, 0 replies; 200+ results
From: Ed Czeck @ 2017-03-20 21:14 UTC (permalink / raw)
  To: dev; +Cc: Ed Czeck, Shepard Siegel, John Miller

This is the PMD for Atomic Rules's Arkville ARK family of devices.
See doc/guides/nics/ark.rst for detailed description.

v2:
* Fixed all observed compiler error
* Fixed all observed checkpatch messages except for PRIu64 system macro

Signed-off-by: Shepard Siegel <shepard.siegel@atomicrules.com>
Signed-off-by: John Miller <john.miller@atomicrules.com>
Signed-off-by: Ed Czeck <ed.czeck@atomicrules.com>
---
 MAINTAINERS                                 |    8 +
 config/common_base                          |   10 +
 config/defconfig_arm-armv7a-linuxapp-gcc    |    1 +
 config/defconfig_ppc_64-power8-linuxapp-gcc |    1 +
 doc/guides/nics/ark.rst                     |  237 +++++++
 doc/guides/nics/features/ark.ini            |   15 +
 doc/guides/nics/index.rst                   |    1 +
 drivers/net/Makefile                        |    1 +
 drivers/net/ark/Makefile                    |   72 ++
 drivers/net/ark/ark_ddm.c                   |  150 ++++
 drivers/net/ark/ark_ddm.h                   |  154 ++++
 drivers/net/ark/ark_debug.h                 |   74 ++
 drivers/net/ark/ark_ethdev.c                | 1015 +++++++++++++++++++++++++++
 drivers/net/ark/ark_ethdev.h                |   75 ++
 drivers/net/ark/ark_ethdev_rx.c             |  667 ++++++++++++++++++
 drivers/net/ark/ark_ethdev_tx.c             |  492 +++++++++++++
 drivers/net/ark/ark_ext.h                   |   79 +++
 drivers/net/ark/ark_global.h                |  159 +++++
 drivers/net/ark/ark_mpu.c                   |  167 +++++
 drivers/net/ark/ark_mpu.h                   |  143 ++++
 drivers/net/ark/ark_pktchkr.c               |  460 ++++++++++++
 drivers/net/ark/ark_pktchkr.h               |  114 +++
 drivers/net/ark/ark_pktdir.c                |   79 +++
 drivers/net/ark/ark_pktdir.h                |   68 ++
 drivers/net/ark/ark_pktgen.c                |  482 +++++++++++++
 drivers/net/ark/ark_pktgen.h                |  106 +++
 drivers/net/ark/ark_rqp.c                   |   92 +++
 drivers/net/ark/ark_rqp.h                   |   75 ++
 drivers/net/ark/ark_udm.c                   |  221 ++++++
 drivers/net/ark/ark_udm.h                   |  175 +++++
 drivers/net/ark/rte_pmd_ark_version.map     |    4 +
 mk/rte.app.mk                               |    1 +
 32 files changed, 5398 insertions(+)
 create mode 100644 doc/guides/nics/ark.rst
 create mode 100644 doc/guides/nics/features/ark.ini
 create mode 100644 drivers/net/ark/Makefile
 create mode 100644 drivers/net/ark/ark_ddm.c
 create mode 100644 drivers/net/ark/ark_ddm.h
 create mode 100644 drivers/net/ark/ark_debug.h
 create mode 100644 drivers/net/ark/ark_ethdev.c
 create mode 100644 drivers/net/ark/ark_ethdev.h
 create mode 100644 drivers/net/ark/ark_ethdev_rx.c
 create mode 100644 drivers/net/ark/ark_ethdev_tx.c
 create mode 100644 drivers/net/ark/ark_ext.h
 create mode 100644 drivers/net/ark/ark_global.h
 create mode 100644 drivers/net/ark/ark_mpu.c
 create mode 100644 drivers/net/ark/ark_mpu.h
 create mode 100644 drivers/net/ark/ark_pktchkr.c
 create mode 100644 drivers/net/ark/ark_pktchkr.h
 create mode 100644 drivers/net/ark/ark_pktdir.c
 create mode 100644 drivers/net/ark/ark_pktdir.h
 create mode 100644 drivers/net/ark/ark_pktgen.c
 create mode 100644 drivers/net/ark/ark_pktgen.h
 create mode 100644 drivers/net/ark/ark_rqp.c
 create mode 100644 drivers/net/ark/ark_rqp.h
 create mode 100644 drivers/net/ark/ark_udm.c
 create mode 100644 drivers/net/ark/ark_udm.h
 create mode 100644 drivers/net/ark/rte_pmd_ark_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 39bc78e..6f6136a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -280,6 +280,14 @@ M: Evgeny Schemeilin <evgenys@amazon.com>
 F: drivers/net/ena/
 F: doc/guides/nics/ena.rst
 
+Atomic Rules ark
+M: Shepard Siegel <shepard.siegel@atomicrules.com>
+M: Ed Czeck       <ed.czeck@atomicrules.com>
+M: John Miller    <john.miller@atomicrules.com>
+F: /drivers/net/ark/
+F: doc/guides/nics/ark.rst
+F: doc/guides/nics/features/ark.ini
+
 Broadcom bnxt
 M: Stephen Hurd <stephen.hurd@broadcom.com>
 M: Ajit Khaparde <ajit.khaparde@broadcom.com>
diff --git a/config/common_base b/config/common_base
index aeee13e..e64cd83 100644
--- a/config/common_base
+++ b/config/common_base
@@ -348,6 +348,16 @@ CONFIG_RTE_LIBRTE_QEDE_FW=""
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
 #
+# Compile ARK PMD
+#
+CONFIG_RTE_LIBRTE_ARK_PMD=y
+CONFIG_RTE_LIBRTE_ARK_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE=n
+
+
+#
 # Compile the TAP PMD
 # It is enabled by default for Linux only.
 #
diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index d9bd2a8..6d2b5e0 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -61,6 +61,7 @@ CONFIG_RTE_SCHED_VECTOR=n
 
 # cannot use those on ARM
 CONFIG_RTE_KNI_KMOD=n
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_EM_PMD=n
 CONFIG_RTE_LIBRTE_IGB_PMD=n
 CONFIG_RTE_LIBRTE_CXGBE_PMD=n
diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc b/config/defconfig_ppc_64-power8-linuxapp-gcc
index 35f7fb6..89bc396 100644
--- a/config/defconfig_ppc_64-power8-linuxapp-gcc
+++ b/config/defconfig_ppc_64-power8-linuxapp-gcc
@@ -48,6 +48,7 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
 
 # Note: Initially, all of the PMD drivers compilation are turned off on Power
 # Will turn on them only after the successful testing on Power
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_IXGBE_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
diff --git a/doc/guides/nics/ark.rst b/doc/guides/nics/ark.rst
new file mode 100644
index 0000000..72fb8d6
--- /dev/null
+++ b/doc/guides/nics/ark.rst
@@ -0,0 +1,237 @@
+.. BSD LICENSE
+
+    Copyright (c) 2015-2017 Atomic Rules LLC
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Atomic Rules LLC nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ARK Poll Mode Driver
+====================
+
+The ARK PMD is a DPDK poll-mode driver for the Atomic Rules Arkville
+(ARK) family of devices.
+
+More information can be found at the `Atomic Rules website
+<http://atomicrules.com>`_.
+
+Overview
+--------
+
+The Atomic Rules Arkville product is DPDK and AXI compliant product
+that marshals packets across a PCIe conduit between host DPDK mbufs and
+FPGA AXI streams.
+
+The ARK PMD, and the spirit of the overall Arkville product,
+has been to take the DPDK API/ABI as a fixed specification;
+then implement much of the business logic in FPGA RTL circuits.
+The approach of *working backwards* from the DPDK API/ABI and having
+the GPP host software *dictate*, while the FPGA hardware *copes*,
+results in significant performance gains over a naive implementation.
+
+While this document describes the ARK PMD software, it is helpful to
+understand what the FPGA hardware is and is not. The Arkville RTL
+component provides a single PCIe Physical Function (PF) supporting
+some number of RX/Ingress and TX/Egress Queues. The ARK PMD controls
+the Arkville core through a dedicated opaque Core BAR (CBAR).
+To allow users full freedom for their own FPGA application IP,
+an independent FPGA Application BAR (ABAR) is provided.
+
+One popular way to imagine Arkville's FPGA hardware aspect is as the
+FPGA PCIe-facing side of a so-called Smart NIC. The Arkville core does
+not contain any MACs, and is link-speed independent, as well as
+agnostic to the number of physical ports the application chooses to
+use. The ARK driver exposes the familiar PMD interface to allow packet
+movement to and from mbufs across multiple queues.
+
+However FPGA RTL applications could contain a universe of added
+functionality that an Arkville RTL core does not provide or can
+not anticipate. To allow for this expectation of user-defined
+innovation, the ARK PMD provides a dynamic mechanism of adding
+capabilities without having to modify the ARK PMD.
+
+The ARK PMD is intended to support all instances of the Arkville
+RTL Core, regardless of configuration, FPGA vendor, or target
+board. While specific capabilities such as number of physical
+hardware queue-pairs are negotiated; the driver is designed to
+remain constant over a broad and extendable feature set.
+
+Intentionally, Arkville by itself DOES NOT provide common NIC
+capabilities such as offload or receive-side scaling (RSS).
+These capabilities would be viewed as a gate-level "tax" on
+Green-box FPGA applications that do not require such function.
+Instead, they can be added as needed with essentially no
+overhead to the FPGA Application.
+
+Data Path Interface
+-------------------
+
+Ingress RX and Egress TX operation is by the nominal DPDK API .
+The driver supports single-port, multi-queue for both RX and TX.
+
+Refer to ``ark_ethdev.h`` for the list of supported methods to
+act upon RX and TX Queues.
+
+Configuration Information
+-------------------------
+
+**DPDK Configuration Parameters**
+
+  The following configuration options are available for the ARK PMD:
+
+   * **CONFIG_RTE_LIBRTE_ARK_PMD** (default y): Enables or disables inclusion
+     of the ARK PMD driver in the DPDK compilation.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_RX** (default n): Enables or disables debug
+     logging and internal checking of RX ingress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TX** (default n): Enables or disables debug
+     logging and internal checking of TX egress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS** (default n): Enables or disables debug
+     logging of detailed packet and performance statistics gathered in
+     the PMD and FPGA.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE** (default n): Enables or disables debug
+     logging of detailed PMD events and status.
+
+
+Building DPDK
+-------------
+
+See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for
+instructions on how to build DPDK.
+
+By default the ARK PMD library will be built into the DPDK library.
+
+For configuring and using UIO and VFIO frameworks, please also refer :ref:`the
+documentation that comes with DPDK suite <linux_gsg>`.
+
+Supported ARK RTL PCIe Instances
+--------------------------------
+
+ARK PMD supports the following Arkville RTL PCIe instances including:
+
+* ``1d6c:100d`` - AR-ARKA-FX0 [Arkville 32B DPDK Data Mover]
+* ``1d6c:100e`` - AR-ARKA-FX1 [Arkville 64B DPDK Data Mover]
+
+Supported Operating Systems
+---------------------------
+
+Any Linux distribution fulfilling the conditions described in ``System Requirements``
+section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK Release Notes*.
+
+Supported Features
+------------------
+
+* Dynamic ARK PMD extensions
+* Multiple receive and transmit queues
+* Jumbo frames up to 9K
+* Hardware Statistics
+
+Unsupported Features
+--------------------
+
+Features that may be part of, or become part of, the Arkville RTL IP that are
+not currently supported or exposed by the ARK PMD include:
+
+* PCIe SR-IOV Virtual Functions (VFs)
+* Arkville's Packet Generator Control and Status
+* Arkville's Packet Director Control and Status
+* Arkville's Packet Checker Control and Status
+* Arkville's Timebase Management
+
+Pre-Requisites
+--------------
+
+#. Prepare the system as recommended by DPDK suite.  This includes environment
+   variables, hugepages configuration, tool-chains and configuration
+
+#. Insert igb_uio kernel module using the command 'modprobe igb_uio'
+
+#. Bind the intended ARK device to igb_uio module
+
+At this point the system should be ready to run DPDK applications. Once the
+application runs to completion, the ARK PMD can be detached from igb_uio if necessary.
+
+Usage Example
+-------------
+
+This section demonstrates how to launch **testpmd** with Atomic Rules ARK
+devices managed by librte_pmd_ark.
+
+#. Load the kernel modules:
+
+   .. code-block:: console
+
+      modprobe uio
+      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
+
+   .. note::
+
+      The ARK PMD driver depends upon the igb_uio user space I/O kernel module
+
+#. Mount and request huge pages:
+
+   .. code-block:: console
+
+      mount -t hugetlbfs nodev /mnt/huge
+      echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
+
+#. Bind UIO driver to ARK device at 0000:01:00.0 (using dpdk-devbind.py):
+
+   .. code-block:: console
+
+      ./usertools/dpdk-devbind.py --bind=igb_uio 0000:01:00.0
+
+   .. note::
+
+      The last argument to dpdk-devbind.py is the 4-tuple that indentifies a specific PCIe
+      device. You can use lspci -d 1d6c: to indentify all Atomic Rules devices in the system,
+      and thus determine the correct 4-tuple argument to dpdk-devbind.py
+
+#. Start testpmd with basic parameters:
+
+   .. code-block:: console
+
+      ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -- -i
+
+   Example output:
+
+   .. code-block:: console
+
+      [...]
+      EAL: PCI device 0000:01:00.0 on NUMA socket -1
+      EAL:   probe driver: 1d6c:100e rte_ark_pmd
+      EAL:   PCI memory mapped at 0x7f9b6c400000
+      PMD: eth_ark_dev_init(): Initializing 0:2:0.1
+      ARKP PMD CommitID: 378f3a67
+      Configuring Port 0 (socket 0)
+      Port 0: DC:3C:F6:00:00:01
+      Checking link statuses...
+      Port 0 Link Up - speed 100000 Mbps - full-duplex
+      Done
+      testpmd>
diff --git a/doc/guides/nics/features/ark.ini b/doc/guides/nics/features/ark.ini
new file mode 100644
index 0000000..dc8a0e2
--- /dev/null
+++ b/doc/guides/nics/features/ark.ini
@@ -0,0 +1,15 @@
+;
+; Supported features of the 'ark' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Queue start/stop     = Y
+Jumbo frame          = Y
+Scattered Rx         = Y
+Basic stats          = Y
+Stats per queue      = Y
+FW version           = Y
+Linux UIO            = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 87f9334..381d82c 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
     :numbered:
 
     overview
+    ark
     bnx2x
     bnxt
     cxgbe
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a16f25e..ea9868b 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -32,6 +32,7 @@
 include $(RTE_SDK)/mk/rte.vars.mk
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += bnx2x
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += bonding
 DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
diff --git a/drivers/net/ark/Makefile b/drivers/net/ark/Makefile
new file mode 100644
index 0000000..615dfa2
--- /dev/null
+++ b/drivers/net/ark/Makefile
@@ -0,0 +1,72 @@
+# BSD LICENSE
+#
+# Copyright (c) 2015-2017 Atomic Rules LLC
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of copyright holder nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_ark.a
+
+CFLAGS += -O3 -I./
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_ark_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD)
+#
+
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev_rx.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev_tx.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_pktgen.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_pktchkr.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_pktdir.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_mpu.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ddm.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_udm.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_rqp.c
+
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mempool
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/libpthread
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/libdl
+
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/ark/ark_ddm.c b/drivers/net/ark/ark_ddm.c
new file mode 100644
index 0000000..86bb2b5
--- /dev/null
+++ b/drivers/net/ark/ark_ddm.c
@@ -0,0 +1,150 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_debug.h"
+#include "ark_ddm.h"
+
+/* ************************************************************************* */
+int
+ark_ddm_verify(struct ark_ddm_t *ddm)
+{
+	if (sizeof(struct ark_ddm_t) != ARK_DDM_EXPECTED_SIZE) {
+	fprintf(stderr, "  DDM structure looks incorrect %d vs %zd\n",
+		ARK_DDM_EXPECTED_SIZE, sizeof(struct ark_ddm_t));
+	return -1;
+	}
+
+	if (ddm->cfg.const0 != ARK_DDM_CONST) {
+	fprintf(stderr, "  DDM module not found as expected 0x%08x\n",
+		ddm->cfg.const0);
+	return -1;
+	}
+	return 0;
+}
+
+void
+ark_ddm_start(struct ark_ddm_t *ddm)
+{
+	ddm->cfg.command = 1;
+}
+
+int
+ark_ddm_stop(struct ark_ddm_t *ddm, const int wait)
+{
+	int cnt = 0;
+
+	ddm->cfg.command = 2;
+	while (wait && (ddm->cfg.stop_flushed & 0x01) == 0) {
+	if (cnt++ > 1000)
+		return 1;
+
+	usleep(10);
+	}
+	return 0;
+}
+
+void
+ark_ddm_reset(struct ark_ddm_t *ddm)
+{
+	int status;
+
+	/* reset only works if ddm has stopped properly. */
+	status = ark_ddm_stop(ddm, 1);
+
+	if (status != 0) {
+	ARK_DEBUG_TRACE("ARKP: %s  stop failed  doing forced reset\n",
+		__func__);
+	ddm->cfg.command = 4;
+	usleep(10);
+	}
+	ddm->cfg.command = 3;
+}
+
+void
+ark_ddm_setup(struct ark_ddm_t *ddm, phys_addr_t cons_addr, uint32_t interval)
+{
+	ddm->setup.cons_write_index_addr = cons_addr;
+	ddm->setup.write_index_interval = interval / 4;	/* 4 ns period */
+}
+
+void
+ark_ddm_stats_reset(struct ark_ddm_t *ddm)
+{
+	ddm->cfg.tlp_stats_clear = 1;
+}
+
+void
+ark_ddm_dump(struct ark_ddm_t *ddm, const char *msg)
+{
+	ARK_DEBUG_TRACE("ARKP DDM Dump: %s Stopped: %d\n", msg,
+	ark_ddm_is_stopped(ddm)
+	);
+}
+
+void
+ark_ddm_dump_stats(struct ark_ddm_t *ddm, const char *msg)
+{
+	struct ark_ddm_stats_t *stats = &ddm->stats;
+
+	ARK_DEBUG_STATS("ARKP DDM Stats: %s"
+					ARK_SU64 ARK_SU64 ARK_SU64
+					"\n", msg,
+	"Bytes:", stats->tx_byte_count,
+	"Packets:", stats->tx_pkt_count, "MBufs", stats->tx_mbuf_count);
+}
+
+int
+ark_ddm_is_stopped(struct ark_ddm_t *ddm)
+{
+	return (ddm->cfg.stop_flushed & 0x01) != 0;
+}
+
+uint64_t
+ark_ddm_queue_byte_count(struct ark_ddm_t *ddm)
+{
+	return ddm->queue_stats.byte_count;
+}
+
+uint64_t
+ark_ddm_queue_pkt_count(struct ark_ddm_t *ddm)
+{
+	return ddm->queue_stats.pkt_count;
+}
+
+void
+ark_ddm_queue_reset_stats(struct ark_ddm_t *ddm)
+{
+	ddm->queue_stats.byte_count = 1;
+}
diff --git a/drivers/net/ark/ark_ddm.h b/drivers/net/ark/ark_ddm.h
new file mode 100644
index 0000000..8208b12
--- /dev/null
+++ b/drivers/net/ark/ark_ddm.h
@@ -0,0 +1,154 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_DDM_H_
+#define _ARK_DDM_H_
+
+#include <stdint.h>
+
+#include <rte_memory.h>
+
+/* DDM core hardware structures */
+#define ARK_DDM_CFG 0x0000
+#define ARK_DDM_CONST 0xfacecafe
+struct ark_ddm_cfg_t {
+	uint32_t r0;
+	volatile uint32_t tlp_stats_clear;
+	uint32_t const0;
+	volatile uint32_t tag_max;
+	volatile uint32_t command;
+	volatile uint32_t stop_flushed;
+};
+
+#define ARK_DDM_STATS 0x0020
+struct ark_ddm_stats_t {
+	volatile uint64_t tx_byte_count;
+	volatile uint64_t tx_pkt_count;
+	volatile uint64_t tx_mbuf_count;
+};
+
+#define ARK_DDM_MRDQ 0x0040
+struct ark_ddm_mrdq_t {
+	volatile uint32_t mrd_q1;
+	volatile uint32_t mrd_q2;
+	volatile uint32_t mrd_q3;
+	volatile uint32_t mrd_q4;
+	volatile uint32_t mrd_full;
+};
+
+#define ARK_DDM_CPLDQ 0x0068
+struct ark_ddm_cpldq_t {
+	volatile uint32_t cpld_q1;
+	volatile uint32_t cpld_q2;
+	volatile uint32_t cpld_q3;
+	volatile uint32_t cpld_q4;
+	volatile uint32_t cpld_full;
+};
+
+#define ARK_DDM_MRD_PS 0x0090
+struct ark_ddm_mrd_ps_t {
+	volatile uint32_t mrd_ps_min;
+	volatile uint32_t mrd_ps_max;
+	volatile uint32_t mrd_full_ps_min;
+	volatile uint32_t mrd_full_ps_max;
+	volatile uint32_t mrd_dw_ps_min;
+	volatile uint32_t mrd_dw_ps_max;
+};
+
+#define ARK_DDM_QUEUE_STATS 0x00a8
+struct ark_ddm_qstats_t {
+	volatile uint64_t byte_count;
+	volatile uint64_t pkt_count;
+	volatile uint64_t mbuf_count;
+};
+
+#define ARK_DDM_CPLD_PS 0x00c0
+struct ark_ddm_cpld_ps_t {
+	volatile uint32_t cpld_ps_min;
+	volatile uint32_t cpld_ps_max;
+	volatile uint32_t cpld_full_ps_min;
+	volatile uint32_t cpld_full_ps_max;
+	volatile uint32_t cpld_dw_ps_min;
+	volatile uint32_t cpld_dw_ps_max;
+};
+
+#define ARK_DDM_SETUP  0x00e0
+struct ark_ddm_setup_t {
+	phys_addr_t cons_write_index_addr;
+	uint32_t write_index_interval;	/* 4ns each */
+	volatile uint32_t cons_index;
+};
+
+/*  Consolidated structure */
+struct ark_ddm_t {
+	struct ark_ddm_cfg_t cfg;
+	uint8_t reserved0[(ARK_DDM_STATS - ARK_DDM_CFG) -
+					  sizeof(struct ark_ddm_cfg_t)];
+	struct ark_ddm_stats_t stats;
+	uint8_t reserved1[(ARK_DDM_MRDQ - ARK_DDM_STATS) -
+					  sizeof(struct ark_ddm_stats_t)];
+	struct ark_ddm_mrdq_t mrdq;
+	uint8_t reserved2[(ARK_DDM_CPLDQ - ARK_DDM_MRDQ) -
+					  sizeof(struct ark_ddm_mrdq_t)];
+	struct ark_ddm_cpldq_t cpldq;
+	uint8_t reserved3[(ARK_DDM_MRD_PS - ARK_DDM_CPLDQ) -
+					  sizeof(struct ark_ddm_cpldq_t)];
+	struct ark_ddm_mrd_ps_t mrd_ps;
+	struct ark_ddm_qstats_t queue_stats;
+	struct ark_ddm_cpld_ps_t cpld_ps;
+	uint8_t reserved5[(ARK_DDM_SETUP - ARK_DDM_CPLD_PS) -
+					  sizeof(struct ark_ddm_cpld_ps_t)];
+	struct ark_ddm_setup_t setup;
+	uint8_t reserved_p[(256 - ARK_DDM_SETUP)
+					  - sizeof(struct ark_ddm_setup_t)];
+};
+
+#define ARK_DDM_EXPECTED_SIZE 256
+#define ARK_DDM_QOFFSET ARK_DDM_EXPECTED_SIZE
+
+/* DDM function prototype */
+int ark_ddm_verify(struct ark_ddm_t *ddm);
+void ark_ddm_start(struct ark_ddm_t *ddm);
+int ark_ddm_stop(struct ark_ddm_t *ddm, const int wait);
+void ark_ddm_reset(struct ark_ddm_t *ddm);
+void ark_ddm_stats_reset(struct ark_ddm_t *ddm);
+void ark_ddm_setup(struct ark_ddm_t *ddm, phys_addr_t cons_addr,
+	uint32_t interval);
+void ark_ddm_dump_stats(struct ark_ddm_t *ddm, const char *msg);
+void ark_ddm_dump(struct ark_ddm_t *ddm, const char *msg);
+int ark_ddm_is_stopped(struct ark_ddm_t *ddm);
+uint64_t ark_ddm_queue_byte_count(struct ark_ddm_t *ddm);
+uint64_t ark_ddm_queue_pkt_count(struct ark_ddm_t *ddm);
+void ark_ddm_queue_reset_stats(struct ark_ddm_t *ddm);
+
+#endif
diff --git a/drivers/net/ark/ark_debug.h b/drivers/net/ark/ark_debug.h
new file mode 100644
index 0000000..8a7f83a
--- /dev/null
+++ b/drivers/net/ark/ark_debug.h
@@ -0,0 +1,74 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_DEBUG_H_
+#define _ARK_DEBUG_H_
+
+#include <inttypes.h>
+#include <rte_log.h>
+
+/* Format specifiers for string data pairs */
+#define ARK_SU32  "\n\t%-20s    %'20" PRIu32
+#define ARK_SU64  "\n\t%-20s    %'20" PRIu64
+#define ARK_SU64X "\n\t%-20s    %#20" PRIx64
+#define ARK_SPTR  "\n\t%-20s    %20p"
+
+#define ARK_TRACE_ON(fmt, ...) \
+	fprintf(stderr, fmt, ##__VA_ARGS__)
+
+#define ARK_TRACE_OFF(fmt, ...) \
+	do {if (0) fprintf(stderr, fmt, ##__VA_ARGS__); } while (0)
+
+/* Debug macro for reporting Packet stats */
+#ifdef RTE_LIBRTE_ARK_DEBUG_STATS
+#define ARK_DEBUG_STATS(fmt, ...) ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_STATS(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+/* Debug macro for tracing full behavior*/
+#ifdef RTE_LIBRTE_ARK_DEBUG_TRACE
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+#ifdef ARK_STD_LOG
+#define PMD_DRV_LOG(level, fmt, args...) \
+	fprintf(stderr, fmt, args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt, __func__, ## args)
+#endif
+
+#endif
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
new file mode 100644
index 0000000..6dfe2ce
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.c
@@ -0,0 +1,1015 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dlfcn.h>
+
+#include <rte_kvargs.h>
+#include <rte_vdev.h>
+
+#include "ark_global.h"
+#include "ark_debug.h"
+#include "ark_ethdev.h"
+#include "ark_mpu.h"
+#include "ark_ddm.h"
+#include "ark_udm.h"
+#include "ark_rqp.h"
+#include "ark_pktdir.h"
+#include "ark_pktgen.h"
+#include "ark_pktchkr.h"
+
+/*  Internal prototypes */
+static int eth_ark_check_args(const char *params);
+static int eth_ark_dev_init(struct rte_eth_dev *dev);
+static int ark_config_device(struct rte_eth_dev *dev);
+static int eth_ark_dev_uninit(struct rte_eth_dev *eth_dev);
+static int eth_ark_dev_configure(struct rte_eth_dev *dev);
+static int eth_ark_dev_start(struct rte_eth_dev *dev);
+static void eth_ark_dev_stop(struct rte_eth_dev *dev);
+static void eth_ark_dev_close(struct rte_eth_dev *dev);
+static void eth_ark_dev_info_get(struct rte_eth_dev *dev,
+	struct rte_eth_dev_info *dev_info);
+static int eth_ark_dev_link_update(struct rte_eth_dev *dev,
+	int wait_to_complete);
+static int eth_ark_dev_set_link_up(struct rte_eth_dev *dev);
+static int eth_ark_dev_set_link_down(struct rte_eth_dev *dev);
+static void eth_ark_dev_stats_get(struct rte_eth_dev *dev,
+	struct rte_eth_stats *stats);
+static void eth_ark_dev_stats_reset(struct rte_eth_dev *dev);
+static void eth_ark_set_default_mac_addr(struct rte_eth_dev *dev,
+	struct ether_addr *mac_addr);
+static void eth_ark_macaddr_add(struct rte_eth_dev *dev,
+	struct ether_addr *mac_addr, uint32_t index, uint32_t pool);
+static void eth_ark_macaddr_remove(struct rte_eth_dev *dev,
+	uint32_t index);
+
+#define ARK_DEV_TO_PCI(eth_dev) \
+	RTE_DEV_TO_PCI((eth_dev)->device)
+
+#define ARK_MAX_ARG_LEN 256
+static uint32_t pkt_dir_v;
+static char pkt_gen_args[ARK_MAX_ARG_LEN];
+static char pkt_chkr_args[ARK_MAX_ARG_LEN];
+
+#define ARK_PKTGEN_ARG "Pkt_gen"
+#define ARK_PKTCHKR_ARG "Pkt_chkr"
+#define ARK_PKTDIR_ARG "Pkt_dir"
+
+static const char * const valid_arguments[] = {
+	ARK_PKTGEN_ARG,
+	ARK_PKTCHKR_ARG,
+	ARK_PKTDIR_ARG,
+	"iface",
+	NULL
+};
+
+#define MAX_ARK_PHYS 16
+struct ark_adapter *gark[MAX_ARK_PHYS];
+
+static const struct rte_pci_id pci_id_ark_map[] = {
+	{RTE_PCI_DEVICE(0x1d6c, 0x100d)},
+	{RTE_PCI_DEVICE(0x1d6c, 0x100e)},
+	{.vendor_id = 0, /* sentinel */ },
+};
+
+static struct eth_driver rte_ark_pmd = {
+	.pci_drv = {
+		.probe = rte_eth_dev_pci_probe,
+		.remove = rte_eth_dev_pci_remove,
+		.id_table = pci_id_ark_map,
+		.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC},
+	.eth_dev_init = eth_ark_dev_init,
+	.eth_dev_uninit = eth_ark_dev_uninit,
+	.dev_private_size = sizeof(struct ark_adapter),
+};
+
+static const struct eth_dev_ops ark_eth_dev_ops = {
+	.dev_configure = eth_ark_dev_configure,
+	.dev_start = eth_ark_dev_start,
+	.dev_stop = eth_ark_dev_stop,
+	.dev_close = eth_ark_dev_close,
+
+	.dev_infos_get = eth_ark_dev_info_get,
+
+	.rx_queue_setup = eth_ark_dev_rx_queue_setup,
+	.rx_queue_count = eth_ark_dev_rx_queue_count,
+	.tx_queue_setup = eth_ark_tx_queue_setup,
+
+	.link_update = eth_ark_dev_link_update,
+	.dev_set_link_up = eth_ark_dev_set_link_up,
+	.dev_set_link_down = eth_ark_dev_set_link_down,
+
+	.rx_queue_start = eth_ark_rx_start_queue,
+	.rx_queue_stop = eth_ark_rx_stop_queue,
+
+	.tx_queue_start = eth_ark_tx_queue_start,
+	.tx_queue_stop = eth_ark_tx_queue_stop,
+
+	.stats_get = eth_ark_dev_stats_get,
+	.stats_reset = eth_ark_dev_stats_reset,
+
+	.mac_addr_add = eth_ark_macaddr_add,
+	.mac_addr_remove = eth_ark_macaddr_remove,
+	.mac_addr_set = eth_ark_set_default_mac_addr,
+
+};
+
+int
+ark_get_port_id(struct rte_eth_dev *dev, struct ark_adapter *ark)
+{
+	int n = ark->num_ports;
+	int i;
+
+	/* There has to be a smarter way to do this ... */
+	for (i = 0; i < n; i++) {
+		if (ark->port[i].eth_dev == dev)
+			return i;
+	}
+	ARK_DEBUG_TRACE("ARK: Device is NOT associated with a port !!");
+	return -1;
+}
+
+static
+int
+check_for_ext(struct rte_eth_dev *dev __rte_unused,
+			  struct ark_adapter *ark __rte_unused)
+{
+	int found = 0;
+
+	/* Get the env */
+	const char *dllpath = getenv("ARK_EXT_PATH");
+
+	if (dllpath == NULL) {
+		ARK_DEBUG_TRACE("ARK EXT NO dll path specified\n");
+		return 0;
+	}
+	ARK_DEBUG_TRACE("ARK EXT found dll path at %s\n", dllpath);
+
+	/* Open and load the .so */
+	ark->d_handle = dlopen(dllpath, RTLD_LOCAL | RTLD_LAZY);
+	if (ark->d_handle == NULL)
+		PMD_DRV_LOG(ERR, "Could not load user extension %s\n", dllpath);
+	else
+		ARK_DEBUG_TRACE("SUCCESS: loaded user extension %s\n", dllpath);
+
+	/* Get the entry points */
+	ark->user_ext.dev_init =
+		(void *(*)(struct rte_eth_dev *, void *, int))
+		dlsym(ark->d_handle, "dev_init");
+	ARK_DEBUG_TRACE("device ext init pointer = %p\n",
+					ark->user_ext.dev_init);
+	ark->user_ext.dev_get_port_count =
+		(int (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "dev_get_port_count");
+	ark->user_ext.dev_uninit =
+		(void (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "dev_uninit");
+	ark->user_ext.dev_configure =
+		(int (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "dev_configure");
+	ark->user_ext.dev_start =
+		(int (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "dev_start");
+	ark->user_ext.dev_stop =
+		(void (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "dev_stop");
+	ark->user_ext.dev_close =
+		(void (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "dev_close");
+	ark->user_ext.link_update =
+		(int (*)(struct rte_eth_dev *, int, void *))
+		dlsym(ark->d_handle,
+			  "link_update");
+	ark->user_ext.dev_set_link_up =
+		(int (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "dev_set_link_up");
+	ark->user_ext.dev_set_link_down =
+		(int (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "dev_set_link_down");
+	ark->user_ext.stats_get =
+		(void (*)(struct rte_eth_dev *, struct rte_eth_stats *,
+				  void *)) dlsym(ark->d_handle, "stats_get");
+	ark->user_ext.stats_reset =
+		(void (*)(struct rte_eth_dev *, void *))
+		dlsym(ark->d_handle,
+			  "stats_reset");
+	ark->user_ext.mac_addr_add =
+		(void (*)(struct rte_eth_dev *, struct ether_addr *, uint32_t,
+				  uint32_t, void *)) dlsym(ark->d_handle, "mac_addr_add");
+	ark->user_ext.mac_addr_remove =
+		(void (*)(struct rte_eth_dev *, uint32_t,
+				  void *)) dlsym(ark->d_handle, "mac_addr_remove");
+	ark->user_ext.mac_addr_set =
+		(void (*)(struct rte_eth_dev *, struct ether_addr *,
+				  void *)) dlsym(ark->d_handle, "mac_addr_set");
+
+	return found;
+}
+
+static int
+eth_ark_dev_init(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+	struct rte_pci_device *pci_dev;
+	int ret;
+
+	ark->eth_dev = dev;
+
+	ARK_DEBUG_TRACE("eth_ark_dev_init(struct rte_eth_dev *dev)\n");
+	gark[0] = ark;
+
+	/* Check to see if there is an extension that we need to load */
+	check_for_ext(dev, ark);
+	pci_dev = ARK_DEV_TO_PCI(dev);
+	rte_eth_copy_pci_info(dev, pci_dev);
+
+	if (pci_dev->device.devargs)
+		eth_ark_check_args(pci_dev->device.devargs->args);
+	else
+		PMD_DRV_LOG(INFO, "No Device args found\n");
+
+	/* Use dummy function until setup */
+	dev->rx_pkt_burst = &eth_ark_recv_pkts_noop;
+	dev->tx_pkt_burst = &eth_ark_xmit_pkts_noop;
+
+	ark->bar0 = (uint8_t *)pci_dev->mem_resource[0].addr;
+	ark->a_bar = (uint8_t *)pci_dev->mem_resource[2].addr;
+
+	ark->sysctrl.v  = (void *)&ark->bar0[ARK_SYSCTRL_BASE];
+	ark->mpurx.v  = (void *)&ark->bar0[ARK_MPU_RX_BASE];
+	ark->udm.v  = (void *)&ark->bar0[ARK_UDM_BASE];
+	ark->mputx.v  = (void *)&ark->bar0[ARK_MPU_TX_BASE];
+	ark->ddm.v  = (void *)&ark->bar0[ARK_DDM_BASE];
+	ark->cmac.v  = (void *)&ark->bar0[ARK_CMAC_BASE];
+	ark->external.v  = (void *)&ark->bar0[ARK_EXTERNAL_BASE];
+	ark->pktdir.v  = (void *)&ark->bar0[ARK_PKTDIR_BASE];
+	ark->pktgen.v  = (void *)&ark->bar0[ARK_PKTGEN_BASE];
+	ark->pktchkr.v  = (void *)&ark->bar0[ARK_PKTCHKR_BASE];
+
+	ark->rqpacing =
+		(struct ark_rqpace_t *)(ark->bar0 + ARK_RCPACING_BASE);
+	ark->started = 0;
+
+	ARK_DEBUG_TRACE
+		("Sys Ctrl Const = 0x%x  DEV Commit_iD: %08x\n",
+		 ark->sysctrl.t32[4],
+		 rte_be_to_cpu_32(ark->sysctrl.t32[0x20 / 4]));
+	PMD_DRV_LOG(INFO, "ARKP PMD  Commit_iD: %08x\n",
+				rte_be_to_cpu_32(ark->sysctrl.t32[0x20 / 4]));
+
+	/* If HW sanity test fails, return an error */
+	if (ark->sysctrl.t32[4] != 0xcafef00d) {
+		PMD_DRV_LOG
+			(ERR,
+			 "HW Sanity test has failed, expected constant 0x%x, read 0x%x (%s)\n",
+			 0xcafef00d, ark->sysctrl.t32[4], __func__);
+		return -1;
+	}
+
+	PMD_DRV_LOG
+		(INFO,
+		 "HW Sanity test has PASSED, expected constant 0x%x, read 0x%x (%s)\n",
+		 0xcafef00d, ark->sysctrl.t32[4], __func__);
+
+	/* We are a single function multi-port device. */
+	const unsigned int numa_node = rte_socket_id();
+	struct ether_addr adr;
+
+	ret = ark_config_device(dev);
+	dev->dev_ops = &ark_eth_dev_ops;
+
+	dev->data->mac_addrs = rte_zmalloc("ark", ETHER_ADDR_LEN, 0);
+	if (!dev->data->mac_addrs) {
+		PMD_DRV_LOG(ERR,
+					"Failed to allocated memory for storing mac address");
+	}
+	ether_addr_copy((struct ether_addr *)&adr, &dev->data->mac_addrs[0]);
+
+	if (ark->user_ext.dev_init) {
+		ark->user_data = ark->user_ext.dev_init(dev, ark->a_bar, 0);
+		if (!ark->user_data) {
+			PMD_DRV_LOG(INFO,
+						"Failed to initialize PMD extension !!, continuing without it\n");
+			memset(&ark->user_ext, 0, sizeof(struct ark_user_ext));
+			dlclose(ark->d_handle);
+		}
+	}
+
+	/*
+	 * We will create additional devices based on the number of requested
+	 * ports
+	 */
+	int pc = 1;
+	int p;
+
+	if (ark->user_ext.dev_get_port_count) {
+		pc = ark->user_ext.dev_get_port_count(dev, ark->user_data);
+		ark->num_ports = pc;
+	} else {
+		ark->num_ports = 1;
+	}
+	for (p = 0; p < pc; p++) {
+		struct ark_port *port;
+
+		port = &ark->port[p];
+		struct rte_eth_dev_data *data = NULL;
+
+		port->id = p;
+
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		snprintf(name, sizeof(name), "arketh%d",
+				 dev->data->port_id + p);
+
+		if (p == 0) {
+			/* First port is already allocated by DPDK */
+			port->eth_dev = ark->eth_dev;
+			continue;
+		}
+
+		/* reserve an ethdev entry */
+		port->eth_dev = rte_eth_dev_allocate(name);
+		if (!port->eth_dev) {
+			PMD_DRV_LOG(ERR, "Could not allocate eth_dev for port %d\n",
+						p);
+			goto error;
+		}
+
+		data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+		if (!data) {
+			PMD_DRV_LOG(ERR, "Could not allocate eth_dev for port %d\n",
+						p);
+			goto error;
+		}
+		data->port_id = ark->eth_dev->data->port_id + p;
+		port->eth_dev->data = data;
+		port->eth_dev->device = &pci_dev->device;
+		port->eth_dev->data->dev_private = ark;
+		port->eth_dev->driver = ark->eth_dev->driver;
+		port->eth_dev->dev_ops = ark->eth_dev->dev_ops;
+		port->eth_dev->tx_pkt_burst = ark->eth_dev->tx_pkt_burst;
+		port->eth_dev->rx_pkt_burst = ark->eth_dev->rx_pkt_burst;
+
+		rte_eth_copy_pci_info(port->eth_dev, pci_dev);
+
+		port->eth_dev->data->mac_addrs =
+			rte_zmalloc(name, ETHER_ADDR_LEN, 0);
+		if (!port->eth_dev->data->mac_addrs) {
+			PMD_DRV_LOG(ERR,
+						"Memory allocation for MAC failed !, exiting\n");
+			goto error;
+		}
+		ether_addr_copy
+			((struct ether_addr *)&adr,
+			 &port->eth_dev->data->mac_addrs[0]);
+
+		if (ark->user_ext.dev_init)
+			ark->user_data =
+				ark->user_ext.dev_init(dev, ark->a_bar, p);
+	}
+
+	return ret;
+
+ error:
+	if (dev->data->mac_addrs)
+		rte_free(dev->data->mac_addrs);
+
+	for (p = 0; p < pc; p++) {
+		if (ark->port[p].eth_dev->data)
+			rte_free(ark->port[p].eth_dev->data);
+		if (ark->port[p].eth_dev->data->mac_addrs)
+			rte_free(ark->port[p].eth_dev->data->mac_addrs);
+	}
+
+	return -1;
+}
+
+/*
+ *Initial device configuration when device is opened
+ * setup the DDM, and UDM
+ * Called once per PCIE device
+ */
+static int
+ark_config_device(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+	uint16_t num_q, i;
+	struct ark_mpu_t *mpu;
+
+	/*
+	 * Make sure that the packet director, generator and checker are in a
+	 * known state
+	 */
+	ark->start_pg = 0;
+	ark->pg = ark_pmd_pktgen_init(ark->pktgen.v, 0, 1);
+	ark_pmd_pktgen_reset(ark->pg);
+	ark->pc = ark_pmd_pktchkr_init(ark->pktchkr.v, 0, 1);
+	ark_pmd_pktchkr_stop(ark->pc);
+	ark->pd = ark_pmd_pktdir_init(ark->pktdir.v);
+
+	/* Verify HW */
+	if (ark_udm_verify(ark->udm.v))
+		return -1;
+	if (ark_ddm_verify(ark->ddm.v))
+		return -1;
+
+	/* UDM */
+	if (ark_udm_reset(ark->udm.v)) {
+		PMD_DRV_LOG(ERR, "Unable to stop and reset UDM\n");
+		return -1;
+	}
+	/* Keep in reset until the MPU are cleared */
+
+	/* MPU reset */
+	mpu = ark->mpurx.v;
+	num_q = ark_api_num_queues(mpu);
+	ark->rx_queues = num_q;
+	for (i = 0; i < num_q; i++) {
+		ark_mpu_reset(mpu);
+		mpu = RTE_PTR_ADD(mpu, ARK_MPU_QOFFSET);
+	}
+
+	ark_udm_stop(ark->udm.v, 0);
+	ark_udm_configure(ark->udm.v,
+					  RTE_PKTMBUF_HEADROOM,
+					  RTE_MBUF_DEFAULT_DATAROOM,
+					  ARK_RX_WRITE_TIME_NS);
+	ark_udm_stats_reset(ark->udm.v);
+	ark_udm_stop(ark->udm.v, 0);
+
+	/* TX -- DDM */
+	if (ark_ddm_stop(ark->ddm.v, 1))
+		PMD_DRV_LOG(ERR, "Unable to stop DDM\n");
+
+	mpu = ark->mputx.v;
+	num_q = ark_api_num_queues(mpu);
+	ark->tx_queues = num_q;
+	for (i = 0; i < num_q; i++) {
+		ark_mpu_reset(mpu);
+		mpu = RTE_PTR_ADD(mpu, ARK_MPU_QOFFSET);
+	}
+
+	ark_ddm_reset(ark->ddm.v);
+	ark_ddm_stats_reset(ark->ddm.v);
+	/* ark_ddm_dump(ark->ddm.v, "Config"); */
+	/* ark_ddm_dump_stats(ark->ddm.v, "Config"); */
+
+	/* MPU reset */
+	ark_ddm_stop(ark->ddm.v, 0);
+	ark_rqp_stats_reset(ark->rqpacing);
+
+	return 0;
+}
+
+static int
+eth_ark_dev_uninit(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	if (ark->user_ext.dev_uninit)
+		ark->user_ext.dev_uninit(dev, ark->user_data);
+
+	ark_pmd_pktgen_uninit(ark->pg);
+	ark_pmd_pktchkr_uninit(ark->pc);
+
+	dev->dev_ops = NULL;
+	dev->rx_pkt_burst = NULL;
+	dev->tx_pkt_burst = NULL;
+	if (dev->data->mac_addrs)
+		rte_free(dev->data->mac_addrs);
+	if (dev->data)
+		rte_free(dev->data);
+
+	return 0;
+}
+
+static int
+eth_ark_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	ARK_DEBUG_TRACE
+		("ARKP: In eth_ark_dev_configure(struct rte_eth_dev *dev)\n");
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	eth_ark_dev_set_link_up(dev);
+	if (ark->user_ext.dev_configure)
+		return ark->user_ext.dev_configure(dev, ark->user_data);
+	return 0;
+}
+
+static void *
+delay_pg_start(void *arg)
+{
+	struct ark_adapter *ark = (struct ark_adapter *)arg;
+
+	/* This function is used exclusively for regression testing, We
+	 * perform a blind sleep here to ensure that the external test
+	 * application has time to setup the test before we generate packets
+	 */
+	usleep(100000);
+	ark_pmd_pktgen_run(ark->pg);
+	return NULL;
+}
+
+static int
+eth_ark_dev_start(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+	int i;
+
+	ARK_DEBUG_TRACE("ARKP: In eth_ark_dev_start\n");
+
+	/* RX Side */
+	/* start UDM */
+	ark_udm_start(ark->udm.v);
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		eth_ark_rx_start_queue(dev, i);
+
+	/* TX Side */
+	for (i = 0; i < dev->data->nb_tx_queues; i++)
+		eth_ark_tx_queue_start(dev, i);
+
+	/* start DDM */
+	ark_ddm_start(ark->ddm.v);
+
+	ark->started = 1;
+	/* set xmit and receive function */
+	dev->rx_pkt_burst = &eth_ark_recv_pkts;
+	dev->tx_pkt_burst = &eth_ark_xmit_pkts;
+
+	if (ark->start_pg)
+		ark_pmd_pktchkr_run(ark->pc);
+
+	if (ark->start_pg && (ark_get_port_id(dev, ark) == 0)) {
+		pthread_t thread;
+
+		/* TODO: add comment here */
+		pthread_create(&thread, NULL, delay_pg_start, ark);
+	}
+
+	if (ark->user_ext.dev_start)
+		ark->user_ext.dev_start(dev, ark->user_data);
+
+	return 0;
+}
+
+static void
+eth_ark_dev_stop(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+	int status;
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+	struct ark_mpu_t *mpu;
+
+	ARK_DEBUG_TRACE("ARKP: In eth_ark_dev_stop\n");
+
+	if (ark->started == 0)
+		return;
+	ark->started = 0;
+
+	/* Stop the extension first */
+	if (ark->user_ext.dev_stop)
+		ark->user_ext.dev_stop(dev, ark->user_data);
+
+	/* Stop the packet generator */
+	if (ark->start_pg)
+		ark_pmd_pktgen_pause(ark->pg);
+
+	dev->rx_pkt_burst = &eth_ark_recv_pkts_noop;
+	dev->tx_pkt_burst = &eth_ark_xmit_pkts_noop;
+
+	/* STOP TX Side */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		status = eth_ark_tx_queue_stop(dev, i);
+		if (status != 0) {
+			uint8_t port = dev->data->port_id;
+			PMD_DRV_LOG(ERR,
+						"ARKP tx_queue stop anomaly port %u, queue %u\n",
+						port, i);
+		}
+	}
+
+	/* Stop DDM */
+	/* Wait up to 0.1 second.  each stop is upto 1000 * 10 useconds */
+	for (i = 0; i < 10; i++) {
+		status = ark_ddm_stop(ark->ddm.v, 1);
+		if (status == 0)
+			break;
+	}
+	if (status || i != 0) {
+		PMD_DRV_LOG(ERR, "DDM stop anomaly. status: %d iter: %u. (%s)\n",
+					status, i, __func__);
+		ark_ddm_dump(ark->ddm.v, "Stop anomaly");
+
+		mpu = ark->mputx.v;
+		for (i = 0; i < ark->tx_queues; i++) {
+			ark_mpu_dump(mpu, "DDM failure dump", i);
+			mpu = RTE_PTR_ADD(mpu, ARK_MPU_QOFFSET);
+		}
+	}
+
+	/* STOP RX Side */
+	/* Stop UDM */
+	for (i = 0; i < 10; i++) {
+		status = ark_udm_stop(ark->udm.v, 1);
+		if (status == 0)
+			break;
+	}
+	if (status || i != 0) {
+		PMD_DRV_LOG(ERR, "UDM stop anomaly. status %d iter: %u. (%s)\n",
+					status, i, __func__);
+		ark_udm_dump(ark->udm.v, "Stop anomaly");
+
+		mpu = ark->mpurx.v;
+		for (i = 0; i < ark->rx_queues; i++) {
+			ark_mpu_dump(mpu, "UDM Stop anomaly", i);
+			mpu = RTE_PTR_ADD(mpu, ARK_MPU_QOFFSET);
+		}
+	}
+
+	ark_udm_dump_stats(ark->udm.v, "Post stop");
+	ark_udm_dump_perf(ark->udm.v, "Post stop");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++)
+		eth_ark_rx_dump_queue(dev, i, __func__);
+
+	/* Stop the packet checker if it is running */
+	if (ark->start_pg) {
+		ark_pmd_pktchkr_dump_stats(ark->pc);
+		ark_pmd_pktchkr_stop(ark->pc);
+	}
+}
+
+static void
+eth_ark_dev_close(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+	uint16_t i;
+
+	if (ark->user_ext.dev_close)
+		ark->user_ext.dev_close(dev, ark->user_data);
+
+	eth_ark_dev_stop(dev);
+	eth_ark_udm_force_close(dev);
+
+	/*
+	 * TODO This should only be called once for the device during shutdown
+	 */
+	ark_rqp_dump(ark->rqpacing);
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+		eth_ark_tx_queue_release(dev->data->tx_queues[i]);
+		dev->data->tx_queues[i] = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		eth_ark_dev_rx_queue_release(dev->data->rx_queues[i]);
+		dev->data->rx_queues[i] = 0;
+	}
+}
+
+static void
+eth_ark_dev_info_get(struct rte_eth_dev *dev,
+					 struct rte_eth_dev_info *dev_info)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+	struct ark_mpu_t *tx_mpu = RTE_PTR_ADD(ark->bar0, ARK_MPU_TX_BASE);
+	struct ark_mpu_t *rx_mpu = RTE_PTR_ADD(ark->bar0, ARK_MPU_RX_BASE);
+
+	uint16_t ports = ark->num_ports;
+
+	/* device specific configuration */
+	memset(dev_info, 0, sizeof(*dev_info));
+
+	dev_info->max_rx_queues = ark_api_num_queues_per_port(rx_mpu, ports);
+	dev_info->max_tx_queues = ark_api_num_queues_per_port(tx_mpu, ports);
+	dev_info->max_mac_addrs = 0;
+	dev_info->if_index = 0;
+	dev_info->max_rx_pktlen = (16 * 1024) - 128;
+	dev_info->min_rx_bufsize = 1024;
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = 4096 * 4,
+		.nb_min = 512,	/* HW Q size for RX */
+		.nb_align = 2,};
+
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+		.nb_max = 4096 * 4,
+		.nb_min = 256,	/* HW Q size for TX */
+		.nb_align = 2,};
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	/* ARK PMD supports all line rates, how do we indicate that here ?? */
+	dev_info->speed_capa =
+		ETH_LINK_SPEED_1G | ETH_LINK_SPEED_10G | ETH_LINK_SPEED_25G |
+		ETH_LINK_SPEED_40G | ETH_LINK_SPEED_50G | ETH_LINK_SPEED_100G;
+	dev_info->pci_dev = ARK_DEV_TO_PCI(dev);
+	dev_info->driver_name = dev->data->drv_name;
+}
+
+static int
+eth_ark_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete)
+{
+	ARK_DEBUG_TRACE("ARKP: link status = %d\n",
+					dev->data->dev_link.link_status);
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	if (ark->user_ext.link_update) {
+		return ark->user_ext.link_update
+			(dev, wait_to_complete,
+			 ark->user_data);
+	}
+	return 0;
+}
+
+static int
+eth_ark_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = 1;
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	if (ark->user_ext.dev_set_link_up)
+		return ark->user_ext.dev_set_link_up(dev, ark->user_data);
+	return 0;
+}
+
+static int
+eth_ark_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = 0;
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	if (ark->user_ext.dev_set_link_down)
+		return ark->user_ext.dev_set_link_down(dev, ark->user_data);
+	return 0;
+}
+
+static void
+eth_ark_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	uint16_t i;
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	stats->ipackets = 0;
+	stats->ibytes = 0;
+	stats->opackets = 0;
+	stats->obytes = 0;
+	stats->imissed = 0;
+	stats->oerrors = 0;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++)
+		eth_tx_queue_stats_get(dev->data->tx_queues[i], stats);
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		eth_rx_queue_stats_get(dev->data->rx_queues[i], stats);
+	if (ark->user_ext.stats_get)
+		ark->user_ext.stats_get(dev, stats, ark->user_data);
+}
+
+static void
+eth_ark_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++)
+		eth_tx_queue_stats_reset(dev->data->rx_queues[i]);
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		eth_rx_queue_stats_reset(dev->data->rx_queues[i]);
+	if (ark->user_ext.stats_reset)
+		ark->user_ext.stats_reset(dev, ark->user_data);
+}
+
+static void
+eth_ark_macaddr_add(struct rte_eth_dev *dev,
+					struct ether_addr *mac_addr,
+					uint32_t index,
+					uint32_t pool)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	if (ark->user_ext.mac_addr_add)
+		ark->user_ext.mac_addr_add
+			(dev,
+			 mac_addr,
+			 index,
+			 pool,
+			 ark->user_data);
+}
+
+static void
+eth_ark_macaddr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	if (ark->user_ext.mac_addr_remove)
+		ark->user_ext.mac_addr_remove(dev, index, ark->user_data);
+}
+
+static void
+eth_ark_set_default_mac_addr(struct rte_eth_dev *dev,
+			 struct ether_addr *mac_addr)
+{
+	struct ark_adapter *ark =
+		(struct ark_adapter *)dev->data->dev_private;
+
+	if (ark->user_ext.mac_addr_set)
+		ark->user_ext.mac_addr_set(dev, mac_addr, ark->user_data);
+}
+
+static inline int
+process_pktdir_arg(const char *key, const char *value,
+				   void *extra_args __rte_unused)
+{
+	ARK_DEBUG_TRACE("**** IN process_pktdir_arg, key = %s, value = %s\n",
+					key, value);
+	pkt_dir_v = strtol(value, NULL, 16);
+	ARK_DEBUG_TRACE("pkt_dir_v = 0x%x\n", pkt_dir_v);
+	return 0;
+}
+
+static inline int
+process_file_args(const char *key, const char *value, void *extra_args)
+{
+	ARK_DEBUG_TRACE("**** IN process_pktgen_arg, key = %s, value = %s\n",
+					key, value);
+	char *args = (char *)extra_args;
+
+	/* Open the configuration file */
+	FILE *file = fopen(value, "r");
+	char line[256];
+	int first = 1;
+
+	while (fgets(line, sizeof(line), file)) {
+		/* ARK_DEBUG_TRACE("%s\n", line); */
+		if (first) {
+			strncpy(args, line, ARK_MAX_ARG_LEN);
+			first = 0;
+		} else {
+			strncat(args, line, ARK_MAX_ARG_LEN);
+		}
+	}
+	ARK_DEBUG_TRACE("file = %s\n", args);
+	fclose(file);
+	return 0;
+}
+
+static int
+eth_ark_check_args(const char *params)
+{
+	struct rte_kvargs *kvlist;
+	unsigned int k_idx;
+	struct rte_kvargs_pair *pair = NULL;
+
+	/*
+	 * TODO: the index of gark[index] should be associated with phy dev
+	 * map
+	 */
+	struct ark_adapter *ark = gark[0];
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+		return 0;
+
+	pkt_gen_args[0] = 0;
+	pkt_chkr_args[0] = 0;
+
+	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
+		pair = &kvlist->pairs[k_idx];
+		ARK_DEBUG_TRACE("**** Arg passed to PMD = %s:%s\n", pair->key,
+						pair->value);
+	}
+
+	if (rte_kvargs_process(kvlist,
+						   ARK_PKTDIR_ARG,
+						   &process_pktdir_arg,
+						   NULL) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTDIR_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist,
+						   ARK_PKTGEN_ARG,
+						   &process_file_args,
+						   pkt_gen_args) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTGEN_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist,
+						   ARK_PKTCHKR_ARG,
+						   &process_file_args,
+						   pkt_chkr_args) != 0) {
+		PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTCHKR_ARG);
+	}
+
+	/* Setup the packet director */
+	ark_pmd_pktdir_setup(ark->pd, pkt_dir_v);
+	ARK_DEBUG_TRACE("INFO: packet director set to 0x%x\n", pkt_dir_v);
+
+	/* Setup the packet generator */
+	if (pkt_gen_args[0]) {
+		PMD_DRV_LOG(INFO, "Setting up the packet generator\n");
+		ark_pmd_pktgen_parse(pkt_gen_args);
+		ark_pmd_pktgen_reset(ark->pg);
+		ark_pmd_pktgen_setup(ark->pg);
+		ark->start_pg = 1;
+	}
+
+	/* Setup the packet checker */
+	if (pkt_chkr_args[0]) {
+		ark_pmd_pktchkr_parse(pkt_chkr_args);
+		ark_pmd_pktchkr_setup(ark->pc);
+	}
+
+	return 1;
+}
+
+static int
+pmd_ark_probe(const char *name, const char *params)
+{
+	RTE_LOG(INFO, PMD, "Initializing pmd_ark for %s params = %s\n", name,
+			params);
+
+	/* Parse off the v index */
+
+	eth_ark_check_args(params);
+	return 0;
+}
+
+static int
+pmd_ark_remove(const char *name)
+{
+	RTE_LOG(INFO, PMD, "Closing ark %s ethdev on numa socket %u\n", name,
+			rte_socket_id());
+	return 1;
+}
+
+static struct rte_vdev_driver pmd_ark_drv = {
+	.probe = pmd_ark_probe,
+	.remove = pmd_ark_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_ark, pmd_ark_drv);
+RTE_PMD_REGISTER_ALIAS(net_ark, eth_ark);
+RTE_PMD_REGISTER_PCI(eth_ark, rte_ark_pmd.pci_drv);
+RTE_PMD_REGISTER_KMOD_DEP(net_ark, "* igb_uio | uio_pci_generic ");
+RTE_PMD_REGISTER_PCI_TABLE(eth_ark, pci_id_ark_map);
diff --git a/drivers/net/ark/ark_ethdev.h b/drivers/net/ark/ark_ethdev.h
new file mode 100644
index 0000000..9167181
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.h
@@ -0,0 +1,75 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_ETHDEV_H_
+#define _ARK_ETHDEV_H_
+
+int ark_get_port_id(struct rte_eth_dev *dev, struct ark_adapter *ark);
+
+/* RX functions */
+int eth_ark_dev_rx_queue_setup(struct rte_eth_dev *dev,
+	uint16_t queue_idx,
+	uint16_t nb_desc,
+	unsigned int socket_id,
+	const struct rte_eth_rxconf *rx_conf, struct rte_mempool *mp);
+uint32_t eth_ark_dev_rx_queue_count(struct rte_eth_dev *dev,
+	uint16_t rx_queue_id);
+int eth_ark_rx_stop_queue(struct rte_eth_dev *dev, uint16_t queue_id);
+int eth_ark_rx_start_queue(struct rte_eth_dev *dev, uint16_t queue_id);
+uint16_t eth_ark_recv_pkts_noop(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts);
+uint16_t eth_ark_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts);
+void eth_ark_dev_rx_queue_release(void *rx_queue);
+void eth_rx_queue_stats_get(void *vqueue, struct rte_eth_stats *stats);
+void eth_rx_queue_stats_reset(void *vqueue);
+void eth_ark_rx_dump_queue(struct rte_eth_dev *dev, uint16_t queue_id,
+	const char *msg);
+
+void eth_ark_udm_force_close(struct rte_eth_dev *dev);
+
+/* TX functions */
+uint16_t eth_ark_xmit_pkts_noop(void *txq, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+uint16_t eth_ark_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+int eth_ark_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
+	uint16_t nb_desc, unsigned int socket_id,
+	const struct rte_eth_txconf *tx_conf);
+void eth_ark_tx_queue_release(void *tx_queue);
+int eth_ark_tx_queue_stop(struct rte_eth_dev *dev, uint16_t queue_id);
+int eth_ark_tx_queue_start(struct rte_eth_dev *dev, uint16_t queue_id);
+void eth_tx_queue_stats_get(void *vqueue, struct rte_eth_stats *stats);
+void eth_tx_queue_stats_reset(void *vqueue);
+
+#endif
diff --git a/drivers/net/ark/ark_ethdev_rx.c b/drivers/net/ark/ark_ethdev_rx.c
new file mode 100644
index 0000000..3a1c778
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev_rx.c
@@ -0,0 +1,667 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_global.h"
+#include "ark_debug.h"
+#include "ark_ethdev.h"
+#include "ark_mpu.h"
+#include "ark_udm.h"
+
+#define ARK_RX_META_SIZE 32
+#define ARK_RX_META_OFFSET (RTE_PKTMBUF_HEADROOM - ARK_RX_META_SIZE)
+#define ARK_RX_MAX_NOCHAIN (RTE_MBUF_DEFAULT_DATAROOM)
+
+#ifdef RTE_LIBRTE_ARK_DEBUG_RX
+#define ARK_RX_DEBUG 1
+#define ARK_FULL_DEBUG 1
+#else
+#define ARK_RX_DEBUG 0
+#define ARK_FULL_DEBUG 0
+#endif
+
+/* Forward declarations */
+struct ark_rx_queue;
+struct ark_rx_meta;
+
+static void dump_mbuf_data(struct rte_mbuf *mbuf, uint16_t lo, uint16_t hi);
+static void ark_ethdev_rx_dump(const char *name, struct ark_rx_queue *queue);
+static uint32_t eth_ark_rx_jumbo(struct ark_rx_queue *queue,
+	struct ark_rx_meta *meta, struct rte_mbuf *mbuf0, uint32_t cons_index);
+static inline int eth_ark_rx_seed_mbufs(struct ark_rx_queue *queue);
+
+/* ************************************************************************* */
+struct ark_rx_queue {
+	/* array of mbufs to populate */
+	struct rte_mbuf **reserve_q;
+	/* array of physical addrresses of the mbuf data pointer */
+	/* This point is a virtual address */
+	phys_addr_t *paddress_q;
+	struct rte_mempool *mb_pool;
+
+	struct ark_udm_t *udm;
+	struct ark_mpu_t *mpu;
+
+	uint32_t queue_size;
+	uint32_t queue_mask;
+
+	uint32_t seed_index;		/* 1 set with an empty mbuf */
+	uint32_t cons_index;		/* 3 consumed by the driver */
+
+	/* The queue Id is used to identify the HW Q */
+	uint16_t phys_qid;
+
+	/* The queue Index is used within the dpdk device structures */
+	uint16_t queue_index;
+
+	uint32_t pad1;
+
+	/* separate cache line */
+	/* second cache line - fields only used in slow path */
+	MARKER cacheline1 __rte_cache_min_aligned;
+
+	volatile uint32_t prod_index;	/* 2 filled by the HW */
+} __rte_cache_aligned;
+
+/* ************************************************************************* */
+
+/* MATCHES struct in UDMDefines.bsv */
+
+/* TODO move to ark_udm.h */
+struct ark_rx_meta {
+	uint64_t timestamp;
+	uint64_t user_data;
+	uint8_t port;
+	uint8_t dst_queue;
+	uint16_t pkt_len;
+};
+
+/* ************************************************************************* */
+
+/* TODO  pick a better function name */
+static int
+eth_ark_rx_queue_setup(struct rte_eth_dev *dev,
+	struct ark_rx_queue *queue,
+	uint16_t rx_queue_id __rte_unused, uint16_t rx_queue_idx)
+{
+	phys_addr_t queue_base;
+	phys_addr_t phys_addr_q_base;
+	phys_addr_t phys_addr_prod_index;
+
+	queue_base = rte_malloc_virt2phy(queue);
+	phys_addr_prod_index = queue_base +
+		offsetof(struct ark_rx_queue, prod_index);
+
+	phys_addr_q_base = rte_malloc_virt2phy(queue->paddress_q);
+
+	/* Verify HW */
+	if (ark_mpu_verify(queue->mpu, sizeof(phys_addr_t))) {
+	PMD_DRV_LOG(ERR, "ARKP: Illegal configuration rx queue\n");
+	return -1;
+	}
+
+	/* Stop and Reset and configure MPU */
+	ark_mpu_configure(queue->mpu, phys_addr_q_base, queue->queue_size, 0);
+
+	ark_udm_write_addr(queue->udm, phys_addr_prod_index);
+
+	/* advance the valid pointer, but don't start until the queue starts */
+	ark_mpu_reset_stats(queue->mpu);
+
+	/* The seed is the producer index for the HW */
+	ark_mpu_set_producer(queue->mpu, queue->seed_index);
+	dev->data->rx_queue_state[rx_queue_idx] = RTE_ETH_QUEUE_STATE_STOPPED;
+
+	return 0;
+}
+
+static inline void
+eth_ark_rx_update_cons_index(struct ark_rx_queue *queue, uint32_t cons_index)
+{
+	queue->cons_index = cons_index;
+	eth_ark_rx_seed_mbufs(queue);
+	ark_mpu_set_producer(queue->mpu, queue->seed_index);
+}
+
+/* ************************************************************************* */
+int
+eth_ark_dev_rx_queue_setup(struct rte_eth_dev *dev,
+	uint16_t queue_idx,
+	uint16_t nb_desc,
+	unsigned int socket_id,
+	const struct rte_eth_rxconf *rx_conf, struct rte_mempool *mb_pool)
+{
+	struct ark_adapter *ark = (struct ark_adapter *)dev->data->dev_private;
+	static int warning1;		/* = 0 */
+
+	struct ark_rx_queue *queue;
+	uint32_t i;
+	int status;
+
+	int port = ark_get_port_id(dev, ark);
+	int qidx = port + queue_idx;	/* TODO FIXME */
+
+	// TODO: We may already be setup, check here if there is nothing to do
+	/* Free memory prior to re-allocation if needed */
+	if (dev->data->rx_queues[queue_idx] != NULL) {
+	// TODO: release any allocated queues
+	dev->data->rx_queues[queue_idx] = NULL;
+	}
+
+	if (rx_conf != NULL && warning1 == 0) {
+	warning1 = 1;
+	PMD_DRV_LOG(INFO,
+		"ARKP: Arkville PMD ignores  rte_eth_rxconf argument.\n");
+	}
+
+	if (RTE_PKTMBUF_HEADROOM < ARK_RX_META_SIZE) {
+	PMD_DRV_LOG(ERR,
+		"Error: DPDK Arkville requires head room > %d bytes (%s)\n",
+		ARK_RX_META_SIZE, __func__);
+	return -1;		/* ERROR CODE */
+	}
+
+	if (!rte_is_power_of_2(nb_desc)) {
+	PMD_DRV_LOG(ERR,
+		"DPDK Arkville configuration queue size must be power of two %u (%s)\n",
+		nb_desc, __func__);
+	return -1;		/* ERROR CODE */
+	}
+
+	/* Allocate queue struct */
+	queue =
+	rte_zmalloc_socket("Ark_rXQueue", sizeof(struct ark_rx_queue), 64,
+	socket_id);
+	if (queue == 0) {
+		PMD_DRV_LOG(ERR, "Failed to allocate memory in %s\n", __func__);
+		return -ENOMEM;
+	}
+
+	/* NOTE zmalloc is used, no need to 0 indexes, etc. */
+	queue->mb_pool = mb_pool;
+	queue->phys_qid = qidx;
+	queue->queue_index = queue_idx;
+	queue->queue_size = nb_desc;
+	queue->queue_mask = nb_desc - 1;
+
+	queue->reserve_q =
+	rte_zmalloc_socket("Ark_rXQueue mbuf",
+	nb_desc * sizeof(struct rte_mbuf *), 64, socket_id);
+	queue->paddress_q =
+	rte_zmalloc_socket("Ark_rXQueue paddr", nb_desc * sizeof(phys_addr_t),
+	64, socket_id);
+	if (queue->reserve_q == 0 || queue->paddress_q == 0) {
+	PMD_DRV_LOG(ERR, "Failed to allocate queue memory in %s\n", __func__);
+	rte_free(queue->reserve_q);
+	rte_free(queue->paddress_q);
+	rte_free(queue);
+	return -ENOMEM;
+	}
+
+	dev->data->rx_queues[queue_idx] = queue;
+	queue->udm = RTE_PTR_ADD(ark->udm.v, qidx * ARK_UDM_QOFFSET);
+	queue->mpu = RTE_PTR_ADD(ark->mpurx.v, qidx * ARK_MPU_QOFFSET);
+
+	/* populate mbuf reserve */
+	status = eth_ark_rx_seed_mbufs(queue);
+
+	/* MPU Setup */
+	if (status == 0)
+		status = eth_ark_rx_queue_setup(dev, queue, qidx, queue_idx);
+
+	if (unlikely(status != 0)) {
+	struct rte_mbuf *mbuf;
+
+	PMD_DRV_LOG(ERR, "ARKP Failed to initialize RX queue %d %s\n", qidx,
+		__func__);
+	/* Free the mbufs allocated */
+	for (i = 0, mbuf = queue->reserve_q[0]; i < nb_desc; ++i, mbuf++) {
+		if (mbuf != 0)
+			rte_pktmbuf_free(mbuf);
+	}
+	rte_free(queue->reserve_q);
+	rte_free(queue->paddress_q);
+	rte_free(queue);
+	return -1;		/* ERROR CODE */
+	}
+
+	return 0;
+}
+
+/* ************************************************************************* */
+uint16_t
+eth_ark_recv_pkts_noop(void *rx_queue __rte_unused,
+	struct rte_mbuf **rx_pkts __rte_unused, uint16_t nb_pkts __rte_unused)
+{
+	return 0;
+}
+
+/* ************************************************************************* */
+uint16_t
+eth_ark_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct ark_rx_queue *queue;
+	register uint32_t cons_index, prod_index;
+	uint16_t nb;
+	uint64_t rx_bytes = 0;
+	struct rte_mbuf *mbuf;
+	struct ark_rx_meta *meta;
+
+	queue = (struct ark_rx_queue *)rx_queue;
+	if (unlikely(queue == 0))
+	return 0;
+	if (unlikely(nb_pkts == 0))
+	return 0;
+	prod_index = queue->prod_index;
+	cons_index = queue->cons_index;
+	nb = 0;
+
+	while (prod_index != cons_index) {
+		mbuf = queue->reserve_q[cons_index & queue->queue_mask];
+		/* prefetch mbuf ? */
+		rte_mbuf_prefetch_part1(mbuf);
+		rte_mbuf_prefetch_part2(mbuf);
+
+		/* META DATA burried in buffer */
+		meta = RTE_PTR_ADD(mbuf->buf_addr, ARK_RX_META_OFFSET);
+
+		mbuf->port = meta->port;
+		mbuf->pkt_len = meta->pkt_len;
+		mbuf->data_len = meta->pkt_len;
+		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+		mbuf->udata64 = meta->user_data;
+		if (ARK_RX_DEBUG) {	/* debug use */
+			if ((meta->pkt_len > (1024 * 16)) ||
+				(meta->pkt_len == 0)) {
+				PMD_DRV_LOG(INFO,
+						"ARKP RX: Bad Meta Q: %u cons: %u prod: %u\n",
+						queue->phys_qid,
+						cons_index,
+						queue->prod_index);
+
+				PMD_DRV_LOG(INFO, "       :  cons: %u prod: %u seed_index %u\n",
+						cons_index,
+						queue->prod_index,
+						queue->seed_index);
+
+				PMD_DRV_LOG(INFO, "       :  UDM prod: %u  len: %u\n",
+						queue->udm->rt_cfg.prod_idx,
+						meta->pkt_len);
+				ark_mpu_dump(queue->mpu,
+							 "    ",
+							 queue->phys_qid);
+
+				dump_mbuf_data(mbuf, 0, 256);
+				/* its FUBAR so fix it */
+				mbuf->pkt_len = 63;
+				meta->pkt_len = 63;
+			}
+			mbuf->seqn = cons_index;
+		}
+
+		rx_bytes += meta->pkt_len;	/* TEMP stats */
+
+		if (unlikely(meta->pkt_len > ARK_RX_MAX_NOCHAIN))
+			cons_index = eth_ark_rx_jumbo
+				(queue, meta, mbuf, cons_index + 1);
+		else
+			cons_index += 1;
+
+		rx_pkts[nb] = mbuf;
+		nb++;
+		if (nb >= nb_pkts)
+			break;
+	}
+
+	if (unlikely(nb != 0))
+		/* report next free to FPGA */
+		eth_ark_rx_update_cons_index(queue, cons_index);
+
+	return nb;
+}
+
+/* ************************************************************************* */
+static uint32_t
+eth_ark_rx_jumbo(struct ark_rx_queue *queue,
+	struct ark_rx_meta *meta, struct rte_mbuf *mbuf0, uint32_t cons_index)
+{
+	struct rte_mbuf *mbuf_prev;
+	struct rte_mbuf *mbuf;
+
+	uint16_t remaining;
+	uint16_t data_len;
+	uint8_t segments;
+
+	/* first buf populated by called */
+	mbuf_prev = mbuf0;
+	segments = 1;
+	data_len = RTE_MIN(meta->pkt_len, RTE_MBUF_DEFAULT_DATAROOM);
+	remaining = meta->pkt_len - data_len;
+	mbuf0->data_len = data_len;
+
+	/* TODO check that the data does not exceed prod_index! */
+	while (remaining != 0) {
+		data_len =
+			RTE_MIN(remaining,
+					RTE_MBUF_DEFAULT_DATAROOM +
+					RTE_PKTMBUF_HEADROOM);
+
+		remaining -= data_len;
+		segments += 1;
+
+		mbuf = queue->reserve_q[cons_index & queue->queue_mask];
+		mbuf_prev->next = mbuf;
+		mbuf_prev = mbuf;
+		mbuf->data_len = data_len;
+		mbuf->data_off = 0;
+		if (ARK_RX_DEBUG)
+			mbuf->seqn = cons_index;	/* for debug only */
+
+		cons_index += 1;
+	}
+
+	mbuf0->nb_segs = segments;
+	return cons_index;
+}
+
+/* Drain the internal queue allowing hw to clear out. */
+static void
+eth_ark_rx_queue_drain(struct ark_rx_queue *queue)
+{
+	register uint32_t cons_index;
+	struct rte_mbuf *mbuf;
+
+	cons_index = queue->cons_index;
+
+	/* NOT performance optimized, since this is a one-shot call */
+	while ((cons_index ^ queue->prod_index) & queue->queue_mask) {
+		mbuf = queue->reserve_q[cons_index & queue->queue_mask];
+		rte_pktmbuf_free(mbuf);
+		cons_index++;
+		eth_ark_rx_update_cons_index(queue, cons_index);
+	}
+}
+
+uint32_t
+eth_ark_dev_rx_queue_count(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_rx_queue *queue;
+
+	queue = dev->data->rx_queues[queue_id];
+	return (queue->prod_index - queue->cons_index);	/* mod arith */
+}
+
+/* ************************************************************************* */
+int
+eth_ark_rx_start_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_rx_queue *queue;
+
+	queue = dev->data->rx_queues[queue_id];
+	if (queue == 0)
+	return -1;
+
+	dev->data->rx_queue_state[queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
+
+	ark_mpu_set_producer(queue->mpu, queue->seed_index);
+	ark_mpu_start(queue->mpu);
+
+	ark_udm_queue_enable(queue->udm, 1);
+
+	return 0;
+}
+
+/* ************************************************************************* */
+
+/* Queue can be restarted.   data remains
+ */
+int
+eth_ark_rx_stop_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_rx_queue *queue;
+
+	queue = dev->data->rx_queues[queue_id];
+	if (queue == 0)
+	return -1;
+
+	ark_udm_queue_enable(queue->udm, 0);
+
+	dev->data->rx_queue_state[queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;
+
+	return 0;
+}
+
+/* ************************************************************************* */
+static inline int
+eth_ark_rx_seed_mbufs(struct ark_rx_queue *queue)
+{
+	uint32_t limit = queue->cons_index + queue->queue_size;
+	uint32_t seed_index = queue->seed_index;
+
+	uint32_t count = 0;
+	uint32_t seed_m = queue->seed_index & queue->queue_mask;
+
+	uint32_t nb = limit - seed_index;
+
+	/* Handle wrap around -- remainder is filled on the next call */
+	if (unlikely(seed_m + nb > queue->queue_size))
+		nb = queue->queue_size - seed_m;
+
+	struct rte_mbuf **mbufs = &queue->reserve_q[seed_m];
+	int status = rte_pktmbuf_alloc_bulk(queue->mb_pool, mbufs, nb);
+
+	if (unlikely(status != 0))
+		return -1;
+
+	if (ARK_RX_DEBUG) {		/* DEBUG */
+		while (count != nb) {
+			struct rte_mbuf *mbuf_init =
+				queue->reserve_q[seed_m + count];
+
+			memset(mbuf_init->buf_addr, -1, 512);
+			*((uint32_t *)mbuf_init->buf_addr) = seed_index + count;
+			*(uint16_t *)RTE_PTR_ADD(mbuf_init->buf_addr, 4) =
+				queue->phys_qid;
+			count++;
+		}
+		count = 0;
+	}
+	/* DEBUG */
+	queue->seed_index += nb;
+
+	/* Duff's device https://en.wikipedia.org/wiki/Duff's_device */
+	switch (nb % 4) {
+	case 0:
+	while (count != nb) {
+		queue->paddress_q[seed_m++] = (*mbufs++)->buf_physaddr;
+		count++;
+		/* FALLTHROUGH */
+	case 3:
+		queue->paddress_q[seed_m++] = (*mbufs++)->buf_physaddr;
+		count++;
+		/* FALLTHROUGH */
+	case 2:
+		queue->paddress_q[seed_m++] = (*mbufs++)->buf_physaddr;
+		count++;
+		/* FALLTHROUGH */
+	case 1:
+		queue->paddress_q[seed_m++] = (*mbufs++)->buf_physaddr;
+		count++;
+		/* FALLTHROUGH */
+
+	} /* while (count != nb) */
+	} /* switch */
+
+	return 0;
+}
+
+void
+eth_ark_rx_dump_queue(struct rte_eth_dev *dev, uint16_t queue_id,
+	const char *msg)
+{
+	struct ark_rx_queue *queue;
+
+	queue = dev->data->rx_queues[queue_id];
+
+	ark_ethdev_rx_dump(msg, queue);
+}
+
+/* ************************************************************************* */
+
+/* Call on device closed no user API, queue is stopped */
+void
+eth_ark_dev_rx_queue_release(void *vqueue)
+{
+	struct ark_rx_queue *queue;
+	uint32_t i;
+
+	queue = (struct ark_rx_queue *)vqueue;
+	if (queue == 0)
+		return;
+
+	ark_udm_queue_enable(queue->udm, 0);
+	/* Stop the MPU since pointer are going away */
+	ark_mpu_stop(queue->mpu);
+
+	/* Need to clear out mbufs here, dropping packets along the way */
+	eth_ark_rx_queue_drain(queue);
+
+	for (i = 0; i < queue->queue_size; ++i)
+		rte_pktmbuf_free(queue->reserve_q[i]);
+
+	rte_free(queue->reserve_q);
+	rte_free(queue->paddress_q);
+	rte_free(queue);
+}
+
+void
+eth_rx_queue_stats_get(void *vqueue, struct rte_eth_stats *stats)
+{
+	struct ark_rx_queue *queue;
+	struct ark_udm_t *udm;
+
+	queue = vqueue;
+	if (queue == 0)
+	return;
+	udm = queue->udm;
+
+	uint64_t ibytes = ark_udm_bytes(udm);
+	uint64_t ipackets = ark_udm_packets(udm);
+	uint64_t idropped = ark_udm_dropped(queue->udm);
+
+	stats->q_ipackets[queue->queue_index] = ipackets;
+	stats->q_ibytes[queue->queue_index] = ibytes;
+	stats->q_errors[queue->queue_index] = idropped;
+	stats->ipackets += ipackets;
+	stats->ibytes += ibytes;
+	stats->imissed += idropped;
+}
+
+void
+eth_rx_queue_stats_reset(void *vqueue)
+{
+	struct ark_rx_queue *queue;
+
+	queue = vqueue;
+	if (queue == 0)
+		return;
+
+	ark_mpu_reset_stats(queue->mpu);
+	ark_udm_queue_stats_reset(queue->udm);
+}
+
+void
+eth_ark_udm_force_close(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark = (struct ark_adapter *)dev->data->dev_private;
+	struct ark_rx_queue *queue;
+	uint32_t index;
+	uint16_t i;
+
+	if (!ark_udm_is_flushed(ark->udm.v)) {
+	/* restart the MPUs */
+	fprintf(stderr, "ARK: %s UDM not flushed\n", __func__);
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		queue = (struct ark_rx_queue *)dev->data->rx_queues[i];
+		if (queue == 0)
+		continue;
+
+		ark_mpu_start(queue->mpu);
+		/* Add some buffers */
+		index = 100000 + queue->seed_index;
+		ark_mpu_set_producer(queue->mpu, index);
+	}
+	/* Wait to allow data to pass */
+	usleep(100);
+
+	ARK_DEBUG_TRACE("UDM forced flush attempt, stopped = %d\n",
+		ark_udm_is_flushed(ark->udm.v));
+	}
+	ark_udm_reset(ark->udm.v);
+}
+
+static void
+ark_ethdev_rx_dump(const char *name, struct ark_rx_queue *queue)
+{
+	if (queue == NULL)
+	return;
+	ARK_DEBUG_TRACE("RX QUEUE %d -- %s", queue->phys_qid, name);
+	ARK_DEBUG_TRACE(ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 "\n",
+	"queue_size", queue->queue_size,
+	"seed_index", queue->seed_index,
+	"prod_index", queue->prod_index, "cons_index", queue->cons_index);
+
+	ark_mpu_dump(queue->mpu, name, queue->phys_qid);
+	ark_mpu_dump_setup(queue->mpu, queue->phys_qid);
+	ark_udm_dump(queue->udm, name);
+	ark_udm_dump_setup(queue->udm, queue->phys_qid);
+}
+
+static void
+dump_mbuf_data(struct rte_mbuf *mbuf, uint16_t lo, uint16_t hi)
+{
+	uint16_t i, j;
+
+	fprintf(stderr, " MBUF: %p len %d, off: %d, seq: %u\n", mbuf,
+	mbuf->pkt_len, mbuf->data_off, mbuf->seqn);
+	for (i = lo; i < hi; i += 16) {
+		uint8_t *dp = RTE_PTR_ADD(mbuf->buf_addr, i);
+
+		fprintf(stderr, "  %6d:  ", i);
+		for (j = 0; j < 16; j++)
+			fprintf(stderr, " %02x", dp[j]);
+
+		fprintf(stderr, "\n");
+	}
+}
diff --git a/drivers/net/ark/ark_ethdev_tx.c b/drivers/net/ark/ark_ethdev_tx.c
new file mode 100644
index 0000000..2b14feb
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev_tx.c
@@ -0,0 +1,492 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_global.h"
+#include "ark_mpu.h"
+#include "ark_ddm.h"
+#include "ark_ethdev.h"
+#include "ark_debug.h"
+
+#define ARK_TX_META_SIZE   32
+#define ARK_TX_META_OFFSET (RTE_PKTMBUF_HEADROOM - ARK_TX_META_SIZE)
+#define ARK_TX_MAX_NOCHAIN (RTE_MBUF_DEFAULT_DATAROOM)
+#define ARK_TX_PAD_TO_60   1
+
+#ifdef RTE_LIBRTE_ARK_DEBUG_TX
+#define ARK_TX_DEBUG       1
+#define ARK_TX_DEBUG_JUMBO 1
+#else
+#define ARK_TX_DEBUG       0
+#define ARK_TX_DEBUG_JUMBO 0
+#endif
+
+/* ************************************************************************* */
+
+/* struct fixed in FPGA -- 16 bytes */
+
+/* TODO move to ark_ddm.h */
+struct ark_tx_meta {
+	uint64_t physaddr;
+	uint32_t delta_ns;
+	uint16_t data_len;		/* of this MBUF */
+#define   ARK_DDM_EOP   0x01
+#define   ARK_DDM_SOP   0x02
+	uint8_t flags;		/* bit 0 indicates last mbuf in chain. */
+	uint8_t reserved[1];
+};
+
+/* ************************************************************************* */
+struct ark_tx_queue {
+	struct ark_tx_meta *meta_q;
+	struct rte_mbuf **bufs;
+
+	/* handles for hw objects */
+	struct ark_mpu_t *mpu;
+	struct ark_ddm_t *ddm;
+
+	/* Stats HW tracks bytes and packets, need to count send errors */
+	uint64_t tx_errors;
+
+	uint32_t queue_size;
+	uint32_t queue_mask;
+
+	/* 3 indexs to the paired data rings. */
+	uint32_t prod_index;		/* where to put the next one */
+	uint32_t free_index;		/* mbuf has been freed */
+
+	// The queue Id is used to identify the HW Q
+	uint16_t phys_qid;
+	/* The queue Index within the dpdk device structures */
+	uint16_t queue_index;
+
+	uint32_t pad[1];
+
+	/* second cache line - fields only used in slow path */
+	MARKER cacheline1 __rte_cache_min_aligned;
+	uint32_t cons_index;		/* hw is done, can be freed */
+} __rte_cache_aligned;
+
+/* Forward declarations */
+static uint32_t eth_ark_tx_jumbo(struct ark_tx_queue *queue,
+	struct rte_mbuf *mbuf);
+static int eth_ark_tx_hw_queue_config(struct ark_tx_queue *queue);
+static void free_completed_tx(struct ark_tx_queue *queue);
+
+static inline void
+ark_tx_hw_queue_stop(struct ark_tx_queue *queue)
+{
+	ark_mpu_stop(queue->mpu);
+}
+
+/* ************************************************************************* */
+static inline void
+eth_ark_tx_meta_from_mbuf(struct ark_tx_meta *meta,
+	const struct rte_mbuf *mbuf, uint8_t flags)
+{
+	meta->physaddr = rte_mbuf_data_dma_addr(mbuf);
+	meta->delta_ns = 0;
+	meta->data_len = rte_pktmbuf_data_len(mbuf);
+	meta->flags = flags;
+}
+
+/* ************************************************************************* */
+uint16_t
+eth_ark_xmit_pkts_noop(void *vtxq __rte_unused,
+	struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts __rte_unused)
+{
+	return 0;
+}
+
+/* ************************************************************************* */
+uint16_t
+eth_ark_xmit_pkts(void *vtxq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct ark_tx_queue *queue;
+	struct rte_mbuf *mbuf;
+	struct ark_tx_meta *meta;
+
+	uint32_t idx;
+	uint32_t prod_index_limit;
+	int stat;
+	uint16_t nb;
+
+	queue = (struct ark_tx_queue *)vtxq;
+
+	/* free any packets after the HW is done with them */
+	free_completed_tx(queue);
+
+	prod_index_limit = queue->queue_size + queue->free_index;
+
+	for (nb = 0;
+		 (nb < nb_pkts) && (queue->prod_index != prod_index_limit);
+		 ++nb) {
+		mbuf = tx_pkts[nb];
+
+		if (ARK_TX_PAD_TO_60) {
+			if (unlikely(rte_pktmbuf_pkt_len(mbuf) < 60)) {
+				/* this packet even if it is small can be split,
+				 * be sure to add to the end
+				 */
+				uint16_t to_add =
+					60 - rte_pktmbuf_pkt_len(mbuf);
+				char *appended =
+					rte_pktmbuf_append(mbuf, to_add);
+
+				if (appended == 0) {
+					/* This packet is in error,
+					 * we cannot send it so just
+					 * count it and delete it.
+					 */
+					queue->tx_errors += 1;
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				memset(appended, 0, to_add);
+			}
+		}
+
+		if (unlikely(mbuf->nb_segs != 1)) {
+			stat = eth_ark_tx_jumbo(queue, mbuf);
+			if (unlikely(stat != 0))
+				break;		/* Queue is full */
+		} else {
+			idx = queue->prod_index & queue->queue_mask;
+			queue->bufs[idx] = mbuf;
+			meta = &queue->meta_q[idx];
+			eth_ark_tx_meta_from_mbuf(meta,
+				  mbuf,
+				  ARK_DDM_SOP |
+				  ARK_DDM_EOP);
+			queue->prod_index++;
+		}
+	}
+
+	if (ARK_TX_DEBUG) {
+		if (nb != nb_pkts) {
+			PMD_DRV_LOG(ERR,
+				"ARKP TX: Failure to send: req: %u sent: %u prod: "
+				"%u cons: %u free: %u\n",
+				nb_pkts, nb, queue->prod_index,
+				queue->cons_index,
+				queue->free_index);
+			ark_mpu_dump(queue->mpu,
+						 "TX Failure MPU: ",
+						 queue->phys_qid);
+		}
+	}
+
+	/* let fpga know producer index.  */
+	if (likely(nb != 0))
+		ark_mpu_set_producer(queue->mpu, queue->prod_index);
+
+	return nb;
+}
+
+/* ************************************************************************* */
+static uint32_t
+eth_ark_tx_jumbo(struct ark_tx_queue *queue, struct rte_mbuf *mbuf)
+{
+	struct rte_mbuf *next;
+	struct ark_tx_meta *meta;
+	uint32_t free_queue_space;
+	uint32_t idx;
+	uint8_t flags = ARK_DDM_SOP;
+
+	free_queue_space = queue->queue_mask -
+		(queue->prod_index - queue->free_index);
+	if (unlikely(free_queue_space < mbuf->nb_segs))
+		return -1;
+
+	if (ARK_TX_DEBUG_JUMBO) {
+	PMD_DRV_LOG(ERR,
+		"ARKP  JUMBO TX len: %u segs: %u prod:"
+		"%u cons: %u free: %u free_space: %u\n",
+		mbuf->pkt_len, mbuf->nb_segs,
+				queue->prod_index,
+				queue->cons_index,
+		queue->free_index, free_queue_space);
+	}
+
+	while (mbuf != NULL) {
+	next = mbuf->next;
+
+	idx = queue->prod_index & queue->queue_mask;
+	queue->bufs[idx] = mbuf;
+	meta = &queue->meta_q[idx];
+
+	flags |= (next == NULL) ? ARK_DDM_EOP : 0;
+	eth_ark_tx_meta_from_mbuf(meta, mbuf, flags);
+	queue->prod_index++;
+
+	flags &= ~ARK_DDM_SOP;	/* drop SOP flags */
+	mbuf = next;
+	}
+
+	return 0;
+}
+
+/* ************************************************************************* */
+int
+eth_ark_tx_queue_setup(struct rte_eth_dev *dev,
+	uint16_t queue_idx,
+	uint16_t nb_desc,
+	unsigned int socket_id,
+	const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct ark_adapter *ark = (struct ark_adapter *)dev->data->dev_private;
+	struct ark_tx_queue *queue;
+	int status;
+
+	/* TODO: divide the Q's evenly with the Vports */
+	int port = ark_get_port_id(dev, ark);
+	int qidx = port + queue_idx;	/* FIXME for multi queue */
+
+	if (!rte_is_power_of_2(nb_desc)) {
+		PMD_DRV_LOG(ERR,
+					"DPDK Arkville configuration queue size must be power of two %u (%s)\n",
+					nb_desc, __func__);
+		return -1;
+	}
+
+	/* Allocate queue struct */
+	queue =
+		rte_zmalloc_socket("Ark_tXQueue",
+			   sizeof(struct ark_tx_queue),
+			   64,
+			   socket_id);
+	if (queue == 0) {
+		PMD_DRV_LOG(ERR, "ARKP Failed to allocate tx "
+				"queue memory in %s\n",
+				__func__);
+		return -ENOMEM;
+	}
+
+	/* we use zmalloc no need to initialize fields */
+	queue->queue_size = nb_desc;
+	queue->queue_mask = nb_desc - 1;
+	queue->phys_qid = qidx;
+	queue->queue_index = queue_idx;
+	dev->data->tx_queues[queue_idx] = queue;
+
+	queue->meta_q =
+	rte_zmalloc_socket("Ark_tXQueue meta",
+	nb_desc * sizeof(struct ark_tx_meta), 64, socket_id);
+	queue->bufs =
+	rte_zmalloc_socket("Ark_tXQueue bufs",
+	nb_desc * sizeof(struct rte_mbuf *), 64, socket_id);
+
+	if (queue->meta_q == 0 || queue->bufs == 0) {
+		PMD_DRV_LOG(ERR, "Failed to allocate "
+					"queue memory in %s\n", __func__);
+		rte_free(queue->meta_q);
+		rte_free(queue->bufs);
+		rte_free(queue);
+		return -ENOMEM;
+	}
+
+	queue->ddm = RTE_PTR_ADD(ark->ddm.v, qidx * ARK_DDM_QOFFSET);
+	queue->mpu = RTE_PTR_ADD(ark->mputx.v, qidx * ARK_MPU_QOFFSET);
+
+	status = eth_ark_tx_hw_queue_config(queue);
+
+	if (unlikely(status != 0)) {
+		rte_free(queue->meta_q);
+		rte_free(queue->bufs);
+		rte_free(queue);
+		return -1;		/* ERROR CODE */
+	}
+
+	return 0;
+}
+
+/* ************************************************************************* */
+static int
+eth_ark_tx_hw_queue_config(struct ark_tx_queue *queue)
+{
+	phys_addr_t queue_base, ring_base, prod_index_addr;
+	uint32_t write_interval_ns;
+
+	/* Verify HW -- MPU */
+	if (ark_mpu_verify(queue->mpu, sizeof(struct ark_tx_meta)))
+		return -1;
+
+	queue_base = rte_malloc_virt2phy(queue);
+	ring_base = rte_malloc_virt2phy(queue->meta_q);
+	prod_index_addr =
+		queue_base + offsetof(struct ark_tx_queue, cons_index);
+
+	ark_mpu_stop(queue->mpu);
+	ark_mpu_reset(queue->mpu);
+
+	/* Stop and Reset and configure MPU */
+	ark_mpu_configure(queue->mpu, ring_base, queue->queue_size, 1);
+
+	/*
+	 * Adjust the write interval based on queue size --
+	 * increase pcie traffic
+	 * when low mbuf count
+	 */
+	switch (queue->queue_size) {
+	case 128:
+		write_interval_ns = 500;
+		break;
+	case 256:
+		write_interval_ns = 500;
+		break;
+	case 512:
+		write_interval_ns = 1000;
+		break;
+	default:
+		write_interval_ns = 2000;
+		break;
+	}
+
+	// Completion address in UDM
+	ark_ddm_setup(queue->ddm, prod_index_addr, write_interval_ns);
+
+	return 0;
+}
+
+/* ************************************************************************* */
+void
+eth_ark_tx_queue_release(void *vtx_queue)
+{
+	struct ark_tx_queue *queue;
+
+	queue = (struct ark_tx_queue *)vtx_queue;
+
+	ark_tx_hw_queue_stop(queue);
+
+	queue->cons_index = queue->prod_index;
+	free_completed_tx(queue);
+
+	rte_free(queue->meta_q);
+	rte_free(queue->bufs);
+	rte_free(queue);
+}
+
+/* ************************************************************************* */
+int
+eth_ark_tx_queue_stop(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_tx_queue *queue;
+	int cnt = 0;
+
+	queue = dev->data->tx_queues[queue_id];
+
+	/* Wait for DDM to send out all packets. */
+	while (queue->cons_index != queue->prod_index) {
+		usleep(100);
+		if (cnt++ > 10000)
+			return -1;
+	}
+
+	ark_mpu_stop(queue->mpu);
+	free_completed_tx(queue);
+
+	dev->data->tx_queue_state[queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;
+
+	return 0;
+}
+
+int
+eth_ark_tx_queue_start(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_tx_queue *queue;
+
+	queue = dev->data->tx_queues[queue_id];
+	if (dev->data->tx_queue_state[queue_id] == RTE_ETH_QUEUE_STATE_STARTED)
+		return 0;
+
+	ark_mpu_start(queue->mpu);
+	dev->data->tx_queue_state[queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
+
+	return 0;
+}
+
+/* ************************************************************************* */
+static void
+free_completed_tx(struct ark_tx_queue *queue)
+{
+	struct rte_mbuf *mbuf;
+	struct ark_tx_meta *meta;
+	uint32_t top_index;
+
+	top_index = queue->cons_index;	/* read once */
+	while (queue->free_index != top_index) {
+		meta = &queue->meta_q[queue->free_index & queue->queue_mask];
+		mbuf = queue->bufs[queue->free_index & queue->queue_mask];
+
+		if (likely((meta->flags & ARK_DDM_SOP) != 0)) {
+			/* ref count of the mbuf is checked in this call. */
+			rte_pktmbuf_free(mbuf);
+		}
+		queue->free_index++;
+	}
+}
+
+/* ************************************************************************* */
+void
+eth_tx_queue_stats_get(void *vqueue, struct rte_eth_stats *stats)
+{
+	struct ark_tx_queue *queue;
+	struct ark_ddm_t *ddm;
+	uint64_t bytes, pkts;
+
+	queue = vqueue;
+	ddm = queue->ddm;
+
+	bytes = ark_ddm_queue_byte_count(ddm);
+	pkts = ark_ddm_queue_pkt_count(ddm);
+
+	stats->q_opackets[queue->queue_index] = pkts;
+	stats->q_obytes[queue->queue_index] = bytes;
+	stats->opackets += pkts;
+	stats->obytes += bytes;
+	stats->oerrors += queue->tx_errors;
+}
+
+void
+eth_tx_queue_stats_reset(void *vqueue)
+{
+	struct ark_tx_queue *queue;
+	struct ark_ddm_t *ddm;
+
+	queue = vqueue;
+	ddm = queue->ddm;
+
+	ark_ddm_queue_reset_stats(ddm);
+	queue->tx_errors = 0;
+}
diff --git a/drivers/net/ark/ark_ext.h b/drivers/net/ark/ark_ext.h
new file mode 100644
index 0000000..0786d1f
--- /dev/null
+++ b/drivers/net/ark/ark_ext.h
@@ -0,0 +1,79 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_EXT_H_
+#define _ARK_EXT_H_
+
+/*
+ * Called post PMD init.
+ * The implementation returns its private data that gets passed into
+ * all other functions as user_data
+ * The ARK extension implementation MUST implement this function
+ */
+void *dev_init(struct rte_eth_dev *dev, void *a_bar, int port_id);
+
+/* Called during device shutdown */
+void dev_uninit(struct rte_eth_dev *dev, void *user_data);
+
+/* This call is optional and allows the
+ * extension to specify the number of supported ports.
+ */
+uint8_t dev_get_port_count(struct rte_eth_dev *dev, void *user_data);
+
+/*
+ * The following functions are optional and are directly mapped
+ * from the DPDK PMD ops structure.
+ * Each function if implemented is called after the ARK PMD
+ * implementation executes.
+ */
+int dev_configure(struct rte_eth_dev *dev, void *user_data);
+int dev_start(struct rte_eth_dev *dev, void *user_data);
+void dev_stop(struct rte_eth_dev *dev, void *user_data);
+void dev_close(struct rte_eth_dev *dev, void *user_data);
+int link_update(struct rte_eth_dev *dev, int wait_to_complete,
+	void *user_data);
+int dev_set_link_up(struct rte_eth_dev *dev, void *user_data);
+int dev_set_link_down(struct rte_eth_dev *dev, void *user_data);
+void stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats,
+	void *user_data);
+void stats_reset(struct rte_eth_dev *dev, void *user_data);
+void mac_addr_add(struct rte_eth_dev *dev,
+	struct ether_addr *macadr,
+				  uint32_t index,
+				  uint32_t pool,
+				  void *user_data);
+void mac_addr_remove(struct rte_eth_dev *dev, uint32_t index, void *user_data);
+void mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+	void *user_data);
+
+#endif
diff --git a/drivers/net/ark/ark_global.h b/drivers/net/ark/ark_global.h
new file mode 100644
index 0000000..398f647
--- /dev/null
+++ b/drivers/net/ark/ark_global.h
@@ -0,0 +1,159 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_GLOBAL_H_
+#define _ARK_GLOBAL_H_
+
+#include <time.h>
+#include <assert.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_cycles.h>
+#include <rte_kvargs.h>
+#include <rte_dev.h>
+#include <rte_version.h>
+
+#include "ark_pktdir.h"
+#include "ark_pktgen.h"
+#include "ark_pktchkr.h"
+
+#define ETH_ARK_ARG_MAXLEN	64
+#define ARK_SYSCTRL_BASE  0x0
+#define ARK_PKTGEN_BASE   0x10000
+#define ARK_MPU_RX_BASE   0x20000
+#define ARK_UDM_BASE      0x30000
+#define ARK_MPU_TX_BASE   0x40000
+#define ARK_DDM_BASE      0x60000
+#define ARK_CMAC_BASE     0x80000
+#define ARK_PKTDIR_BASE   0xa0000
+#define ARK_PKTCHKR_BASE  0x90000
+#define ARK_RCPACING_BASE 0xb0000
+#define ARK_EXTERNAL_BASE 0x100000
+#define ARK_MPU_QOFFSET   0x00100
+#define ARK_MAX_PORTS     8
+
+#define offset8(n)     n
+#define offset16(n)   ((n) / 2)
+#define offset32(n)   ((n) / 4)
+#define offset64(n)   ((n) / 8)
+
+/*
+ * Structure to store private data for each PF/VF instance.
+ */
+#define def_ptr(type, name) \
+	union type {		   \
+		uint64_t *t64;	   \
+		uint32_t *t32;	   \
+		uint16_t *t16;	   \
+		uint8_t  *t8;	   \
+		void     *v;	   \
+	} name
+
+struct ark_port {
+	struct rte_eth_dev *eth_dev;
+	int id;
+};
+
+struct ark_user_ext {
+	void *(*dev_init)(struct rte_eth_dev *, void *abar, int port_id);
+	void (*dev_uninit)(struct rte_eth_dev *, void *);
+	int (*dev_get_port_count)(struct rte_eth_dev *, void *);
+	int (*dev_configure)(struct rte_eth_dev *, void *);
+	int (*dev_start)(struct rte_eth_dev *, void *);
+	void (*dev_stop)(struct rte_eth_dev *, void *);
+	void (*dev_close)(struct rte_eth_dev *, void *);
+	int (*link_update)(struct rte_eth_dev *, int wait_to_complete, void *);
+	int (*dev_set_link_up)(struct rte_eth_dev *, void *);
+	int (*dev_set_link_down)(struct rte_eth_dev *, void *);
+	void (*stats_get)(struct rte_eth_dev *, struct rte_eth_stats *, void *);
+	void (*stats_reset)(struct rte_eth_dev *, void *);
+	void (*mac_addr_add)(struct rte_eth_dev *,
+						  struct ether_addr *,
+						 uint32_t,
+						 uint32_t,
+						 void *);
+	void (*mac_addr_remove)(struct rte_eth_dev *, uint32_t, void *);
+	void (*mac_addr_set)(struct rte_eth_dev *, struct ether_addr *, void *);
+};
+
+struct ark_adapter {
+	/* User extension private data */
+	void *user_data;
+
+	/* Pointers to packet generator and checker */
+	int start_pg;
+	ark_pkt_gen_t pg;
+	ark_pkt_chkr_t pc;
+	ark_pkt_dir_t pd;
+
+	struct ark_port port[ARK_MAX_PORTS];
+	int num_ports;
+
+	/* Common for both PF and VF */
+	struct rte_eth_dev *eth_dev;
+
+	void *d_handle;
+	struct ark_user_ext user_ext;
+
+	/* Our Bar 0 */
+	uint8_t *bar0;
+
+	/* A Bar */
+	uint8_t *a_bar;
+
+	/* Arkville demo block offsets */
+	def_ptr(sys_ctrl, sysctrl);
+	def_ptr(pkt_gen, pktgen);
+	def_ptr(mpu_rx, mpurx);
+	def_ptr(UDM, udm);
+	def_ptr(mpu_tx, mputx);
+	def_ptr(DDM, ddm);
+	def_ptr(CMAC, cmac);
+	def_ptr(external, external);
+	def_ptr(pkt_dir, pktdir);
+	def_ptr(pkt_chkr, pktchkr);
+
+	int started;
+	uint16_t rx_queues;
+	uint16_t tx_queues;
+
+	struct ark_rqpace_t *rqpacing;
+};
+
+typedef uint32_t *ark_t;
+
+#endif
diff --git a/drivers/net/ark/ark_mpu.c b/drivers/net/ark/ark_mpu.c
new file mode 100644
index 0000000..7f60cbc
--- /dev/null
+++ b/drivers/net/ark/ark_mpu.c
@@ -0,0 +1,167 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_debug.h"
+#include "ark_mpu.h"
+
+uint16_t
+ark_api_num_queues(struct ark_mpu_t *mpu)
+{
+	return mpu->hw.num_queues;
+}
+
+uint16_t
+ark_api_num_queues_per_port(struct ark_mpu_t *mpu, uint16_t ark_ports)
+{
+	return mpu->hw.num_queues / ark_ports;
+}
+
+int
+ark_mpu_verify(struct ark_mpu_t *mpu, uint32_t obj_size)
+{
+	uint32_t version;
+
+	version = mpu->id.vernum & 0x0000fF00;
+	if ((mpu->id.idnum != 0x2055504d) || (mpu->hw.obj_size != obj_size) ||
+		version != 0x00003100) {
+		fprintf(stderr,
+		"   MPU module not found as expected %08x \"%c%c%c%c"
+		"%c%c%c%c\"\n", mpu->id.idnum, mpu->id.id[0], mpu->id.id[1],
+		mpu->id.id[2], mpu->id.id[3], mpu->id.ver[0], mpu->id.ver[1],
+		mpu->id.ver[2], mpu->id.ver[3]);
+		fprintf(stderr,
+				"   MPU HW num_queues: %u hw_depth %u, obj_size: %u, "
+				"obj_per_mrr: %u Expected size %u\n",
+		mpu->hw.num_queues, mpu->hw.hw_depth, mpu->hw.obj_size,
+		mpu->hw.obj_per_mrr, obj_size);
+	return -1;
+	}
+	return 0;
+}
+
+void
+ark_mpu_stop(struct ark_mpu_t *mpu)
+{
+	mpu->cfg.command = MPU_CMD_STOP;
+}
+
+void
+ark_mpu_start(struct ark_mpu_t *mpu)
+{
+	mpu->cfg.command = MPU_CMD_RUN;	/* run state */
+}
+
+int
+ark_mpu_reset(struct ark_mpu_t *mpu)
+{
+	int cnt = 0;
+
+	mpu->cfg.command = MPU_CMD_RESET;	/* reset */
+
+	while (mpu->cfg.command != MPU_CMD_IDLE) {
+	if (cnt++ > 1000)
+		break;
+	usleep(10);
+	}
+	if (mpu->cfg.command != MPU_CMD_IDLE) {
+	mpu->cfg.command = MPU_CMD_FORCE_RESET;	/* forced reset */
+	usleep(10);
+	}
+	ark_mpu_reset_stats(mpu);
+	return mpu->cfg.command != MPU_CMD_IDLE;
+}
+
+void
+ark_mpu_reset_stats(struct ark_mpu_t *mpu)
+{
+	mpu->stats.pci_request = 1;	/* reset stats */
+}
+
+int
+ark_mpu_configure(struct ark_mpu_t *mpu, phys_addr_t ring, uint32_t ring_size,
+	int is_tx)
+{
+	ark_mpu_reset(mpu);
+
+	if (!rte_is_power_of_2(ring_size)) {
+	fprintf(stderr, "ARKP Invalid ring size for MPU %d\n", ring_size);
+	return -1;
+	}
+
+	mpu->cfg.ring_base = ring;
+	mpu->cfg.ring_size = ring_size;
+	mpu->cfg.ring_mask = ring_size - 1;
+	mpu->cfg.min_host_move = is_tx ? 1 : mpu->hw.obj_per_mrr;
+	mpu->cfg.min_hw_move = mpu->hw.obj_per_mrr;
+	mpu->cfg.sw_prod_index = 0;
+	mpu->cfg.hw_cons_index = 0;
+	return 0;
+}
+
+void
+ark_mpu_dump(struct ark_mpu_t *mpu, const char *code, uint16_t qid)
+{
+	/* DUMP to see that we have started */
+	ARK_DEBUG_TRACE
+		("ARKP MPU: %s Q: %3u sw_prod %u, hw_cons: %u\n", code, qid,
+		 mpu->cfg.sw_prod_index, mpu->cfg.hw_cons_index);
+	ARK_DEBUG_TRACE
+		("ARKP MPU: %s state: %d count %d, reserved %d data 0x%08x_%08x 0x%08x_%08x\n",
+		 code, mpu->debug.state, mpu->debug.count, mpu->debug.reserved,
+		 mpu->debug.peek[1], mpu->debug.peek[0], mpu->debug.peek[3],
+		 mpu->debug.peek[2]
+		 );
+	ARK_DEBUG_STATS
+		("ARKP MPU: %s Q: %3u" ARK_SU64 ARK_SU64 ARK_SU64 ARK_SU64
+		 ARK_SU64 ARK_SU64 ARK_SU64 "\n", code, qid,
+		 "PCI Request:", mpu->stats.pci_request,
+		 "Queue_empty", mpu->stats.q_empty,
+		 "Queue_q1", mpu->stats.q_q1,
+		 "Queue_q2", mpu->stats.q_q2,
+		 "Queue_q3", mpu->stats.q_q3,
+		 "Queue_q4", mpu->stats.q_q4,
+		 "Queue_full", mpu->stats.q_full
+		 );
+}
+
+void
+ark_mpu_dump_setup(struct ark_mpu_t *mpu, uint16_t q_id)
+{
+	ARK_DEBUG_TRACE
+		("MPU Setup Q: %u"
+		 ARK_SU64X "\n", q_id,
+		 "ring_base", mpu->cfg.ring_base
+		 );
+}
diff --git a/drivers/net/ark/ark_mpu.h b/drivers/net/ark/ark_mpu.h
new file mode 100644
index 0000000..376c042
--- /dev/null
+++ b/drivers/net/ark/ark_mpu.h
@@ -0,0 +1,143 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_MPU_H_
+#define _ARK_MPU_H_
+
+#include <stdint.h>
+
+#include <rte_memory.h>
+
+/*
+ * MPU hardware structures
+ */
+
+#define ARK_MPU_ID 0x00
+struct ark_mpu_id_t {
+	union {
+	char id[4];
+	uint32_t idnum;
+	};
+	union {
+	char ver[4];
+	uint32_t vernum;
+	};
+	uint32_t phys_id;
+	uint32_t mrr_code;
+};
+
+#define ARK_MPU_HW 0x010
+struct ark_mpu_hw_t {
+	uint16_t num_queues;
+	uint16_t reserved;
+	uint32_t hw_depth;
+	uint32_t obj_size;
+	uint32_t obj_per_mrr;
+};
+
+#define ARK_MPU_CFG 0x040
+struct ark_mpu_cfg_t {
+	phys_addr_t ring_base;	/* phys_addr_t is a uint64_t */
+	uint32_t ring_size;
+	uint32_t ring_mask;
+	uint32_t min_host_move;
+	uint32_t min_hw_move;
+	volatile uint32_t sw_prod_index;
+	volatile uint32_t hw_cons_index;
+	volatile uint32_t command;
+};
+enum ARK_MPU_COMMAND {
+	MPU_CMD_IDLE = 1, MPU_CMD_RUN = 2, MPU_CMD_STOP = 4, MPU_CMD_RESET =
+	8, MPU_CMD_FORCE_RESET = 16, MPU_COMMAND_LIMIT = 0xfFFFFFFF
+};
+
+#define ARK_MPU_STATS 0x080
+struct ark_mpu_stats_t {
+	volatile uint64_t pci_request;
+	volatile uint64_t q_empty;
+	volatile uint64_t q_q1;
+	volatile uint64_t q_q2;
+	volatile uint64_t q_q3;
+	volatile uint64_t q_q4;
+	volatile uint64_t q_full;
+};
+
+#define ARK_MPU_DEBUG 0x0C0
+struct ark_mpu_debug_t {
+	volatile uint32_t state;
+	uint32_t reserved;
+	volatile uint32_t count;
+	volatile uint32_t take;
+	volatile uint32_t peek[4];
+};
+
+/*  Consolidated structure */
+struct ark_mpu_t {
+	struct ark_mpu_id_t id;
+	uint8_t reserved0[(ARK_MPU_HW - ARK_MPU_ID)
+					  - sizeof(struct ark_mpu_id_t)];
+	struct ark_mpu_hw_t hw;
+	uint8_t reserved1[(ARK_MPU_CFG - ARK_MPU_HW) -
+					  sizeof(struct ark_mpu_hw_t)];
+	struct ark_mpu_cfg_t cfg;
+	uint8_t reserved2[(ARK_MPU_STATS - ARK_MPU_CFG) -
+					  sizeof(struct ark_mpu_cfg_t)];
+	struct ark_mpu_stats_t stats;
+	uint8_t reserved3[(ARK_MPU_DEBUG - ARK_MPU_STATS) -
+					  sizeof(struct ark_mpu_stats_t)];
+	struct ark_mpu_debug_t debug;
+};
+
+uint16_t ark_api_num_queues(struct ark_mpu_t *mpu);
+uint16_t ark_api_num_queues_per_port(struct ark_mpu_t *mpu,
+	uint16_t ark_ports);
+int ark_mpu_verify(struct ark_mpu_t *mpu, uint32_t obj_size);
+void ark_mpu_stop(struct ark_mpu_t *mpu);
+void ark_mpu_start(struct ark_mpu_t *mpu);
+int ark_mpu_reset(struct ark_mpu_t *mpu);
+int ark_mpu_configure(struct ark_mpu_t *mpu, phys_addr_t ring,
+	uint32_t ring_size, int is_tx);
+
+void ark_mpu_dump(struct ark_mpu_t *mpu, const char *msg, uint16_t idx);
+void ark_mpu_dump_setup(struct ark_mpu_t *mpu, uint16_t qid);
+void ark_mpu_reset_stats(struct ark_mpu_t *mpu);
+
+static inline void
+ark_mpu_set_producer(struct ark_mpu_t *mpu, uint32_t idx)
+{
+	mpu->cfg.sw_prod_index = idx;
+}
+
+// #define    ark_mpu_set_producer(MPU, IDX) {(MPU)->cfg.sw_prod_index = (IDX);}
+
+#endif
diff --git a/drivers/net/ark/ark_pktchkr.c b/drivers/net/ark/ark_pktchkr.c
new file mode 100644
index 0000000..1f390de
--- /dev/null
+++ b/drivers/net/ark/ark_pktchkr.c
@@ -0,0 +1,460 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <getopt.h>
+#include <sys/time.h>
+#include <locale.h>
+#include <unistd.h>
+
+#include "ark_pktchkr.h"
+#include "ark_debug.h"
+
+static int set_arg(char *arg, char *val);
+static int ark_pmd_pktchkr_is_gen_forever(ark_pkt_chkr_t handle);
+
+#define ARK_MAX_STR_LEN 64
+union OPTV {
+	int INT;
+	int BOOL;
+	uint64_t LONG;
+	char STR[ARK_MAX_STR_LEN];
+};
+
+enum OPTYPE {
+	OTINT,
+	OTLONG,
+	OTBOOL,
+	OTSTRING
+};
+
+struct OPTIONS {
+	char opt[ARK_MAX_STR_LEN];
+	enum OPTYPE t;
+	union OPTV v;
+};
+
+static struct OPTIONS toptions[] = {
+	{{"configure"}, OTBOOL, {1} },
+	{{"port"}, OTINT, {0} },
+	{{"mac-dump"}, OTBOOL, {0} },
+	{{"dg-mode"}, OTBOOL, {1} },
+	{{"run"}, OTBOOL, {0} },
+	{{"stop"}, OTBOOL, {0} },
+	{{"dump"}, OTBOOL, {0} },
+	{{"en_resync"}, OTBOOL, {0} },
+	{{"tuser_err_val"}, OTINT, {1} },
+	{{"gen_forever"}, OTBOOL, {0} },
+	{{"en_slaved_start"}, OTBOOL, {0} },
+	{{"vary_length"}, OTBOOL, {0} },
+	{{"incr_payload"}, OTINT, {0} },
+	{{"incr_first_byte"}, OTBOOL, {0} },
+	{{"ins_seq_num"}, OTBOOL, {0} },
+	{{"ins_time_stamp"}, OTBOOL, {1} },
+	{{"ins_udp_hdr"}, OTBOOL, {0} },
+	{{"num_pkts"}, OTLONG, .v.LONG = 10000000000000L},
+	{{"payload_byte"}, OTINT, {0x55} },
+	{{"pkt_spacing"}, OTINT, {60} },
+	{{"pkt_size_min"}, OTINT, {2005} },
+	{{"pkt_size_max"}, OTINT, {1514} },
+	{{"pkt_size_incr"}, OTINT, {1} },
+	{{"eth_type"}, OTINT, {0x0800} },
+	{{"src_mac_addr"}, OTLONG, .v.LONG = 0xdC3cF6425060L},
+	{{"dst_mac_addr"}, OTLONG, .v.LONG = 0x112233445566L},
+	{{"hdr_dW0"}, OTINT, {0x0016e319} },
+	{{"hdr_dW1"}, OTINT, {0x27150004} },
+	{{"hdr_dW2"}, OTINT, {0x76967bda} },
+	{{"hdr_dW3"}, OTINT, {0x08004500} },
+	{{"hdr_dW4"}, OTINT, {0x005276ed} },
+	{{"hdr_dW5"}, OTINT, {0x40004006} },
+	{{"hdr_dW6"}, OTINT, {0x56cfc0a8} },
+	{{"start_offset"}, OTINT, {0} },
+	{{"dst_ip"}, OTSTRING, .v.STR = "169.254.10.240"},
+	{{"dst_port"}, OTINT, {65536} },
+	{{"src_port"}, OTINT, {65536} },
+};
+
+ark_pkt_chkr_t
+ark_pmd_pktchkr_init(void *addr, int ord, int l2_mode)
+{
+	struct ark_pkt_chkr_inst *inst =
+		rte_malloc("ark_pkt_chkr_inst",
+		sizeof(struct ark_pkt_chkr_inst), 0);
+	inst->sregs = (struct ark_pkt_chkr_stat_regs *)addr;
+	inst->cregs =
+		(struct ark_pkt_chkr_ctl_regs *)(((uint8_t *)addr) + 0x100);
+	inst->ordinal = ord;
+	inst->l2_mode = l2_mode;
+	return inst;
+}
+
+void
+ark_pmd_pktchkr_uninit(ark_pkt_chkr_t handle)
+{
+	rte_free(handle);
+}
+
+void
+ark_pmd_pktchkr_run(ark_pkt_chkr_t handle)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->sregs->pkt_start_stop = 0;
+	inst->sregs->pkt_start_stop = 0x1;
+}
+
+int
+ark_pmd_pktchkr_stopped(ark_pkt_chkr_t handle)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+	uint32_t r = inst->sregs->pkt_start_stop;
+
+	return (((r >> 16) & 1) == 1);
+}
+
+void
+ark_pmd_pktchkr_stop(ark_pkt_chkr_t handle)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+	int wait_cycle = 10;
+
+	inst->sregs->pkt_start_stop = 0;
+	while (!ark_pmd_pktchkr_stopped(handle) && (wait_cycle > 0)) {
+	usleep(1000);
+	wait_cycle--;
+	ARK_DEBUG_TRACE("Waiting for pktchk %d to stop...\n", inst->ordinal);
+	}
+	ARK_DEBUG_TRACE("pktchk %d stopped.\n", inst->ordinal);
+}
+
+int
+ark_pmd_pktchkr_is_running(ark_pkt_chkr_t handle)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+	uint32_t r = inst->sregs->pkt_start_stop;
+
+	return ((r & 1) == 1);
+}
+
+static void
+ark_pmd_pktchkr_set_pkt_ctrl(ark_pkt_chkr_t handle, uint32_t gen_forever,
+	uint32_t vary_length, uint32_t incr_payload, uint32_t incr_first_byte,
+	uint32_t ins_seq_num, uint32_t ins_udp_hdr, uint32_t en_resync,
+	uint32_t tuser_err_val, uint32_t ins_time_stamp)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+	uint32_t r = (tuser_err_val << 16) | (en_resync << 0);
+
+	inst->sregs->pkt_ctrl = r;
+	if (!inst->l2_mode)
+		ins_udp_hdr = 0;
+	r = (gen_forever << 24) | (vary_length << 16) |
+	(incr_payload << 12) | (incr_first_byte << 8) |
+	(ins_time_stamp << 5) | (ins_seq_num << 4) | ins_udp_hdr;
+	inst->cregs->pkt_ctrl = r;
+}
+
+static
+	int
+ark_pmd_pktchkr_is_gen_forever(ark_pkt_chkr_t handle)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+	uint32_t r = inst->cregs->pkt_ctrl;
+
+	return (((r >> 24) & 1) == 1);
+}
+
+int
+ark_pmd_pktchkr_wait_done(ark_pkt_chkr_t handle)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	if (ark_pmd_pktchkr_is_gen_forever(handle)) {
+	ARK_DEBUG_TRACE
+		("Error: wait_done will not terminate because gen_forever=1\n");
+	return -1;
+	}
+	int wait_cycle = 10;
+
+	while (!ark_pmd_pktchkr_stopped(handle) && (wait_cycle > 0)) {
+	usleep(1000);
+	wait_cycle--;
+	ARK_DEBUG_TRACE
+		("Waiting for packet checker %d's internal pktgen to finish sending...\n",
+		inst->ordinal);
+	ARK_DEBUG_TRACE("pktchk %d's pktgen done.\n", inst->ordinal);
+	}
+	return 0;
+}
+
+int
+ark_pmd_pktchkr_get_pkts_sent(ark_pkt_chkr_t handle)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	return inst->cregs->pkts_sent;
+}
+
+void
+ark_pmd_pktchkr_set_payload_byte(ark_pkt_chkr_t handle, uint32_t b)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->cregs->pkt_payload = b;
+}
+
+void
+ark_pmd_pktchkr_set_pkt_size_min(ark_pkt_chkr_t handle, uint32_t x)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->cregs->pkt_size_min = x;
+}
+
+void
+ark_pmd_pktchkr_set_pkt_size_max(ark_pkt_chkr_t handle, uint32_t x)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->cregs->pkt_size_max = x;
+}
+
+void
+ark_pmd_pktchkr_set_pkt_size_incr(ark_pkt_chkr_t handle, uint32_t x)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->cregs->pkt_size_incr = x;
+}
+
+void
+ark_pmd_pktchkr_set_num_pkts(ark_pkt_chkr_t handle, uint32_t x)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->cregs->num_pkts = x;
+}
+
+void
+ark_pmd_pktchkr_set_src_mac_addr(ark_pkt_chkr_t handle, uint64_t mac_addr)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->cregs->src_mac_addr_h = (mac_addr >> 32) & 0xffff;
+	inst->cregs->src_mac_addr_l = mac_addr & 0xffffffff;
+}
+
+void
+ark_pmd_pktchkr_set_dst_mac_addr(ark_pkt_chkr_t handle, uint64_t mac_addr)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->cregs->dst_mac_addr_h = (mac_addr >> 32) & 0xffff;
+	inst->cregs->dst_mac_addr_l = mac_addr & 0xffffffff;
+}
+
+void
+ark_pmd_pktchkr_set_eth_type(ark_pkt_chkr_t handle, uint32_t x)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	inst->cregs->eth_type = x;
+}
+
+void
+ark_pmd_pktchkr_set_hdr_dW(ark_pkt_chkr_t handle, uint32_t *hdr)
+{
+	uint32_t i;
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	for (i = 0; i < 7; i++)
+		inst->cregs->hdr_dw[i] = hdr[i];
+}
+
+void
+ark_pmd_pktchkr_dump_stats(ark_pkt_chkr_t handle)
+{
+	struct ark_pkt_chkr_inst *inst = (struct ark_pkt_chkr_inst *)handle;
+
+	fprintf(stderr, "pkts_rcvd      = (%'u)\n", inst->sregs->pkts_rcvd);
+	fprintf(stderr, "bytes_rcvd     = (%'" PRIu64 ")\n",
+	inst->sregs->bytes_rcvd);
+	fprintf(stderr, "pkts_ok        = (%'u)\n",
+			inst->sregs->pkts_ok);
+	fprintf(stderr, "pkts_mismatch  = (%'u)\n",
+			inst->sregs->pkts_mismatch);
+	fprintf(stderr, "pkts_err       = (%'u)\n",
+			inst->sregs->pkts_err);
+	fprintf(stderr, "first_mismatch = (%'u)\n",
+			inst->sregs->first_mismatch);
+	fprintf(stderr, "resync_events  = (%'u)\n",
+			inst->sregs->resync_events);
+	fprintf(stderr, "pkts_missing   = (%'u)\n",
+			inst->sregs->pkts_missing);
+	fprintf(stderr, "min_latency    = (%'u)\n",
+			inst->sregs->min_latency);
+	fprintf(stderr, "max_latency    = (%'u)\n",
+			inst->sregs->max_latency);
+}
+
+static struct OPTIONS *
+options(const char *id)
+{
+	unsigned int i;
+
+	for (i = 0; i < sizeof(toptions) / sizeof(struct OPTIONS); i++) {
+	if (strcmp(id, toptions[i].opt) == 0)
+		return &toptions[i];
+	}
+	PMD_DRV_LOG(ERR,
+	"pktchkr: Could not find requested option !!, option = %s\n", id);
+	return NULL;
+}
+
+static int
+set_arg(char *arg, char *val)
+{
+	struct OPTIONS *o = options(arg);
+
+	if (o) {
+	switch (o->t) {
+	case OTINT:
+	case OTBOOL:
+		o->v.INT = atoi(val);
+		break;
+	case OTLONG:
+		o->v.INT = atoll(val);
+		break;
+	case OTSTRING:
+		strncpy(o->v.STR, val, ARK_MAX_STR_LEN);
+		break;
+	}
+	return 1;
+	}
+	return 0;
+}
+
+/******
+ * Arg format = "opt0=v,opt_n=v ..."
+ ******/
+void
+ark_pmd_pktchkr_parse(char *args)
+{
+	char *argv, *v;
+	const char toks[] = "=\n\t\v\f \r";
+	argv = strtok(args, toks);
+	v = strtok(NULL, toks);
+	set_arg(argv, v);
+	while (argv && v) {
+		argv = strtok(NULL, toks);
+		v = strtok(NULL, toks);
+		if (argv && v)
+			set_arg(argv, v);
+	}
+}
+
+static int32_t parse_ipv4_string(char const *ip_address);
+static int32_t
+parse_ipv4_string(char const *ip_address)
+{
+	unsigned int ip[4];
+
+	if (sscanf(ip_address,
+					"%u.%u.%u.%u",
+					&ip[0], &ip[1], &ip[2], &ip[3]) != 4)
+		return 0;
+	return ip[3] + ip[2] * 0x100 + ip[1] * 0x10000ul + ip[0] * 0x1000000ul;
+}
+
+void
+ark_pmd_pktchkr_setup(ark_pkt_chkr_t handle)
+{
+	uint32_t hdr[7];
+	int32_t dst_ip = parse_ipv4_string(options("dst_ip")->v.STR);
+
+	if (!options("stop")->v.BOOL && options("configure")->v.BOOL) {
+	ark_pmd_pktchkr_set_payload_byte(handle,
+		 options("payload_byte")->v.INT);
+	ark_pmd_pktchkr_set_src_mac_addr(handle,
+		options("src_mac_addr")->v.INT);
+	ark_pmd_pktchkr_set_dst_mac_addr(handle,
+		options("dst_mac_addr")->v.LONG);
+
+	ark_pmd_pktchkr_set_eth_type(handle,
+		options("eth_type")->v.INT);
+	if (options("dg-mode")->v.BOOL) {
+		hdr[0] = options("hdr_dW0")->v.INT;
+		hdr[1] = options("hdr_dW1")->v.INT;
+		hdr[2] = options("hdr_dW2")->v.INT;
+		hdr[3] = options("hdr_dW3")->v.INT;
+		hdr[4] = options("hdr_dW4")->v.INT;
+		hdr[5] = options("hdr_dW5")->v.INT;
+		hdr[6] = options("hdr_dW6")->v.INT;
+	} else {
+		hdr[0] = dst_ip;
+		hdr[1] = options("dst_port")->v.INT;
+		hdr[2] = options("src_port")->v.INT;
+		hdr[3] = 0;
+		hdr[4] = 0;
+		hdr[5] = 0;
+		hdr[6] = 0;
+	}
+	ark_pmd_pktchkr_set_hdr_dW(handle, hdr);
+	ark_pmd_pktchkr_set_num_pkts(handle,
+		 options("num_pkts")->v.INT);
+	ark_pmd_pktchkr_set_pkt_size_min(handle,
+		 options("pkt_size_min")->v.INT);
+	ark_pmd_pktchkr_set_pkt_size_max(handle,
+		 options("pkt_size_max")->v.INT);
+	ark_pmd_pktchkr_set_pkt_size_incr(handle,
+		 options("pkt_size_incr")->v.INT);
+	ark_pmd_pktchkr_set_pkt_ctrl(handle,
+		options("gen_forever")->v.BOOL,
+		options("vary_length")->v.BOOL,
+		options("incr_payload")->v.BOOL,
+		options("incr_first_byte")->v.BOOL,
+		options("ins_seq_num")->v.INT,
+		options("ins_udp_hdr")->v.BOOL,
+		options("en_resync")->v.BOOL,
+		options("tuser_err_val")->v.INT,
+		options("ins_time_stamp")->v.INT);
+	}
+
+	if (options("stop")->v.BOOL)
+	ark_pmd_pktchkr_stop(handle);
+
+	if (options("run")->v.BOOL) {
+	ARK_DEBUG_TRACE("Starting packet checker on port %d\n",
+		options("port")->v.INT);
+	ark_pmd_pktchkr_run(handle);
+	}
+}
diff --git a/drivers/net/ark/ark_pktchkr.h b/drivers/net/ark/ark_pktchkr.h
new file mode 100644
index 0000000..5d62bc6
--- /dev/null
+++ b/drivers/net/ark/ark_pktchkr.h
@@ -0,0 +1,114 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_PKTCHKR_H_
+#define _ARK_PKTCHKR_H_
+
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_eal.h>
+
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+
+#define ARK_PKTCHKR_BASE_ADR  0x90000
+
+typedef void *ark_pkt_chkr_t;
+
+struct ark_pkt_chkr_stat_regs {
+	uint32_t r0;
+	uint32_t pkt_start_stop;
+	uint32_t pkt_ctrl;
+	uint32_t pkts_rcvd;
+	uint64_t bytes_rcvd;
+	uint32_t pkts_ok;
+	uint32_t pkts_mismatch;
+	uint32_t pkts_err;
+	uint32_t first_mismatch;
+	uint32_t resync_events;
+	uint32_t pkts_missing;
+	uint32_t min_latency;
+	uint32_t max_latency;
+} __attribute__ ((packed));
+
+struct ark_pkt_chkr_ctl_regs {
+	uint32_t pkt_ctrl;
+	uint32_t pkt_payload;
+	uint32_t pkt_size_min;
+	uint32_t pkt_size_max;
+	uint32_t pkt_size_incr;
+	uint32_t num_pkts;
+	uint32_t pkts_sent;
+	uint32_t src_mac_addr_l;
+	uint32_t src_mac_addr_h;
+	uint32_t dst_mac_addr_l;
+	uint32_t dst_mac_addr_h;
+	uint32_t eth_type;
+	uint32_t hdr_dw[7];
+} __attribute__ ((packed));
+
+struct ark_pkt_chkr_inst {
+	struct rte_eth_dev_info *dev_info;
+	volatile struct ark_pkt_chkr_stat_regs *sregs;
+	volatile struct ark_pkt_chkr_ctl_regs *cregs;
+	int l2_mode;
+	int ordinal;
+};
+
+/*  packet checker functions */
+ark_pkt_chkr_t ark_pmd_pktchkr_init(void *addr, int ord, int l2_mode);
+void ark_pmd_pktchkr_uninit(ark_pkt_chkr_t handle);
+void ark_pmd_pktchkr_run(ark_pkt_chkr_t handle);
+int ark_pmd_pktchkr_stopped(ark_pkt_chkr_t handle);
+void ark_pmd_pktchkr_stop(ark_pkt_chkr_t handle);
+int ark_pmd_pktchkr_is_running(ark_pkt_chkr_t handle);
+int ark_pmd_pktchkr_get_pkts_sent(ark_pkt_chkr_t handle);
+void ark_pmd_pktchkr_set_payload_byte(ark_pkt_chkr_t handle, uint32_t b);
+void ark_pmd_pktchkr_set_pkt_size_min(ark_pkt_chkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_set_pkt_size_max(ark_pkt_chkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_set_pkt_size_incr(ark_pkt_chkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_set_num_pkts(ark_pkt_chkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_set_src_mac_addr(ark_pkt_chkr_t handle, uint64_t mac_addr);
+void ark_pmd_pktchkr_set_dst_mac_addr(ark_pkt_chkr_t handle, uint64_t mac_addr);
+void ark_pmd_pktchkr_set_eth_type(ark_pkt_chkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_set_hdr_dW(ark_pkt_chkr_t handle, uint32_t *hdr);
+void ark_pmd_pktchkr_parse(char *args);
+void ark_pmd_pktchkr_setup(ark_pkt_chkr_t handle);
+void ark_pmd_pktchkr_dump_stats(ark_pkt_chkr_t handle);
+int ark_pmd_pktchkr_wait_done(ark_pkt_chkr_t handle);
+
+#endif
diff --git a/drivers/net/ark/ark_pktdir.c b/drivers/net/ark/ark_pktdir.c
new file mode 100644
index 0000000..3a6ccdf
--- /dev/null
+++ b/drivers/net/ark/ark_pktdir.c
@@ -0,0 +1,79 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <inttypes.h>
+
+#include "ark_global.h"
+
+ark_pkt_dir_t
+ark_pmd_pktdir_init(void *base)
+{
+	struct ark_pkt_dir_inst *inst =
+	rte_malloc("ark_pkt_dir_inst", sizeof(struct ark_pkt_dir_inst), 0);
+	inst->regs = (struct ark_pkt_dir_regs *)base;
+	inst->regs->ctrl = 0x00110110;	/* POR state */
+	return inst;
+}
+
+void
+ark_pmd_pktdir_uninit(ark_pkt_dir_t handle)
+{
+	struct ark_pkt_dir_inst *inst = (struct ark_pkt_dir_inst *)handle;
+
+	rte_free(inst);
+}
+
+void
+ark_pmd_pktdir_setup(ark_pkt_dir_t handle, uint32_t v)
+{
+	struct ark_pkt_dir_inst *inst = (struct ark_pkt_dir_inst *)handle;
+
+	inst->regs->ctrl = v;
+}
+
+uint32_t
+ark_pmd_pktdir_status(ark_pkt_dir_t handle)
+{
+	struct ark_pkt_dir_inst *inst = (struct ark_pkt_dir_inst *)handle;
+
+	return inst->regs->ctrl;
+}
+
+uint32_t
+ark_pmd_pktdir_stall_cnt(ark_pkt_dir_t handle)
+{
+	struct ark_pkt_dir_inst *inst = (struct ark_pkt_dir_inst *)handle;
+
+	return inst->regs->stall_cnt;
+}
diff --git a/drivers/net/ark/ark_pktdir.h b/drivers/net/ark/ark_pktdir.h
new file mode 100644
index 0000000..3f01e5c
--- /dev/null
+++ b/drivers/net/ark/ark_pktdir.h
@@ -0,0 +1,68 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_PKTDIR_H_
+#define _ARK_PKTDIR_H_
+
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_eal.h>
+
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+
+#define ARK_PKTDIR_BASE_ADR  0xa0000
+
+typedef void *ark_pkt_dir_t;
+
+struct ark_pkt_dir_regs {
+	uint32_t ctrl;
+	uint32_t status;
+	uint32_t stall_cnt;
+} __attribute__ ((packed));
+
+struct ark_pkt_dir_inst {
+	volatile struct ark_pkt_dir_regs *regs;
+};
+
+ark_pkt_dir_t ark_pmd_pktdir_init(void *base);
+void ark_pmd_pktdir_uninit(ark_pkt_dir_t handle);
+void ark_pmd_pktdir_setup(ark_pkt_dir_t handle, uint32_t v);
+uint32_t ark_pmd_pktdir_stall_cnt(ark_pkt_dir_t handle);
+uint32_t ark_pmd_pktdir_status(ark_pkt_dir_t handle);
+
+#endif
diff --git a/drivers/net/ark/ark_pktgen.c b/drivers/net/ark/ark_pktgen.c
new file mode 100644
index 0000000..7679064
--- /dev/null
+++ b/drivers/net/ark/ark_pktgen.c
@@ -0,0 +1,482 @@
+/*
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <getopt.h>
+#include <sys/time.h>
+#include <locale.h>
+#include <unistd.h>
+
+#include "ark_pktgen.h"
+#include "ark_debug.h"
+
+#define ARK_MAX_STR_LEN 64
+union OPTV {
+	int INT;
+	int BOOL;
+	uint64_t LONG;
+	char STR[ARK_MAX_STR_LEN];
+};
+
+enum OPTYPE {
+	OTINT,
+	OTLONG,
+	OTBOOL,
+	OTSTRING
+};
+
+struct OPTIONS {
+	char opt[ARK_MAX_STR_LEN];
+	enum OPTYPE t;
+	union OPTV v;
+};
+
+static struct OPTIONS toptions[] = {
+	{{"configure"}, OTBOOL, {1} },
+	{{"dg-mode"}, OTBOOL, {1} },
+	{{"run"}, OTBOOL, {0} },
+	{{"pause"}, OTBOOL, {0} },
+	{{"reset"}, OTBOOL, {0} },
+	{{"dump"}, OTBOOL, {0} },
+	{{"gen_forever"}, OTBOOL, {0} },
+	{{"en_slaved_start"}, OTBOOL, {0} },
+	{{"vary_length"}, OTBOOL, {0} },
+	{{"incr_payload"}, OTBOOL, {0} },
+	{{"incr_first_byte"}, OTBOOL, {0} },
+	{{"ins_seq_num"}, OTBOOL, {0} },
+	{{"ins_time_stamp"}, OTBOOL, {1} },
+	{{"ins_udp_hdr"}, OTBOOL, {0} },
+	{{"num_pkts"}, OTLONG, .v.LONG = 100000000},
+	{{"payload_byte"}, OTINT, {0x55} },
+	{{"pkt_spacing"}, OTINT, {130} },
+	{{"pkt_size_min"}, OTINT, {2006} },
+	{{"pkt_size_max"}, OTINT, {1514} },
+	{{"pkt_size_incr"}, OTINT, {1} },
+	{{"eth_type"}, OTINT, {0x0800} },
+	{{"src_mac_addr"}, OTLONG, .v.LONG = 0xdC3cF6425060L},
+	{{"dst_mac_addr"}, OTLONG, .v.LONG = 0x112233445566L},
+	{{"hdr_dW0"}, OTINT, {0x0016e319} },
+	{{"hdr_dW1"}, OTINT, {0x27150004} },
+	{{"hdr_dW2"}, OTINT, {0x76967bda} },
+	{{"hdr_dW3"}, OTINT, {0x08004500} },
+	{{"hdr_dW4"}, OTINT, {0x005276ed} },
+	{{"hdr_dW5"}, OTINT, {0x40004006} },
+	{{"hdr_dW6"}, OTINT, {0x56cfc0a8} },
+	{{"start_offset"}, OTINT, {0} },
+	{{"bytes_per_cycle"}, OTINT, {10} },
+	{{"shaping"}, OTBOOL, {0} },
+	{{"dst_ip"}, OTSTRING, .v.STR = "169.254.10.240"},
+	{{"dst_port"}, OTINT, {65536} },
+	{{"src_port"}, OTINT, {65536} },
+};
+
+ark_pkt_gen_t
+ark_pmd_pktgen_init(void *adr, int ord, int l2_mode)
+{
+	struct ark_pkt_gen_inst *inst =
+		rte_malloc("ark_pkt_gen_inst_pMD",
+	   sizeof(struct ark_pkt_gen_inst), 0);
+	inst->regs = (struct ark_pkt_gen_regs *)adr;
+	inst->ordinal = ord;
+	inst->l2_mode = l2_mode;
+	return inst;
+}
+
+void
+ark_pmd_pktgen_uninit(ark_pkt_gen_t handle)
+{
+	rte_free(handle);
+}
+
+void
+ark_pmd_pktgen_run(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+
+	inst->regs->pkt_start_stop = 1;
+}
+
+uint32_t
+ark_pmd_pktgen_paused(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	uint32_t r = inst->regs->pkt_start_stop;
+
+	return (((r >> 16) & 1) == 1);
+}
+
+void
+ark_pmd_pktgen_pause(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	int cnt = 0;
+
+	inst->regs->pkt_start_stop = 0;
+
+	while (!ark_pmd_pktgen_paused(handle)) {
+		usleep(1000);
+		if (cnt++ > 100) {
+			PMD_DRV_LOG(ERR, "pktgen %d failed to pause.\n",
+						inst->ordinal);
+			break;
+		}
+	}
+	ARK_DEBUG_TRACE("pktgen %d paused.\n", inst->ordinal);
+}
+
+void
+ark_pmd_pktgen_reset(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+
+	if (!ark_pmd_pktgen_is_running(handle) &&
+		!ark_pmd_pktgen_paused(handle)) {
+		ARK_DEBUG_TRACE
+			("pktgen %d is not running and is not paused. No need to reset.\n",
+			 inst->ordinal);
+		return;
+	}
+
+	if (ark_pmd_pktgen_is_running(handle) &&
+		!ark_pmd_pktgen_paused(handle)) {
+		ARK_DEBUG_TRACE("pktgen %d is not paused. Pausing first.\n",
+						inst->ordinal);
+		ark_pmd_pktgen_pause(handle);
+	}
+
+	ARK_DEBUG_TRACE("Resetting pktgen %d.\n", inst->ordinal);
+	inst->regs->pkt_start_stop = (1 << 8);
+}
+
+uint32_t
+ark_pmd_pktgen_tx_done(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	uint32_t r = inst->regs->pkt_start_stop;
+
+	return (((r >> 24) & 1) == 1);
+}
+
+uint32_t
+ark_pmd_pktgen_is_running(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	uint32_t r = inst->regs->pkt_start_stop;
+
+	return ((r & 1) == 1);
+}
+
+uint32_t
+ark_pmd_pktgen_is_gen_forever(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	uint32_t r = inst->regs->pkt_ctrl;
+
+	return (((r >> 24) & 1) == 1);
+}
+
+void
+ark_pmd_pktgen_wait_done(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+
+	if (ark_pmd_pktgen_is_gen_forever(handle))
+		PMD_DRV_LOG(ERR, "wait_done will not terminate because gen_forever=1\n");
+	int wait_cycle = 10;
+
+	while (!ark_pmd_pktgen_tx_done(handle) && (wait_cycle > 0)) {
+		usleep(1000);
+		wait_cycle--;
+		ARK_DEBUG_TRACE("Waiting for pktgen %d to finish sending...\n",
+						inst->ordinal);
+	}
+	ARK_DEBUG_TRACE("pktgen %d done.\n", inst->ordinal);
+}
+
+uint32_t
+ark_pmd_pktgen_get_pkts_sent(ark_pkt_gen_t handle)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	return inst->regs->pkts_sent;
+}
+
+void
+ark_pmd_pktgen_set_payload_byte(ark_pkt_gen_t handle, uint32_t b)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->pkt_payload = b;
+}
+
+void
+ark_pmd_pktgen_set_pkt_spacing(ark_pkt_gen_t handle, uint32_t x)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->pkt_spacing = x;
+}
+
+void
+ark_pmd_pktgen_set_pkt_size_min(ark_pkt_gen_t handle, uint32_t x)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->pkt_size_min = x;
+}
+
+void
+ark_pmd_pktgen_set_pkt_size_max(ark_pkt_gen_t handle, uint32_t x)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->pkt_size_max = x;
+}
+
+void
+ark_pmd_pktgen_set_pkt_size_incr(ark_pkt_gen_t handle, uint32_t x)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->pkt_size_incr = x;
+}
+
+void
+ark_pmd_pktgen_set_num_pkts(ark_pkt_gen_t handle, uint32_t x)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->num_pkts = x;
+}
+
+void
+ark_pmd_pktgen_set_src_mac_addr(ark_pkt_gen_t handle, uint64_t mac_addr)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->src_mac_addr_h = (mac_addr >> 32) & 0xffff;
+	inst->regs->src_mac_addr_l = mac_addr & 0xffffffff;
+}
+
+void
+ark_pmd_pktgen_set_dst_mac_addr(ark_pkt_gen_t handle, uint64_t mac_addr)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->dst_mac_addr_h = (mac_addr >> 32) & 0xffff;
+	inst->regs->dst_mac_addr_l = mac_addr & 0xffffffff;
+}
+
+void
+ark_pmd_pktgen_set_eth_type(ark_pkt_gen_t handle, uint32_t x)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+	inst->regs->eth_type = x;
+}
+
+void
+ark_pmd_pktgen_set_hdr_dW(ark_pkt_gen_t handle, uint32_t *hdr)
+{
+	uint32_t i;
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+
+	for (i = 0; i < 7; i++)
+		inst->regs->hdr_dw[i] = hdr[i];
+}
+
+void
+ark_pmd_pktgen_set_start_offset(ark_pkt_gen_t handle, uint32_t x)
+{
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+
+	inst->regs->start_offset = x;
+}
+
+static struct OPTIONS *
+options(const char *id)
+{
+	unsigned int i;
+
+	for (i = 0; i < sizeof(toptions) / sizeof(struct OPTIONS); i++) {
+		if (strcmp(id, toptions[i].opt) == 0)
+			return &toptions[i];
+	}
+
+	PMD_DRV_LOG
+		(ERR,
+		 "pktgen: Could not find requested option !!, "
+		 "option = %s\n",
+		 id
+		 );
+	return NULL;
+}
+
+static int pmd_set_arg(char *arg, char *val);
+static int
+pmd_set_arg(char *arg, char *val)
+{
+	struct OPTIONS *o = options(arg);
+
+	if (o) {
+	switch (o->t) {
+	case OTINT:
+	case OTBOOL:
+		o->v.INT = atoi(val);
+		break;
+	case OTLONG:
+		o->v.INT = atoll(val);
+		break;
+	case OTSTRING:
+		strncpy(o->v.STR, val, ARK_MAX_STR_LEN);
+		break;
+	}
+	return 1;
+	}
+	return 0;
+}
+
+/******
+ * Arg format = "opt0=v,opt_n=v ..."
+ ******/
+void
+ark_pmd_pktgen_parse(char *args)
+{
+	char *argv, *v;
+	const char toks[] = " =\n\t\v\f \r";
+	argv = strtok(args, toks);
+	v = strtok(NULL, toks);
+	pmd_set_arg(argv, v);
+	while (argv && v) {
+	argv = strtok(NULL, toks);
+	v = strtok(NULL, toks);
+	if (argv && v)
+		pmd_set_arg(argv, v);
+	}
+}
+
+static int32_t parse_ipv4_string(char const *ip_address);
+static int32_t
+parse_ipv4_string(char const *ip_address)
+{
+	unsigned int ip[4];
+
+	if (sscanf(ip_address,
+			   "%u.%u.%u.%u",
+			   &ip[0], &ip[1], &ip[2], &ip[3]) != 4)
+		return 0;
+	return ip[3] + ip[2] * 0x100 + ip[1] * 0x10000ul + ip[0] * 0x1000000ul;
+}
+
+static void
+ark_pmd_pktgen_set_pkt_ctrl(ark_pkt_gen_t handle, uint32_t gen_forever,
+	uint32_t en_slaved_start, uint32_t vary_length, uint32_t incr_payload,
+	uint32_t incr_first_byte, uint32_t ins_seq_num, uint32_t ins_udp_hdr,
+	uint32_t ins_time_stamp)
+{
+	uint32_t r;
+	struct ark_pkt_gen_inst *inst = (struct ark_pkt_gen_inst *)handle;
+
+	if (!inst->l2_mode)
+		ins_udp_hdr = 0;
+
+	r = (gen_forever << 24) | (en_slaved_start << 20) |
+		(vary_length << 16) |
+	(incr_payload << 12) | (incr_first_byte << 8) |
+	(ins_time_stamp << 5) | (ins_seq_num << 4) | ins_udp_hdr;
+
+	inst->regs->bytes_per_cycle = options("bytes_per_cycle")->v.INT;
+	if (options("shaping")->v.BOOL)
+		r = r | (1 << 28);	/* enable shaping */
+
+	inst->regs->pkt_ctrl = r;
+}
+
+void
+ark_pmd_pktgen_setup(ark_pkt_gen_t handle)
+{
+	uint32_t hdr[7];
+	int32_t dst_ip = parse_ipv4_string(options("dst_ip")->v.STR);
+
+	if (!options("pause")->v.BOOL && (!options("reset")->v.BOOL &&
+				  (options("configure")->v.BOOL))) {
+	ark_pmd_pktgen_set_payload_byte(handle,
+			options("payload_byte")->v.INT);
+	ark_pmd_pktgen_set_src_mac_addr(handle,
+			options("src_mac_addr")->v.INT);
+	ark_pmd_pktgen_set_dst_mac_addr(handle,
+			options("dst_mac_addr")->v.LONG);
+	ark_pmd_pktgen_set_eth_type(handle,
+			options("eth_type")->v.INT);
+
+	if (options("dg-mode")->v.BOOL) {
+		hdr[0] = options("hdr_dW0")->v.INT;
+		hdr[1] = options("hdr_dW1")->v.INT;
+		hdr[2] = options("hdr_dW2")->v.INT;
+		hdr[3] = options("hdr_dW3")->v.INT;
+		hdr[4] = options("hdr_dW4")->v.INT;
+		hdr[5] = options("hdr_dW5")->v.INT;
+		hdr[6] = options("hdr_dW6")->v.INT;
+	} else {
+		hdr[0] = dst_ip;
+		hdr[1] = options("dst_port")->v.INT;
+		hdr[2] = options("src_port")->v.INT;
+		hdr[3] = 0;
+		hdr[4] = 0;
+		hdr[5] = 0;
+		hdr[6] = 0;
+	}
+	ark_pmd_pktgen_set_hdr_dW(handle, hdr);
+	ark_pmd_pktgen_set_num_pkts(handle,
+		options("num_pkts")->v.INT);
+	ark_pmd_pktgen_set_pkt_size_min(handle,
+		options("pkt_size_min")->v.INT);
+	ark_pmd_pktgen_set_pkt_size_max(handle,
+		options("pkt_size_max")->v.INT);
+	ark_pmd_pktgen_set_pkt_size_incr(handle,
+		 options("pkt_size_incr")->v.INT);
+	ark_pmd_pktgen_set_pkt_spacing(handle,
+	   options("pkt_spacing")->v.INT);
+	ark_pmd_pktgen_set_start_offset(handle,
+		options("start_offset")->v.INT);
+	ark_pmd_pktgen_set_pkt_ctrl(handle,
+		options("gen_forever")->v.BOOL,
+		options("en_slaved_start")->v.BOOL,
+		options("vary_length")->v.BOOL,
+		options("incr_payload")->v.BOOL,
+		options("incr_first_byte")->v.BOOL,
+		options("ins_seq_num")->v.INT,
+		options("ins_udp_hdr")->v.BOOL,
+		options("ins_time_stamp")->v.INT);
+	}
+
+	if (options("pause")->v.BOOL)
+	ark_pmd_pktgen_pause(handle);
+
+	if (options("reset")->v.BOOL)
+	ark_pmd_pktgen_reset(handle);
+	if (options("run")->v.BOOL) {
+	ARK_DEBUG_TRACE("Starting packet generator on port %d\n",
+		options("port")->v.INT);
+	ark_pmd_pktgen_run(handle);
+	}
+}
diff --git a/drivers/net/ark/ark_pktgen.h b/drivers/net/ark/ark_pktgen.h
new file mode 100644
index 0000000..92994ce
--- /dev/null
+++ b/drivers/net/ark/ark_pktgen.h
@@ -0,0 +1,106 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_PKTGEN_H_
+#define _ARK_PKTGEN_H_
+
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_eal.h>
+
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+
+#define ARK_PKTGEN_BASE_ADR  0x10000
+
+typedef void *ark_pkt_gen_t;
+
+struct ark_pkt_gen_regs {
+	uint32_t r0;
+	volatile uint32_t pkt_start_stop;
+	volatile uint32_t pkt_ctrl;
+	uint32_t pkt_payload;
+	uint32_t pkt_spacing;
+	uint32_t pkt_size_min;
+	uint32_t pkt_size_max;
+	uint32_t pkt_size_incr;
+	volatile uint32_t num_pkts;
+	volatile uint32_t pkts_sent;
+	uint32_t src_mac_addr_l;
+	uint32_t src_mac_addr_h;
+	uint32_t dst_mac_addr_l;
+	uint32_t dst_mac_addr_h;
+	uint32_t eth_type;
+	uint32_t hdr_dw[7];
+	uint32_t start_offset;
+	uint32_t bytes_per_cycle;
+} __attribute__ ((packed));
+
+struct ark_pkt_gen_inst {
+	struct rte_eth_dev_info *dev_info;
+	struct ark_pkt_gen_regs *regs;
+	int l2_mode;
+	int ordinal;
+};
+
+/*  packet generator functions */
+ark_pkt_gen_t ark_pmd_pktgen_init(void *arg, int ord, int l2_mode);
+void ark_pmd_pktgen_uninit(ark_pkt_gen_t handle);
+void ark_pmd_pktgen_run(ark_pkt_gen_t handle);
+void ark_pmd_pktgen_pause(ark_pkt_gen_t handle);
+uint32_t ark_pmd_pktgen_paused(ark_pkt_gen_t handle);
+uint32_t ark_pmd_pktgen_is_gen_forever(ark_pkt_gen_t handle);
+uint32_t ark_pmd_pktgen_is_running(ark_pkt_gen_t handle);
+uint32_t ark_pmd_pktgen_tx_done(ark_pkt_gen_t handle);
+void ark_pmd_pktgen_reset(ark_pkt_gen_t handle);
+void ark_pmd_pktgen_wait_done(ark_pkt_gen_t handle);
+uint32_t ark_pmd_pktgen_get_pkts_sent(ark_pkt_gen_t handle);
+void ark_pmd_pktgen_set_payload_byte(ark_pkt_gen_t handle, uint32_t b);
+void ark_pmd_pktgen_set_pkt_spacing(ark_pkt_gen_t handle, uint32_t x);
+void ark_pmd_pktgen_set_pkt_size_min(ark_pkt_gen_t handle, uint32_t x);
+void ark_pmd_pktgen_set_pkt_size_max(ark_pkt_gen_t handle, uint32_t x);
+void ark_pmd_pktgen_set_pkt_size_incr(ark_pkt_gen_t handle, uint32_t x);
+void ark_pmd_pktgen_set_num_pkts(ark_pkt_gen_t handle, uint32_t x);
+void ark_pmd_pktgen_set_src_mac_addr(ark_pkt_gen_t handle, uint64_t mac_addr);
+void ark_pmd_pktgen_set_dst_mac_addr(ark_pkt_gen_t handle, uint64_t mac_addr);
+void ark_pmd_pktgen_set_eth_type(ark_pkt_gen_t handle, uint32_t x);
+void ark_pmd_pktgen_set_hdr_dW(ark_pkt_gen_t handle, uint32_t *hdr);
+void ark_pmd_pktgen_set_start_offset(ark_pkt_gen_t handle, uint32_t x);
+void ark_pmd_pktgen_parse(char *argv);
+void ark_pmd_pktgen_setup(ark_pkt_gen_t handle);
+
+#endif
diff --git a/drivers/net/ark/ark_rqp.c b/drivers/net/ark/ark_rqp.c
new file mode 100644
index 0000000..ece6044
--- /dev/null
+++ b/drivers/net/ark/ark_rqp.c
@@ -0,0 +1,92 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_rqp.h"
+#include "ark_debug.h"
+
+/* ************************************************************************* */
+void
+ark_rqp_stats_reset(struct ark_rqpace_t *rqp)
+{
+	rqp->stats_clear = 1;
+	/* POR 992 */
+	/* rqp->cpld_max = 992; */
+	/* POR 64 */
+	/* rqp->cplh_max = 64; */
+}
+
+/* ************************************************************************* */
+void
+ark_rqp_dump(struct ark_rqpace_t *rqp)
+{
+	if (rqp->err_count_other != 0)
+	fprintf
+		(stderr,
+		"ARKP RQP Errors noted: ctrl: %d cplh_hmax %d cpld_max %d"
+		 ARK_SU32
+		ARK_SU32 "\n",
+		 rqp->ctrl, rqp->cplh_max, rqp->cpld_max,
+		"Error Count", rqp->err_cnt,
+		 "Error General", rqp->err_count_other);
+
+	ARK_DEBUG_STATS
+		("ARKP RQP Dump: ctrl: %d cplh_hmax %d cpld_max %d" ARK_SU32
+		 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32
+		 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32
+		 ARK_SU32 ARK_SU32 ARK_SU32
+		 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 "\n",
+		 rqp->ctrl, rqp->cplh_max, rqp->cpld_max,
+		 "Error Count", rqp->err_cnt,
+		 "Error General", rqp->err_count_other,
+		 "stall_pS", rqp->stall_ps,
+		 "stall_pS Min", rqp->stall_ps_min,
+		 "stall_pS Max", rqp->stall_ps_max,
+		 "req_pS", rqp->req_ps,
+		 "req_pS Min", rqp->req_ps_min,
+		 "req_pS Max", rqp->req_ps_max,
+		 "req_dWPS", rqp->req_dw_ps,
+		 "req_dWPS Min", rqp->req_dw_ps_min,
+		 "req_dWPS Max", rqp->req_dw_ps_max,
+		 "cpl_pS", rqp->cpl_ps,
+		 "cpl_pS Min", rqp->cpl_ps_min,
+		 "cpl_pS Max", rqp->cpl_ps_max,
+		 "cpl_dWPS", rqp->cpl_dw_ps,
+		 "cpl_dWPS Min", rqp->cpl_dw_ps_min,
+		 "cpl_dWPS Max", rqp->cpl_dw_ps_max,
+		 "cplh pending", rqp->cplh_pending,
+		 "cpld pending", rqp->cpld_pending,
+		 "cplh pending max", rqp->cplh_pending_max,
+		 "cpld pending max", rqp->cpld_pending_max);
+}
diff --git a/drivers/net/ark/ark_rqp.h b/drivers/net/ark/ark_rqp.h
new file mode 100644
index 0000000..4376d76
--- /dev/null
+++ b/drivers/net/ark/ark_rqp.h
@@ -0,0 +1,75 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_RQP_H_
+#define _ARK_RQP_H_
+
+#include <stdint.h>
+
+#include <rte_memory.h>
+
+/*
+ * RQ Pacing core hardware structure
+ */
+struct ark_rqpace_t {
+	volatile uint32_t ctrl;
+	volatile uint32_t stats_clear;
+	volatile uint32_t cplh_max;
+	volatile uint32_t cpld_max;
+	volatile uint32_t err_cnt;
+	volatile uint32_t stall_ps;
+	volatile uint32_t stall_ps_min;
+	volatile uint32_t stall_ps_max;
+	volatile uint32_t req_ps;
+	volatile uint32_t req_ps_min;
+	volatile uint32_t req_ps_max;
+	volatile uint32_t req_dw_ps;
+	volatile uint32_t req_dw_ps_min;
+	volatile uint32_t req_dw_ps_max;
+	volatile uint32_t cpl_ps;
+	volatile uint32_t cpl_ps_min;
+	volatile uint32_t cpl_ps_max;
+	volatile uint32_t cpl_dw_ps;
+	volatile uint32_t cpl_dw_ps_min;
+	volatile uint32_t cpl_dw_ps_max;
+	volatile uint32_t cplh_pending;
+	volatile uint32_t cpld_pending;
+	volatile uint32_t cplh_pending_max;
+	volatile uint32_t cpld_pending_max;
+	volatile uint32_t err_count_other;
+};
+
+void ark_rqp_dump(struct ark_rqpace_t *rqp);
+void ark_rqp_stats_reset(struct ark_rqpace_t *rqp);
+
+#endif
diff --git a/drivers/net/ark/ark_udm.c b/drivers/net/ark/ark_udm.c
new file mode 100644
index 0000000..decd48c
--- /dev/null
+++ b/drivers/net/ark/ark_udm.c
@@ -0,0 +1,221 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_debug.h"
+#include "ark_udm.h"
+
+int
+ark_udm_verify(struct ark_udm_t *udm)
+{
+	if (sizeof(struct ark_udm_t) != ARK_UDM_EXPECT_SIZE) {
+		fprintf(stderr, "  UDM structure looks incorrect %d vs %zd\n",
+				ARK_UDM_EXPECT_SIZE, sizeof(struct ark_udm_t));
+		return -1;
+	}
+
+	if (udm->setup.const0 != ARK_UDM_CONST) {
+		fprintf(stderr, "  UDM module not found as expected 0x%08x\n",
+				udm->setup.const0);
+		return -1;
+	}
+	return 0;
+}
+
+int
+ark_udm_stop(struct ark_udm_t *udm, const int wait)
+{
+	int cnt = 0;
+
+	udm->cfg.command = 2;
+
+	while (wait && (udm->cfg.stop_flushed & 0x01) == 0) {
+	if (cnt++ > 1000)
+		return 1;
+
+	usleep(10);
+	}
+	return 0;
+}
+
+int
+ark_udm_reset(struct ark_udm_t *udm)
+{
+	int status;
+
+	status = ark_udm_stop(udm, 1);
+	if (status != 0) {
+		ARK_DEBUG_TRACE("ARKP: %s  stop failed  doing forced reset\n",
+						__func__);
+		udm->cfg.command = 4;
+		usleep(10);
+		udm->cfg.command = 3;
+		status = ark_udm_stop(udm, 0);
+		ARK_DEBUG_TRACE
+			("ARKP: %s  stop status %d post failure and forced reset\n",
+			 __func__, status);
+	} else {
+		udm->cfg.command = 3;
+	}
+
+	return status;
+}
+
+void
+ark_udm_start(struct ark_udm_t *udm)
+{
+	udm->cfg.command = 1;
+}
+
+void
+ark_udm_stats_reset(struct ark_udm_t *udm)
+{
+	udm->pcibp.pci_clear = 1;
+	udm->tlp_ps.tlp_clear = 1;
+}
+
+void
+ark_udm_configure(struct ark_udm_t *udm, uint32_t headroom, uint32_t dataroom,
+	uint32_t write_interval_ns)
+{
+	/* headroom and data room are in DWs in the UDM */
+	udm->cfg.dataroom = dataroom / 4;
+	udm->cfg.headroom = headroom / 4;
+
+	/* 4 NS period ns */
+	udm->rt_cfg.write_interval = write_interval_ns / 4;
+}
+
+void
+ark_udm_write_addr(struct ark_udm_t *udm, phys_addr_t addr)
+{
+	udm->rt_cfg.hw_prod_addr = addr;
+}
+
+int
+ark_udm_is_flushed(struct ark_udm_t *udm)
+{
+	return (udm->cfg.stop_flushed & 0x01) != 0;
+}
+
+uint64_t
+ark_udm_dropped(struct ark_udm_t *udm)
+{
+	return udm->qstats.q_pkt_drop;
+}
+
+uint64_t
+ark_udm_bytes(struct ark_udm_t *udm)
+{
+	return udm->qstats.q_byte_count;
+}
+
+uint64_t
+ark_udm_packets(struct ark_udm_t *udm)
+{
+	return udm->qstats.q_ff_packet_count;
+}
+
+void
+ark_udm_dump_stats(struct ark_udm_t *udm, const char *msg)
+{
+	ARK_DEBUG_STATS("ARKP UDM Stats: %s" ARK_SU64 ARK_SU64 ARK_SU64 ARK_SU64
+		ARK_SU64 "\n", msg, "Pkts Received", udm->stats.rx_packet_count,
+		"Pkts Finalized", udm->stats.rx_sent_packets, "Pkts Dropped",
+		udm->tlp.pkt_drop, "Bytes Count", udm->stats.rx_byte_count, "MBuf Count",
+		udm->stats.rx_mbuf_count);
+}
+
+void
+ark_udm_dump_queue_stats(struct ark_udm_t *udm, const char *msg, uint16_t qid)
+{
+	ARK_DEBUG_STATS
+		("ARKP UDM Queue %3u Stats: %s"
+		 ARK_SU64 ARK_SU64
+		 ARK_SU64 ARK_SU64
+		 ARK_SU64 "\n",
+		 qid, msg,
+		 "Pkts Received", udm->qstats.q_packet_count,
+		 "Pkts Finalized", udm->qstats.q_ff_packet_count,
+		 "Pkts Dropped", udm->qstats.q_pkt_drop,
+		 "Bytes Count", udm->qstats.q_byte_count,
+		 "MBuf Count", udm->qstats.q_mbuf_count);
+}
+
+void
+ark_udm_dump(struct ark_udm_t *udm, const char *msg)
+{
+	ARK_DEBUG_TRACE("ARKP UDM Dump: %s Stopped: %d\n", msg,
+	udm->cfg.stop_flushed);
+}
+
+void
+ark_udm_dump_setup(struct ark_udm_t *udm, uint16_t q_id)
+{
+	ARK_DEBUG_TRACE
+		("UDM Setup Q: %u"
+		 ARK_SU64X ARK_SU32 "\n",
+		 q_id,
+		 "hw_prod_addr", udm->rt_cfg.hw_prod_addr,
+		 "prod_idx", udm->rt_cfg.prod_idx);
+}
+
+void
+ark_udm_dump_perf(struct ark_udm_t *udm, const char *msg)
+{
+	struct ark_udm_pcibp_t *bp = &udm->pcibp;
+
+	ARK_DEBUG_STATS
+		("ARKP UDM Performance %s"
+		 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 ARK_SU32 "\n",
+		 msg,
+		 "PCI Empty", bp->pci_empty,
+		 "PCI Q1", bp->pci_q1,
+		 "PCI Q2", bp->pci_q2,
+		 "PCI Q3", bp->pci_q3,
+		 "PCI Q4", bp->pci_q4,
+		 "PCI Full", bp->pci_full);
+}
+
+void
+ark_udm_queue_stats_reset(struct ark_udm_t *udm)
+{
+	udm->qstats.q_byte_count = 1;
+}
+
+void
+ark_udm_queue_enable(struct ark_udm_t *udm, int enable)
+{
+	udm->qstats.q_enable = enable ? 1 : 0;
+}
diff --git a/drivers/net/ark/ark_udm.h b/drivers/net/ark/ark_udm.h
new file mode 100644
index 0000000..d05b925
--- /dev/null
+++ b/drivers/net/ark/ark_udm.h
@@ -0,0 +1,175 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_UDM_H_
+#define _ARK_UDM_H_
+
+#include <stdint.h>
+
+#include <rte_memory.h>
+
+/*
+ * UDM hardware structures
+ */
+
+#define ARK_RX_WRITE_TIME_NS 2500
+#define ARK_UDM_SETUP 0
+#define ARK_UDM_CONST 0xbACECACE
+struct ark_udm_setup_t {
+	uint32_t r0;
+	uint32_t r4;
+	volatile uint32_t cycle_count;
+	uint32_t const0;
+};
+
+#define ARK_UDM_CFG 0x010
+struct ark_udm_cfg_t {
+	volatile uint32_t stop_flushed;	/* RO */
+	volatile uint32_t command;
+	uint32_t dataroom;
+	uint32_t headroom;
+};
+
+typedef enum {
+	ARK_UDM_START = 0x1,
+	ARK_UDM_STOP = 0x2,
+	ARK_UDM_RESET = 0x3
+} ark_udm_commands;
+
+#define ARK_UDM_STATS 0x020
+struct ark_udm_stats_t {
+	volatile uint64_t rx_byte_count;
+	volatile uint64_t rx_packet_count;
+	volatile uint64_t rx_mbuf_count;
+	volatile uint64_t rx_sent_packets;
+};
+
+#define ARK_UDM_PQ 0x040
+struct ark_udm_queue_stats_t {
+	volatile uint64_t q_byte_count;
+	volatile uint64_t q_packet_count;	/* includes drops */
+	volatile uint64_t q_mbuf_count;
+	volatile uint64_t q_ff_packet_count;
+	volatile uint64_t q_pkt_drop;
+	uint32_t q_enable;
+};
+
+#define ARK_UDM_TLP 0x0070
+struct ark_udm_tlp_t {
+	volatile uint64_t pkt_drop;	/* global */
+	volatile uint32_t tlp_q1;
+	volatile uint32_t tlp_q2;
+	volatile uint32_t tlp_q3;
+	volatile uint32_t tlp_q4;
+	volatile uint32_t tlp_full;
+};
+
+#define ARK_UDM_PCIBP 0x00a0
+struct ark_udm_pcibp_t {
+	volatile uint32_t pci_clear;
+	volatile uint32_t pci_empty;
+	volatile uint32_t pci_q1;
+	volatile uint32_t pci_q2;
+	volatile uint32_t pci_q3;
+	volatile uint32_t pci_q4;
+	volatile uint32_t pci_full;
+};
+
+#define ARK_UDM_TLP_PS 0x00bc
+struct ark_udm_tlp_ps_t {
+	volatile uint32_t tlp_clear;
+	volatile uint32_t tlp_ps_min;
+	volatile uint32_t tlp_ps_max;
+	volatile uint32_t tlp_full_ps_min;
+	volatile uint32_t tlp_full_ps_max;
+	volatile uint32_t tlp_dw_ps_min;
+	volatile uint32_t tlp_dw_ps_max;
+	volatile uint32_t tlp_pldw_ps_min;
+	volatile uint32_t tlp_pldw_ps_max;
+};
+
+#define ARK_UDM_RT_CFG 0x00e0
+struct ark_udm_rt_cfg_t {
+	phys_addr_t hw_prod_addr;
+	uint32_t write_interval;	/* 4ns cycles */
+	volatile uint32_t prod_idx;	/* RO */
+};
+
+/*  Consolidated structure */
+struct ark_udm_t {
+	struct ark_udm_setup_t setup;
+	struct ark_udm_cfg_t cfg;
+	struct ark_udm_stats_t stats;
+	struct ark_udm_queue_stats_t qstats;
+	uint8_t reserved1[(ARK_UDM_TLP - ARK_UDM_PQ) -
+					  sizeof(struct ark_udm_queue_stats_t)];
+	struct ark_udm_tlp_t tlp;
+	uint8_t reserved2[(ARK_UDM_PCIBP - ARK_UDM_TLP) -
+					  sizeof(struct ark_udm_tlp_t)];
+	struct ark_udm_pcibp_t pcibp;
+	struct ark_udm_tlp_ps_t tlp_ps;
+	struct ark_udm_rt_cfg_t rt_cfg;
+	int8_t reserved3[(0x100 - ARK_UDM_RT_CFG) -
+					  sizeof(struct ark_udm_rt_cfg_t)];
+};
+
+#define ARK_UDM_EXPECT_SIZE (0x00fc + 4)
+#define ARK_UDM_QOFFSET ARK_UDM_EXPECT_SIZE
+
+int ark_udm_verify(struct ark_udm_t *udm);
+int ark_udm_stop(struct ark_udm_t *udm, int wait);
+void ark_udm_start(struct ark_udm_t *udm);
+int ark_udm_reset(struct ark_udm_t *udm);
+void ark_udm_configure(struct ark_udm_t *udm,
+					   uint32_t headroom,
+					   uint32_t dataroom,
+					   uint32_t write_interval_ns);
+void ark_udm_write_addr(struct ark_udm_t *udm, phys_addr_t addr);
+void ark_udm_stats_reset(struct ark_udm_t *udm);
+void ark_udm_dump_stats(struct ark_udm_t *udm, const char *msg);
+void ark_udm_dump_queue_stats(struct ark_udm_t *udm, const char *msg,
+							  uint16_t qid);
+void ark_udm_dump(struct ark_udm_t *udm, const char *msg);
+void ark_udm_dump_perf(struct ark_udm_t *udm, const char *msg);
+void ark_udm_dump_setup(struct ark_udm_t *udm, uint16_t q_id);
+int ark_udm_is_flushed(struct ark_udm_t *udm);
+
+/* Per queue data */
+uint64_t ark_udm_dropped(struct ark_udm_t *udm);
+uint64_t ark_udm_bytes(struct ark_udm_t *udm);
+uint64_t ark_udm_packets(struct ark_udm_t *udm);
+
+void ark_udm_queue_stats_reset(struct ark_udm_t *udm);
+void ark_udm_queue_enable(struct ark_udm_t *udm, int enable);
+
+#endif
diff --git a/drivers/net/ark/rte_pmd_ark_version.map b/drivers/net/ark/rte_pmd_ark_version.map
new file mode 100644
index 0000000..7f84780
--- /dev/null
+++ b/drivers/net/ark/rte_pmd_ark_version.map
@@ -0,0 +1,4 @@
+DPDK_2.0 {
+	 local: *;
+
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0e0b600..da23898 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -104,6 +104,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD)      += -lrte_pmd_bnx2x -lz
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNXT_PMD)       += -lrte_pmd_bnxt
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
1.9.1

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v11 08/18] lib: add symbol versioning to distributor
  2017-03-20 10:08  2%                 ` [dpdk-dev] [PATCH v11 0/18] distributor lib performance enhancements David Hunt
  2017-03-20 10:08  1%                   ` [dpdk-dev] [PATCH v11 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-20 10:08  2%                   ` David Hunt
  2017-03-27 13:02  3%                     ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
 lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 +++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++
 5 files changed, 162 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..06df13d 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -44,6 +45,7 @@
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
 #include "rte_distributor_v20.h"
+#include "rte_distributor_v1705.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
@@ -57,7 +59,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
@@ -102,9 +104,14 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count),
+		rte_distributor_request_pkt_v1705);
 
 int
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -138,9 +145,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts),
+		rte_distributor_poll_pkt_v1705);
 
 int
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -168,9 +179,14 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count),
+		rte_distributor_get_pkt_v1705);
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -197,6 +213,10 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num),
+		rte_distributor_return_pkt_v1705);
 
 /**** APIs called on distributor core ***/
 
@@ -342,7 +362,7 @@ release(struct rte_distributor *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -476,10 +496,14 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs),
+		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -504,6 +528,10 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs),
+		rte_distributor_returned_pkts_v1705);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -525,7 +553,7 @@ total_outstanding(const struct rte_distributor *d)
  * queued up.
  */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
@@ -549,10 +577,13 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
+		rte_distributor_flush_v1705);
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
@@ -565,10 +596,13 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
+		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
 struct rte_distributor *
-rte_distributor_create(const char *name,
+rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
@@ -638,3 +672,8 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, _v1705, 17.05);
+MAP_STATIC_SYMBOL(struct rte_distributor *rte_distributor_create(
+		const char *name, unsigned int socket_id,
+		unsigned int num_workers, unsigned int alg_type),
+		rte_distributor_create_v1705);
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..81b2691
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,89 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V1705_H_
+#define _RTE_DISTRIB_V1705_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v11 01/18] lib: rename legacy distributor lib files
  2017-03-20 10:08  2%                 ` [dpdk-dev] [PATCH v11 0/18] distributor lib performance enhancements David Hunt
@ 2017-03-20 10:08  1%                   ` David Hunt
  2017-03-20 10:08  2%                   ` [dpdk-dev] [PATCH v11 08/18] lib: add symbol versioning to distributor David Hunt
  1 sibling, 0 replies; 200+ results
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v11 0/18] distributor lib performance enhancements
  2017-03-15  6:19  1%               ` [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-20 10:08  2%                 ` David Hunt
  2017-03-20 10:08  1%                   ` [dpdk-dev] [PATCH v11 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-20 10:08  2%                   ` [dpdk-dev] [PATCH v11 08/18] lib: add symbol versioning to distributor David Hunt
  0 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v11 changes:
   * Fixed RTE_DIST_BURST_SIZE so it compiles on Arm platforms
   * Fixed compile issue in rte_distributor_match_generic.c on Arm plarforms
   * Tweaked distributor_app docs based on review and added John's Ack

v10 changes:
   * Addressed all review comments from v9 (thanks, Bruce)
   * Inherited v9 series Ack by Bruce, except new suggested addition
     for example app documentation (17/18)
   * Squashed the two patches containing distributor structs and code
   * Renamed confusing rte_distributor_v1705.h to rte_distributor_next.h
   * Added usleep in main so as to be a little more gentle with that core
   * Fixed some patch titles and improved some descriptions
   * Updated sample app guide documentation
   * Removed un-needed code limiting Tx rings and cleaned up patch

v9 changes:
   * fixed symbol versioning so it will compile on CentOS and RedHat

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new distributor code
[04/18] lib: add SIMD flow matching to distributor
[05/18] test/distributor: extra params for autotests
[06/18] lib: switch distributor over to new API
[07/18] lib: make v20 header file private
[08/18] lib: add symbol versioning to distributor
[09/18] test: test single and burst distributor API
[10/18] test: add perf test for distributor burst mode
[11/18] examples/distributor: allow for extra stats
[12/18] examples/distributor: wait for ports to come up
[13/18] examples/distributor: add dedicated core for dist
[14/18] examples/distributor: tweaks for performance
[15/18] examples/distributor: give Rx thread a core
[16/18] doc: distributor library changes for new burst API
[17/18] doc: distributor app changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v2 00/13] introduce fail-safe PMD
  @ 2017-03-18 19:51  3%             ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2017-03-18 19:51 UTC (permalink / raw)
  To: Gaëtan Rivet
  Cc: Thomas Monjalon, Bruce Richardson, dev, Adrien Mazarguil, techboard

On Fri, Mar 17, 2017 at 11:56:21AM +0100, Gaëtan Rivet wrote:
> On Thu, Mar 16, 2017 at 04:50:43PM -0400, Neil Horman wrote:
> > On Wed, Mar 15, 2017 at 03:25:37PM +0100, Gaëtan Rivet wrote:
> > > On Wed, Mar 15, 2017 at 12:15:56PM +0100, Thomas Monjalon wrote:
> > > > 2017-03-15 03:28, Bruce Richardson:
> > > > > On Tue, Mar 14, 2017 at 03:49:47PM +0100, Gaëtan Rivet wrote:
> > > > > > - In the bonding, the init and configuration steps are still the
> > > > > >  responsibility of the application and no one else. The bonding PMD
> > > > > >  captures the device, re-applies its configuration upon dev_configure()
> > > > > >  which is actually re-applying part of the configuration already  present
> > > > > > within the slave eth_dev (cf rte_eth_dev_config_restore).
> > > > > >
> > > > > > - In the fail-safe, the init and configuration are both the
> > > > > >  responsibilities of the fail-safe PMD itself, not the application
> > > > > >  anymore. This handling of these responsibilities in lieu of the
> > > > > >  application is the whole point of the "deferred hot-plug" support, of
> > > > > >  proposing a simple implementation to the user.
> > > > > >
> > > > > > This change in responsibilities is the bulk of the fail-safe code. It
> > > > > > would have to be added as-is to the bonding. Verifying the correctness
> > > > > > of the sync of the initialization phase (acceptable states of a device
> > > > > > following several events registered by the fail-safe PMD) and the
> > > > > > configuration items between the state the application believes it is in
> > > > > > and the fail-safe knows it is in, is the bulk of the fail-safe code.
> > > > > >
> > > > > > This function is not overlapping with that of the bonding. The reason I
> > > > > > did not add this whole architecture to the bonding is that when I tried
> > > > > > to do so, I found that I only had two possibilities:
> > > > > >
> > > > > > - The current slave handling path is kept, and we only add a new one
> > > > > >  with additional functionalities: full init and conf handling with
> > > > > >  extended parsing capabilities.
> > > > > >
> > > > > > - The current slave handling is scraped and replaced entirely by the new
> > > > > >  slave management. The old capturing of existing device is not done
> > > > > >  anymore.
> > > > > >
> > > > > > The first solution is not acceptable, because we effectively end-up with
> > > > > > a maintenance nightmare by having to validate two types of slaves with
> > > > > > differing capabilities, differing initialization paths and differing
> > > > > > configuration code.  This is extremely awkward and architecturally
> > > > > > unsound. This is essentially the same as having the exact code of the
> > > > > > fail-safe as an aside in the bonding, maintening exactly the same
> > > > > > breadth of code while having muddier interfaces and organization.
> > > > > >
> > > > > > The second solution is not acceptable, because we are bending the whole
> > > > > > existing bonding API to our whim. We could just as well simply rename
> > > > > > the fail-safe PMD as bonding, add a few grouping capabilities and call
> > > > > > it a day. This is not acceptable for users.
> > > > > >
> > > > > If the first solution is indeed not an option, why do you think this
> > > > > second one would be unacceptable for users? If the functionality remains
> > > > > the same, I don't see how it matters much for users which driver
> > > > > provides it or where the code originates.
> > > > >
> > > 
> > > The problem with the second solution is also that bonding is not only a PMD.
> > > It exposes its own public API that existing applications rely on, see
> > > rte_eth_bond_*() definitions in rte_eth_bond.h.
> > > 
> > > Although bonding instances can be set up through command-line options,
> > > target "users" are mainly applications explicitly written to use it.
> > > This must be preserved for no other reason that it hasn't been deprecated.
> > > 
> > I fail to see how either of your points are relevant.  The fact that the bonding
> > pmd exposes an api to the application has no bearing on its ability to implement
> > a hot plug function.
> > 
> 
> This depends on the API making sense in the context of the new
> functionality.
> 
Well, the api should always make sense in the context of any added
functionality, but it seems to me thats just another way of saying you might
need to make some modifications to the bonding api.  I'm not saying thats
necessecarily going to have to be the case, but while I'm a big proponent of ABI
stability, I would support an API change to add valid functionality if it were a
big enough feature.  Though again, I don't think thats really necessecary if you
rethink the model a little bit

> This API offers to add and remove slaves to a bonding and to configure them.
> In the fail-safe arch, it is not possible to add and remove slaves from the
> grouping. Doing so would mean adding and removing devices from internal EAL
> structures.
> 
Ok, so you update the bonding option parser to include a fail-safe mode, which
only supports two slaves, one of which is an instance of the null-pmd and the
other is to be added at a later date in response to a hot plug event.  In the
fail safe mode, after initial option parsing, bonds operating in this mode
return an error to the application when/if it attempts to remove a slave from a
bond.  That doesn't seem so hard to me.

> It is also invalid to try to configure a fail-safe slave. An application
> only configures a fail-safe device, which will in turn configure its slaves.
> This separation follows from the nature of a device failover.
> 
See my previous mail, you augment the null pmd to allow storage of aribtrary
configuration strings or key/value pairs when operating as a bonded slave.  The
bond is then capable of retrieving configuration from that null instance and
pushing it to the real slave on a hot plug event.

> As seen previously, the fail-safe PMD handles different responsibilities
> from the bonding PMD. It is thus necessary to make different assumptions
> concerning what it can and cannot do with a slave.
> 
Only because you seem to refuse to think about the failsafe model in any other
way than the one you have implemented.

> > > Also, trying to implement this API for the device failover function would
> > > implies a device capture down to the devargs parsing level. This means that
> > > a PMD could request taking over a device, messing with the internals of the
> > > EAL: devargs list and busses lists of devices. This seems unacceptable.
> > > 
> > Why?  You just said yourself above that, while there is a devargs interface to
> > the bonding driver, there is also an api, which is the more used method to
> > configure bonding.  I'm not sure I agree with that, but I think its beside the
> > point.  Your PMD also requires configuration, and it appears necessecary that
> > you do so from the command line (you need to specifically ennumerate the
> > subdevices that you intend to provide failsafe behavior to).  I see no reason
> > why such a feature cant' be added to bonding, and the null pmd used as a
> > standin device, should the ennumerated device not yet exist).
> > 
> > To your argument regarding about taking over a device, I don't see how you find
> > that unacceptable, as it is precisely what the bonding driver does today, in the
> > sense that it allows an application to assign a master/slave relationship to
> > devices right now.  I see no reason that we can't convey the right and ability
> > for bonding to do that dynamically based on configuration.
> > 
> 
> No, the bonding PMD does not take over a device. It only cares about the
> ether layer for its link failover. It does not care about parsing parameters
> of a slave, probing devices, detaching drivers. It does not remove a device
> from the pci_device_list in the EAL for example.
> 
Again, please take a moment and think about how else your failsafe model might
be implemented in the context of bonding.  Right now, you are asserting that
your failsafe model can't be implemented in any other way becausee of the
decisions you have made.  If you change how you think of failsafe, you'll see it
can be done in other ways.

> Doing so would imply exposing private internals structures from the EAL,
> messing with elements reserved while doing the EAL init. This requires
> controlling a priority in the device initialization order to create the
> over-arching ones last (which is a hacky solution). It would wreak havoc
> with the DPDK arch.
> 
> The fail-safe PMD does not rely on the EAL for handling its slaves. This is
> what I explained before, when I touched upon the differing responsibilities
> implied by the differences in nature between a link failover and a device
> failover.
> 
> > > The bonding API is thus in conflict with the concept of a device failover in
> > > the context of the current DPDK arch.
> > > 
> > I really don't see how you get to this from your argument above.
> > 
> 
> The current DPDK arch does not expose EAL elements to be modified by PMDs,
> and with good reasons. In this context, it is not possible to handle slaves
> correctly for a device failover in the bonding PMD,
> because the bonding PMD from the get-go expects the EAL to handle its slaves
> on a device level.
> 
> > > > > Despite all the discussion, it still just doesn't make sense to me to
> > > > > have more than one DPDK driver to handle failover - be it link or
> > > > > device. If nothing else, it's going to be awkward to explain to users
> > > > > that if they want fail-over for when a link goes down they have to use
> > > > > driver A, but if they want fail-over when a NIC gets hotplugged they use
> > > > > driver B, and if they want both kinds of failover - which would surely
> > > > > be the expected case - they need to use both drivers. The usability is
> > > > > a problem here.
> > > 
> > > Having both kind of failovers in the same PMD will always lead to the first
> > > solution in some form or another.
> > > 
> > It really isn't because you can model hotplug behavior as a trival form of the
> > failover that bonding does now (i.e. failover between a null device and a
> > preferred real device).
> > 
> 
> The preferred real device still has to be created / destroyed. It still
> relies on EAL entry points for handling. It still puts additional
> responsibilities on a PMD. Those responsibilities are expressed in sub
> layers clearly defined in the fail-safe PMD. You would have to create these
> sub-layers in some form in the bonding for it to be able to create a
> preferred real device at some point. This additional way of handling slaves
> has already been discussed as inducing a messy architecture to the bonding
> PMD.
> 
> > > I am sure we can document all this in a way that does no cause users
> > > confusion, with the help of community feedback such as yours.
> > > 
> > > Perhaps "net_failsafe" is a misnomer? We also thought about "net_persistent"
> > > or "net_hotplug". Any other ideas?
> > > 
> > > It is also possible for me to remove the failover support from this series,
> > > only providing deferred hot-plug handling at first. I could then send the
> > > failover support as separate patches to better assert that it is a useful,
> > > secondary feature that is essentially free to implement.
> > > 
> > I think thats solving the wrong problem.  I've no issue with the functionality
> > in this patch, its really the implementation that we are all arguing against.
> > 
> > > >
> > > > It seems everybody agrees on the need for the failsafe code.
> > > > We are just discussing the right place to implement it.
> > > >
> > > > Gaetan, moving this code in the bonding PMD means replacing the bonding
> > > > API design by the failsafe design, right?
> > > > With the failsafe design in the bonding PMD, is it possible to keep other
> > > > bonding features?
> > > 
> > > As seen previously, the bonding API is incompatible with device failover.
> > > 
> > Its not been seen previously, you asserted it to be so, and I certainly disagree
> > with that assertion.  I think others might too.
> > 
> 
> I also explained at length my assertion. I can certainly expand further if
> necessary, but you need to point the elements you disagree with.
> 
> > Additionally, its not really in line with this discussion, but in looking at
> > your hotplug detection code, I think somewhat lacking.  Currently you seem to
> > implement this with a timer that wakes up and checks for device existance, which
> > is pretty substandard in my mind.  Thats going to waste cpu cycles that might
> > lead to packet loss.  I'd really prefer to see you augment the eal library with
> > an event handling code (it can tie into udev in linux and kqueue in bsd), and
> > create a generic event hook, that we can use to detect device adds/removes
> > without having to wake up constantly to see if anything has changed.
> > 
> > 
> 
> I think it's fine. We can discuss it further once we agree on the form the
> hot-plug implementation will take in the DPDK.
> 


Well, until then, I feel like we're talking past one another at this point, and
so I'll just say this driver is a NAK for me.

Neil


> > > Having some features enabled solely for one kind of failover, while
> > > having
> > > specific code paths for both, seems unecessarily complicated to me ;
> > > following suite with my previous points about the first solution.
> > > 
> > > >
> > > > In case we do not have a consensus in the following days, I suggest to add
> > > > this topic in the next techboard meeting agenda.
> 
> Best regards,
> -- 
> Gaëtan Rivet
> 6WIND
> 

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] net/ark: poll-mode driver for AtomicRules Arkville
@ 2017-03-17 21:15  1% Ed Czeck
  0 siblings, 0 replies; 200+ results
From: Ed Czeck @ 2017-03-17 21:15 UTC (permalink / raw)
  To: dev; +Cc: Ed Czeck, Shepard Siegel, John Miller

This is the PMD for Atomic Rules's Arkville ARK family of devices.
See doc/guides/nics/ark.rst for detailed description.


Signed-off-by: Shepard Siegel <shepard.siegel@atomicrules.com>
Signed-off-by: John Miller <john.miller@atomicrules.com>
Signed-off-by: Ed Czeck <ed.czeck@atomicrules.com>
---
 MAINTAINERS                                 |   8 +
 config/common_base                          |  10 +
 config/defconfig_arm-armv7a-linuxapp-gcc    |   1 +
 config/defconfig_ppc_64-power8-linuxapp-gcc |   1 +
 doc/guides/nics/ark.rst                     | 238 +++++++
 doc/guides/nics/features/ark.ini            |  15 +
 doc/guides/nics/index.rst                   |   1 +
 drivers/net/Makefile                        |   1 +
 drivers/net/ark/Makefile                    |  73 +++
 drivers/net/ark/ark_ddm.c                   | 150 +++++
 drivers/net/ark/ark_ddm.h                   | 154 +++++
 drivers/net/ark/ark_debug.h                 |  72 ++
 drivers/net/ark/ark_ethdev.c                | 982 ++++++++++++++++++++++++++++
 drivers/net/ark/ark_ethdev.h                |  75 +++
 drivers/net/ark/ark_ethdev.o.pmd.c          |   2 +
 drivers/net/ark/ark_ethdev_rx.c             | 671 +++++++++++++++++++
 drivers/net/ark/ark_ethdev_tx.c             | 479 ++++++++++++++
 drivers/net/ark/ark_ext.h                   |  71 ++
 drivers/net/ark/ark_global.h                | 164 +++++
 drivers/net/ark/ark_mpu.c                   | 168 +++++
 drivers/net/ark/ark_mpu.h                   | 143 ++++
 drivers/net/ark/ark_pktchkr.c               | 445 +++++++++++++
 drivers/net/ark/ark_pktchkr.h               | 114 ++++
 drivers/net/ark/ark_pktdir.c                |  79 +++
 drivers/net/ark/ark_pktdir.h                |  68 ++
 drivers/net/ark/ark_pktgen.c                | 477 ++++++++++++++
 drivers/net/ark/ark_pktgen.h                | 106 +++
 drivers/net/ark/ark_rqp.c                   |  93 +++
 drivers/net/ark/ark_rqp.h                   |  75 +++
 drivers/net/ark/ark_udm.c                   | 221 +++++++
 drivers/net/ark/ark_udm.h                   | 175 +++++
 drivers/net/ark/rte_pmd_ark_version.map     |   4 +
 mk/rte.app.mk                               |   1 +
 33 files changed, 5337 insertions(+)
 create mode 100644 doc/guides/nics/ark.rst
 create mode 100644 doc/guides/nics/features/ark.ini
 create mode 100644 drivers/net/ark/Makefile
 create mode 100644 drivers/net/ark/ark_ddm.c
 create mode 100644 drivers/net/ark/ark_ddm.h
 create mode 100644 drivers/net/ark/ark_debug.h
 create mode 100644 drivers/net/ark/ark_ethdev.c
 create mode 100644 drivers/net/ark/ark_ethdev.h
 create mode 100644 drivers/net/ark/ark_ethdev.o.pmd.c
 create mode 100644 drivers/net/ark/ark_ethdev_rx.c
 create mode 100644 drivers/net/ark/ark_ethdev_tx.c
 create mode 100644 drivers/net/ark/ark_ext.h
 create mode 100644 drivers/net/ark/ark_global.h
 create mode 100644 drivers/net/ark/ark_mpu.c
 create mode 100644 drivers/net/ark/ark_mpu.h
 create mode 100644 drivers/net/ark/ark_pktchkr.c
 create mode 100644 drivers/net/ark/ark_pktchkr.h
 create mode 100644 drivers/net/ark/ark_pktdir.c
 create mode 100644 drivers/net/ark/ark_pktdir.h
 create mode 100644 drivers/net/ark/ark_pktgen.c
 create mode 100644 drivers/net/ark/ark_pktgen.h
 create mode 100644 drivers/net/ark/ark_rqp.c
 create mode 100644 drivers/net/ark/ark_rqp.h
 create mode 100644 drivers/net/ark/ark_udm.c
 create mode 100644 drivers/net/ark/ark_udm.h
 create mode 100644 drivers/net/ark/rte_pmd_ark_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 39bc78e..6f6136a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -280,6 +280,14 @@ M: Evgeny Schemeilin <evgenys@amazon.com>
 F: drivers/net/ena/
 F: doc/guides/nics/ena.rst
 
+Atomic Rules ark
+M: Shepard Siegel <shepard.siegel@atomicrules.com>
+M: Ed Czeck       <ed.czeck@atomicrules.com>
+M: John Miller    <john.miller@atomicrules.com>
+F: /drivers/net/ark/
+F: doc/guides/nics/ark.rst
+F: doc/guides/nics/features/ark.ini
+
 Broadcom bnxt
 M: Stephen Hurd <stephen.hurd@broadcom.com>
 M: Ajit Khaparde <ajit.khaparde@broadcom.com>
diff --git a/config/common_base b/config/common_base
index aeee13e..e64cd83 100644
--- a/config/common_base
+++ b/config/common_base
@@ -348,6 +348,16 @@ CONFIG_RTE_LIBRTE_QEDE_FW=""
 CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
 
 #
+# Compile ARK PMD
+#
+CONFIG_RTE_LIBRTE_ARK_PMD=y
+CONFIG_RTE_LIBRTE_ARK_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS=n
+CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE=n
+
+
+#
 # Compile the TAP PMD
 # It is enabled by default for Linux only.
 #
diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index d9bd2a8..6d2b5e0 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -61,6 +61,7 @@ CONFIG_RTE_SCHED_VECTOR=n
 
 # cannot use those on ARM
 CONFIG_RTE_KNI_KMOD=n
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_EM_PMD=n
 CONFIG_RTE_LIBRTE_IGB_PMD=n
 CONFIG_RTE_LIBRTE_CXGBE_PMD=n
diff --git a/config/defconfig_ppc_64-power8-linuxapp-gcc b/config/defconfig_ppc_64-power8-linuxapp-gcc
index 35f7fb6..89bc396 100644
--- a/config/defconfig_ppc_64-power8-linuxapp-gcc
+++ b/config/defconfig_ppc_64-power8-linuxapp-gcc
@@ -48,6 +48,7 @@ CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n
 
 # Note: Initially, all of the PMD drivers compilation are turned off on Power
 # Will turn on them only after the successful testing on Power
+CONFIG_RTE_LIBRTE_ARK_PMD=n
 CONFIG_RTE_LIBRTE_IXGBE_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
 CONFIG_RTE_LIBRTE_VIRTIO_PMD=y
diff --git a/doc/guides/nics/ark.rst b/doc/guides/nics/ark.rst
new file mode 100644
index 0000000..e1e1c5c
--- /dev/null
+++ b/doc/guides/nics/ark.rst
@@ -0,0 +1,238 @@
+.. BSD LICENSE
+
+    Copyright (c) 2015-2017 Atomic Rules LLC
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Atomic Rules LLC nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ARK Poll Mode Driver
+====================
+
+The ARK PMD is a DPDK poll-mode driver for the Atomic Rules Arkville
+(ARK) family of devices.
+
+More information can be found at the `Atomic Rules website
+<http://atomicrules.com>`_.
+
+Overview
+--------
+
+The Atomic Rules Arkville product is DPDK and AXI compliant product
+that marshals packets across a PCIe conduit between host DPDK mbufs and
+FPGA AXI streams.
+
+The ARK PMD, and the spirit of the overall Arkville product,
+has been to take the DPDK API/ABI as a fixed specification;
+then implement much of the business logic in FPGA RTL circuits.
+The approach of *working backwards* from the DPDK API/ABI and having
+the GPP host software *dictate*, while the FPGA hardware *copes*,
+results in significant performance gains over a naive implementation.
+
+While this document describes the ARK PMD software, it is helpful to
+understand what the FPGA hardware is and is not. The Arkville RTL
+component provides a single PCIe Physical Function (PF) supporting
+some number of RX/Ingress and TX/Egress Queues. The ARK PMD controls
+the Arkville core through a dedicated opaque Core BAR (CBAR).
+To allow users full freedom for their own FPGA application IP,
+an independent FPGA Application BAR (ABAR) is provided.
+
+One popular way to imagine Arkville's FPGA hardware aspect is as the
+FPGA PCIe-facing side of a so-called Smart NIC. The Arkville core does
+not contain any MACs, and is link-speed independent, as well as
+agnostic to the number of physical ports the application chooses to
+use. The ARK driver exposes the familiar PMD interface to allow packet
+movement to and from mbufs across multiple queues.
+
+However FPGA RTL applications could contain a universe of added
+functionality that an Arkville RTL core does not provide or can
+not anticipate. To allow for this expectation of user-defined
+innovation, the ARK PMD provides a dynamic mechanism of adding
+capabilities without having to modify the ARK PMD.
+
+The ARK PMD is intended to support all instances of the Arkville
+RTL Core, regardless of configuration, FPGA vendor, or target
+board. While specific capabilities such as number of physical
+hardware queue-pairs are negotiated; the driver is designed to
+remain constant over a broad and extendable feature set.
+
+Intentionally, Arkville by itself DOES NOT provide common NIC
+capabilities such as offload or receive-side scaling (RSS).
+These capabilities would be viewed as a gate-level "tax" on
+Green-box FPGA applications that do not require such function.
+Instead, they can be added as needed with essentially no
+overhead to the FPGA Application.
+
+Data Path Interface
+-------------------
+
+Ingress RX and Egress TX operation is by the nominal DPDK API .
+The driver supports single-port, multi-queue for both RX and TX.
+
+Refer to ``ark_ethdev.h`` for the list of supported methods to
+act upon RX and TX Queues.
+
+Configuration Information
+-------------------------
+
+**DPDK Configuration Parameters**
+
+  The following configuration options are available for the ARK PMD:
+
+   * **CONFIG_RTE_LIBRTE_ARK_PMD** (default y): Enables or disables inclusion
+     of the ARK PMD driver in the DPDK compilation.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_RX** (default n): Enables or disables debug
+     logging and internal checking of RX ingress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TX** (default n): Enables or disables debug
+     logging and internal checking of TX egress logic within the ARK PMD driver.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_STATS** (default n): Enables or disables debug
+     logging of detailed packet and performance statistics gathered in
+     the PMD and FPGA.
+
+   * **CONFIG_RTE_LIBRTE_ARK_DEBUG_TRACE** (default n): Enables or disables debug
+     logging of detailed PMD events and status.
+
+
+Building DPDK
+-------------
+
+See the :ref:`DPDK Getting Started Guide for Linux <linux_gsg>` for
+instructions on how to build DPDK.
+
+By default the ARK PMD library will be built into the DPDK library.
+
+For configuring and using UIO and VFIO frameworks, please also refer :ref:`the
+documentation that comes with DPDK suite <linux_gsg>`.
+
+Supported ARK RTL PCIe Instances
+--------------------------------
+
+ARK PMD supports the following Arkville RTL PCIe instances including:
+
+* ``1d6c:100d`` - AR-ARKA-FX0 [Arkville 32B DPDK Data Mover]
+* ``1d6c:100e`` - AR-ARKA-FX1 [Arkville 64B DPDK Data Mover]
+
+Supported Operating Systems
+---------------------------
+
+Any Linux distribution fulfilling the conditions described in ``System Requirements``
+section of :ref:`the DPDK documentation <linux_gsg>` or refer to *DPDK Release Notes*.
+
+Supported Features
+------------------
+
+* Dynamic ARK PMD extensions
+* Multiple receive and transmit queues
+* Jumbo frames up to 9K
+* Hardware Statistics
+
+Unsupported Features
+--------------------
+
+Features that may be part of, or become part of, the Arkville RTL IP that are
+not currently supported or exposed by the ARK PMD include:
+
+* PCIe SR-IOV Virtual Functions (VFs)
+* Arkville's Packet Generator Control and Status
+* Arkville's Packet Director Control and Status
+* Arkville's Packet Checker Control and Status
+* Arkville's Timebase Management
+
+Pre-Requisites
+--------------
+
+#. Prepare the system as recommended by DPDK suite.  This includes environment
+   variables, hugepages configuration, tool-chains and configuration
+
+#. Insert igb_uio kernel module using the command 'modprobe igb_uio'
+
+#. Bind the intended ARK device to igb_uio module
+
+At this point the system should be ready to run DPDK applications. Once the
+application runs to completion, the ARK PMD can be detached from igb_uio if necessary.
+
+Usage Example
+-------------
+
+This section demonstrates how to launch **testpmd** with Atomic Rules ARK
+devices managed by librte_pmd_ark.
+
+#. Load the kernel modules:
+
+   .. code-block:: console
+
+      modprobe uio
+      insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
+
+   .. note::
+
+      The ARK PMD driver depends upon the igb_uio user space I/O kernel module
+
+#. Mount and request huge pages:
+
+   .. code-block:: console
+
+      mount -t hugetlbfs nodev /mnt/huge
+      echo 256 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
+
+#. Bind UIO driver to ARK device at 0000:01:00.0 (using dpdk-devbind.py):
+
+   .. code-block:: console
+
+      ./usertools/dpdk-devbind.py --bind=igb_uio 0000:01:00.0
+
+   .. note::
+
+      The last argument to dpdk-devbind.py is the 4-tuple that indentifies a specific PCIe
+      device. You can use lspci -d 1d6c: to indentify all Atomic Rules devices in the system,
+      and thus determine the correct 4-tuple argument to dpdk-devbind.py
+
+#. Start testpmd with basic parameters:
+
+   .. code-block:: console
+
+      ./x86_64-native-linuxapp-gcc/app/testpmd -l 0-3 -n 4 -- -i
+
+   Example output:
+
+   .. code-block:: console
+
+      [...]
+      EAL: PCI device 0000:01:00.0 on NUMA socket -1
+      EAL:   probe driver: 1d6c:100e rte_ark_pmd
+      EAL:   PCI memory mapped at 0x7f9b6c400000
+      PMD: eth_ark_dev_init(): Initializing 0:2:0.1
+      ARKP PMD CommitID: 378f3a67
+      Configuring Port 0 (socket 0)
+      Port 0: DC:3C:F6:00:00:01
+      Checking link statuses...
+      Port 0 Link Up - speed 100000 Mbps - full-duplex
+      Done
+      testpmd>
+
diff --git a/doc/guides/nics/features/ark.ini b/doc/guides/nics/features/ark.ini
new file mode 100644
index 0000000..dc8a0e2
--- /dev/null
+++ b/doc/guides/nics/features/ark.ini
@@ -0,0 +1,15 @@
+;
+; Supported features of the 'ark' poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Queue start/stop     = Y
+Jumbo frame          = Y
+Scattered Rx         = Y
+Basic stats          = Y
+Stats per queue      = Y
+FW version           = Y
+Linux UIO            = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 87f9334..381d82c 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
     :numbered:
 
     overview
+    ark
     bnx2x
     bnxt
     cxgbe
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index a16f25e..ea9868b 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -32,6 +32,7 @@
 include $(RTE_SDK)/mk/rte.vars.mk
 
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += af_packet
+DIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark
 DIRS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += bnx2x
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += bonding
 DIRS-$(CONFIG_RTE_LIBRTE_CXGBE_PMD) += cxgbe
diff --git a/drivers/net/ark/Makefile b/drivers/net/ark/Makefile
new file mode 100644
index 0000000..5d70ef3
--- /dev/null
+++ b/drivers/net/ark/Makefile
@@ -0,0 +1,73 @@
+# BSD LICENSE
+#
+# Copyright (c) 2015-2017 Atomic Rules LLC
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of copyright holder nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_ark.a
+
+CFLAGS += -O3 -I./
+CFLAGS += $(WERROR_FLAGS)
+
+EXPORT_MAP := rte_pmd_ark_version.map
+
+LIBABIVER := 1
+
+#
+# all source are stored in SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD)
+#
+
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev_rx.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ethdev_tx.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_pktgen.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_pktchkr.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_pktdir.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_mpu.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_ddm.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_udm.c
+SRCS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += ark_rqp.c
+
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_kvargs
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/librte_mempool
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/libpthread
+DEPDIRS-$(CONFIG_RTE_LIBRTE_ARK_PMD) += lib/libdl
+
+
+include $(RTE_SDK)/mk/rte.lib.mk
+
diff --git a/drivers/net/ark/ark_ddm.c b/drivers/net/ark/ark_ddm.c
new file mode 100644
index 0000000..a8e20a2
--- /dev/null
+++ b/drivers/net/ark/ark_ddm.c
@@ -0,0 +1,150 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_debug.h"
+#include "ark_ddm.h"
+
+/* ************************************************************************* */
+int
+ark_ddm_verify(struct ark_ddm_t *ddm)
+{
+	if (sizeof(struct ark_ddm_t) != ARK_DDM_EXPECTED_SIZE) {
+	fprintf(stderr, "  DDM structure looks incorrect %#x vs %#lx\n",
+		ARK_DDM_EXPECTED_SIZE, sizeof(struct ark_ddm_t));
+	return -1;
+	}
+
+	if (ddm->cfg.const0 != ARK_DDM_CONST) {
+	fprintf(stderr, "  DDM module not found as expected 0x%08x\n",
+		ddm->cfg.const0);
+	return -1;
+	}
+	return 0;
+}
+
+void
+ark_ddm_start(struct ark_ddm_t *ddm)
+{
+	ddm->cfg.command = 1;
+}
+
+int
+ark_ddm_stop(struct ark_ddm_t *ddm, const int wait)
+{
+	int cnt = 0;
+
+	ddm->cfg.command = 2;
+	while (wait && (ddm->cfg.stopFlushed & 0x01) == 0) {
+	if (cnt++ > 1000)
+		return 1;
+
+	usleep(10);
+	}
+	return 0;
+}
+
+void
+ark_ddm_reset(struct ark_ddm_t *ddm)
+{
+	int status;
+
+	/* reset only works if ddm has stopped properly. */
+	status = ark_ddm_stop(ddm, 1);
+
+	if (status != 0) {
+	ARK_DEBUG_TRACE("ARKP: %s  stop failed  doing forced reset\n",
+		__func__);
+	ddm->cfg.command = 4;
+	usleep(10);
+	}
+	ddm->cfg.command = 3;
+}
+
+void
+ark_ddm_setup(struct ark_ddm_t *ddm, phys_addr_t consAddr, uint32_t interval)
+{
+	ddm->setup.consWriteIndexAddr = consAddr;
+	ddm->setup.writeIndexInterval = interval / 4;	/* 4 ns period */
+}
+
+void
+ark_ddm_stats_reset(struct ark_ddm_t *ddm)
+{
+	ddm->cfg.tlpStatsClear = 1;
+}
+
+void
+ark_ddm_dump(struct ark_ddm_t *ddm, const char *msg)
+{
+	ARK_DEBUG_TRACE("ARKP DDM Dump: %s Stopped: %d\n", msg,
+	ark_ddm_is_stopped(ddm)
+	);
+}
+
+void
+ark_ddm_dump_stats(struct ark_ddm_t *ddm, const char *msg)
+{
+	struct ark_ddm_stats_t *stats = &ddm->stats;
+
+	ARK_DEBUG_STATS("ARKP DDM Stats: %s"
+					FMT_SU64 FMT_SU64 FMT_SU64
+					"\n", msg,
+	"Bytes:", stats->txByteCount,
+	"Packets:", stats->txPktCount, "MBufs", stats->txMbufCount);
+}
+
+int
+ark_ddm_is_stopped(struct ark_ddm_t *ddm)
+{
+	return (ddm->cfg.stopFlushed & 0x01) != 0;
+}
+
+uint64_t
+ark_ddm_queue_byte_count(struct ark_ddm_t *ddm)
+{
+	return ddm->queue_stats.byteCount;
+}
+
+uint64_t
+ark_ddm_queue_pkt_count(struct ark_ddm_t *ddm)
+{
+	return ddm->queue_stats.pktCount;
+}
+
+void
+ark_ddm_queue_reset_stats(struct ark_ddm_t *ddm)
+{
+	ddm->queue_stats.byteCount = 1;
+}
diff --git a/drivers/net/ark/ark_ddm.h b/drivers/net/ark/ark_ddm.h
new file mode 100644
index 0000000..311dbc5
--- /dev/null
+++ b/drivers/net/ark/ark_ddm.h
@@ -0,0 +1,154 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_DDM_H_
+#define _ARK_DDM_H_
+
+#include <stdint.h>
+
+#include <rte_memory.h>
+
+/* DDM core hardware structures */
+#define ARK_DDM_CFG 0x0000
+#define ARK_DDM_CONST 0xfacecafe
+struct ark_ddm_cfg_t {
+	uint32_t r0;
+	volatile uint32_t tlpStatsClear;
+	uint32_t const0;
+	volatile uint32_t tag_max;
+	volatile uint32_t command;
+	volatile uint32_t stopFlushed;
+};
+
+#define ARK_DDM_STATS 0x0020
+struct ark_ddm_stats_t {
+	volatile uint64_t txByteCount;
+	volatile uint64_t txPktCount;
+	volatile uint64_t txMbufCount;
+};
+
+#define ARK_DDM_MRDQ 0x0040
+struct ark_ddm_mrdq_t {
+	volatile uint32_t mrd_q1;
+	volatile uint32_t mrd_q2;
+	volatile uint32_t mrd_q3;
+	volatile uint32_t mrd_q4;
+	volatile uint32_t mrd_full;
+};
+
+#define ARK_DDM_CPLDQ 0x0068
+struct ark_ddm_cpldq_t {
+	volatile uint32_t cpld_q1;
+	volatile uint32_t cpld_q2;
+	volatile uint32_t cpld_q3;
+	volatile uint32_t cpld_q4;
+	volatile uint32_t cpld_full;
+};
+
+#define ARK_DDM_MRD_PS 0x0090
+struct ark_ddm_mrd_ps_t {
+	volatile uint32_t mrd_ps_min;
+	volatile uint32_t mrd_ps_max;
+	volatile uint32_t mrd_full_ps_min;
+	volatile uint32_t mrd_full_ps_max;
+	volatile uint32_t mrd_dw_ps_min;
+	volatile uint32_t mrd_dw_ps_max;
+};
+
+#define ARK_DDM_QUEUE_STATS 0x00a8
+struct ark_ddm_qstats_t {
+	volatile uint64_t byteCount;
+	volatile uint64_t pktCount;
+	volatile uint64_t mbufCount;
+};
+
+#define ARK_DDM_CPLD_PS 0x00c0
+struct ark_ddm_cpld_ps_t {
+	volatile uint32_t cpld_ps_min;
+	volatile uint32_t cpld_ps_max;
+	volatile uint32_t cpld_full_ps_min;
+	volatile uint32_t cpld_full_ps_max;
+	volatile uint32_t cpld_dw_ps_min;
+	volatile uint32_t cpld_dw_ps_max;
+};
+
+#define ARK_DDM_SETUP  0x00e0
+struct ark_ddm_setup_t {
+	phys_addr_t consWriteIndexAddr;
+	uint32_t writeIndexInterval;	/* 4ns each */
+	volatile uint32_t consIndex;
+};
+
+/*  Consolidated structure */
+struct ark_ddm_t {
+	struct ark_ddm_cfg_t cfg;
+	uint8_t reserved0[(ARK_DDM_STATS - ARK_DDM_CFG) -
+					  sizeof(struct ark_ddm_cfg_t)];
+	struct ark_ddm_stats_t stats;
+	uint8_t reserved1[(ARK_DDM_MRDQ - ARK_DDM_STATS) -
+					  sizeof(struct ark_ddm_stats_t)];
+	struct ark_ddm_mrdq_t mrdq;
+	uint8_t reserved2[(ARK_DDM_CPLDQ - ARK_DDM_MRDQ) -
+					  sizeof(struct ark_ddm_mrdq_t)];
+	struct ark_ddm_cpldq_t cpldq;
+	uint8_t reserved3[(ARK_DDM_MRD_PS - ARK_DDM_CPLDQ) -
+					  sizeof(struct ark_ddm_cpldq_t)];
+	struct ark_ddm_mrd_ps_t mrd_ps;
+	struct ark_ddm_qstats_t queue_stats;
+	struct ark_ddm_cpld_ps_t cpld_ps;
+	uint8_t reserved5[(ARK_DDM_SETUP - ARK_DDM_CPLD_PS) -
+					  sizeof(struct ark_ddm_cpld_ps_t)];
+	struct ark_ddm_setup_t setup;
+	uint8_t reservedP[(256 - ARK_DDM_SETUP)
+					  - sizeof(struct ark_ddm_setup_t)];
+};
+
+#define ARK_DDM_EXPECTED_SIZE 256
+#define ARK_DDM_QOFFSET ARK_DDM_EXPECTED_SIZE
+
+/* DDM function prototype */
+int ark_ddm_verify(struct ark_ddm_t *ddm);
+void ark_ddm_start(struct ark_ddm_t *ddm);
+int ark_ddm_stop(struct ark_ddm_t *ddm, const int wait);
+void ark_ddm_reset(struct ark_ddm_t *ddm);
+void ark_ddm_stats_reset(struct ark_ddm_t *ddm);
+void ark_ddm_setup(struct ark_ddm_t *ddm, phys_addr_t consAddr,
+	uint32_t interval);
+void ark_ddm_dump_stats(struct ark_ddm_t *ddm, const char *msg);
+void ark_ddm_dump(struct ark_ddm_t *ddm, const char *msg);
+int ark_ddm_is_stopped(struct ark_ddm_t *ddm);
+uint64_t ark_ddm_queue_byte_count(struct ark_ddm_t *ddm);
+uint64_t ark_ddm_queue_pkt_count(struct ark_ddm_t *ddm);
+void ark_ddm_queue_reset_stats(struct ark_ddm_t *ddm);
+
+#endif
diff --git a/drivers/net/ark/ark_debug.h b/drivers/net/ark/ark_debug.h
new file mode 100644
index 0000000..d50557f
--- /dev/null
+++ b/drivers/net/ark/ark_debug.h
@@ -0,0 +1,72 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_DEBUG_H_
+#define _ARK_DEBUG_H_
+
+#include <rte_log.h>
+
+/* Format specifiers for string data pairs */
+#define FMT_SU32  "\n\t%-20s    %'20u"
+#define FMT_SU64  "\n\t%-20s    %'20lu"
+#define FMT_SPTR  "\n\t%-20s    %20p"
+
+#define ARK_TRACE_ON(fmt, ...) \
+  fprintf(stderr, fmt, ##__VA_ARGS__)
+
+#define ARK_TRACE_OFF(fmt, ...) \
+  do {if (0) fprintf(stderr, fmt, ##__VA_ARGS__); } while (0)
+
+/* Debug macro for reporting Packet stats */
+#ifdef RTE_LIBRTE_ARK_DEBUG_STATS
+#define ARK_DEBUG_STATS(fmt, ...) ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_STATS(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+/* Debug macro for tracing full behavior*/
+#ifdef RTE_LIBRTE_ARK_DEBUG_TRACE
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_ON(fmt, ##__VA_ARGS__)
+#else
+#define ARK_DEBUG_TRACE(fmt, ...)  ARK_TRACE_OFF(fmt, ##__VA_ARGS__)
+#endif
+
+#ifdef ARK_STD_LOG
+#define PMD_DRV_LOG(level, fmt, args...) \
+  fprintf(stderr, fmt, args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt, __func__, ## args)
+#endif
+
+#endif
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
new file mode 100644
index 0000000..8479435
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.c
@@ -0,0 +1,982 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/stat.h>
+#include <dlfcn.h>
+
+#include <rte_kvargs.h>
+#include <rte_vdev.h>
+
+#include "ark_global.h"
+#include "ark_debug.h"
+#include "ark_ethdev.h"
+#include "ark_mpu.h"
+#include "ark_ddm.h"
+#include "ark_udm.h"
+#include "ark_rqp.h"
+#include "ark_pktdir.h"
+#include "ark_pktgen.h"
+#include "ark_pktchkr.h"
+
+/*  Internal prototypes */
+static int eth_ark_check_args(const char *params);
+static int eth_ark_dev_init(struct rte_eth_dev *dev);
+static int ark_config_device(struct rte_eth_dev *dev);
+static int eth_ark_dev_uninit(struct rte_eth_dev *eth_dev);
+static int eth_ark_dev_configure(struct rte_eth_dev *dev);
+static int eth_ark_dev_start(struct rte_eth_dev *dev);
+static void eth_ark_dev_stop(struct rte_eth_dev *dev);
+static void eth_ark_dev_close(struct rte_eth_dev *dev);
+static void eth_ark_dev_info_get(struct rte_eth_dev *dev,
+	struct rte_eth_dev_info *dev_info);
+static int eth_ark_dev_link_update(struct rte_eth_dev *dev,
+	int wait_to_complete);
+static int eth_ark_dev_set_link_up(struct rte_eth_dev *dev);
+static int eth_ark_dev_set_link_down(struct rte_eth_dev *dev);
+static void eth_ark_dev_stats_get(struct rte_eth_dev *dev,
+	struct rte_eth_stats *stats);
+static void eth_ark_dev_stats_reset(struct rte_eth_dev *dev);
+static void eth_ark_set_default_mac_addr(struct rte_eth_dev *dev,
+	struct ether_addr *mac_addr);
+static void eth_ark_macaddr_add(struct rte_eth_dev *dev,
+	struct ether_addr *mac_addr, uint32_t index, uint32_t pool);
+static void eth_ark_macaddr_remove(struct rte_eth_dev *dev, uint32_t index);
+
+#define ARK_DEV_TO_PCI(eth_dev) \
+	RTE_DEV_TO_PCI((eth_dev)->device)
+
+#define ARK_MAX_ARG_LEN 256
+static uint32_t pktDirV;
+static char pktGenArgs[ARK_MAX_ARG_LEN];
+static char pktChkrArgs[ARK_MAX_ARG_LEN];
+
+#define ARK_PKTGEN_ARG "PktGen"
+#define ARK_PKTCHKR_ARG "PktChkr"
+#define ARK_PKTDIR_ARG "PktDir"
+
+static const char *valid_arguments[] = {
+	ARK_PKTGEN_ARG,
+	ARK_PKTCHKR_ARG,
+	ARK_PKTDIR_ARG,
+	"iface",
+	NULL
+};
+
+#define MAX_ARK_PHYS 16
+struct ark_adapter *gark[MAX_ARK_PHYS];
+
+static const struct rte_pci_id pci_id_ark_map[] = {
+	{RTE_PCI_DEVICE(0x1d6c, 0x100d)},
+	{RTE_PCI_DEVICE(0x1d6c, 0x100e)},
+	{.vendor_id = 0, /* sentinel */ },
+};
+
+static struct eth_driver rte_ark_pmd = {
+	.pci_drv = {
+		.probe = rte_eth_dev_pci_probe,
+		.remove = rte_eth_dev_pci_remove,
+		.id_table = pci_id_ark_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC},
+	.eth_dev_init = eth_ark_dev_init,
+	.eth_dev_uninit = eth_ark_dev_uninit,
+	.dev_private_size = sizeof(struct ark_adapter),
+};
+
+static const struct eth_dev_ops ark_eth_dev_ops = {
+	.dev_configure = eth_ark_dev_configure,
+	.dev_start = eth_ark_dev_start,
+	.dev_stop = eth_ark_dev_stop,
+	.dev_close = eth_ark_dev_close,
+
+	.dev_infos_get = eth_ark_dev_info_get,
+
+	.rx_queue_setup = eth_ark_dev_rx_queue_setup,
+	.rx_queue_count = eth_ark_dev_rx_queue_count,
+	.tx_queue_setup = eth_ark_tx_queue_setup,
+
+	.link_update = eth_ark_dev_link_update,
+	.dev_set_link_up = eth_ark_dev_set_link_up,
+	.dev_set_link_down = eth_ark_dev_set_link_down,
+
+	.rx_queue_start = eth_ark_rx_start_queue,
+	.rx_queue_stop = eth_ark_rx_stop_queue,
+
+	.tx_queue_start = eth_ark_tx_queue_start,
+	.tx_queue_stop = eth_ark_tx_queue_stop,
+
+	.stats_get = eth_ark_dev_stats_get,
+	.stats_reset = eth_ark_dev_stats_reset,
+
+	.mac_addr_add = eth_ark_macaddr_add,
+	.mac_addr_remove = eth_ark_macaddr_remove,
+	.mac_addr_set = eth_ark_set_default_mac_addr,
+
+};
+
+int
+ark_get_port_id(struct rte_eth_dev *dev, struct ark_adapter *ark)
+{
+	int n = ark->num_ports;
+	int i;
+
+	/* There has to be a smarter way to do this ... */
+	for (i = 0; i < n; i++) {
+	if (ark->port[i].eth_dev == dev)
+		return i;
+	}
+	ARK_DEBUG_TRACE("ARK: Device is NOT associated with a port !!");
+	return -1;
+}
+
+static
+	int
+check_for_ext(struct rte_eth_dev *dev __rte_unused,
+	struct ark_adapter *ark __rte_unused)
+{
+	int found = 0;
+
+	/* Get the env */
+	const char *dllpath = getenv("ARK_EXT_PATH");
+
+	if (dllpath == NULL) {
+	ARK_DEBUG_TRACE("ARK EXT NO dll path specified \n");
+	return 0;
+	}
+	ARK_DEBUG_TRACE("ARK EXT found dll path at %s\n", dllpath);
+
+	/* Open and load the .so */
+	ark->dHandle = dlopen(dllpath, RTLD_LOCAL | RTLD_LAZY);
+	if (ark->dHandle == NULL) {
+	PMD_DRV_LOG(ERR, "Could not load user extension %s \n", dllpath);
+	} else {
+	ARK_DEBUG_TRACE("SUCCESS: loaded user extension %s\n", dllpath);
+	}
+
+	/* Get the entry points */
+	ark->user_ext.dev_init =
+	(void *(*)(struct rte_eth_dev *, void *, int)) dlsym(ark->dHandle,
+	"dev_init");
+	ARK_DEBUG_TRACE("device ext init pointer = %p\n", ark->user_ext.dev_init);
+	ark->user_ext.dev_get_port_count =
+	(int (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"dev_get_port_count");
+	ark->user_ext.dev_uninit =
+	(void (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"dev_uninit");
+	ark->user_ext.dev_configure =
+	(int (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"dev_configure");
+	ark->user_ext.dev_start =
+	(int (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"dev_start");
+	ark->user_ext.dev_stop =
+	(void (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"dev_stop");
+	ark->user_ext.dev_close =
+	(void (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"dev_close");
+	ark->user_ext.link_update =
+	(int (*)(struct rte_eth_dev *, int, void *)) dlsym(ark->dHandle,
+	"link_update");
+	ark->user_ext.dev_set_link_up =
+	(int (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"dev_set_link_up");
+	ark->user_ext.dev_set_link_down =
+	(int (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"dev_set_link_down");
+	ark->user_ext.stats_get =
+	(void (*)(struct rte_eth_dev *, struct rte_eth_stats *,
+		void *)) dlsym(ark->dHandle, "stats_get");
+	ark->user_ext.stats_reset =
+	(void (*)(struct rte_eth_dev *, void *)) dlsym(ark->dHandle,
+	"stats_reset");
+	ark->user_ext.mac_addr_add =
+	(void (*)(struct rte_eth_dev *, struct ether_addr *, uint32_t,
+		uint32_t, void *)) dlsym(ark->dHandle, "mac_addr_add");
+	ark->user_ext.mac_addr_remove =
+	(void (*)(struct rte_eth_dev *, uint32_t, void *)) dlsym(ark->dHandle,
+	"mac_addr_remove");
+	ark->user_ext.mac_addr_set =
+	(void (*)(struct rte_eth_dev *, struct ether_addr *,
+		void *)) dlsym(ark->dHandle, "mac_addr_set");
+
+	return found;
+}
+
+static int
+eth_ark_dev_init(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	struct rte_pci_device *pci_dev;
+	int ret;
+
+	ark->eth_dev = dev;
+
+	ARK_DEBUG_TRACE("eth_ark_dev_init(struct rte_eth_dev *dev)");
+	gark[0] = ark;
+
+	/* Check to see if there is an extension that we need to load */
+	check_for_ext(dev, ark);
+	pci_dev = ARK_DEV_TO_PCI(dev);
+	rte_eth_copy_pci_info(dev, pci_dev);
+
+	if (pci_dev->device.devargs)
+	eth_ark_check_args(pci_dev->device.devargs->args);
+	else
+	PMD_DRV_LOG(INFO, "No Device args found\n");
+
+	/* Use dummy function until setup */
+	dev->rx_pkt_burst = &eth_ark_recv_pkts_noop;
+	dev->tx_pkt_burst = &eth_ark_xmit_pkts_noop;
+
+	ark->bar0 = (uint8_t *) pci_dev->mem_resource[0].addr;
+	ark->Abar = (uint8_t *) pci_dev->mem_resource[2].addr;
+
+	SetPtr(bar0, ark, sysctrl, ARK_SYSCTRL_BASE);
+	SetPtr(bar0, ark, mpurx, ARK_MPURx_BASE);
+	SetPtr(bar0, ark, udm, ARK_UDM_BASE);
+	SetPtr(bar0, ark, mputx, ARK_MPUTx_BASE);
+	SetPtr(bar0, ark, ddm, ARK_DDM_BASE);
+	SetPtr(bar0, ark, cmac, ARK_CMAC_BASE);
+	SetPtr(bar0, ark, external, ARK_EXTERNAL_BASE);
+	SetPtr(bar0, ark, pktdir, ARK_PKTDIR_BASE);
+	SetPtr(bar0, ark, pktgen, ARK_PKTGEN_BASE);
+	SetPtr(bar0, ark, pktchkr, ARK_PKTCHKR_BASE);
+
+	ark->rqpacing = (struct ark_rqpace_t *) (ark->bar0 + ARK_RCPACING_BASE);
+	ark->started = 0;
+
+	ARK_DEBUG_TRACE("Sys Ctrl Const = 0x%x  DEV CommitID: %08x\n",
+	ark->sysctrl.t32[4], rte_be_to_cpu_32(ark->sysctrl.t32[0x20 / 4]));
+	PMD_DRV_LOG(INFO, "ARKP PMD  CommitID: %08x\n",
+	rte_be_to_cpu_32(ark->sysctrl.t32[0x20 / 4]));
+
+	/* If HW sanity test fails, return an error */
+	if (ark->sysctrl.t32[4] != 0xcafef00d) {
+	PMD_DRV_LOG(ERR,
+		"HW Sanity test has failed, expected constant 0x%x, read 0x%x (%s)\n",
+		0xcafef00d, ark->sysctrl.t32[4], __func__);
+	return -1;
+	} else {
+	PMD_DRV_LOG(INFO,
+		"HW Sanity test has PASSED, expected constant 0x%x, read 0x%x (%s)\n",
+		0xcafef00d, ark->sysctrl.t32[4], __func__);
+	}
+
+	/* We are a single function multi-port device. */
+	const unsigned int numa_node = rte_socket_id();
+	struct ether_addr adr;
+
+	ret = ark_config_device(dev);
+	dev->dev_ops = &ark_eth_dev_ops;
+
+	dev->data->mac_addrs = rte_zmalloc("ark", ETHER_ADDR_LEN, 0);
+	if (!dev->data->mac_addrs) {
+	PMD_DRV_LOG(ERR, "Failed to allocated memory for storing mac address");
+	}
+	ether_addr_copy((struct ether_addr *) &adr, &dev->data->mac_addrs[0]);
+
+	if (ark->user_ext.dev_init) {
+	ark->user_data = ark->user_ext.dev_init(dev, ark->Abar, 0);
+	if (!ark->user_data) {
+		PMD_DRV_LOG(INFO,
+		"Failed to initialize PMD extension !!, continuing without it\n");
+		memset(&ark->user_ext, 0, sizeof(struct ark_user_ext));
+		dlclose(ark->dHandle);
+	}
+	}
+
+	/* We will create additional devices based on the number of requested
+	 * ports */
+	int pc = 1;
+	int p;
+
+	if (ark->user_ext.dev_get_port_count) {
+	pc = ark->user_ext.dev_get_port_count(dev, ark->user_data);
+	ark->num_ports = pc;
+	} else {
+	ark->num_ports = 1;
+	}
+	for (p = 0; p < pc; p++) {
+	struct ark_port *port;
+
+	port = &ark->port[p];
+	struct rte_eth_dev_data *data = NULL;
+
+	port->id = p;
+
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	snprintf(name, sizeof(name), "arketh%d", dev->data->port_id + p);
+
+	if (p == 0) {
+		/* First port is already allocated by DPDK */
+		port->eth_dev = ark->eth_dev;
+		continue;
+	}
+
+	/* reserve an ethdev entry */
+	port->eth_dev = rte_eth_dev_allocate(name);
+	if (!port->eth_dev) {
+		PMD_DRV_LOG(ERR, "Could not allocate eth_dev for port %d\n", p);
+		goto error;
+	}
+
+	data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
+	if (!data) {
+		PMD_DRV_LOG(ERR, "Could not allocate eth_dev for port %d\n", p);
+		goto error;
+	}
+	data->port_id = ark->eth_dev->data->port_id + p;
+	port->eth_dev->data = data;
+	port->eth_dev->device = &pci_dev->device;
+	port->eth_dev->data->dev_private = ark;
+	port->eth_dev->driver = ark->eth_dev->driver;
+	port->eth_dev->dev_ops = ark->eth_dev->dev_ops;
+	port->eth_dev->tx_pkt_burst = ark->eth_dev->tx_pkt_burst;
+	port->eth_dev->rx_pkt_burst = ark->eth_dev->rx_pkt_burst;
+
+	rte_eth_copy_pci_info(port->eth_dev, pci_dev);
+
+	port->eth_dev->data->mac_addrs = rte_zmalloc(name, ETHER_ADDR_LEN, 0);
+	if (!port->eth_dev->data->mac_addrs) {
+		PMD_DRV_LOG(ERR, "Memory allocation for MAC failed !, exiting\n");
+		goto error;
+	}
+	ether_addr_copy((struct ether_addr *) &adr,
+		&port->eth_dev->data->mac_addrs[0]);
+
+	if (ark->user_ext.dev_init) {
+		ark->user_data = ark->user_ext.dev_init(dev, ark->Abar, p);
+	}
+	}
+
+	return ret;
+
+error:
+	if (dev->data->mac_addrs)
+	rte_free(dev->data->mac_addrs);
+
+	for (p = 0; p < pc; p++) {
+	if (ark->port[p].eth_dev->data)
+		rte_free(ark->port[p].eth_dev->data);
+	if (ark->port[p].eth_dev->data->mac_addrs)
+		rte_free(ark->port[p].eth_dev->data->mac_addrs);
+	}
+
+	return -1;
+
+}
+
+/* Initial device configuration when device is opened
+ setup the DDM, and UDM
+ Called once per PCIE device
+*/
+static int
+ark_config_device(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	uint16_t numQ, i;
+	struct ark_mpu_t *mpu;
+
+	/* Make sure that the packet director, generator and checker are in a
+	 * known state */
+	ark->start_pg = 0;
+	ark->pg = ark_pmd_pktgen_init(ark->pktgen.v, 0, 1);
+	ark_pmd_pktgen_reset(ark->pg);
+	ark->pc = ark_pmd_pktchkr_init(ark->pktchkr.v, 0, 1);
+	ark_pmd_pktchkr_stop(ark->pc);
+	ark->pd = ark_pmd_pktdir_init(ark->pktdir.v);
+
+	/* Verify HW */
+	if (ark_udm_verify(ark->udm.v)) {
+	return -1;
+	}
+	if (ark_ddm_verify(ark->ddm.v)) {
+	return -1;
+	}
+
+	/* UDM */
+	if (ark_udm_reset(ark->udm.v)) {
+	PMD_DRV_LOG(ERR, "Unable to stop and reset UDM \n");
+	return -1;
+	}
+	/* Keep in reset until the MPU are cleared */
+
+	/* MPU reset */
+	mpu = ark->mpurx.v;
+	numQ = ark_api_num_queues(mpu);
+	ark->rxQueues = numQ;
+	for (i = 0; i < numQ; i++) {
+	ark_mpu_reset(mpu);
+	mpu = RTE_PTR_ADD(mpu, ARK_MPU_QOFFSET);
+	}
+
+	ark_udm_stop(ark->udm.v, 0);
+	ark_udm_configure(ark->udm.v, RTE_PKTMBUF_HEADROOM,
+	RTE_MBUF_DEFAULT_DATAROOM, ARK_RX_WRITE_TIME_NS);
+	ark_udm_stats_reset(ark->udm.v);
+	ark_udm_stop(ark->udm.v, 0);
+
+	/* TX -- DDM */
+	if (ark_ddm_stop(ark->ddm.v, 1)) {
+	PMD_DRV_LOG(ERR, "Unable to stop DDM \n");
+	};
+
+	mpu = ark->mputx.v;
+	numQ = ark_api_num_queues(mpu);
+	ark->txQueues = numQ;
+	for (i = 0; i < numQ; i++) {
+	ark_mpu_reset(mpu);
+	mpu = RTE_PTR_ADD(mpu, ARK_MPU_QOFFSET);
+	}
+
+	ark_ddm_reset(ark->ddm.v);
+	ark_ddm_stats_reset(ark->ddm.v);
+	/* ark_ddm_dump(ark->ddm.v, "Config"); */
+	/* ark_ddm_dump_stats(ark->ddm.v, "Config"); */
+
+	/* MPU reset */
+	ark_ddm_stop(ark->ddm.v, 0);
+	ark_rqp_stats_reset(ark->rqpacing);
+
+	return 0;
+}
+
+static int
+eth_ark_dev_uninit(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+	return 0;
+
+	if (ark->user_ext.dev_uninit) {
+	ark->user_ext.dev_uninit(dev, ark->user_data);
+	}
+
+	ark_pmd_pktgen_uninit(ark->pg);
+	ark_pmd_pktchkr_uninit(ark->pc);
+
+	dev->dev_ops = NULL;
+	dev->rx_pkt_burst = NULL;
+	dev->tx_pkt_burst = NULL;
+	if (dev->data->mac_addrs)
+	rte_free(dev->data->mac_addrs);
+	if (dev->data)
+	rte_free(dev->data);
+
+	return 0;
+}
+
+static int
+eth_ark_dev_configure(struct rte_eth_dev *dev __rte_unused)
+{
+	ARK_DEBUG_TRACE
+	("ARKP: In eth_ark_dev_configure(struct rte_eth_dev *dev)\n");
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	eth_ark_dev_set_link_up(dev);
+	if (ark->user_ext.dev_configure) {
+	return ark->user_ext.dev_configure(dev, ark->user_data);
+	}
+	return 0;
+}
+
+static void *
+delay_pg_start(void *arg)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) arg;
+
+	/* This function is used exclusively for regression testing, We perform a
+	 * blind sleep here to ensure that the external test application has time
+	 * to setup the test before we generate packets */
+	usleep(100000);
+	ark_pmd_pktgen_run(ark->pg);
+	return NULL;
+}
+
+static int
+eth_ark_dev_start(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	int i;
+
+	ARK_DEBUG_TRACE("ARKP: In eth_ark_dev_start\n");
+
+	/* RX Side */
+	/* start UDM */
+	ark_udm_start(ark->udm.v);
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+	eth_ark_rx_start_queue(dev, i);
+	}
+
+	/* TX Side */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+	eth_ark_tx_queue_start(dev, i);
+	}
+
+	/* start DDM */
+	ark_ddm_start(ark->ddm.v);
+
+	ark->started = 1;
+	/* set xmit and receive function */
+	dev->rx_pkt_burst = &eth_ark_recv_pkts;
+	dev->tx_pkt_burst = &eth_ark_xmit_pkts;
+
+	if (ark->start_pg) {
+	ark_pmd_pktchkr_run(ark->pc);
+	}
+
+	if (ark->start_pg && (ark_get_port_id(dev, ark) == 0)) {
+	pthread_t thread;
+
+	/* TODO: add comment here */
+	pthread_create(&thread, NULL, delay_pg_start, ark);
+	}
+
+	if (ark->user_ext.dev_start) {
+	ark->user_ext.dev_start(dev, ark->user_data);
+	}
+
+	return 0;
+}
+
+static void
+eth_ark_dev_stop(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+	int status;
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	struct ark_mpu_t *mpu;
+
+	ARK_DEBUG_TRACE("ARKP: In eth_ark_dev_stop\n");
+
+	if (ark->started == 0)
+	return;
+	ark->started = 0;
+
+	/* Stop the extension first */
+	if (ark->user_ext.dev_stop) {
+	ark->user_ext.dev_stop(dev, ark->user_data);
+	}
+
+	/* Stop the packet generator */
+	if (ark->start_pg) {
+	ark_pmd_pktgen_pause(ark->pg);
+	}
+
+	dev->rx_pkt_burst = &eth_ark_recv_pkts_noop;
+	dev->tx_pkt_burst = &eth_ark_xmit_pkts_noop;
+
+	/* STOP TX Side */
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+	status = eth_ark_tx_queue_stop(dev, i);
+	if (status != 0) {
+		uint8_t port = dev->data->port_id;
+
+		fprintf(stderr, "ARKP tx_queue stop anomaly port %u, queue %u\n",
+		port, i);
+	}
+	}
+
+	/* Stop DDM */
+	/* Wait up to 0.1 second.  each stop is upto 1000 * 10 useconds */
+	for (i = 0; i < 10; i++) {
+	status = ark_ddm_stop(ark->ddm.v, 1);
+	if (status == 0)
+		break;
+	}
+	if (status || i != 0) {
+	PMD_DRV_LOG(ERR, "DDM stop anomaly. status: %d iter: %u. (%s)\n",
+		status, i, __func__);
+	ark_ddm_dump(ark->ddm.v, "Stop anomaly");
+
+	mpu = ark->mputx.v;
+	for (i = 0; i < ark->txQueues; i++) {
+		ark_mpu_dump(mpu, "DDM failure dump", i);
+		mpu = RTE_PTR_ADD(mpu, ARK_MPU_QOFFSET);
+	}
+	}
+
+	/* STOP RX Side */
+	/* Stop UDM */
+	for (i = 0; i < 10; i++) {
+	status = ark_udm_stop(ark->udm.v, 1);
+	if (status == 0)
+		break;
+	}
+	if (status || i != 0) {
+	PMD_DRV_LOG(ERR, "UDM stop anomaly. status %d iter: %u. (%s)\n",
+		status, i, __func__);
+	ark_udm_dump(ark->udm.v, "Stop anomaly");
+
+	mpu = ark->mpurx.v;
+	for (i = 0; i < ark->rxQueues; i++) {
+		ark_mpu_dump(mpu, "UDM Stop anomaly", i);
+		mpu = RTE_PTR_ADD(mpu, ARK_MPU_QOFFSET);
+	}
+	}
+
+	ark_udm_dump_stats(ark->udm.v, "Post stop");
+	ark_udm_dump_perf(ark->udm.v, "Post stop");
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+	eth_ark_rx_dump_queue(dev, i, __func__);
+	}
+
+	/* Stop the packet checker if it is running */
+	if (ark->start_pg) {
+	ark_pmd_pktchkr_dump_stats(ark->pc);
+	ark_pmd_pktchkr_stop(ark->pc);
+	}
+}
+
+static void
+eth_ark_dev_close(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	uint16_t i;
+
+	if (ark->user_ext.dev_close) {
+	ark->user_ext.dev_close(dev, ark->user_data);
+	}
+
+	eth_ark_dev_stop(dev);
+	eth_ark_udm_force_close(dev);
+
+	/* TODO This should only be called once for the device during shutdown */
+	ark_rqp_dump(ark->rqpacing);
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+	eth_ark_tx_queue_release(dev->data->tx_queues[i]);
+	dev->data->tx_queues[i] = 0;
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+	eth_ark_dev_rx_queue_release(dev->data->rx_queues[i]);
+	dev->data->rx_queues[i] = 0;
+	}
+
+}
+
+static void
+eth_ark_dev_info_get(struct rte_eth_dev *dev,
+	struct rte_eth_dev_info *dev_info)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	struct ark_mpu_t *tx_mpu = RTE_PTR_ADD(ark->bar0, ARK_MPUTx_BASE);
+	struct ark_mpu_t *rx_mpu = RTE_PTR_ADD(ark->bar0, ARK_MPURx_BASE);
+
+	uint16_t ports = ark->num_ports;
+
+	/* device specific configuration */
+	memset(dev_info, 0, sizeof(*dev_info));
+
+	dev_info->max_rx_queues = ark_api_num_queues_per_port(rx_mpu, ports);
+	dev_info->max_tx_queues = ark_api_num_queues_per_port(tx_mpu, ports);
+	dev_info->max_mac_addrs = 0;
+	dev_info->if_index = 0;
+	dev_info->max_rx_pktlen = (16 * 1024) - 128;
+	dev_info->min_rx_bufsize = 1024;
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
+	.nb_max = 4096 * 4,
+	.nb_min = 512,	/* HW Q size for RX */
+	.nb_align = 2,};
+
+	dev_info->tx_desc_lim = (struct rte_eth_desc_lim) {
+	.nb_max = 4096 * 4,
+	.nb_min = 256,	/* HW Q size for TX */
+	.nb_align = 2,};
+
+	dev_info->rx_offload_capa = 0;
+	dev_info->tx_offload_capa = 0;
+
+	/* ARK PMD supports all line rates, how do we indicate that here ?? */
+	dev_info->speed_capa =
+	ETH_LINK_SPEED_1G | ETH_LINK_SPEED_10G | ETH_LINK_SPEED_25G |
+	ETH_LINK_SPEED_40G | ETH_LINK_SPEED_50G | ETH_LINK_SPEED_100G;
+	dev_info->pci_dev = ARK_DEV_TO_PCI(dev);
+	dev_info->driver_name = dev->data->drv_name;
+
+}
+
+static int
+eth_ark_dev_link_update(struct rte_eth_dev *dev, int wait_to_complete)
+{
+	ARK_DEBUG_TRACE("ARKP: link status = %d\n",
+	dev->data->dev_link.link_status);
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	if (ark->user_ext.link_update) {
+	return ark->user_ext.link_update(dev, wait_to_complete,
+		ark->user_data);
+	}
+	return 0;
+}
+
+static int
+eth_ark_dev_set_link_up(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = 1;
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	if (ark->user_ext.dev_set_link_up) {
+	return ark->user_ext.dev_set_link_up(dev, ark->user_data);
+	}
+	return 0;
+}
+
+static int
+eth_ark_dev_set_link_down(struct rte_eth_dev *dev)
+{
+	dev->data->dev_link.link_status = 0;
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	if (ark->user_ext.dev_set_link_down) {
+	return ark->user_ext.dev_set_link_down(dev, ark->user_data);
+	}
+	return 0;
+}
+
+static void
+eth_ark_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+	uint16_t i;
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	stats->ipackets = 0;
+	stats->ibytes = 0;
+	stats->opackets = 0;
+	stats->obytes = 0;
+	stats->imissed = 0;
+	stats->oerrors = 0;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+	eth_tx_queue_stats_get(dev->data->tx_queues[i], stats);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+	eth_rx_queue_stats_get(dev->data->rx_queues[i], stats);
+	}
+
+	if (ark->user_ext.stats_get) {
+	ark->user_ext.stats_get(dev, stats, ark->user_data);
+	}
+
+}
+
+static void
+eth_ark_dev_stats_reset(struct rte_eth_dev *dev)
+{
+	uint16_t i;
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	for (i = 0; i < dev->data->nb_tx_queues; i++) {
+	eth_tx_queue_stats_reset(dev->data->rx_queues[i]);
+	}
+
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+	eth_rx_queue_stats_reset(dev->data->rx_queues[i]);
+	}
+
+	if (ark->user_ext.stats_reset) {
+	ark->user_ext.stats_reset(dev, ark->user_data);
+	}
+
+}
+
+static void
+eth_ark_macaddr_add(struct rte_eth_dev *dev,
+	struct ether_addr *mac_addr, uint32_t index, uint32_t pool)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	if (ark->user_ext.mac_addr_add) {
+	ark->user_ext.mac_addr_add(dev, mac_addr, index, pool, ark->user_data);
+	}
+}
+
+static void
+eth_ark_macaddr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	if (ark->user_ext.mac_addr_remove) {
+	ark->user_ext.mac_addr_remove(dev, index, ark->user_data);
+	}
+}
+
+static void
+eth_ark_set_default_mac_addr(struct rte_eth_dev *dev,
+	struct ether_addr *mac_addr)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+
+	if (ark->user_ext.mac_addr_set) {
+	ark->user_ext.mac_addr_set(dev, mac_addr, ark->user_data);
+	}
+}
+
+static inline int
+process_pktdir_arg(const char *key, const char *value,
+	void *extra_args __rte_unused)
+{
+	ARK_DEBUG_TRACE("**** IN process_pktdir_arg, key = %s, value = %s\n", key,
+	value);
+	pktDirV = strtol(value, NULL, 16);
+	ARK_DEBUG_TRACE("pktDirV = 0x%x\n", pktDirV);
+	return 0;
+}
+
+static inline int
+process_file_args(const char *key, const char *value, void *extra_args)
+{
+	ARK_DEBUG_TRACE("**** IN process_pktgen_arg, key = %s, value = %s\n", key,
+	value);
+	char *args = (char *) extra_args;
+
+	/* Open the configuration file */
+	FILE *file = fopen(value, "r");
+	char line[256];
+	int first = 1;
+
+	while (fgets(line, sizeof(line), file)) {
+	/* ARK_DEBUG_TRACE("%s\n", line); */
+	if (first) {
+		strncpy(args, line, ARK_MAX_ARG_LEN);
+		first = 0;
+	} else {
+		strncat(args, line, ARK_MAX_ARG_LEN);
+	}
+	}
+	ARK_DEBUG_TRACE("file = %s\n", args);
+	fclose(file);
+	return 0;
+}
+
+static int
+eth_ark_check_args(const char *params)
+{
+	struct rte_kvargs *kvlist;
+	unsigned k_idx;
+	struct rte_kvargs_pair *pair = NULL;
+
+	/* TODO: the index of gark[index] should be associated with phy dev map */
+	struct ark_adapter *ark = gark[0];
+
+	kvlist = rte_kvargs_parse(params, valid_arguments);
+	if (kvlist == NULL)
+	return 0;
+
+	pktGenArgs[0] = 0;
+	pktChkrArgs[0] = 0;
+
+	for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
+	pair = &kvlist->pairs[k_idx];
+	ARK_DEBUG_TRACE("**** Arg passed to PMD = %s:%s\n", pair->key,
+		pair->value);
+	}
+
+	if (rte_kvargs_process(kvlist, ARK_PKTDIR_ARG,
+		&process_pktdir_arg, NULL) != 0) {
+	PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTDIR_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist, ARK_PKTGEN_ARG,
+		&process_file_args, pktGenArgs) != 0) {
+	PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTGEN_ARG);
+	}
+
+	if (rte_kvargs_process(kvlist, ARK_PKTCHKR_ARG,
+		&process_file_args, pktChkrArgs) != 0) {
+	PMD_DRV_LOG(ERR, "Unable to parse arg %s\n", ARK_PKTCHKR_ARG);
+	}
+
+	/* Setup the packet director */
+	ark_pmd_pktdir_setup(ark->pd, pktDirV);
+	ARK_DEBUG_TRACE("INFO: packet director set to 0x%x\n", pktDirV);
+
+	/* Setup the packet generator */
+	if (pktGenArgs[0]) {
+	PMD_DRV_LOG(INFO, "Setting up the packet generator\n");
+	ark_pmd_pktgen_parse(pktGenArgs);
+	ark_pmd_pktgen_reset(ark->pg);
+	ark_pmd_pktgen_setup(ark->pg);
+	ark->start_pg = 1;
+	}
+
+	/* Setup the packet checker */
+	if (pktChkrArgs[0]) {
+	ark_pmd_pktchkr_parse(pktChkrArgs);
+	ark_pmd_pktchkr_setup(ark->pc);
+	}
+
+	return 1;
+}
+
+static int
+pmd_ark_probe(const char *name, const char *params)
+{
+	RTE_LOG(INFO, PMD, "Initializing pmd_ark for %s params = %s\n", name,
+	params);
+
+	/* Parse off the v index */
+
+	eth_ark_check_args(params);
+	return 0;
+}
+
+static int
+pmd_ark_remove(const char *name)
+{
+	RTE_LOG(INFO, PMD, "Closing ark %s ethdev on numa socket %u\n", name,
+	rte_socket_id());
+	return 1;
+}
+
+static struct rte_vdev_driver pmd_ark_drv = {
+	.probe = pmd_ark_probe,
+	.remove = pmd_ark_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_ark, pmd_ark_drv);
+RTE_PMD_REGISTER_ALIAS(net_ark, eth_ark);
+RTE_PMD_REGISTER_PCI(eth_ark, rte_ark_pmd.pci_drv);
+RTE_PMD_REGISTER_KMOD_DEP(net_ark, "* igb_uio | uio_pci_generic ");
+RTE_PMD_REGISTER_PCI_TABLE(eth_ark, pci_id_ark_map);
diff --git a/drivers/net/ark/ark_ethdev.h b/drivers/net/ark/ark_ethdev.h
new file mode 100644
index 0000000..9167181
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.h
@@ -0,0 +1,75 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_ETHDEV_H_
+#define _ARK_ETHDEV_H_
+
+int ark_get_port_id(struct rte_eth_dev *dev, struct ark_adapter *ark);
+
+/* RX functions */
+int eth_ark_dev_rx_queue_setup(struct rte_eth_dev *dev,
+	uint16_t queue_idx,
+	uint16_t nb_desc,
+	unsigned int socket_id,
+	const struct rte_eth_rxconf *rx_conf, struct rte_mempool *mp);
+uint32_t eth_ark_dev_rx_queue_count(struct rte_eth_dev *dev,
+	uint16_t rx_queue_id);
+int eth_ark_rx_stop_queue(struct rte_eth_dev *dev, uint16_t queue_id);
+int eth_ark_rx_start_queue(struct rte_eth_dev *dev, uint16_t queue_id);
+uint16_t eth_ark_recv_pkts_noop(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts);
+uint16_t eth_ark_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+	uint16_t nb_pkts);
+void eth_ark_dev_rx_queue_release(void *rx_queue);
+void eth_rx_queue_stats_get(void *vqueue, struct rte_eth_stats *stats);
+void eth_rx_queue_stats_reset(void *vqueue);
+void eth_ark_rx_dump_queue(struct rte_eth_dev *dev, uint16_t queue_id,
+	const char *msg);
+
+void eth_ark_udm_force_close(struct rte_eth_dev *dev);
+
+/* TX functions */
+uint16_t eth_ark_xmit_pkts_noop(void *txq, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+uint16_t eth_ark_xmit_pkts(void *txq, struct rte_mbuf **tx_pkts,
+	uint16_t nb_pkts);
+int eth_ark_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
+	uint16_t nb_desc, unsigned int socket_id,
+	const struct rte_eth_txconf *tx_conf);
+void eth_ark_tx_queue_release(void *tx_queue);
+int eth_ark_tx_queue_stop(struct rte_eth_dev *dev, uint16_t queue_id);
+int eth_ark_tx_queue_start(struct rte_eth_dev *dev, uint16_t queue_id);
+void eth_tx_queue_stats_get(void *vqueue, struct rte_eth_stats *stats);
+void eth_tx_queue_stats_reset(void *vqueue);
+
+#endif
diff --git a/drivers/net/ark/ark_ethdev.o.pmd.c b/drivers/net/ark/ark_ethdev.o.pmd.c
new file mode 100644
index 0000000..fb84d20
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev.o.pmd.c
@@ -0,0 +1,2 @@
+const char net_ark_pmd_info[] __attribute__((used)) = "PMD_INFO_STRING= {\"name\" : \"net_ark\", \"kmod\" : \"* igb_uio | uio_pci_generic \", \"pci_ids\" : []}";
+const char eth_ark_pmd_info[] __attribute__((used)) = "PMD_INFO_STRING= {\"name\" : \"eth_ark\", \"pci_ids\" : [[7532, 4109, 65535, 65535],[7532, 4110, 65535, 65535] ]}";
diff --git a/drivers/net/ark/ark_ethdev_rx.c b/drivers/net/ark/ark_ethdev_rx.c
new file mode 100644
index 0000000..3c3ba0f
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev_rx.c
@@ -0,0 +1,671 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_global.h"
+#include "ark_debug.h"
+#include "ark_ethdev.h"
+#include "ark_mpu.h"
+#include "ark_udm.h"
+
+#define ARK_RX_META_SIZE 32
+#define ARK_RX_META_OFFSET (RTE_PKTMBUF_HEADROOM - ARK_RX_META_SIZE)
+#define ARK_RX_MAX_NOCHAIN (RTE_MBUF_DEFAULT_DATAROOM)
+
+#ifdef RTE_LIBRTE_ARK_DEBUG_RX
+#define ARK_RX_DEBUG 1
+#define ARK_FULL_DEBUG 1
+#else
+#define ARK_RX_DEBUG 0
+#define ARK_FULL_DEBUG 0
+#endif
+
+/* Forward declarations */
+struct ark_rx_queue;
+struct ark_rx_meta;
+
+static void dump_mbuf_data(struct rte_mbuf *mbuf, uint16_t lo, uint16_t hi);
+static void ark_ethdev_rx_dump(const char *name, struct ark_rx_queue *queue);
+static uint32_t eth_ark_rx_jumbo(struct ark_rx_queue *queue,
+	struct ark_rx_meta *meta, struct rte_mbuf *mbuf0, uint32_t consIndex);
+static inline int eth_ark_rx_seed_mbufs(struct ark_rx_queue *queue);
+
+/* ************************************************************************* */
+struct ark_rx_queue {
+
+	/* array of mbufs to populate */
+	struct rte_mbuf **reserveQ;
+	/* array of physical addrresses of the mbuf data pointer */
+	/* This point is a virtual address */
+	phys_addr_t *paddressQ;
+	struct rte_mempool *mb_pool;
+
+	struct ark_udm_t *udm;
+	struct ark_mpu_t *mpu;
+
+	uint32_t queueSize;
+	uint32_t queueMask;
+
+	uint32_t seedIndex;		/* 1 set with an empty mbuf */
+	uint32_t consIndex;		/* 3 consumed by the driver */
+
+	/* The queue Id is used to identify the HW Q */
+	uint16_t phys_qid;
+
+	/* The queue Index is used within the dpdk device structures */
+	uint16_t queueIndex;
+
+	uint32_t pad1;
+
+	/* separate cache line */
+	/* second cache line - fields only used in slow path */
+	MARKER cacheline1 __rte_cache_min_aligned;
+
+	volatile uint32_t prodIndex;	/* 2 filled by the HW */
+
+} __rte_cache_aligned;
+
+/* ************************************************************************* */
+
+/* MATCHES struct in UDMDefines.bsv */
+
+/* TODO move to ark_udm.h */
+struct ark_rx_meta {
+	uint64_t timestamp;
+	uint64_t userData;
+	uint8_t port;
+	uint8_t dstQueue;
+	uint16_t pktLen;
+};
+
+/* ************************************************************************* */
+
+/* TODO  pick a better function name */
+static int
+eth_ark_rx_queue_setup(struct rte_eth_dev *dev,
+	struct ark_rx_queue *queue,
+	uint16_t rx_queue_id __rte_unused, uint16_t rx_queue_idx)
+{
+	phys_addr_t queueBase;
+	phys_addr_t physAddrQBase;
+	phys_addr_t physAddrProdIndex;
+
+	queueBase = rte_malloc_virt2phy(queue);
+	physAddrProdIndex = queueBase +
+		offsetof(struct ark_rx_queue, prodIndex);
+
+	physAddrQBase = rte_malloc_virt2phy(queue->paddressQ);
+
+	/* Verify HW */
+	if (ark_mpu_verify(queue->mpu, sizeof(phys_addr_t))) {
+	PMD_DRV_LOG(ERR, "ARKP: Illegal configuration rx queue\n");
+	return -1;
+	}
+
+	/* Stop and Reset and configure MPU */
+	ark_mpu_configure(queue->mpu, physAddrQBase, queue->queueSize, 0);
+
+	ark_udm_write_addr(queue->udm, physAddrProdIndex);
+
+	/* advance the valid pointer, but don't start until the queue starts */
+	ark_mpu_reset_stats(queue->mpu);
+
+	/* The seed is the producer index for the HW */
+	ark_mpu_set_producer(queue->mpu, queue->seedIndex);
+	dev->data->rx_queue_state[rx_queue_idx] = RTE_ETH_QUEUE_STATE_STOPPED;
+
+	return 0;
+}
+
+static inline void
+eth_ark_rx_update_consIndex(struct ark_rx_queue *queue, uint32_t consIndex)
+{
+	queue->consIndex = consIndex;
+	eth_ark_rx_seed_mbufs(queue);
+	ark_mpu_set_producer(queue->mpu, queue->seedIndex);
+}
+
+/* ************************************************************************* */
+int
+eth_ark_dev_rx_queue_setup(struct rte_eth_dev *dev,
+	uint16_t queue_idx,
+	uint16_t nb_desc,
+	unsigned int socket_id,
+	const struct rte_eth_rxconf *rx_conf, struct rte_mempool *mb_pool)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	static int warning1;		/* = 0 */
+
+	struct ark_rx_queue *queue;
+	uint32_t i;
+	int status;
+
+	int port = ark_get_port_id(dev, ark);
+	int qidx = port + queue_idx;	/* TODO FIXME */
+
+	// TODO: We may already be setup, check here if there is nothing to do
+	/* Free memory prior to re-allocation if needed */
+	if (dev->data->rx_queues[queue_idx] != NULL) {
+	// TODO: release any allocated queues
+	dev->data->rx_queues[queue_idx] = NULL;
+	}
+
+	if (rx_conf != NULL && warning1 == 0) {
+	warning1 = 1;
+	PMD_DRV_LOG(INFO,
+		"ARKP: Arkville PMD ignores  rte_eth_rxconf argument.\n");
+	}
+
+	if (RTE_PKTMBUF_HEADROOM < ARK_RX_META_SIZE) {
+	PMD_DRV_LOG(ERR,
+		"Error: DPDK Arkville requires head room > %d bytes (%s)\n",
+		ARK_RX_META_SIZE, __func__);
+	return -1;		/* ERROR CODE */
+	}
+
+	if (!rte_is_power_of_2(nb_desc)) {
+	PMD_DRV_LOG(ERR,
+		"DPDK Arkville configuration queue size must be power of two %u (%s)\n",
+		nb_desc, __func__);
+	return -1;		/* ERROR CODE */
+	}
+
+	/* Allocate queue struct */
+	queue =
+	rte_zmalloc_socket("ArkRXQueue", sizeof(struct ark_rx_queue), 64,
+	socket_id);
+	if (queue == 0) {
+	PMD_DRV_LOG(ERR, "Failed to allocate memory in %s\n", __func__);
+	return -ENOMEM;
+	}
+
+	/* NOTE zmalloc is used, no need to 0 indexes, etc. */
+	queue->mb_pool = mb_pool;
+	queue->phys_qid = qidx;
+	queue->queueIndex = queue_idx;
+	queue->queueSize = nb_desc;
+	queue->queueMask = nb_desc - 1;
+
+	queue->reserveQ =
+	rte_zmalloc_socket("ArkRXQueue mbuf",
+	nb_desc * sizeof(struct rte_mbuf *), 64, socket_id);
+	queue->paddressQ =
+	rte_zmalloc_socket("ArkRXQueue paddr", nb_desc * sizeof(phys_addr_t),
+	64, socket_id);
+	if (queue->reserveQ == 0 || queue->paddressQ == 0) {
+	PMD_DRV_LOG(ERR, "Failed to allocate queue memory in %s\n", __func__);
+	rte_free(queue->reserveQ);
+	rte_free(queue->paddressQ);
+	rte_free(queue);
+	return -ENOMEM;
+	}
+
+	dev->data->rx_queues[queue_idx] = queue;
+	queue->udm = RTE_PTR_ADD(ark->udm.v, qidx * ARK_UDM_QOFFSET);
+	queue->mpu = RTE_PTR_ADD(ark->mpurx.v, qidx * ARK_MPU_QOFFSET);
+
+	/* populate mbuf reserve */
+	status = eth_ark_rx_seed_mbufs(queue);
+
+	/* MPU Setup */
+	if (status == 0)
+		status = eth_ark_rx_queue_setup(dev, queue, qidx, queue_idx);
+
+	if (unlikely(status != 0)) {
+	struct rte_mbuf *mbuf;
+
+	PMD_DRV_LOG(ERR, "ARKP Failed to initialize RX queue %d %s\n", qidx,
+		__func__);
+	/* Free the mbufs allocated */
+	for (i = 0, mbuf = queue->reserveQ[0]; i < nb_desc; ++i, mbuf++) {
+		if (mbuf != 0)
+			rte_pktmbuf_free(mbuf);
+	}
+	rte_free(queue->reserveQ);
+	rte_free(queue->paddressQ);
+	rte_free(queue);
+	return -1;		/* ERROR CODE */
+	}
+
+	return 0;
+}
+
+/* ************************************************************************* */
+uint16_t
+eth_ark_recv_pkts_noop(void *rx_queue __rte_unused,
+	struct rte_mbuf **rx_pkts __rte_unused, uint16_t nb_pkts __rte_unused)
+{
+	return 0;
+}
+
+/* ************************************************************************* */
+uint16_t
+eth_ark_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct ark_rx_queue *queue;
+	register uint32_t consIndex, prodIndex;
+	uint16_t nb;
+	uint64_t rx_bytes = 0;
+	struct rte_mbuf *mbuf;
+	struct ark_rx_meta *meta;
+
+	queue = (struct ark_rx_queue *) rx_queue;
+	if (unlikely(queue == 0))
+	return 0;
+	if (unlikely(nb_pkts == 0))
+	return 0;
+	prodIndex = queue->prodIndex;
+	consIndex = queue->consIndex;
+	nb = 0;
+
+	while (prodIndex != consIndex) {
+		mbuf = queue->reserveQ[consIndex & queue->queueMask];
+		/* prefetch mbuf ? */
+		rte_mbuf_prefetch_part1(mbuf);
+		rte_mbuf_prefetch_part2(mbuf);
+
+		/* META DATA burried in buffer */
+		meta = RTE_PTR_ADD(mbuf->buf_addr, ARK_RX_META_OFFSET);
+
+		mbuf->port = meta->port;
+		mbuf->pkt_len = meta->pktLen;
+		mbuf->data_len = meta->pktLen;
+		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+		mbuf->udata64 = meta->userData;
+		if (ARK_RX_DEBUG) {	/* debug use */
+			if ((meta->pktLen > (1024 * 16)) ||
+				(meta->pktLen == 0)) {
+				PMD_DRV_LOG(INFO,
+						"ARKP RX: Bad Meta Q: %u cons: %u prod: %u\n",
+						queue->phys_qid,
+						consIndex,
+						queue->prodIndex);
+
+				PMD_DRV_LOG(INFO, "       :  cons: %u prod: %u seedIndex %u\n",
+						consIndex,
+						queue->prodIndex,
+						queue->seedIndex);
+
+				PMD_DRV_LOG(INFO, "       :  UDM prod: %u  len: %u\n",
+						queue->udm->rt_cfg.prodIdx,
+						meta->pktLen);
+				ark_mpu_dump(queue->mpu,
+							 "    ",
+							 queue->phys_qid);
+
+				dump_mbuf_data(mbuf, 0, 256);
+				/* its FUBAR so fix it */
+				mbuf->pkt_len = 63;
+				meta->pktLen = 63;
+			}
+			mbuf->seqn = consIndex;
+		}
+
+		rx_bytes += meta->pktLen;	/* TEMP stats */
+
+		if (unlikely(meta->pktLen > ARK_RX_MAX_NOCHAIN))
+			consIndex = eth_ark_rx_jumbo
+				(queue, meta, mbuf, consIndex + 1);
+		else
+			consIndex += 1;
+
+		rx_pkts[nb] = mbuf;
+		nb++;
+		if (nb >= nb_pkts)
+			break;
+	}
+
+	if (unlikely(nb != 0))
+		/* report next free to FPGA */
+		eth_ark_rx_update_consIndex(queue, consIndex);
+
+	return nb;
+}
+
+/* ************************************************************************* */
+static uint32_t
+eth_ark_rx_jumbo(struct ark_rx_queue *queue,
+	struct ark_rx_meta *meta, struct rte_mbuf *mbuf0, uint32_t consIndex)
+{
+	struct rte_mbuf *mbuf_prev;
+	struct rte_mbuf *mbuf;
+
+	uint16_t remaining;
+	uint16_t data_len;
+	uint8_t segments;
+
+	/* first buf populated by called */
+	mbuf_prev = mbuf0;
+	segments = 1;
+	data_len = RTE_MIN(meta->pktLen, RTE_MBUF_DEFAULT_DATAROOM);
+	remaining = meta->pktLen - data_len;
+	mbuf0->data_len = data_len;
+
+	/* TODO check that the data does not exceed prodIndex! */
+	while (remaining != 0) {
+		data_len =
+			RTE_MIN(remaining,
+					RTE_MBUF_DEFAULT_DATAROOM +
+					RTE_PKTMBUF_HEADROOM);
+
+		remaining -= data_len;
+		segments += 1;
+
+		mbuf = queue->reserveQ[consIndex & queue->queueMask];
+		mbuf_prev->next = mbuf;
+		mbuf_prev = mbuf;
+		mbuf->data_len = data_len;
+		mbuf->data_off = 0;
+		if (ARK_RX_DEBUG)
+			mbuf->seqn = consIndex;	/* for debug only */
+
+		consIndex += 1;
+	}
+
+	mbuf0->nb_segs = segments;
+	return consIndex;
+}
+
+/* Drain the internal queue allowing hw to clear out. */
+static void
+eth_ark_rx_queue_drain(struct ark_rx_queue *queue)
+{
+	register uint32_t consIndex;
+	struct rte_mbuf *mbuf;
+
+	consIndex = queue->consIndex;
+
+	/* NOT performance optimized, since this is a one-shot call */
+	while ((consIndex ^ queue->prodIndex) & queue->queueMask) {
+		mbuf = queue->reserveQ[consIndex & queue->queueMask];
+		rte_pktmbuf_free(mbuf);
+		consIndex++;
+		eth_ark_rx_update_consIndex(queue, consIndex);
+	}
+}
+
+uint32_t
+eth_ark_dev_rx_queue_count(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_rx_queue *queue;
+
+	queue = dev->data->rx_queues[queue_id];
+	return (queue->prodIndex - queue->consIndex);	/* mod arith */
+}
+
+/* ************************************************************************* */
+int
+eth_ark_rx_start_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_rx_queue *queue;
+
+	queue = dev->data->rx_queues[queue_id];
+	if (queue == 0)
+	return -1;
+
+	dev->data->rx_queue_state[queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
+
+	ark_mpu_set_producer(queue->mpu, queue->seedIndex);
+	ark_mpu_start(queue->mpu);
+
+	ark_udm_queue_enable(queue->udm, 1);
+
+	return 0;
+}
+
+/* ************************************************************************* */
+
+/* Queue can be restarted.   data remains
+ */
+int
+eth_ark_rx_stop_queue(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_rx_queue *queue;
+
+	queue = dev->data->rx_queues[queue_id];
+	if (queue == 0)
+	return -1;
+
+	ark_udm_queue_enable(queue->udm, 0);
+
+	dev->data->rx_queue_state[queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;
+
+	return 0;
+}
+
+/* ************************************************************************* */
+static inline int
+eth_ark_rx_seed_mbufs(struct ark_rx_queue *queue)
+{
+	uint32_t limit = queue->consIndex + queue->queueSize;
+	uint32_t seedIndex = queue->seedIndex;
+
+	uint32_t count = 0;
+	uint32_t seedM = queue->seedIndex & queue->queueMask;
+
+	uint32_t nb = limit - seedIndex;
+
+	/* Handle wrap around -- remainder is filled on the next call */
+	if (unlikely(seedM + nb > queue->queueSize))
+		nb = queue->queueSize - seedM;
+
+	struct rte_mbuf **mbufs = &queue->reserveQ[seedM];
+	int status = rte_pktmbuf_alloc_bulk(queue->mb_pool, mbufs, nb);
+
+	if (unlikely(status != 0))
+		return -1;
+
+	if (ARK_RX_DEBUG) {		/* DEBUG */
+		while (count != nb) {
+			struct rte_mbuf *mbuf_init =
+				queue->reserveQ[seedM + count];
+
+			memset(mbuf_init->buf_addr, -1, 512);
+			*((uint32_t *) mbuf_init->buf_addr) = seedIndex + count;
+			*(uint16_t *) RTE_PTR_ADD(mbuf_init->buf_addr, 4) =
+				queue->phys_qid;
+			count++;
+		}
+		count = 0;
+	}
+	/* DEBUG */
+	queue->seedIndex += nb;
+
+	/* Duff's device https://en.wikipedia.org/wiki/Duff's_device */
+	switch (nb % 4) {
+	case 0:
+	while (count != nb) {
+		queue->paddressQ[seedM++] = (*mbufs++)->buf_physaddr;
+		count++;
+		/* FALLTHROUGH */
+	case 3:
+		queue->paddressQ[seedM++] = (*mbufs++)->buf_physaddr;
+		count++;
+		/* FALLTHROUGH */
+	case 2:
+		queue->paddressQ[seedM++] = (*mbufs++)->buf_physaddr;
+		count++;
+		/* FALLTHROUGH */
+	case 1:
+		queue->paddressQ[seedM++] = (*mbufs++)->buf_physaddr;
+		count++;
+		/* FALLTHROUGH */
+
+	} /* while (count != nb) */
+	} /* switch */
+
+	return 0;
+}
+
+void
+eth_ark_rx_dump_queue(struct rte_eth_dev *dev, uint16_t queue_id,
+	const char *msg)
+{
+	struct ark_rx_queue *queue;
+
+	queue = dev->data->rx_queues[queue_id];
+
+	ark_ethdev_rx_dump(msg, queue);
+}
+
+/* ************************************************************************* */
+
+/* Call on device closed no user API, queue is stopped */
+void
+eth_ark_dev_rx_queue_release(void *vqueue)
+{
+	struct ark_rx_queue *queue;
+	uint32_t i;
+
+	queue = (struct ark_rx_queue *) vqueue;
+	if (queue == 0)
+		return;
+
+	ark_udm_queue_enable(queue->udm, 0);
+	/* Stop the MPU since pointer are going away */
+	ark_mpu_stop(queue->mpu);
+
+	/* Need to clear out mbufs here, dropping packets along the way */
+	eth_ark_rx_queue_drain(queue);
+
+	for (i = 0; i < queue->queueSize; ++i)
+		rte_pktmbuf_free(queue->reserveQ[i]);
+
+	rte_free(queue->reserveQ);
+	rte_free(queue->paddressQ);
+	rte_free(queue);
+}
+
+void
+eth_rx_queue_stats_get(void *vqueue, struct rte_eth_stats *stats)
+{
+	struct ark_rx_queue *queue;
+	struct ark_udm_t *udm;
+
+	queue = vqueue;
+	if (queue == 0)
+	return;
+	udm = queue->udm;
+
+	uint64_t ibytes = ark_udm_bytes(udm);
+	uint64_t ipackets = ark_udm_packets(udm);
+	uint64_t idropped = ark_udm_dropped(queue->udm);
+
+	stats->q_ipackets[queue->queueIndex] = ipackets;
+	stats->q_ibytes[queue->queueIndex] = ibytes;
+	stats->q_errors[queue->queueIndex] = idropped;
+	stats->ipackets += ipackets;
+	stats->ibytes += ibytes;
+	stats->imissed += idropped;
+}
+
+void
+eth_rx_queue_stats_reset(void *vqueue)
+{
+	struct ark_rx_queue *queue;
+
+	queue = vqueue;
+	if (queue == 0)
+		return;
+
+	ark_mpu_reset_stats(queue->mpu);
+	ark_udm_queue_stats_reset(queue->udm);
+}
+
+void
+eth_ark_udm_force_close(struct rte_eth_dev *dev)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	struct ark_rx_queue *queue;
+	uint32_t index;
+	uint16_t i;
+
+	if (!ark_udm_is_flushed(ark->udm.v)) {
+	/* restart the MPUs */
+	fprintf(stderr, "ARK: %s UDM not flushed\n", __func__);
+	for (i = 0; i < dev->data->nb_rx_queues; i++) {
+		queue = (struct ark_rx_queue *) dev->data->rx_queues[i];
+		if (queue == 0)
+		continue;
+
+		ark_mpu_start(queue->mpu);
+		/* Add some buffers */
+		index = 100000 + queue->seedIndex;
+		ark_mpu_set_producer(queue->mpu, index);
+	}
+	/* Wait to allow data to pass */
+	usleep(100);
+
+	ARK_DEBUG_TRACE("UDM forced flush attempt, stopped = %d\n",
+		ark_udm_is_flushed(ark->udm.v));
+	}
+	ark_udm_reset(ark->udm.v);
+
+}
+
+static void
+ark_ethdev_rx_dump(const char *name, struct ark_rx_queue *queue)
+{
+	if (queue == NULL)
+	return;
+	ARK_DEBUG_TRACE("RX QUEUE %d -- %s", queue->phys_qid, name);
+	ARK_DEBUG_TRACE(FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 "\n",
+	"queueSize", queue->queueSize,
+	"seedIndex", queue->seedIndex,
+	"prodIndex", queue->prodIndex, "consIndex", queue->consIndex);
+
+	ark_mpu_dump(queue->mpu, name, queue->phys_qid);
+	ark_mpu_dump_setup(queue->mpu, queue->phys_qid);
+	ark_udm_dump(queue->udm, name);
+	ark_udm_dump_setup(queue->udm, queue->phys_qid);
+
+}
+
+static void
+dump_mbuf_data(struct rte_mbuf *mbuf, uint16_t lo, uint16_t hi)
+{
+	uint16_t i, j;
+
+	fprintf(stderr, " MBUF: %p len %d, off: %d, seq: %u\n", mbuf,
+	mbuf->pkt_len, mbuf->data_off, mbuf->seqn);
+	for (i = lo; i < hi; i += 16) {
+		uint8_t *dp = RTE_PTR_ADD(mbuf->buf_addr, i);
+
+		fprintf(stderr, "  %6d:  ", i);
+		for (j = 0; j < 16; j++)
+			fprintf(stderr, " %02x", dp[j]);
+
+		fprintf(stderr, "\n");
+	}
+}
diff --git a/drivers/net/ark/ark_ethdev_tx.c b/drivers/net/ark/ark_ethdev_tx.c
new file mode 100644
index 0000000..457057e
--- /dev/null
+++ b/drivers/net/ark/ark_ethdev_tx.c
@@ -0,0 +1,479 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_global.h"
+#include "ark_mpu.h"
+#include "ark_ddm.h"
+#include "ark_ethdev.h"
+#include "ark_debug.h"
+
+#define ARK_TX_META_SIZE   32
+#define ARK_TX_META_OFFSET (RTE_PKTMBUF_HEADROOM - ARK_TX_META_SIZE)
+#define ARK_TX_MAX_NOCHAIN (RTE_MBUF_DEFAULT_DATAROOM)
+#define ARK_TX_PAD_TO_60   1
+
+#ifdef RTE_LIBRTE_ARK_DEBUG_TX
+#define ARK_TX_DEBUG       1
+#define ARK_TX_DEBUG_JUMBO 1
+#else
+#define ARK_TX_DEBUG       0
+#define ARK_TX_DEBUG_JUMBO 0
+#endif
+
+/* ************************************************************************* */
+
+/* struct fixed in FPGA -- 16 bytes */
+
+/* TODO move to ark_ddm.h */
+struct ark_tx_meta {
+	uint64_t physaddr;
+	uint32_t delta_ns;
+	uint16_t data_len;		/* of this MBUF */
+#define   ARK_DDM_EOP   0x01
+#define   ARK_DDM_SOP   0x02
+	uint8_t flags;		/* bit 0 indicates last mbuf in chain. */
+	uint8_t reserved[1];
+};
+
+/* ************************************************************************* */
+struct ark_tx_queue {
+
+	struct ark_tx_meta *metaQ;
+	struct rte_mbuf **bufs;
+
+	/* handles for hw objects */
+	struct ark_mpu_t *mpu;
+	struct ark_ddm_t *ddm;
+
+	/* Stats HW tracks bytes and packets, need to count send errors */
+	uint64_t tx_errors;
+
+	uint32_t queueSize;
+	uint32_t queueMask;
+
+	/* 3 indexs to the paired data rings. */
+	uint32_t prodIndex;		/* where to put the next one */
+	uint32_t freeIndex;		/* mbuf has been freed */
+
+	// The queue Id is used to identify the HW Q
+	uint16_t phys_qid;
+	/* The queue Index within the dpdk device structures */
+	uint16_t queueIndex;
+
+	uint32_t pad[1];
+
+	/* second cache line - fields only used in slow path */
+	MARKER cacheline1 __rte_cache_min_aligned;
+	uint32_t consIndex;		/* hw is done, can be freed */
+} __rte_cache_aligned;
+
+/* Forward declarations */
+static uint32_t eth_ark_tx_jumbo(struct ark_tx_queue *queue,
+	struct rte_mbuf *mbuf);
+static int eth_ark_tx_hw_queue_config(struct ark_tx_queue *queue);
+static void free_completed_tx(struct ark_tx_queue *queue);
+
+static inline void
+ark_tx_hw_queue_stop(struct ark_tx_queue *queue)
+{
+	ark_mpu_stop(queue->mpu);
+}
+
+/* ************************************************************************* */
+static inline void
+eth_ark_tx_meta_from_mbuf(struct ark_tx_meta *meta,
+	const struct rte_mbuf *mbuf, uint8_t flags)
+{
+	meta->physaddr = rte_mbuf_data_dma_addr(mbuf);
+	meta->delta_ns = 0;
+	meta->data_len = rte_pktmbuf_data_len(mbuf);
+	meta->flags = flags;
+}
+
+/* ************************************************************************* */
+uint16_t
+eth_ark_xmit_pkts_noop(void *vtxq __rte_unused,
+	struct rte_mbuf **tx_pkts __rte_unused, uint16_t nb_pkts __rte_unused)
+{
+	return 0;
+}
+
+/* ************************************************************************* */
+uint16_t
+eth_ark_xmit_pkts(void *vtxq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct ark_tx_queue *queue;
+	struct rte_mbuf *mbuf;
+	struct ark_tx_meta *meta;
+
+	uint32_t idx;
+	uint32_t prodIndexLimit;
+	int stat;
+	uint16_t nb;
+
+	queue = (struct ark_tx_queue *) vtxq;
+
+	/* free any packets after the HW is done with them */
+	free_completed_tx(queue);
+
+	prodIndexLimit = queue->queueSize + queue->freeIndex;
+
+	for (nb = 0;
+		 (nb < nb_pkts) && (queue->prodIndex != prodIndexLimit);
+		 ++nb) {
+		mbuf = tx_pkts[nb];
+
+		if (ARK_TX_PAD_TO_60) {
+			if (unlikely(rte_pktmbuf_pkt_len(mbuf) < 60)) {
+				/* this packet even if it is small can be split,
+				 * be sure to add to the end
+				 */
+				uint16_t toAdd = 60 - rte_pktmbuf_pkt_len(mbuf);
+				char *appended = rte_pktmbuf_append(mbuf, toAdd);
+
+				if (appended == 0) {
+					/* This packet is in error, we cannot send it so just
+					 * count it and delete it.
+					 */
+					queue->tx_errors += 1;
+					rte_pktmbuf_free(mbuf);
+					continue;
+				}
+				memset(appended, 0, toAdd);
+			}
+		}
+
+		if (unlikely(mbuf->nb_segs != 1)) {
+			stat = eth_ark_tx_jumbo(queue, mbuf);
+			if (unlikely(stat != 0))
+				break;		/* Queue is full */
+		} else {
+			idx = queue->prodIndex & queue->queueMask;
+			queue->bufs[idx] = mbuf;
+			meta = &queue->metaQ[idx];
+			eth_ark_tx_meta_from_mbuf(meta, mbuf,
+									  ARK_DDM_SOP | ARK_DDM_EOP);
+			queue->prodIndex++;
+		}
+	}
+
+	if (ARK_TX_DEBUG) {
+		if (nb != nb_pkts) {
+			PMD_DRV_LOG(ERR,
+						"ARKP TX: Failure to send: req: %u sent: %u prod: %u cons: %u free: %u\n",
+						nb_pkts, nb, queue->prodIndex, queue->consIndex,
+						queue->freeIndex);
+			ark_mpu_dump(queue->mpu, "TX Failure MPU: ", queue->phys_qid);
+		}
+	}
+
+	/* let fpga know producer index.  */
+	if (likely(nb != 0))
+		ark_mpu_set_producer(queue->mpu, queue->prodIndex);
+
+	return nb;
+}
+
+/* ************************************************************************* */
+static uint32_t
+eth_ark_tx_jumbo(struct ark_tx_queue *queue, struct rte_mbuf *mbuf)
+{
+	struct rte_mbuf *next;
+	struct ark_tx_meta *meta;
+	uint32_t freeQueueSpace;
+	uint32_t idx;
+	uint8_t flags = ARK_DDM_SOP;
+
+	freeQueueSpace = queue->queueMask - (queue->prodIndex - queue->freeIndex);
+	if (unlikely(freeQueueSpace < mbuf->nb_segs)) {
+	return -1;
+	}
+
+	if (ARK_TX_DEBUG_JUMBO) {
+	PMD_DRV_LOG(ERR,
+		"ARKP  JUMBO TX len: %u segs: %u prod: %u cons: %u free: %u freeSpace: %u\n",
+		mbuf->pkt_len, mbuf->nb_segs, queue->prodIndex, queue->consIndex,
+		queue->freeIndex, freeQueueSpace);
+	}
+
+	while (mbuf != NULL) {
+	next = mbuf->next;
+
+	idx = queue->prodIndex & queue->queueMask;
+	queue->bufs[idx] = mbuf;
+	meta = &queue->metaQ[idx];
+
+	flags |= (next == NULL) ? ARK_DDM_EOP : 0;
+	eth_ark_tx_meta_from_mbuf(meta, mbuf, flags);
+	queue->prodIndex++;
+
+	flags &= ~ARK_DDM_SOP;	/* drop SOP flags */
+	mbuf = next;
+	}
+
+	return 0;
+}
+
+/* ************************************************************************* */
+int
+eth_ark_tx_queue_setup(struct rte_eth_dev *dev,
+	uint16_t queue_idx,
+	uint16_t nb_desc,
+	unsigned int socket_id, const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+	struct ark_adapter *ark = (struct ark_adapter *) dev->data->dev_private;
+	struct ark_tx_queue *queue;
+	int status;
+
+	/* TODO: divide the Q's evenly with the Vports */
+	int port = ark_get_port_id(dev, ark);
+	int qidx = port + queue_idx;	/* FIXME for multi queue */
+
+	if (!rte_is_power_of_2(nb_desc)) {
+		PMD_DRV_LOG(ERR,
+					"DPDK Arkville configuration queue size must be power of two %u (%s)\n",
+					nb_desc, __func__);
+		return -1;
+	}
+
+	/* TODO: We may already be setup, check here if there is to do return */
+	/* /\* Free memory prior to re-allocation if needed *\/ */
+	/* if (dev->data->tx_queues[queue_idx] != NULL) { */
+	/* 	dev->data->tx_queues[queue_idx] = NULL; */
+	/* } */
+
+	/* Allocate queue struct */
+	queue =
+		rte_zmalloc_socket("ArkTXQueue", sizeof(struct ark_tx_queue), 64,
+	socket_id);
+	if (queue == 0) {
+		PMD_DRV_LOG(ERR, "ARKP Failed to allocate tx queue memory in %s\n",
+					__func__);
+		return -ENOMEM;
+	}
+
+	/* we use zmalloc no need to initialize fields */
+	queue->queueSize = nb_desc;
+	queue->queueMask = nb_desc - 1;
+	queue->phys_qid = qidx;
+	queue->queueIndex = queue_idx;
+	dev->data->tx_queues[queue_idx] = queue;
+
+	queue->metaQ =
+	rte_zmalloc_socket("ArkTXQueue meta",
+	nb_desc * sizeof(struct ark_tx_meta), 64, socket_id);
+	queue->bufs =
+	rte_zmalloc_socket("ArkTXQueue bufs",
+	nb_desc * sizeof(struct rte_mbuf *), 64, socket_id);
+
+	if (queue->metaQ == 0 || queue->bufs == 0) {
+		PMD_DRV_LOG(ERR, "Failed to allocate queue memory in %s\n", __func__);
+		rte_free(queue->metaQ);
+		rte_free(queue->bufs);
+		rte_free(queue);
+		return -ENOMEM;
+	}
+
+	queue->ddm = RTE_PTR_ADD(ark->ddm.v, qidx * ARK_DDM_QOFFSET);
+	queue->mpu = RTE_PTR_ADD(ark->mputx.v, qidx * ARK_MPU_QOFFSET);
+
+	status = eth_ark_tx_hw_queue_config(queue);
+
+	if (unlikely(status != 0)) {
+		rte_free(queue->metaQ);
+		rte_free(queue->bufs);
+		rte_free(queue);
+		return -1;		/* ERROR CODE */
+	}
+
+	return 0;
+}
+
+/* ************************************************************************* */
+static int
+eth_ark_tx_hw_queue_config(struct ark_tx_queue *queue)
+{
+	phys_addr_t queueBase, ringBase, prodIndexAddr;
+	uint32_t writeInterval_ns;
+
+	/* Verify HW -- MPU */
+	if (ark_mpu_verify(queue->mpu, sizeof(struct ark_tx_meta)))
+		return -1;
+
+	queueBase = rte_malloc_virt2phy(queue);
+	ringBase = rte_malloc_virt2phy(queue->metaQ);
+	prodIndexAddr = queueBase + offsetof(struct ark_tx_queue, consIndex);
+
+	ark_mpu_stop(queue->mpu);
+	ark_mpu_reset(queue->mpu);
+
+	/* Stop and Reset and configure MPU */
+	ark_mpu_configure(queue->mpu, ringBase, queue->queueSize, 1);
+
+	/* Adjust the write interval based on queue size -- increase pcie traffic
+	 * when low mbuf count */
+	switch (queue->queueSize) {
+	case 128:
+	writeInterval_ns = 500;
+	break;
+	case 256:
+	writeInterval_ns = 500;
+	break;
+	case 512:
+	writeInterval_ns = 1000;
+	break;
+	default:
+	writeInterval_ns = 2000;
+	break;
+	}
+
+	// Completion address in UDM
+	ark_ddm_setup(queue->ddm, prodIndexAddr, writeInterval_ns);
+
+	return 0;
+}
+
+/* ************************************************************************* */
+void
+eth_ark_tx_queue_release(void *vtx_queue)
+{
+	struct ark_tx_queue *queue;
+
+	queue = (struct ark_tx_queue *) vtx_queue;
+
+	ark_tx_hw_queue_stop(queue);
+
+	queue->consIndex = queue->prodIndex;
+	free_completed_tx(queue);
+
+	rte_free(queue->metaQ);
+	rte_free(queue->bufs);
+	rte_free(queue);
+
+}
+
+/* ************************************************************************* */
+int
+eth_ark_tx_queue_stop(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_tx_queue *queue;
+	int cnt = 0;
+
+	queue = dev->data->tx_queues[queue_id];
+
+	/* Wait for DDM to send out all packets. */
+	while (queue->consIndex != queue->prodIndex) {
+		usleep(100);
+		if (cnt++ > 10000)
+			return -1;
+	}
+
+	ark_mpu_stop(queue->mpu);
+	free_completed_tx(queue);
+
+	dev->data->tx_queue_state[queue_id] = RTE_ETH_QUEUE_STATE_STOPPED;
+
+	return 0;
+}
+
+int
+eth_ark_tx_queue_start(struct rte_eth_dev *dev, uint16_t queue_id)
+{
+	struct ark_tx_queue *queue;
+
+	queue = dev->data->tx_queues[queue_id];
+	if (dev->data->tx_queue_state[queue_id] == RTE_ETH_QUEUE_STATE_STARTED)
+		return 0;
+
+	ark_mpu_start(queue->mpu);
+	dev->data->tx_queue_state[queue_id] = RTE_ETH_QUEUE_STATE_STARTED;
+
+	return 0;
+}
+
+/* ************************************************************************* */
+static void
+free_completed_tx(struct ark_tx_queue *queue)
+{
+	struct rte_mbuf *mbuf;
+	struct ark_tx_meta *meta;
+	uint32_t topIndex;
+
+	topIndex = queue->consIndex;	/* read once */
+	while (queue->freeIndex != topIndex) {
+		meta = &queue->metaQ[queue->freeIndex & queue->queueMask];
+		mbuf = queue->bufs[queue->freeIndex & queue->queueMask];
+
+		if (likely((meta->flags & ARK_DDM_SOP) != 0)) {
+			/* ref count of the mbuf is checked in this call. */
+			rte_pktmbuf_free(mbuf);
+		}
+		queue->freeIndex++;
+	}
+}
+
+/* ************************************************************************* */
+void
+eth_tx_queue_stats_get(void *vqueue, struct rte_eth_stats *stats)
+{
+	struct ark_tx_queue *queue;
+	struct ark_ddm_t *ddm;
+	uint64_t bytes, pkts;
+
+	queue = vqueue;
+	ddm = queue->ddm;
+
+	bytes = ark_ddm_queue_byte_count(ddm);
+	pkts = ark_ddm_queue_pkt_count(ddm);
+
+	stats->q_opackets[queue->queueIndex] = pkts;
+	stats->q_obytes[queue->queueIndex] = bytes;
+	stats->opackets += pkts;
+	stats->obytes += bytes;
+	stats->oerrors += queue->tx_errors;
+}
+
+void
+eth_tx_queue_stats_reset(void *vqueue)
+{
+	struct ark_tx_queue *queue;
+	struct ark_ddm_t *ddm;
+
+	queue = vqueue;
+	ddm = queue->ddm;
+
+	ark_ddm_queue_reset_stats(ddm);
+	queue->tx_errors = 0;
+}
diff --git a/drivers/net/ark/ark_ext.h b/drivers/net/ark/ark_ext.h
new file mode 100644
index 0000000..0b5b9ba
--- /dev/null
+++ b/drivers/net/ark/ark_ext.h
@@ -0,0 +1,71 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_EXT_H_
+#define _ARK_EXT_H_
+
+/*
+ Called post PMD init.  The implementation returns its private data that gets passed into
+ all other functions as user_data
+ The ARK extension implementation MUST implement this function
+*/
+void *dev_init(struct rte_eth_dev *dev, void *Abar, int port_id);
+
+/* Called during device shutdown */
+void dev_uninit(struct rte_eth_dev *dev, void *user_data);
+
+/* This call is optional and allows the extension to specify the number of supported ports. */
+uint8_t dev_get_port_count(struct rte_eth_dev *dev, void *user_data);
+
+/*
+   The following functions are optional and are directly mapped from the DPDK PMD ops
+   structure. Each function if implemented is called after the ARK PMD implementation executes.
+*/
+int dev_configure(struct rte_eth_dev *dev, void *user_data);
+int dev_start(struct rte_eth_dev *dev, void *user_data);
+void dev_stop(struct rte_eth_dev *dev, void *user_data);
+void dev_close(struct rte_eth_dev *dev, void *user_data);
+int link_update(struct rte_eth_dev *dev, int wait_to_complete,
+	void *user_data);
+int dev_set_link_up(struct rte_eth_dev *dev, void *user_data);
+int dev_set_link_down(struct rte_eth_dev *dev, void *user_data);
+void stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats,
+	void *user_data);
+void stats_reset(struct rte_eth_dev *dev, void *user_data);
+void mac_addr_add(struct rte_eth_dev *dev,
+	struct ether_addr *macadr, uint32_t index, uint32_t pool, void *user_data);
+void mac_addr_remove(struct rte_eth_dev *dev, uint32_t index, void *user_data);
+void mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+	void *user_data);
+
+#endif
diff --git a/drivers/net/ark/ark_global.h b/drivers/net/ark/ark_global.h
new file mode 100644
index 0000000..78c61de
--- /dev/null
+++ b/drivers/net/ark/ark_global.h
@@ -0,0 +1,164 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_GLOBAL_H_
+#define _ARK_GLOBAL_H_
+
+#include <time.h>
+#include <assert.h>
+
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_memcpy.h>
+#include <rte_string_fns.h>
+#include <rte_cycles.h>
+#include <rte_kvargs.h>
+#include <rte_dev.h>
+#include <rte_version.h>
+
+#include "ark_pktdir.h"
+#include "ark_pktgen.h"
+#include "ark_pktchkr.h"
+
+#define ETH_ARK_ARG_MAXLEN	64
+#define ARK_SYSCTRL_BASE  0x0
+#define ARK_PKTGEN_BASE   0x10000
+#define ARK_MPURx_BASE    0x20000
+#define ARK_UDM_BASE      0x30000
+#define ARK_MPUTx_BASE    0x40000
+#define ARK_DDM_BASE      0x60000
+#define ARK_CMAC_BASE     0x80000
+#define ARK_PKTDIR_BASE   0xA0000
+#define ARK_PKTCHKR_BASE  0x90000
+#define ARK_RCPACING_BASE 0xB0000
+#define ARK_EXTERNAL_BASE 0x100000
+#define ARK_MPU_QOFFSET   0x00100
+#define ARK_MAX_PORTS     8
+
+#define Offset8(n)     n
+#define Offset16(n)   (n/2)
+#define Offset32(n)   (n/4)
+#define Offset64(n)   (n/8)
+
+/*
+ * Structure to store private data for each PF/VF instance.
+ */
+#define DefPtr(type, name)	\
+  union type {      \
+	uint64_t *t64; \
+	uint32_t *t32; \
+	uint16_t *t16; \
+	uint8_t  *t8;  \
+	void     *v;  \
+  } name
+
+#define SetPtr(bar, ark, mem, off) {	    \
+  ark->mem.t64 = (uint64_t *)&ark->bar[off]; \
+  ark->mem.t32 = (uint32_t *)&ark->bar[off]; \
+  ark->mem.t16 = (uint16_t *)&ark->bar[off]; \
+  ark->mem.t8  = (uint8_t *)&ark->bar[off]; \
+  }
+
+struct ark_port {
+	struct rte_eth_dev *eth_dev;
+	int id;
+};
+
+struct ark_user_ext {
+	void *(*dev_init) (struct rte_eth_dev *, void *abar, int port_id);
+	void (*dev_uninit) (struct rte_eth_dev *, void *);
+	int (*dev_get_port_count) (struct rte_eth_dev *, void *);
+	int (*dev_configure) (struct rte_eth_dev *, void *);
+	int (*dev_start) (struct rte_eth_dev *, void *);
+	void (*dev_stop) (struct rte_eth_dev *, void *);
+	void (*dev_close) (struct rte_eth_dev *, void *);
+	int (*link_update) (struct rte_eth_dev *, int wait_to_complete, void *);
+	int (*dev_set_link_up) (struct rte_eth_dev *, void *);
+	int (*dev_set_link_down) (struct rte_eth_dev *, void *);
+	void (*stats_get) (struct rte_eth_dev *, struct rte_eth_stats *, void *);
+	void (*stats_reset) (struct rte_eth_dev *, void *);
+	void (*mac_addr_add) (struct rte_eth_dev *,
+	struct ether_addr *, uint32_t, uint32_t, void *);
+	void (*mac_addr_remove) (struct rte_eth_dev *, uint32_t, void *);
+	void (*mac_addr_set) (struct rte_eth_dev *, struct ether_addr *, void *);
+};
+
+struct ark_adapter {
+
+	/* User extension private data */
+	void *user_data;
+
+	/* Pointers to packet generator and checker */
+	int start_pg;
+	ArkPktGen_t pg;
+	ArkPktChkr_t pc;
+	ArkPktDir_t pd;
+
+	struct ark_port port[ARK_MAX_PORTS];
+	int num_ports;
+
+	/* Common for both PF and VF */
+	struct rte_eth_dev *eth_dev;
+
+	void *dHandle;
+	struct ark_user_ext user_ext;
+
+	/* Our Bar 0 */
+	uint8_t *bar0;
+
+	/* A Bar */
+	uint8_t *Abar;
+
+	/* Arkville demo block offsets */
+	 DefPtr(SysCtrl, sysctrl);
+	 DefPtr(PktGen, pktgen);
+	 DefPtr(MpuRx, mpurx);
+	 DefPtr(UDM, udm);
+	 DefPtr(MpuTx, mputx);
+	 DefPtr(DDM, ddm);
+	 DefPtr(CMAC, cmac);
+	 DefPtr(External, external);
+	 DefPtr(PktDir, pktdir);
+	 DefPtr(PktChkr, pktchkr);
+
+	int started;
+	uint16_t rxQueues;
+	uint16_t txQueues;
+
+	struct ark_rqpace_t *rqpacing;
+};
+
+typedef uint32_t *ark_t;
+
+#endif
diff --git a/drivers/net/ark/ark_mpu.c b/drivers/net/ark/ark_mpu.c
new file mode 100644
index 0000000..b206d59
--- /dev/null
+++ b/drivers/net/ark/ark_mpu.c
@@ -0,0 +1,168 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_debug.h"
+#include "ark_mpu.h"
+
+uint16_t
+ark_api_num_queues(struct ark_mpu_t *mpu)
+{
+	return mpu->hw.numQueues;
+}
+
+uint16_t
+ark_api_num_queues_per_port(struct ark_mpu_t *mpu, uint16_t ark_ports)
+{
+	return mpu->hw.numQueues / ark_ports;
+}
+
+int
+ark_mpu_verify(struct ark_mpu_t *mpu, uint32_t objSize)
+{
+	uint32_t version;
+
+	version = mpu->id.vernum & 0x0000FF00;
+	if ((mpu->id.idnum != 0x2055504d) || (mpu->hw.objSize != objSize)
+	|| version != 0x00003100) {
+	fprintf(stderr,
+		"   MPU module not found as expected %08x \"%c%c%c%c"
+		"%c%c%c%c\"\n", mpu->id.idnum, mpu->id.id[0], mpu->id.id[1],
+		mpu->id.id[2], mpu->id.id[3], mpu->id.ver[0], mpu->id.ver[1],
+		mpu->id.ver[2], mpu->id.ver[3]);
+	fprintf(stderr,
+		"   MPU HW numQueues: %u hwDepth %u, objSize: %u, objPerMRR: %u Expected size %u\n",
+		mpu->hw.numQueues, mpu->hw.hwDepth, mpu->hw.objSize,
+		mpu->hw.objPerMRR, objSize);
+	return -1;
+	}
+	return 0;
+}
+
+void
+ark_mpu_stop(struct ark_mpu_t *mpu)
+{
+	mpu->cfg.command = MPU_CMD_Stop;
+}
+
+void
+ark_mpu_start(struct ark_mpu_t *mpu)
+{
+	mpu->cfg.command = MPU_CMD_Run;	/* run state */
+}
+
+int
+ark_mpu_reset(struct ark_mpu_t *mpu)
+{
+
+	int cnt = 0;
+
+	mpu->cfg.command = MPU_CMD_Reset;	/* reset */
+
+	while (mpu->cfg.command != MPU_CMD_Idle) {
+	if (cnt++ > 1000)
+		break;
+	usleep(10);
+	}
+	if (mpu->cfg.command != MPU_CMD_Idle) {
+	mpu->cfg.command = MPU_CMD_ForceReset;	/* forced reset */
+	usleep(10);
+	}
+	ark_mpu_reset_stats(mpu);
+	return mpu->cfg.command != MPU_CMD_Idle;
+}
+
+void
+ark_mpu_reset_stats(struct ark_mpu_t *mpu)
+{
+	mpu->stats.pciRequest = 1;	/* reset stats */
+}
+
+int
+ark_mpu_configure(struct ark_mpu_t *mpu, phys_addr_t ring, uint32_t ringSize,
+	int isTx)
+{
+	ark_mpu_reset(mpu);
+
+	if (!rte_is_power_of_2(ringSize)) {
+	fprintf(stderr, "ARKP Invalid ring size for MPU %d\n", ringSize);
+	return -1;
+	}
+
+	mpu->cfg.ringBase = ring;
+	mpu->cfg.ringSize = ringSize;
+	mpu->cfg.ringMask = ringSize - 1;
+	mpu->cfg.minHostMove = isTx ? 1 : mpu->hw.objPerMRR;
+	mpu->cfg.minHWMove = mpu->hw.objPerMRR;
+	mpu->cfg.swProdIndex = 0;
+	mpu->cfg.hwConsIndex = 0;
+	return 0;
+}
+
+void
+ark_mpu_dump(struct ark_mpu_t *mpu, const char *code, uint16_t qid)
+{
+	/* DUMP to see that we have started */
+	ARK_DEBUG_TRACE
+		("ARKP MPU: %s Q: %3u swProd %u, hwCons: %u\n", code, qid,
+		 mpu->cfg.swProdIndex, mpu->cfg.hwConsIndex);
+	ARK_DEBUG_TRACE
+		("ARKP MPU: %s state: %d count %d, reserved %d data 0x%08x_%08x 0x%08x_%08x\n",
+		 code, mpu->debug.state, mpu->debug.count, mpu->debug.reserved,
+		 mpu->debug.peek[1], mpu->debug.peek[0], mpu->debug.peek[3],
+		 mpu->debug.peek[2]
+		 );
+	ARK_DEBUG_STATS
+		("ARKP MPU: %s Q: %3u" FMT_SU64 FMT_SU64 FMT_SU64 FMT_SU64
+		 FMT_SU64 FMT_SU64 FMT_SU64 "\n", code, qid,
+		 "PCI Request:", mpu->stats.pciRequest,
+		 "QueueEmpty", mpu->stats.qEmpty,
+		 "QueueQ1", mpu->stats.qQ1,
+		 "QueueQ2", mpu->stats.qQ2,
+		 "QueueQ3", mpu->stats.qQ3,
+		 "QueueQ4", mpu->stats.qQ4,
+		 "QueueFull", mpu->stats.qFull
+		 );
+}
+
+void
+ark_mpu_dump_setup(struct ark_mpu_t *mpu, uint16_t qId)
+{
+	ARK_DEBUG_TRACE
+		("MPU Setup Q: %u"
+		 FMT_SPTR "\n", qId,
+		 "ringBase", (void *) mpu->cfg.ringBase
+		 );
+
+}
diff --git a/drivers/net/ark/ark_mpu.h b/drivers/net/ark/ark_mpu.h
new file mode 100644
index 0000000..54b4b60
--- /dev/null
+++ b/drivers/net/ark/ark_mpu.h
@@ -0,0 +1,143 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_MPU_H_
+#define _ARK_MPU_H_
+
+#include <stdint.h>
+
+#include <rte_memory.h>
+
+/*
+ * MPU hardware structures
+ */
+
+#define ARK_MPU_ID 0x00
+struct ark_mpu_id_t {
+	union {
+	char id[4];
+	uint32_t idnum;
+	};
+	union {
+	char ver[4];
+	uint32_t vernum;
+	};
+	uint32_t physId;
+	uint32_t mrrCode;
+};
+
+#define ARK_MPU_HW 0x010
+struct ark_mpu_hw_t {
+	uint16_t numQueues;
+	uint16_t reserved;
+	uint32_t hwDepth;
+	uint32_t objSize;
+	uint32_t objPerMRR;
+};
+
+#define ARK_MPU_CFG 0x040
+struct ark_mpu_cfg_t {
+	phys_addr_t ringBase;	/* phys_addr_t is a uint64_t */
+	uint32_t ringSize;
+	uint32_t ringMask;
+	uint32_t minHostMove;
+	uint32_t minHWMove;
+	volatile uint32_t swProdIndex;
+	volatile uint32_t hwConsIndex;
+	volatile uint32_t command;
+};
+enum ARK_MPU_COMMAND {
+	MPU_CMD_Idle = 1, MPU_CMD_Run = 2, MPU_CMD_Stop = 4, MPU_CMD_Reset =
+	8, MPU_CMD_ForceReset = 16, MPU_COMMAND_LIMIT = 0xFFFFFFFF
+};
+
+#define ARK_MPU_STATS 0x080
+struct ark_mpu_stats_t {
+	volatile uint64_t pciRequest;
+	volatile uint64_t qEmpty;
+	volatile uint64_t qQ1;
+	volatile uint64_t qQ2;
+	volatile uint64_t qQ3;
+	volatile uint64_t qQ4;
+	volatile uint64_t qFull;
+};
+
+#define ARK_MPU_DEBUG 0x0C0
+struct ark_mpu_debug_t {
+	volatile uint32_t state;
+	uint32_t reserved;
+	volatile uint32_t count;
+	volatile uint32_t take;
+	volatile uint32_t peek[4];
+};
+
+/*  Consolidated structure */
+struct ark_mpu_t {
+	struct ark_mpu_id_t id;
+	uint8_t reserved0[(ARK_MPU_HW - ARK_MPU_ID)
+					  - sizeof(struct ark_mpu_id_t)];
+	struct ark_mpu_hw_t hw;
+	uint8_t reserved1[(ARK_MPU_CFG - ARK_MPU_HW) -
+					  sizeof(struct ark_mpu_hw_t)];
+	struct ark_mpu_cfg_t cfg;
+	uint8_t reserved2[(ARK_MPU_STATS - ARK_MPU_CFG) -
+					  sizeof(struct ark_mpu_cfg_t)];
+	struct ark_mpu_stats_t stats;
+	uint8_t reserved3[(ARK_MPU_DEBUG - ARK_MPU_STATS) -
+					  sizeof(struct ark_mpu_stats_t)];
+	struct ark_mpu_debug_t debug;
+};
+
+uint16_t ark_api_num_queues(struct ark_mpu_t *mpu);
+uint16_t ark_api_num_queues_per_port(struct ark_mpu_t *mpu,
+	uint16_t ark_ports);
+int ark_mpu_verify(struct ark_mpu_t *mpu, uint32_t objSize);
+void ark_mpu_stop(struct ark_mpu_t *mpu);
+void ark_mpu_start(struct ark_mpu_t *mpu);
+int ark_mpu_reset(struct ark_mpu_t *mpu);
+int ark_mpu_configure(struct ark_mpu_t *mpu, phys_addr_t ring,
+	uint32_t ringSize, int isTx);
+
+void ark_mpu_dump(struct ark_mpu_t *mpu, const char *msg, uint16_t idx);
+void ark_mpu_dump_setup(struct ark_mpu_t *mpu, uint16_t qid);
+void ark_mpu_reset_stats(struct ark_mpu_t *mpu);
+
+static inline void
+ark_mpu_set_producer(struct ark_mpu_t *mpu, uint32_t idx)
+{
+	mpu->cfg.swProdIndex = idx;
+}
+
+// #define    ark_mpu_set_producer(MPU, IDX) {(MPU)->cfg.swProdIndex = (IDX);}
+
+#endif
diff --git a/drivers/net/ark/ark_pktchkr.c b/drivers/net/ark/ark_pktchkr.c
new file mode 100644
index 0000000..47b75a0
--- /dev/null
+++ b/drivers/net/ark/ark_pktchkr.c
@@ -0,0 +1,445 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <getopt.h>
+#include <sys/time.h>
+#include <locale.h>
+#include <unistd.h>
+
+#include "ark_pktchkr.h"
+#include "ark_debug.h"
+
+static int setArg(char *arg, char *val);
+static int ark_pmd_pktchkr_isGenForever(ArkPktChkr_t handle);
+
+#define ARK_MAX_STR_LEN 64
+union OptV {
+	int Int;
+	int Bool;
+	uint64_t Long;
+	char Str[ARK_MAX_STR_LEN];
+};
+
+enum OPType {
+	OTInt,
+	OTLong,
+	OTBool,
+	OTString
+};
+
+struct Options {
+	char opt[ARK_MAX_STR_LEN];
+	enum OPType t;
+	union OptV v;
+};
+
+static struct Options toptions[] = {
+	{{"configure"}, OTBool, {1} },
+	{{"port"}, OTInt, {0} },
+	{{"mac-dump"}, OTBool, {0} },
+	{{"dg-mode"}, OTBool, {1} },
+	{{"run"}, OTBool, {0} },
+	{{"stop"}, OTBool, {0} },
+	{{"dump"}, OTBool, {0} },
+	{{"enResync"}, OTBool, {0} },
+	{{"tuserErrVal"}, OTInt, {1} },
+	{{"genForever"}, OTBool, {0} },
+	{{"enSlavedStart"}, OTBool, {0} },
+	{{"varyLength"}, OTBool, {0} },
+	{{"incrPayload"}, OTInt, {0} },
+	{{"incrFirstByte"}, OTBool, {0} },
+	{{"insSeqNum"}, OTBool, {0} },
+	{{"insTimeStamp"}, OTBool, {1} },
+	{{"insUDPHdr"}, OTBool, {0} },
+	{{"numPkts"}, OTLong, .v.Long = 10000000000000L},
+	{{"payloadByte"}, OTInt, {0x55} },
+	{{"pktSpacing"}, OTInt, {60} },
+	{{"pktSizeMin"}, OTInt, {2005} },
+	{{"pktSizeMax"}, OTInt, {1514} },
+	{{"pktSizeIncr"}, OTInt, {1} },
+	{{"ethType"}, OTInt, {0x0800} },
+	{{"srcMACAddr"}, OTLong, .v.Long = 0xDC3CF6425060L},
+	{{"dstMACAddr"}, OTLong, .v.Long = 0x112233445566L},
+	{{"hdrDW0"}, OTInt, {0x0016e319} },
+	{{"hdrDW1"}, OTInt, {0x27150004} },
+	{{"hdrDW2"}, OTInt, {0x76967bda} },
+	{{"hdrDW3"}, OTInt, {0x08004500} },
+	{{"hdrDW4"}, OTInt, {0x005276ed} },
+	{{"hdrDW5"}, OTInt, {0x40004006} },
+	{{"hdrDW6"}, OTInt, {0x56cfc0a8} },
+	{{"startOffset"}, OTInt, {0} },
+	{{"dstIP"}, OTString, .v.Str = "169.254.10.240"},
+	{{"dstPort"}, OTInt, {65536} },
+	{{"srcPort"}, OTInt, {65536} },
+};
+
+ArkPktChkr_t
+ark_pmd_pktchkr_init(void *addr, int ord, int l2_mode)
+{
+	struct ArkPktChkrInst *inst =
+		rte_malloc("ArkPktChkrInst", sizeof(struct ArkPktChkrInst), 0);
+	inst->sregs = (struct ArkPktChkrStatRegs *) addr;
+	inst->cregs = (struct ArkPktChkrCtlRegs *) (((uint8_t *) addr) + 0x100);
+	inst->ordinal = ord;
+	inst->l2_mode = l2_mode;
+	return inst;
+}
+
+void
+ark_pmd_pktchkr_uninit(ArkPktChkr_t handle)
+{
+	rte_free(handle);
+}
+
+void
+ark_pmd_pktchkr_run(ArkPktChkr_t handle)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->sregs->pktStartStop = 0;
+	inst->sregs->pktStartStop = 0x1;
+}
+
+int
+ark_pmd_pktchkr_stopped(ArkPktChkr_t handle)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+	uint32_t r = inst->sregs->pktStartStop;
+
+	return (((r >> 16) & 1) == 1);
+}
+
+void
+ark_pmd_pktchkr_stop(ArkPktChkr_t handle)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+	int waitCycle = 10;
+
+	inst->sregs->pktStartStop = 0;
+	while (!ark_pmd_pktchkr_stopped(handle) && (waitCycle > 0)) {
+	usleep(1000);
+	waitCycle--;
+	ARK_DEBUG_TRACE("Waiting for pktchk %d to stop...\n", inst->ordinal);
+	}
+	ARK_DEBUG_TRACE("pktchk %d stopped.\n", inst->ordinal);
+}
+
+int
+ark_pmd_pktchkr_isRunning(ArkPktChkr_t handle)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+	uint32_t r = inst->sregs->pktStartStop;
+
+	return ((r & 1) == 1);
+}
+
+static void
+ark_pmd_pktchkr_setPktCtrl(ArkPktChkr_t handle, uint32_t genForever,
+	uint32_t varyLength, uint32_t incrPayload, uint32_t incrFirstByte,
+	uint32_t insSeqNum, uint32_t insUDPHdr, uint32_t enResync,
+	uint32_t tuserErrVal, uint32_t insTimeStamp)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+	uint32_t r = (tuserErrVal << 16) | (enResync << 0);
+
+	inst->sregs->pktCtrl = r;
+	if (!inst->l2_mode) {
+	insUDPHdr = 0;
+	}
+	r = (genForever << 24) | (varyLength << 16) |
+	(incrPayload << 12) | (incrFirstByte << 8) |
+	(insTimeStamp << 5) | (insSeqNum << 4) | insUDPHdr;
+	inst->cregs->pktCtrl = r;
+}
+
+static
+	int
+ark_pmd_pktchkr_isGenForever(ArkPktChkr_t handle)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+	uint32_t r = inst->cregs->pktCtrl;
+
+	return (((r >> 24) & 1) == 1);
+}
+
+int
+ark_pmd_pktchkr_waitDone(ArkPktChkr_t handle)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	if (ark_pmd_pktchkr_isGenForever(handle)) {
+	ARK_DEBUG_TRACE
+		("Error: waitDone will not terminate because genForever=1\n");
+	return -1;
+	}
+	int waitCycle = 10;
+
+	while (!ark_pmd_pktchkr_stopped(handle) && (waitCycle > 0)) {
+	usleep(1000);
+	waitCycle--;
+	ARK_DEBUG_TRACE
+		("Waiting for packet checker %d's internal pktgen to finish sending...\n",
+		inst->ordinal);
+	ARK_DEBUG_TRACE("pktchk %d's pktgen done.\n", inst->ordinal);
+	}
+	return 0;
+}
+
+int
+ark_pmd_pktchkr_getPktsSent(ArkPktChkr_t handle)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	return inst->cregs->pktsSent;
+}
+
+void
+ark_pmd_pktchkr_setPayloadByte(ArkPktChkr_t handle, uint32_t b)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->cregs->pktPayload = b;
+}
+
+void
+ark_pmd_pktchkr_setPktSizeMin(ArkPktChkr_t handle, uint32_t x)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->cregs->pktSizeMin = x;
+}
+
+void
+ark_pmd_pktchkr_setPktSizeMax(ArkPktChkr_t handle, uint32_t x)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->cregs->pktSizeMax = x;
+}
+
+void
+ark_pmd_pktchkr_setPktSizeIncr(ArkPktChkr_t handle, uint32_t x)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->cregs->pktSizeIncr = x;
+}
+
+void
+ark_pmd_pktchkr_setNumPkts(ArkPktChkr_t handle, uint32_t x)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->cregs->numPkts = x;
+}
+
+void
+ark_pmd_pktchkr_setSrcMACAddr(ArkPktChkr_t handle, uint64_t macAddr)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->cregs->srcMACAddrH = (macAddr >> 32) & 0xffff;
+	inst->cregs->srcMACAddrL = macAddr & 0xffffffff;
+}
+
+void
+ark_pmd_pktchkr_setDstMACAddr(ArkPktChkr_t handle, uint64_t macAddr)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->cregs->dstMACAddrH = (macAddr >> 32) & 0xffff;
+	inst->cregs->dstMACAddrL = macAddr & 0xffffffff;
+}
+
+void
+ark_pmd_pktchkr_setEthType(ArkPktChkr_t handle, uint32_t x)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	inst->cregs->ethType = x;
+}
+
+void
+ark_pmd_pktchkr_setHdrDW(ArkPktChkr_t handle, uint32_t *hdr)
+{
+	uint32_t i;
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	for (i = 0; i < 7; i++) {
+	inst->cregs->hdrDW[i] = hdr[i];
+	}
+}
+
+void
+ark_pmd_pktchkr_dump_stats(ArkPktChkr_t handle)
+{
+	struct ArkPktChkrInst *inst = (struct ArkPktChkrInst *) handle;
+
+	fprintf(stderr, "pktsRcvd      = (%'u)\n", inst->sregs->pktsRcvd);
+	fprintf(stderr, "bytesRcvd     = (%'" PRIu64 ")\n",
+	inst->sregs->bytesRcvd);
+	fprintf(stderr, "pktsOK        = (%'u)\n", inst->sregs->pktsOK);
+	fprintf(stderr, "pktsMismatch  = (%'u)\n", inst->sregs->pktsMismatch);
+	fprintf(stderr, "pktsErr       = (%'u)\n", inst->sregs->pktsErr);
+	fprintf(stderr, "firstMismatch = (%'u)\n", inst->sregs->firstMismatch);
+	fprintf(stderr, "resyncEvents  = (%'u)\n", inst->sregs->resyncEvents);
+	fprintf(stderr, "pktsMissing   = (%'u)\n", inst->sregs->pktsMissing);
+	fprintf(stderr, "minLatency    = (%'u)\n", inst->sregs->minLatency);
+	fprintf(stderr, "maxLatency    = (%'u)\n", inst->sregs->maxLatency);
+}
+
+static struct Options *
+OPTIONS(const char *id)
+{
+	unsigned i;
+
+	for (i = 0; i < sizeof(toptions) / sizeof(struct Options); i++) {
+	if (strcmp(id, toptions[i].opt) == 0) {
+		return &toptions[i];
+	}
+	}
+	PMD_DRV_LOG(ERR,
+	"pktgen: Could not find requested option !!, option = %s\n", id);
+	return NULL;
+}
+
+static int
+setArg(char *arg, char *val)
+{
+	struct Options *o = OPTIONS(arg);
+
+	if (o) {
+	switch (o->t) {
+	case OTInt:
+	case OTBool:
+		o->v.Int = atoi(val);
+		break;
+	case OTLong:
+		o->v.Int = atoll(val);
+		break;
+	case OTString:
+		strncpy(o->v.Str, val, ARK_MAX_STR_LEN);
+		break;
+	}
+	return 1;
+	}
+	return 0;
+}
+
+/******
+ * Arg format = "opt0=v,optN=v ..."
+ ******/
+void
+ark_pmd_pktchkr_parse(char *args)
+{
+	char *argv, *v;
+	const char toks[] = "= \n\t\v\f\r";
+
+	argv = strtok(args, toks);
+	v = strtok(NULL, toks);
+	setArg(argv, v);
+	while (argv && v) {
+	argv = strtok(NULL, toks);
+	v = strtok(NULL, toks);
+	if (argv && v)
+		setArg(argv, v);
+	}
+}
+
+static int32_t parseIPV4string(char const *ipAddress);
+static int32_t
+parseIPV4string(char const *ipAddress)
+{
+	unsigned int ip[4];
+
+	if (4 != sscanf(ipAddress, "%u.%u.%u.%u", &ip[0], &ip[1], &ip[2], &ip[3]))
+	return 0;
+	return ip[3] + ip[2] * 0x100 + ip[1] * 0x10000ul + ip[0] * 0x1000000ul;
+}
+
+void
+ark_pmd_pktchkr_setup(ArkPktChkr_t handle)
+{
+	uint32_t hdr[7];
+	int32_t dstIp = parseIPV4string(OPTIONS("dstIP")->v.Str);
+
+	if (!OPTIONS("stop")->v.Bool && OPTIONS("configure")->v.Bool) {
+
+	ark_pmd_pktchkr_setPayloadByte(handle, OPTIONS("payloadByte")->v.Int);
+	ark_pmd_pktchkr_setSrcMACAddr(handle, OPTIONS("srcMACAddr")->v.Int);
+	ark_pmd_pktchkr_setDstMACAddr(handle, OPTIONS("dstMACAddr")->v.Long);
+
+	ark_pmd_pktchkr_setEthType(handle, OPTIONS("ethType")->v.Int);
+	if (OPTIONS("dg-mode")->v.Bool) {
+		hdr[0] = OPTIONS("hdrDW0")->v.Int;
+		hdr[1] = OPTIONS("hdrDW1")->v.Int;
+		hdr[2] = OPTIONS("hdrDW2")->v.Int;
+		hdr[3] = OPTIONS("hdrDW3")->v.Int;
+		hdr[4] = OPTIONS("hdrDW4")->v.Int;
+		hdr[5] = OPTIONS("hdrDW5")->v.Int;
+		hdr[6] = OPTIONS("hdrDW6")->v.Int;
+	} else {
+		hdr[0] = dstIp;
+		hdr[1] = OPTIONS("dstPort")->v.Int;
+		hdr[2] = OPTIONS("srcPort")->v.Int;
+		hdr[3] = 0;
+		hdr[4] = 0;
+		hdr[5] = 0;
+		hdr[6] = 0;
+	}
+	ark_pmd_pktchkr_setHdrDW(handle, hdr);
+	ark_pmd_pktchkr_setNumPkts(handle, OPTIONS("numPkts")->v.Int);
+	ark_pmd_pktchkr_setPktSizeMin(handle, OPTIONS("pktSizeMin")->v.Int);
+	ark_pmd_pktchkr_setPktSizeMax(handle, OPTIONS("pktSizeMax")->v.Int);
+	ark_pmd_pktchkr_setPktSizeIncr(handle, OPTIONS("pktSizeIncr")->v.Int);
+	ark_pmd_pktchkr_setPktCtrl(handle,
+		OPTIONS("genForever")->v.Bool,
+		OPTIONS("varyLength")->v.Bool,
+		OPTIONS("incrPayload")->v.Bool,
+		OPTIONS("incrFirstByte")->v.Bool,
+		OPTIONS("insSeqNum")->v.Int,
+		OPTIONS("insUDPHdr")->v.Bool,
+		OPTIONS("enResync")->v.Bool,
+		OPTIONS("tuserErrVal")->v.Int, OPTIONS("insTimeStamp")->v.Int);
+	}
+
+	if (OPTIONS("stop")->v.Bool)
+	ark_pmd_pktchkr_stop(handle);
+
+	if (OPTIONS("run")->v.Bool) {
+	ARK_DEBUG_TRACE("Starting packet checker on port %d\n",
+		OPTIONS("port")->v.Int);
+	ark_pmd_pktchkr_run(handle);
+	}
+
+}
diff --git a/drivers/net/ark/ark_pktchkr.h b/drivers/net/ark/ark_pktchkr.h
new file mode 100644
index 0000000..5c2a60d
--- /dev/null
+++ b/drivers/net/ark/ark_pktchkr.h
@@ -0,0 +1,114 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_PKTCHKR_H_
+#define _ARK_PKTCHKR_H_
+
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_eal.h>
+
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+
+#define ARK_PKTCHKR_BASE_ADR  0x90000
+
+typedef void *ArkPktChkr_t;
+
+struct ArkPktChkrStatRegs {
+	uint32_t r0;
+	uint32_t pktStartStop;
+	uint32_t pktCtrl;
+	uint32_t pktsRcvd;
+	uint64_t bytesRcvd;
+	uint32_t pktsOK;
+	uint32_t pktsMismatch;
+	uint32_t pktsErr;
+	uint32_t firstMismatch;
+	uint32_t resyncEvents;
+	uint32_t pktsMissing;
+	uint32_t minLatency;
+	uint32_t maxLatency;
+} __attribute__ ((packed));
+
+struct ArkPktChkrCtlRegs {
+	uint32_t pktCtrl;
+	uint32_t pktPayload;
+	uint32_t pktSizeMin;
+	uint32_t pktSizeMax;
+	uint32_t pktSizeIncr;
+	uint32_t numPkts;
+	uint32_t pktsSent;
+	uint32_t srcMACAddrL;
+	uint32_t srcMACAddrH;
+	uint32_t dstMACAddrL;
+	uint32_t dstMACAddrH;
+	uint32_t ethType;
+	uint32_t hdrDW[7];
+} __attribute__ ((packed));
+
+struct ArkPktChkrInst {
+	struct rte_eth_dev_info *dev_info;
+	volatile struct ArkPktChkrStatRegs *sregs;
+	volatile struct ArkPktChkrCtlRegs *cregs;
+	int l2_mode;
+	int ordinal;
+};
+
+/*  packet checker functions */
+ArkPktChkr_t ark_pmd_pktchkr_init(void *addr, int ord, int l2_mode);
+void ark_pmd_pktchkr_uninit(ArkPktChkr_t handle);
+void ark_pmd_pktchkr_run(ArkPktChkr_t handle);
+int ark_pmd_pktchkr_stopped(ArkPktChkr_t handle);
+void ark_pmd_pktchkr_stop(ArkPktChkr_t handle);
+int ark_pmd_pktchkr_isRunning(ArkPktChkr_t handle);
+int ark_pmd_pktchkr_getPktsSent(ArkPktChkr_t handle);
+void ark_pmd_pktchkr_setPayloadByte(ArkPktChkr_t handle, uint32_t b);
+void ark_pmd_pktchkr_setPktSizeMin(ArkPktChkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_setPktSizeMax(ArkPktChkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_setPktSizeIncr(ArkPktChkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_setNumPkts(ArkPktChkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_setSrcMACAddr(ArkPktChkr_t handle, uint64_t macAddr);
+void ark_pmd_pktchkr_setDstMACAddr(ArkPktChkr_t handle, uint64_t macAddr);
+void ark_pmd_pktchkr_setEthType(ArkPktChkr_t handle, uint32_t x);
+void ark_pmd_pktchkr_setHdrDW(ArkPktChkr_t handle, uint32_t *hdr);
+void ark_pmd_pktchkr_parse(char *args);
+void ark_pmd_pktchkr_setup(ArkPktChkr_t handle);
+void ark_pmd_pktchkr_dump_stats(ArkPktChkr_t handle);
+int ark_pmd_pktchkr_waitDone(ArkPktChkr_t handle);
+
+#endif
diff --git a/drivers/net/ark/ark_pktdir.c b/drivers/net/ark/ark_pktdir.c
new file mode 100644
index 0000000..e68ff4e
--- /dev/null
+++ b/drivers/net/ark/ark_pktdir.c
@@ -0,0 +1,79 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <inttypes.h>
+
+#include "ark_global.h"
+
+ArkPktDir_t
+ark_pmd_pktdir_init(void *base)
+{
+	struct ArkPktDirInst *inst =
+	rte_malloc("ArkPktDirInst", sizeof(struct ArkPktDirInst), 0);
+	inst->regs = (struct ArkPktDirRegs *) base;
+	inst->regs->ctrl = 0x00110110;	/* POR state */
+	return inst;
+}
+
+void
+ark_pmd_pktdir_uninit(ArkPktDir_t handle)
+{
+	struct ArkPktDirInst *inst = (struct ArkPktDirInst *) handle;
+
+	rte_free(inst);
+}
+
+void
+ark_pmd_pktdir_setup(ArkPktDir_t handle, uint32_t v)
+{
+	struct ArkPktDirInst *inst = (struct ArkPktDirInst *) handle;
+
+	inst->regs->ctrl = v;
+}
+
+uint32_t
+ark_pmd_pktdir_status(ArkPktDir_t handle)
+{
+	struct ArkPktDirInst *inst = (struct ArkPktDirInst *) handle;
+
+	return inst->regs->ctrl;
+}
+
+uint32_t
+ark_pmd_pktdir_stallCnt(ArkPktDir_t handle)
+{
+	struct ArkPktDirInst *inst = (struct ArkPktDirInst *) handle;
+
+	return inst->regs->stallCnt;
+}
diff --git a/drivers/net/ark/ark_pktdir.h b/drivers/net/ark/ark_pktdir.h
new file mode 100644
index 0000000..6c0c634
--- /dev/null
+++ b/drivers/net/ark/ark_pktdir.h
@@ -0,0 +1,68 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_PKTDIR_H_
+#define _ARK_PKTDIR_H_
+
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_eal.h>
+
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+
+#define ARK_PKTDIR_BASE_ADR  0xA0000
+
+typedef void *ArkPktDir_t;
+
+struct ArkPktDirRegs {
+	uint32_t ctrl;
+	uint32_t status;
+	uint32_t stallCnt;
+} __attribute__ ((packed));
+
+struct ArkPktDirInst {
+	volatile struct ArkPktDirRegs *regs;
+};
+
+ArkPktDir_t ark_pmd_pktdir_init(void *base);
+void ark_pmd_pktdir_uninit(ArkPktDir_t handle);
+void ark_pmd_pktdir_setup(ArkPktDir_t handle, uint32_t v);
+uint32_t ark_pmd_pktdir_stallCnt(ArkPktDir_t handle);
+uint32_t ark_pmd_pktdir_status(ArkPktDir_t handle);
+
+#endif
diff --git a/drivers/net/ark/ark_pktgen.c b/drivers/net/ark/ark_pktgen.c
new file mode 100644
index 0000000..743aacf
--- /dev/null
+++ b/drivers/net/ark/ark_pktgen.c
@@ -0,0 +1,477 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <getopt.h>
+#include <sys/time.h>
+#include <locale.h>
+#include <unistd.h>
+
+#include "ark_pktgen.h"
+#include "ark_debug.h"
+
+#define ARK_MAX_STR_LEN 64
+union OptV {
+	int Int;
+	int Bool;
+	uint64_t Long;
+	char Str[ARK_MAX_STR_LEN];
+};
+
+enum OPType {
+	OTInt,
+	OTLong,
+	OTBool,
+	OTString
+};
+
+struct Options {
+	char opt[ARK_MAX_STR_LEN];
+	enum OPType t;
+	union OptV v;
+};
+
+static struct Options toptions[] = {
+	{{"configure"}, OTBool, {1} },
+	{{"dg-mode"}, OTBool, {1} },
+	{{"run"}, OTBool, {0} },
+	{{"pause"}, OTBool, {0} },
+	{{"reset"}, OTBool, {0} },
+	{{"dump"}, OTBool, {0} },
+	{{"genForever"}, OTBool, {0} },
+	{{"enSlavedStart"}, OTBool, {0} },
+	{{"varyLength"}, OTBool, {0} },
+	{{"incrPayload"}, OTBool, {0} },
+	{{"incrFirstByte"}, OTBool, {0} },
+	{{"insSeqNum"}, OTBool, {0} },
+	{{"insTimeStamp"}, OTBool, {1} },
+	{{"insUDPHdr"}, OTBool, {0} },
+	{{"numPkts"}, OTLong, .v.Long = 100000000},
+	{{"payloadByte"}, OTInt, {0x55} },
+	{{"pktSpacing"}, OTInt, {130} },
+	{{"pktSizeMin"}, OTInt, {2006} },
+	{{"pktSizeMax"}, OTInt, {1514} },
+	{{"pktSizeIncr"}, OTInt, {1} },
+	{{"ethType"}, OTInt, {0x0800} },
+	{{"srcMACAddr"}, OTLong, .v.Long = 0xDC3CF6425060L},
+	{{"dstMACAddr"}, OTLong, .v.Long = 0x112233445566L},
+	{{"hdrDW0"}, OTInt, {0x0016e319} },
+	{{"hdrDW1"}, OTInt, {0x27150004} },
+	{{"hdrDW2"}, OTInt, {0x76967bda} },
+	{{"hdrDW3"}, OTInt, {0x08004500} },
+	{{"hdrDW4"}, OTInt, {0x005276ed} },
+	{{"hdrDW5"}, OTInt, {0x40004006} },
+	{{"hdrDW6"}, OTInt, {0x56cfc0a8} },
+	{{"startOffset"}, OTInt, {0} },
+	{{"bytesPerCycle"}, OTInt, {10} },
+	{{"shaping"}, OTBool, {0} },
+	{{"dstIP"}, OTString, .v.Str = "169.254.10.240"},
+	{{"dstPort"}, OTInt, {65536} },
+	{{"srcPort"}, OTInt, {65536} },
+};
+
+ArkPktGen_t
+ark_pmd_pktgen_init(void *adr, int ord, int l2_mode)
+{
+	struct ArkPktGenInst *inst =
+		rte_malloc("ArkPktGenInstPMD", sizeof(struct ArkPktGenInst), 0);
+	inst->regs = (struct ArkPktGenRegs *) adr;
+	inst->ordinal = ord;
+	inst->l2_mode = l2_mode;
+	return inst;
+}
+
+void
+ark_pmd_pktgen_uninit(ArkPktGen_t handle)
+{
+	rte_free(handle);
+}
+
+void
+ark_pmd_pktgen_run(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->pktStartStop = 1;
+}
+
+uint32_t
+ark_pmd_pktgen_paused(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+	uint32_t r = inst->regs->pktStartStop;
+
+	return (((r >> 16) & 1) == 1);
+}
+
+void
+ark_pmd_pktgen_pause(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+	int cnt = 0;
+
+	inst->regs->pktStartStop = 0;
+
+	while (!ark_pmd_pktgen_paused(handle)) {
+		usleep(1000);
+		if (cnt++ > 100) {
+			PMD_DRV_LOG(ERR, "pktgen %d failed to pause.\n", inst->ordinal);
+			break;
+		}
+	}
+	ARK_DEBUG_TRACE("pktgen %d paused.\n", inst->ordinal);
+}
+
+void
+ark_pmd_pktgen_reset(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	if (!ark_pmd_pktgen_isRunning(handle) && !ark_pmd_pktgen_paused(handle)) {
+		ARK_DEBUG_TRACE
+			("pktgen %d is not running and is not paused. No need to reset.\n",
+			 inst->ordinal);
+		return;
+	}
+
+	if (ark_pmd_pktgen_isRunning(handle) && !ark_pmd_pktgen_paused(handle)) {
+		ARK_DEBUG_TRACE("pktgen %d is not paused. Pausing first.\n",
+						inst->ordinal);
+		ark_pmd_pktgen_pause(handle);
+	}
+
+	ARK_DEBUG_TRACE("Resetting pktgen %d.\n", inst->ordinal);
+	inst->regs->pktStartStop = (1 << 8);
+}
+
+uint32_t
+ark_pmd_pktgen_txDone(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+	uint32_t r = inst->regs->pktStartStop;
+
+	return (((r >> 24) & 1) == 1);
+}
+
+uint32_t
+ark_pmd_pktgen_isRunning(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+	uint32_t r = inst->regs->pktStartStop;
+
+	return ((r & 1) == 1);
+}
+
+uint32_t
+ark_pmd_pktgen_isGenForever(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+	uint32_t r = inst->regs->pktCtrl;
+
+	return (((r >> 24) & 1) == 1);
+}
+
+void
+ark_pmd_pktgen_waitDone(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	if (ark_pmd_pktgen_isGenForever(handle)) {
+	PMD_DRV_LOG(ERR, "waitDone will not terminate because genForever=1\n");
+	}
+	int waitCycle = 10;
+
+	while (!ark_pmd_pktgen_txDone(handle) && (waitCycle > 0)) {
+	usleep(1000);
+	waitCycle--;
+	ARK_DEBUG_TRACE("Waiting for pktgen %d to finish sending...\n",
+		inst->ordinal);
+	}
+	ARK_DEBUG_TRACE("pktgen %d done.\n", inst->ordinal);
+}
+
+uint32_t
+ark_pmd_pktgen_getPktsSent(ArkPktGen_t handle)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	return inst->regs->pktsSent;
+}
+
+void
+ark_pmd_pktgen_setPayloadByte(ArkPktGen_t handle, uint32_t b)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->pktPayload = b;
+}
+
+void
+ark_pmd_pktgen_setPktSpacing(ArkPktGen_t handle, uint32_t x)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->pktSpacing = x;
+}
+
+void
+ark_pmd_pktgen_setPktSizeMin(ArkPktGen_t handle, uint32_t x)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->pktSizeMin = x;
+}
+
+void
+ark_pmd_pktgen_setPktSizeMax(ArkPktGen_t handle, uint32_t x)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->pktSizeMax = x;
+}
+
+void
+ark_pmd_pktgen_setPktSizeIncr(ArkPktGen_t handle, uint32_t x)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->pktSizeIncr = x;
+}
+
+void
+ark_pmd_pktgen_setNumPkts(ArkPktGen_t handle, uint32_t x)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->numPkts = x;
+}
+
+void
+ark_pmd_pktgen_setSrcMACAddr(ArkPktGen_t handle, uint64_t macAddr)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->srcMACAddrH = (macAddr >> 32) & 0xffff;
+	inst->regs->srcMACAddrL = macAddr & 0xffffffff;
+}
+
+void
+ark_pmd_pktgen_setDstMACAddr(ArkPktGen_t handle, uint64_t macAddr)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->dstMACAddrH = (macAddr >> 32) & 0xffff;
+	inst->regs->dstMACAddrL = macAddr & 0xffffffff;
+}
+
+void
+ark_pmd_pktgen_setEthType(ArkPktGen_t handle, uint32_t x)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->ethType = x;
+}
+
+void
+ark_pmd_pktgen_setHdrDW(ArkPktGen_t handle, uint32_t *hdr)
+{
+	uint32_t i;
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	for (i = 0; i < 7; i++)
+		inst->regs->hdrDW[i] = hdr[i];
+}
+
+void
+ark_pmd_pktgen_setStartOffset(ArkPktGen_t handle, uint32_t x)
+{
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	inst->regs->startOffset = x;
+}
+
+static struct Options *
+OPTIONS(const char *id)
+{
+	unsigned i;
+
+	for (i = 0; i < sizeof(toptions) / sizeof(struct Options); i++) {
+		if (strcmp(id, toptions[i].opt) == 0)
+			return &toptions[i];
+	}
+
+	PMD_DRV_LOG
+		(ERR,
+		 "pktgen:  Could not find requested option !!, option = %s\n", id
+		 );
+	return NULL;
+}
+
+static int pmd_setArg(char *arg, char *val);
+static int
+pmd_setArg(char *arg, char *val)
+{
+	struct Options *o = OPTIONS(arg);
+
+	if (o) {
+	switch (o->t) {
+	case OTInt:
+	case OTBool:
+		o->v.Int = atoi(val);
+		break;
+	case OTLong:
+		o->v.Int = atoll(val);
+		break;
+	case OTString:
+		strncpy(o->v.Str, val, ARK_MAX_STR_LEN);
+		break;
+	}
+	return 1;
+	}
+	return 0;
+}
+
+/******
+ * Arg format = "opt0=v,optN=v ..."
+ ******/
+void
+ark_pmd_pktgen_parse(char *args)
+{
+	char *argv, *v;
+	const char toks[] = " =\n\t\v\f\r";
+
+	argv = strtok(args, toks);
+	v = strtok(NULL, toks);
+	pmd_setArg(argv, v);
+	while (argv && v) {
+	argv = strtok(NULL, toks);
+	v = strtok(NULL, toks);
+	if (argv && v)
+		pmd_setArg(argv, v);
+	}
+}
+
+static int32_t parseIPV4string(char const *ipAddress);
+static int32_t
+parseIPV4string(char const *ipAddress)
+{
+	unsigned int ip[4];
+
+	if (4 != sscanf(ipAddress, "%u.%u.%u.%u", &ip[0], &ip[1], &ip[2], &ip[3]))
+	return 0;
+	return ip[3] + ip[2] * 0x100 + ip[1] * 0x10000ul + ip[0] * 0x1000000ul;
+}
+
+static void
+ark_pmd_pktgen_setPktCtrl(ArkPktGen_t handle, uint32_t genForever,
+	uint32_t enSlavedStart, uint32_t varyLength, uint32_t incrPayload,
+	uint32_t incrFirstByte, uint32_t insSeqNum, uint32_t insUDPHdr,
+	uint32_t insTimeStamp)
+{
+	uint32_t r;
+	struct ArkPktGenInst *inst = (struct ArkPktGenInst *) handle;
+
+	if (!inst->l2_mode)
+		insUDPHdr = 0;
+
+	r = (genForever << 24) | (enSlavedStart << 20) | (varyLength << 16) |
+	(incrPayload << 12) | (incrFirstByte << 8) |
+	(insTimeStamp << 5) | (insSeqNum << 4) | insUDPHdr;
+
+	inst->regs->bytesPerCycle = OPTIONS("bytesPerCycle")->v.Int;
+	if (OPTIONS("shaping")->v.Bool)
+		r = r | (1 << 28);	/* enable shaping */
+
+
+	inst->regs->pktCtrl = r;
+}
+
+void
+ark_pmd_pktgen_setup(ArkPktGen_t handle)
+{
+	uint32_t hdr[7];
+	int32_t dstIp = parseIPV4string(OPTIONS("dstIP")->v.Str);
+
+	if (!OPTIONS("pause")->v.Bool && (!OPTIONS("reset")->v.Bool
+		&& (OPTIONS("configure")->v.Bool))) {
+
+	ark_pmd_pktgen_setPayloadByte(handle, OPTIONS("payloadByte")->v.Int);
+	ark_pmd_pktgen_setSrcMACAddr(handle, OPTIONS("srcMACAddr")->v.Int);
+	ark_pmd_pktgen_setDstMACAddr(handle, OPTIONS("dstMACAddr")->v.Long);
+	ark_pmd_pktgen_setEthType(handle, OPTIONS("ethType")->v.Int);
+
+	if (OPTIONS("dg-mode")->v.Bool) {
+		hdr[0] = OPTIONS("hdrDW0")->v.Int;
+		hdr[1] = OPTIONS("hdrDW1")->v.Int;
+		hdr[2] = OPTIONS("hdrDW2")->v.Int;
+		hdr[3] = OPTIONS("hdrDW3")->v.Int;
+		hdr[4] = OPTIONS("hdrDW4")->v.Int;
+		hdr[5] = OPTIONS("hdrDW5")->v.Int;
+		hdr[6] = OPTIONS("hdrDW6")->v.Int;
+	} else {
+		hdr[0] = dstIp;
+		hdr[1] = OPTIONS("dstPort")->v.Int;
+		hdr[2] = OPTIONS("srcPort")->v.Int;
+		hdr[3] = 0;
+		hdr[4] = 0;
+		hdr[5] = 0;
+		hdr[6] = 0;
+	}
+	ark_pmd_pktgen_setHdrDW(handle, hdr);
+	ark_pmd_pktgen_setNumPkts(handle, OPTIONS("numPkts")->v.Int);
+	ark_pmd_pktgen_setPktSizeMin(handle, OPTIONS("pktSizeMin")->v.Int);
+	ark_pmd_pktgen_setPktSizeMax(handle, OPTIONS("pktSizeMax")->v.Int);
+	ark_pmd_pktgen_setPktSizeIncr(handle, OPTIONS("pktSizeIncr")->v.Int);
+	ark_pmd_pktgen_setPktSpacing(handle, OPTIONS("pktSpacing")->v.Int);
+	ark_pmd_pktgen_setStartOffset(handle, OPTIONS("startOffset")->v.Int);
+	ark_pmd_pktgen_setPktCtrl(handle,
+		OPTIONS("genForever")->v.Bool,
+		OPTIONS("enSlavedStart")->v.Bool,
+		OPTIONS("varyLength")->v.Bool,
+		OPTIONS("incrPayload")->v.Bool,
+		OPTIONS("incrFirstByte")->v.Bool,
+		OPTIONS("insSeqNum")->v.Int,
+		OPTIONS("insUDPHdr")->v.Bool, OPTIONS("insTimeStamp")->v.Int);
+	}
+
+	if (OPTIONS("pause")->v.Bool)
+	ark_pmd_pktgen_pause(handle);
+
+	if (OPTIONS("reset")->v.Bool)
+	ark_pmd_pktgen_reset(handle);
+
+	if (OPTIONS("run")->v.Bool) {
+	ARK_DEBUG_TRACE("Starting packet generator on port %d\n",
+		OPTIONS("port")->v.Int);
+	ark_pmd_pktgen_run(handle);
+	}
+}
diff --git a/drivers/net/ark/ark_pktgen.h b/drivers/net/ark/ark_pktgen.h
new file mode 100644
index 0000000..3d81d60
--- /dev/null
+++ b/drivers/net/ark/ark_pktgen.h
@@ -0,0 +1,106 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_PKTGEN_H_
+#define _ARK_PKTGEN_H_
+
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_eal.h>
+
+#include <rte_ethdev.h>
+#include <rte_cycles.h>
+#include <rte_lcore.h>
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+
+#define ARK_PKTGEN_BASE_ADR  0x10000
+
+typedef void *ArkPktGen_t;
+
+struct ArkPktGenRegs {
+	uint32_t r0;
+	volatile uint32_t pktStartStop;
+	volatile uint32_t pktCtrl;
+	uint32_t pktPayload;
+	uint32_t pktSpacing;
+	uint32_t pktSizeMin;
+	uint32_t pktSizeMax;
+	uint32_t pktSizeIncr;
+	volatile uint32_t numPkts;
+	volatile uint32_t pktsSent;
+	uint32_t srcMACAddrL;
+	uint32_t srcMACAddrH;
+	uint32_t dstMACAddrL;
+	uint32_t dstMACAddrH;
+	uint32_t ethType;
+	uint32_t hdrDW[7];
+	uint32_t startOffset;
+	uint32_t bytesPerCycle;
+} __attribute__ ((packed));
+
+struct ArkPktGenInst {
+	struct rte_eth_dev_info *dev_info;
+	struct ArkPktGenRegs *regs;
+	int l2_mode;
+	int ordinal;
+};
+
+/*  packet generator functions */
+ArkPktGen_t ark_pmd_pktgen_init(void *, int ord, int l2_mode);
+void ark_pmd_pktgen_uninit(ArkPktGen_t handle);
+void ark_pmd_pktgen_run(ArkPktGen_t handle);
+void ark_pmd_pktgen_pause(ArkPktGen_t handle);
+uint32_t ark_pmd_pktgen_paused(ArkPktGen_t handle);
+uint32_t ark_pmd_pktgen_isGenForever(ArkPktGen_t handle);
+uint32_t ark_pmd_pktgen_isRunning(ArkPktGen_t handle);
+uint32_t ark_pmd_pktgen_txDone(ArkPktGen_t handle);
+void ark_pmd_pktgen_reset(ArkPktGen_t handle);
+void ark_pmd_pktgen_waitDone(ArkPktGen_t handle);
+uint32_t ark_pmd_pktgen_getPktsSent(ArkPktGen_t handle);
+void ark_pmd_pktgen_setPayloadByte(ArkPktGen_t handle, uint32_t b);
+void ark_pmd_pktgen_setPktSpacing(ArkPktGen_t handle, uint32_t x);
+void ark_pmd_pktgen_setPktSizeMin(ArkPktGen_t handle, uint32_t x);
+void ark_pmd_pktgen_setPktSizeMax(ArkPktGen_t handle, uint32_t x);
+void ark_pmd_pktgen_setPktSizeIncr(ArkPktGen_t handle, uint32_t x);
+void ark_pmd_pktgen_setNumPkts(ArkPktGen_t handle, uint32_t x);
+void ark_pmd_pktgen_setSrcMACAddr(ArkPktGen_t handle, uint64_t macAddr);
+void ark_pmd_pktgen_setDstMACAddr(ArkPktGen_t handle, uint64_t macAddr);
+void ark_pmd_pktgen_setEthType(ArkPktGen_t handle, uint32_t x);
+void ark_pmd_pktgen_setHdrDW(ArkPktGen_t handle, uint32_t *hdr);
+void ark_pmd_pktgen_setStartOffset(ArkPktGen_t handle, uint32_t x);
+void ark_pmd_pktgen_parse(char *argv);
+void ark_pmd_pktgen_setup(ArkPktGen_t handle);
+
+#endif
diff --git a/drivers/net/ark/ark_rqp.c b/drivers/net/ark/ark_rqp.c
new file mode 100644
index 0000000..468c6d4
--- /dev/null
+++ b/drivers/net/ark/ark_rqp.c
@@ -0,0 +1,93 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_rqp.h"
+#include "ark_debug.h"
+
+/* ************************************************************************* */
+void
+ark_rqp_stats_reset(struct ark_rqpace_t *rqp)
+{
+	rqp->statsClear = 1;
+	/* POR 992 */
+	/* rqp->cpld_max = 992; */
+	/* POR 64 */
+	/* rqp->cplh_max = 64; */
+
+}
+
+/* ************************************************************************* */
+void
+ark_rqp_dump(struct ark_rqpace_t *rqp)
+{
+	if (rqp->errCountOther != 0)
+	fprintf
+		(stderr,
+		"ARKP RQP Errors noted: ctrl: %d cplh_hmax %d cpld_max %d"
+		 FMT_SU32
+		FMT_SU32 "\n",
+		 rqp->ctrl, rqp->cplh_max, rqp->cpld_max,
+		"Error Count", rqp->errCnt,
+		 "Error General", rqp->errCountOther);
+
+	ARK_DEBUG_STATS
+		("ARKP RQP Dump: ctrl: %d cplh_hmax %d cpld_max %d" FMT_SU32
+		 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32
+		 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32
+		 FMT_SU32 FMT_SU32 FMT_SU32
+		 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 "\n",
+		 rqp->ctrl, rqp->cplh_max, rqp->cpld_max,
+		 "Error Count", rqp->errCnt,
+		 "Error General", rqp->errCountOther,
+		 "stallPS", rqp->stallPS,
+		 "stallPS Min", rqp->stallPSMin,
+		 "stallPS Max", rqp->stallPSMax,
+		 "reqPS", rqp->reqPS,
+		 "reqPS Min", rqp->reqPSMin,
+		 "reqPS Max", rqp->reqPSMax,
+		 "reqDWPS", rqp->reqDWPS,
+		 "reqDWPS Min", rqp->reqDWPSMin,
+		 "reqDWPS Max", rqp->reqDWPSMax,
+		 "cplPS", rqp->cplPS,
+		 "cplPS Min", rqp->cplPSMin,
+		 "cplPS Max", rqp->cplPSMax,
+		 "cplDWPS", rqp->cplDWPS,
+		 "cplDWPS Min", rqp->cplDWPSMin,
+		 "cplDWPS Max", rqp->cplDWPSMax,
+		 "cplh pending", rqp->cplh_pending,
+		 "cpld pending", rqp->cpld_pending,
+		 "cplh pending max", rqp->cplh_pending_max,
+		 "cpld pending max", rqp->cpld_pending_max);
+}
diff --git a/drivers/net/ark/ark_rqp.h b/drivers/net/ark/ark_rqp.h
new file mode 100644
index 0000000..1c11c41
--- /dev/null
+++ b/drivers/net/ark/ark_rqp.h
@@ -0,0 +1,75 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_RQP_H_
+#define _ARK_RQP_H_
+
+#include <stdint.h>
+
+#include <rte_memory.h>
+
+/*
+ * RQ Pacing core hardware structure
+ */
+struct ark_rqpace_t {
+	volatile uint32_t ctrl;
+	volatile uint32_t statsClear;
+	volatile uint32_t cplh_max;
+	volatile uint32_t cpld_max;
+	volatile uint32_t errCnt;
+	volatile uint32_t stallPS;
+	volatile uint32_t stallPSMin;
+	volatile uint32_t stallPSMax;
+	volatile uint32_t reqPS;
+	volatile uint32_t reqPSMin;
+	volatile uint32_t reqPSMax;
+	volatile uint32_t reqDWPS;
+	volatile uint32_t reqDWPSMin;
+	volatile uint32_t reqDWPSMax;
+	volatile uint32_t cplPS;
+	volatile uint32_t cplPSMin;
+	volatile uint32_t cplPSMax;
+	volatile uint32_t cplDWPS;
+	volatile uint32_t cplDWPSMin;
+	volatile uint32_t cplDWPSMax;
+	volatile uint32_t cplh_pending;
+	volatile uint32_t cpld_pending;
+	volatile uint32_t cplh_pending_max;
+	volatile uint32_t cpld_pending_max;
+	volatile uint32_t errCountOther;
+};
+
+void ark_rqp_dump(struct ark_rqpace_t *rqp);
+void ark_rqp_stats_reset(struct ark_rqpace_t *rqp);
+
+#endif
diff --git a/drivers/net/ark/ark_udm.c b/drivers/net/ark/ark_udm.c
new file mode 100644
index 0000000..a239c54
--- /dev/null
+++ b/drivers/net/ark/ark_udm.c
@@ -0,0 +1,221 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+
+#include "ark_debug.h"
+#include "ark_udm.h"
+
+int
+ark_udm_verify(struct ark_udm_t *udm)
+{
+	if (sizeof(struct ark_udm_t) != ARK_UDM_EXPECT_SIZE) {
+		fprintf(stderr, "  UDM structure looks incorrect %#x vs %#lx\n",
+				ARK_UDM_EXPECT_SIZE, sizeof(struct ark_udm_t));
+		return -1;
+	}
+
+	if (udm->setup.const0 != ARK_UDM_CONST) {
+		fprintf(stderr, "  UDM module not found as expected 0x%08x\n",
+				udm->setup.const0);
+		return -1;
+	}
+	return 0;
+}
+
+int
+ark_udm_stop(struct ark_udm_t *udm, const int wait)
+{
+	int cnt = 0;
+
+	udm->cfg.command = 2;
+
+	while (wait && (udm->cfg.stopFlushed & 0x01) == 0) {
+	if (cnt++ > 1000)
+		return 1;
+
+	usleep(10);
+	}
+	return 0;
+}
+
+int
+ark_udm_reset(struct ark_udm_t *udm)
+{
+	int status;
+
+	status = ark_udm_stop(udm, 1);
+	if (status != 0) {
+		ARK_DEBUG_TRACE("ARKP: %s  stop failed  doing forced reset\n",
+						__func__);
+		udm->cfg.command = 4;
+		usleep(10);
+		udm->cfg.command = 3;
+		status = ark_udm_stop(udm, 0);
+		ARK_DEBUG_TRACE
+			("ARKP: %s  stop status %d post failure and forced reset\n",
+			 __func__, status);
+	} else {
+		udm->cfg.command = 3;
+	}
+
+	return status;
+}
+
+void
+ark_udm_start(struct ark_udm_t *udm)
+{
+	udm->cfg.command = 1;
+}
+
+void
+ark_udm_stats_reset(struct ark_udm_t *udm)
+{
+	udm->pcibp.pci_clear = 1;
+	udm->tlp_ps.tlp_clear = 1;
+}
+
+void
+ark_udm_configure(struct ark_udm_t *udm, uint32_t headroom, uint32_t dataroom,
+	uint32_t write_interval_ns)
+{
+	/* headroom and data room are in DWs in the UDM */
+	udm->cfg.dataroom = dataroom / 4;
+	udm->cfg.headroom = headroom / 4;
+
+	/* 4 NS period ns */
+	udm->rt_cfg.writeInterval = write_interval_ns / 4;
+}
+
+void
+ark_udm_write_addr(struct ark_udm_t *udm, phys_addr_t addr)
+{
+	udm->rt_cfg.hwProdAddr = addr;
+}
+
+int
+ark_udm_is_flushed(struct ark_udm_t *udm)
+{
+	return (udm->cfg.stopFlushed & 0x01) != 0;
+}
+
+uint64_t
+ark_udm_dropped(struct ark_udm_t *udm)
+{
+	return udm->qstats.qPktDrop;
+}
+
+uint64_t
+ark_udm_bytes(struct ark_udm_t *udm)
+{
+	return udm->qstats.qByteCount;
+}
+
+uint64_t
+ark_udm_packets(struct ark_udm_t *udm)
+{
+	return udm->qstats.qFFPacketCount;
+}
+
+void
+ark_udm_dump_stats(struct ark_udm_t *udm, const char *msg)
+{
+	ARK_DEBUG_STATS("ARKP UDM Stats: %s" FMT_SU64 FMT_SU64 FMT_SU64 FMT_SU64
+	FMT_SU64 "\n", msg, "Pkts Received", udm->stats.rxPacketCount,
+	"Pkts Finalized", udm->stats.rxSentPackets, "Pkts Dropped",
+	udm->tlp.pkt_drop, "Bytes Count", udm->stats.rxByteCount, "MBuf Count",
+	udm->stats.rxMBufCount);
+}
+
+void
+ark_udm_dump_queue_stats(struct ark_udm_t *udm, const char *msg, uint16_t qid)
+{
+	ARK_DEBUG_STATS
+		("ARKP UDM Queue %3u Stats: %s"
+		 FMT_SU64 FMT_SU64
+		 FMT_SU64 FMT_SU64
+		 FMT_SU64 "\n",
+		 qid, msg,
+		 "Pkts Received", udm->qstats.qPacketCount,
+		 "Pkts Finalized", udm->qstats.qFFPacketCount,
+		 "Pkts Dropped", udm->qstats.qPktDrop,
+		 "Bytes Count", udm->qstats.qByteCount,
+		 "MBuf Count", udm->qstats.qMbufCount);
+}
+
+void
+ark_udm_dump(struct ark_udm_t *udm, const char *msg)
+{
+	ARK_DEBUG_TRACE("ARKP UDM Dump: %s Stopped: %d\n", msg,
+	udm->cfg.stopFlushed);
+}
+
+void
+ark_udm_dump_setup(struct ark_udm_t *udm, uint16_t qId)
+{
+	ARK_DEBUG_TRACE
+		("UDM Setup Q: %u"
+		 FMT_SPTR FMT_SU32 "\n",
+		 qId,
+		 "hwProdAddr", (void *) udm->rt_cfg.hwProdAddr,
+		 "prodIdx", udm->rt_cfg.prodIdx);
+}
+
+void
+ark_udm_dump_perf(struct ark_udm_t *udm, const char *msg)
+{
+	struct ark_udm_pcibp_t *bp = &udm->pcibp;
+
+	ARK_DEBUG_STATS
+		("ARKP UDM Performance %s"
+		 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 FMT_SU32 "\n",
+		 msg,
+		 "PCI Empty", bp->pci_empty,
+		 "PCI Q1", bp->pci_q1,
+		 "PCI Q2", bp->pci_q2,
+		 "PCI Q3", bp->pci_q3,
+		 "PCI Q4", bp->pci_q4,
+		 "PCI Full", bp->pci_full);
+}
+
+void
+ark_udm_queue_stats_reset(struct ark_udm_t *udm)
+{
+	udm->qstats.qByteCount = 1;
+}
+
+void
+ark_udm_queue_enable(struct ark_udm_t *udm, int enable)
+{
+	udm->qstats.qEnable = enable ? 1 : 0;
+}
diff --git a/drivers/net/ark/ark_udm.h b/drivers/net/ark/ark_udm.h
new file mode 100644
index 0000000..89dcf8a
--- /dev/null
+++ b/drivers/net/ark/ark_udm.h
@@ -0,0 +1,175 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright (c) 2015-2017 Atomic Rules LLC
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of copyright holder nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _ARK_UDM_H_
+#define _ARK_UDM_H_
+
+#include <stdint.h>
+
+#include <rte_memory.h>
+
+/*
+ * UDM hardware structures
+ */
+
+#define ARK_RX_WRITE_TIME_NS 2500
+#define ARK_UDM_SETUP 0
+#define ARK_UDM_CONST 0xBACECACE
+struct ark_udm_setup_t {
+	uint32_t r0;
+	uint32_t r4;
+	volatile uint32_t cycleCount;
+	uint32_t const0;
+};
+
+#define ARK_UDM_CFG 0x010
+struct ark_udm_cfg_t {
+	volatile uint32_t stopFlushed;	/* RO */
+	volatile uint32_t command;
+	uint32_t dataroom;
+	uint32_t headroom;
+};
+
+typedef enum {
+	ARK_UDM_START = 0x1,
+	ARK_UDM_STOP = 0x2,
+	ARK_UDM_RESET = 0x3
+} ArkUdmCommands;
+
+#define ARK_UDM_STATS 0x020
+struct ark_udm_stats_t {
+	volatile uint64_t rxByteCount;
+	volatile uint64_t rxPacketCount;
+	volatile uint64_t rxMBufCount;
+	volatile uint64_t rxSentPackets;
+};
+
+#define ARK_UDM_PQ 0x040
+struct ark_udm_queue_stats_t {
+	volatile uint64_t qByteCount;
+	volatile uint64_t qPacketCount;	/* includes drops */
+	volatile uint64_t qMbufCount;
+	volatile uint64_t qFFPacketCount;
+	volatile uint64_t qPktDrop;
+	uint32_t qEnable;
+};
+
+#define ARK_UDM_TLP 0x0070
+struct ark_udm_tlp_t {
+	volatile uint64_t pkt_drop;	/* global */
+	volatile uint32_t tlp_q1;
+	volatile uint32_t tlp_q2;
+	volatile uint32_t tlp_q3;
+	volatile uint32_t tlp_q4;
+	volatile uint32_t tlp_full;
+};
+
+#define ARK_UDM_PCIBP 0x00a0
+struct ark_udm_pcibp_t {
+	volatile uint32_t pci_clear;
+	volatile uint32_t pci_empty;
+	volatile uint32_t pci_q1;
+	volatile uint32_t pci_q2;
+	volatile uint32_t pci_q3;
+	volatile uint32_t pci_q4;
+	volatile uint32_t pci_full;
+};
+
+#define ARK_UDM_TLP_PS 0x00bc
+struct ark_udm_tlp_ps_t {
+	volatile uint32_t tlp_clear;
+	volatile uint32_t tlp_ps_min;
+	volatile uint32_t tlp_ps_max;
+	volatile uint32_t tlp_full_ps_min;
+	volatile uint32_t tlp_full_ps_max;
+	volatile uint32_t tlp_dw_ps_min;
+	volatile uint32_t tlp_dw_ps_max;
+	volatile uint32_t tlp_pldw_ps_min;
+	volatile uint32_t tlp_pldw_ps_max;
+};
+
+#define ARK_UDM_RT_CFG 0x00E0
+struct ark_udm_rt_cfg_t {
+	phys_addr_t hwProdAddr;
+	uint32_t writeInterval;	/* 4ns cycles */
+	volatile uint32_t prodIdx;	/* RO */
+};
+
+/*  Consolidated structure */
+struct ark_udm_t {
+	struct ark_udm_setup_t setup;
+	struct ark_udm_cfg_t cfg;
+	struct ark_udm_stats_t stats;
+	struct ark_udm_queue_stats_t qstats;
+	uint8_t reserved1[(ARK_UDM_TLP - ARK_UDM_PQ) -
+					  sizeof(struct ark_udm_queue_stats_t)];
+	struct ark_udm_tlp_t tlp;
+	uint8_t reserved2[(ARK_UDM_PCIBP - ARK_UDM_TLP) -
+					  sizeof(struct ark_udm_tlp_t)];
+	struct ark_udm_pcibp_t pcibp;
+	struct ark_udm_tlp_ps_t tlp_ps;
+	struct ark_udm_rt_cfg_t rt_cfg;
+	int8_t reserved3[(0x100 - ARK_UDM_RT_CFG) -
+					  sizeof(struct ark_udm_rt_cfg_t)];
+};
+
+#define ARK_UDM_EXPECT_SIZE (0x00fc + 4)
+#define ARK_UDM_QOFFSET ARK_UDM_EXPECT_SIZE
+
+int ark_udm_verify(struct ark_udm_t *udm);
+int ark_udm_stop(struct ark_udm_t *udm, int wait);
+void ark_udm_start(struct ark_udm_t *udm);
+int ark_udm_reset(struct ark_udm_t *udm);
+void ark_udm_configure(struct ark_udm_t *udm,
+					   uint32_t headroom,
+					   uint32_t dataroom,
+					   uint32_t write_interval_ns);
+void ark_udm_write_addr(struct ark_udm_t *udm, phys_addr_t addr);
+void ark_udm_stats_reset(struct ark_udm_t *udm);
+void ark_udm_dump_stats(struct ark_udm_t *udm, const char *msg);
+void ark_udm_dump_queue_stats(struct ark_udm_t *udm, const char *msg,
+							  uint16_t qid);
+void ark_udm_dump(struct ark_udm_t *udm, const char *msg);
+void ark_udm_dump_perf(struct ark_udm_t *udm, const char *msg);
+void ark_udm_dump_setup(struct ark_udm_t *udm, uint16_t qId);
+int ark_udm_is_flushed(struct ark_udm_t *udm);
+
+/* Per queue data */
+uint64_t ark_udm_dropped(struct ark_udm_t *udm);
+uint64_t ark_udm_bytes(struct ark_udm_t *udm);
+uint64_t ark_udm_packets(struct ark_udm_t *udm);
+
+void ark_udm_queue_stats_reset(struct ark_udm_t *udm);
+void ark_udm_queue_enable(struct ark_udm_t *udm, int enable);
+
+#endif
diff --git a/drivers/net/ark/rte_pmd_ark_version.map b/drivers/net/ark/rte_pmd_ark_version.map
new file mode 100644
index 0000000..7f84780
--- /dev/null
+++ b/drivers/net/ark/rte_pmd_ark_version.map
@@ -0,0 +1,4 @@
+DPDK_2.0 {
+	 local: *;
+
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0e0b600..da23898 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -104,6 +104,7 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET)  += -lrte_pmd_af_packet
+_LDLIBS-$(CONFIG_RTE_LIBRTE_ARK_PMD)        += -lrte_pmd_ark
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD)      += -lrte_pmd_bnx2x -lz
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BNXT_PMD)       += -lrte_pmd_bnxt
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
1.9.1

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v2] mk: Provide option to set Major ABI version
  2017-03-17  8:27  4%                 ` Christian Ehrhardt
@ 2017-03-17  9:16  4%                   ` Jan Blunck
  0 siblings, 0 replies; 200+ results
From: Jan Blunck @ 2017-03-17  9:16 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: Thomas Monjalon, dev, cjcollier, ricardo.salveti, Luca Boccassi

On Fri, Mar 17, 2017 at 9:27 AM, Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
>
> On Thu, Mar 16, 2017 at 6:19 PM, Thomas Monjalon <thomas.monjalon@6wind.com>
> wrote:
>>
>> Not sure about how it can be used in distributions, but it does not hurt
>> to provide the config option.
>> Are you going to link applications against a fixed DPDK version for
>> every libraries?
>
>
> Kind of yes - we can still update "inside" a major version e.g. stable
> releases just fine.
> That helps a lot transitioning from one to the next release.

Exactly. It enables me to roll out a new major release without being
blocked on the (external) consumers to pick up the changes. Now I can
do this one by one with reduced risk because the runtimes are clearly
separated.

> And I already heard that even other downstreams are using it to simplify
> their dependencies.
>

Downstream?! Pha! ;)

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] mk: Provide option to set Major ABI version
  2017-03-16 17:19  4%               ` Thomas Monjalon
@ 2017-03-17  8:27  4%                 ` Christian Ehrhardt
  2017-03-17  9:16  4%                   ` Jan Blunck
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2017-03-17  8:27 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Jan Blunck, cjcollier, ricardo.salveti, Luca Boccassi

On Thu, Mar 16, 2017 at 6:19 PM, Thomas Monjalon <thomas.monjalon@6wind.com>
wrote:

> Not sure about how it can be used in distributions, but it does not hurt
> to provide the config option.
> Are you going to link applications against a fixed DPDK version for
> every libraries?
>

Kind of yes - we can still update "inside" a major version e.g. stable
releases just fine.
That helps a lot transitioning from one to the next release.

I have an RFC patch for the Debian packaging making use of it this function
that we will likely refresh and pick up on our next merge of a DPDK version.
And I already heard that even other downstreams are using it to simplify
their dependencies.

Applied, thanks
>

Thanks !


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back
  2017-03-17  0:08  0%               ` Wiles, Keith
@ 2017-03-17  0:15  0%                 ` O'Driscoll, Tim
  0 siblings, 0 replies; 200+ results
From: O'Driscoll, Tim @ 2017-03-17  0:15 UTC (permalink / raw)
  To: Wiles, Keith, Vincent JARDIN
  Cc: Stephen Hemminger, Legacy, Allain (Wind River),
	Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, thomas.monjalon, jerin.jacob,
	3chas3, stefanha, Markus Armbruster

> From: Wiles, Keith
> 
> 
> > On Mar 17, 2017, at 7:41 AM, Vincent JARDIN <vincent.jardin@6wind.com>
> wrote:
> >
> > Let's be back to 2014 with Qemu's thoughts on it,
> > +Stefan
> > https://lists.linuxfoundation.org/pipermail/virtualization/2014-
> June/026767.html
> >
> > and
> > +Markus
> > https://lists.linuxfoundation.org/pipermail/virtualization/2014-
> June/026713.html
> >
> >> 6. Device models belong into QEMU
> >>
> >>   Say you build an actual interface on top of ivshmem.  Then ivshmem
> in
> >>   QEMU together with the supporting host code outside QEMU (see 3.)
> and
> >>   the lower layer of the code using it in guests (kernel + user
> space)
> >>   provide something that to me very much looks like a device model.
> >>
> >>   Device models belong into QEMU.  It's what QEMU does.
> >
> >
> > Le 17/03/2017 à 00:17, Stephen Hemminger a écrit :
> >> On Wed, 15 Mar 2017 04:10:56 +0000
> >> "O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:
> >>
> >>> I've included a couple of specific comments inline below, and a
> general comment here.
> >>>
> >>> We have somebody proposing to add a new driver to DPDK. It's
> standalone and doesn't affect any of the core libraries. They're willing
> to maintain the driver and have included a patch to update the
> maintainers file. They've also included the relevant documentation
> changes. I haven't seen any negative comment on the patches themselves
> except for a request from John McNamara for an update to the Release
> Notes that was addressed in a later version. I think we should be
> welcoming this into DPDK rather than questioning/rejecting it.
> >>>
> >>> I'd suggest that this is a good topic for the next Tech Board
> meeting.
> >>
> >> This is a virtualization driver for supporting DPDK on platform that
> provides an alternative
> >> virtual network driver. I see no reason it shouldn't be part of DPDK.
> Given the unstable
> >> ABI for drivers, supporting out of tree DPDK drivers is difficult.
> The DPDK should try
> >> to be inclusive and support as many environments as possible.
> 
> 
> +2!! for Stephen’s comment.

+1 (I'm only half as important as Keith :-)

I don't think there's any doubt over the benefit of virtio and the fact that it should be our preferred solution. I'm sure everybody agrees on that. The issue is whether we should block alternative solutions. I don't think we should.

> 
> >>
> >
> > On Qemu mailing list, back to 2014, I did push to build models of
> devices over ivshmem, like AVP, but folks did not want that we abuse of
> it. The Qemu community wants that we avoid unfocusing. So, by being too
> much inclusive, we abuse of the Qemu's capabilities.
> >
> > So, because of being "inclusive", we should allow any cases, it is not
> a proper way to make sure that virtio gets all the focuses it deserves.
> >
> >
> 
> Regards,
> Keith


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back
  2017-03-17  0:11  0%               ` Wiles, Keith
@ 2017-03-17  0:14  0%                 ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2017-03-17  0:14 UTC (permalink / raw)
  To: Wiles, Keith
  Cc: Vincent JARDIN, O'Driscoll, Tim, Legacy, Allain (Wind River),
	Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, thomas.monjalon, jerin.jacob,
	3chas3, stefanha, Markus Armbruster

On Fri, 17 Mar 2017 00:11:10 +0000
"Wiles, Keith" <keith.wiles@intel.com> wrote:

> > On Mar 17, 2017, at 7:41 AM, Vincent JARDIN <vincent.jardin@6wind.com> wrote:
> > 
> > Let's be back to 2014 with Qemu's thoughts on it,
> > +Stefan
> > https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026767.html
> > 
> > and
> > +Markus
> > https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026713.html
> >   
> >> 6. Device models belong into QEMU
> >> 
> >>   Say you build an actual interface on top of ivshmem.  Then ivshmem in
> >>   QEMU together with the supporting host code outside QEMU (see 3.) and
> >>   the lower layer of the code using it in guests (kernel + user space)
> >>   provide something that to me very much looks like a device model.
> >> 
> >>   Device models belong into QEMU.  It's what QEMU does.  
> > 
> > 
> > Le 17/03/2017 à 00:17, Stephen Hemminger a écrit :  
> >> On Wed, 15 Mar 2017 04:10:56 +0000
> >> "O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:
> >>   
> >>> I've included a couple of specific comments inline below, and a general comment here.
> >>> 
> >>> We have somebody proposing to add a new driver to DPDK. It's standalone and doesn't affect any of the core libraries. They're willing to maintain the driver and have included a patch to update the maintainers file. They've also included the relevant documentation changes. I haven't seen any negative comment on the patches themselves except for a request from John McNamara for an update to the Release Notes that was addressed in a later version. I think we should be welcoming this into DPDK rather than questioning/rejecting it.
> >>> 
> >>> I'd suggest that this is a good topic for the next Tech Board meeting.  
> >> 
> >> This is a virtualization driver for supporting DPDK on platform that provides an alternative
> >> virtual network driver. I see no reason it shouldn't be part of DPDK. Given the unstable
> >> ABI for drivers, supporting out of tree DPDK drivers is difficult. The DPDK should try
> >> to be inclusive and support as many environments as possible.
> >>   
> > 
> > On Qemu mailing list, back to 2014, I did push to build models of devices over ivshmem, like AVP, but folks did not want that we abuse of it. The Qemu community wants that we avoid unfocusing. So, by being too much inclusive, we abuse of the Qemu's capabilities.
> > 
> > So, because of being "inclusive", we should allow any cases, it is not a proper way to make sure that virtio gets all the focuses it deserves.
> > 
> >   
> 
> Why are we bring QEMU in to the picture it does not make a lot of sense to me. Stephen stated it well above and I hope my comments were stating the same conclusion. I do not see your real reasons for not allowing this driver into DPDK, it seems like some other hidden agenda is at play here, but I am a paranoid person :-)

I am thinking of people already using Windriver systems. One can argue all you want that
they should be using QEMU/KVM/Virtio or 6Wind Virtual Accelerator but it is not the
role of DPDK to be used to influence customers architecture decisions.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back
  2017-03-16 23:41  0%             ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back Vincent JARDIN
  2017-03-17  0:08  0%               ` Wiles, Keith
@ 2017-03-17  0:11  0%               ` Wiles, Keith
  2017-03-17  0:14  0%                 ` Stephen Hemminger
  1 sibling, 1 reply; 200+ results
From: Wiles, Keith @ 2017-03-17  0:11 UTC (permalink / raw)
  To: Vincent JARDIN
  Cc: Stephen Hemminger, O'Driscoll, Tim, Legacy,
	Allain (Wind River),
	Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, thomas.monjalon, jerin.jacob,
	3chas3, stefanha, Markus Armbruster


> On Mar 17, 2017, at 7:41 AM, Vincent JARDIN <vincent.jardin@6wind.com> wrote:
> 
> Let's be back to 2014 with Qemu's thoughts on it,
> +Stefan
> https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026767.html
> 
> and
> +Markus
> https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026713.html
> 
>> 6. Device models belong into QEMU
>> 
>>   Say you build an actual interface on top of ivshmem.  Then ivshmem in
>>   QEMU together with the supporting host code outside QEMU (see 3.) and
>>   the lower layer of the code using it in guests (kernel + user space)
>>   provide something that to me very much looks like a device model.
>> 
>>   Device models belong into QEMU.  It's what QEMU does.
> 
> 
> Le 17/03/2017 à 00:17, Stephen Hemminger a écrit :
>> On Wed, 15 Mar 2017 04:10:56 +0000
>> "O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:
>> 
>>> I've included a couple of specific comments inline below, and a general comment here.
>>> 
>>> We have somebody proposing to add a new driver to DPDK. It's standalone and doesn't affect any of the core libraries. They're willing to maintain the driver and have included a patch to update the maintainers file. They've also included the relevant documentation changes. I haven't seen any negative comment on the patches themselves except for a request from John McNamara for an update to the Release Notes that was addressed in a later version. I think we should be welcoming this into DPDK rather than questioning/rejecting it.
>>> 
>>> I'd suggest that this is a good topic for the next Tech Board meeting.
>> 
>> This is a virtualization driver for supporting DPDK on platform that provides an alternative
>> virtual network driver. I see no reason it shouldn't be part of DPDK. Given the unstable
>> ABI for drivers, supporting out of tree DPDK drivers is difficult. The DPDK should try
>> to be inclusive and support as many environments as possible.
>> 
> 
> On Qemu mailing list, back to 2014, I did push to build models of devices over ivshmem, like AVP, but folks did not want that we abuse of it. The Qemu community wants that we avoid unfocusing. So, by being too much inclusive, we abuse of the Qemu's capabilities.
> 
> So, because of being "inclusive", we should allow any cases, it is not a proper way to make sure that virtio gets all the focuses it deserves.
> 
> 

Why are we bring QEMU in to the picture it does not make a lot of sense to me. Stephen stated it well above and I hope my comments were stating the same conclusion. I do not see your real reasons for not allowing this driver into DPDK, it seems like some other hidden agenda is at play here, but I am a paranoid person :-)


Regards,
Keith


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back
  2017-03-16 23:41  0%             ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back Vincent JARDIN
@ 2017-03-17  0:08  0%               ` Wiles, Keith
  2017-03-17  0:15  0%                 ` O'Driscoll, Tim
  2017-03-17  0:11  0%               ` Wiles, Keith
  1 sibling, 1 reply; 200+ results
From: Wiles, Keith @ 2017-03-17  0:08 UTC (permalink / raw)
  To: Vincent JARDIN
  Cc: Stephen Hemminger, O'Driscoll, Tim, Legacy,
	Allain (Wind River),
	Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, thomas.monjalon, jerin.jacob,
	3chas3, stefanha, Markus Armbruster


> On Mar 17, 2017, at 7:41 AM, Vincent JARDIN <vincent.jardin@6wind.com> wrote:
> 
> Let's be back to 2014 with Qemu's thoughts on it,
> +Stefan
> https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026767.html
> 
> and
> +Markus
> https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026713.html
> 
>> 6. Device models belong into QEMU
>> 
>>   Say you build an actual interface on top of ivshmem.  Then ivshmem in
>>   QEMU together with the supporting host code outside QEMU (see 3.) and
>>   the lower layer of the code using it in guests (kernel + user space)
>>   provide something that to me very much looks like a device model.
>> 
>>   Device models belong into QEMU.  It's what QEMU does.
> 
> 
> Le 17/03/2017 à 00:17, Stephen Hemminger a écrit :
>> On Wed, 15 Mar 2017 04:10:56 +0000
>> "O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:
>> 
>>> I've included a couple of specific comments inline below, and a general comment here.
>>> 
>>> We have somebody proposing to add a new driver to DPDK. It's standalone and doesn't affect any of the core libraries. They're willing to maintain the driver and have included a patch to update the maintainers file. They've also included the relevant documentation changes. I haven't seen any negative comment on the patches themselves except for a request from John McNamara for an update to the Release Notes that was addressed in a later version. I think we should be welcoming this into DPDK rather than questioning/rejecting it.
>>> 
>>> I'd suggest that this is a good topic for the next Tech Board meeting.
>> 
>> This is a virtualization driver for supporting DPDK on platform that provides an alternative
>> virtual network driver. I see no reason it shouldn't be part of DPDK. Given the unstable
>> ABI for drivers, supporting out of tree DPDK drivers is difficult. The DPDK should try
>> to be inclusive and support as many environments as possible.


+2!! for Stephen’s comment.

>> 
> 
> On Qemu mailing list, back to 2014, I did push to build models of devices over ivshmem, like AVP, but folks did not want that we abuse of it. The Qemu community wants that we avoid unfocusing. So, by being too much inclusive, we abuse of the Qemu's capabilities.
> 
> So, because of being "inclusive", we should allow any cases, it is not a proper way to make sure that virtio gets all the focuses it deserves.
> 
> 

Regards,
Keith


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back
  2017-03-16 23:17  3%           ` Stephen Hemminger
@ 2017-03-16 23:41  0%             ` Vincent JARDIN
  2017-03-17  0:08  0%               ` Wiles, Keith
  2017-03-17  0:11  0%               ` Wiles, Keith
  0 siblings, 2 replies; 200+ results
From: Vincent JARDIN @ 2017-03-16 23:41 UTC (permalink / raw)
  To: Stephen Hemminger, O'Driscoll, Tim, Legacy, Allain (Wind River)
  Cc: Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, Wiles, Keith, thomas.monjalon,
	jerin.jacob, 3chas3, stefanha, Markus Armbruster

Let's be back to 2014 with Qemu's thoughts on it,
+Stefan
 
https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026767.html

and
+Markus
 
https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026713.html

> 6. Device models belong into QEMU
>
>    Say you build an actual interface on top of ivshmem.  Then ivshmem in
>    QEMU together with the supporting host code outside QEMU (see 3.) and
>    the lower layer of the code using it in guests (kernel + user space)
>    provide something that to me very much looks like a device model.
>
>    Device models belong into QEMU.  It's what QEMU does.


Le 17/03/2017 à 00:17, Stephen Hemminger a écrit :
> On Wed, 15 Mar 2017 04:10:56 +0000
> "O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:
>
>> I've included a couple of specific comments inline below, and a general comment here.
>>
>> We have somebody proposing to add a new driver to DPDK. It's standalone and doesn't affect any of the core libraries. They're willing to maintain the driver and have included a patch to update the maintainers file. They've also included the relevant documentation changes. I haven't seen any negative comment on the patches themselves except for a request from John McNamara for an update to the Release Notes that was addressed in a later version. I think we should be welcoming this into DPDK rather than questioning/rejecting it.
>>
>> I'd suggest that this is a good topic for the next Tech Board meeting.
>
> This is a virtualization driver for supporting DPDK on platform that provides an alternative
> virtual network driver. I see no reason it shouldn't be part of DPDK. Given the unstable
> ABI for drivers, supporting out of tree DPDK drivers is difficult. The DPDK should try
> to be inclusive and support as many environments as possible.
>

On Qemu mailing list, back to 2014, I did push to build models of 
devices over ivshmem, like AVP, but folks did not want that we abuse of 
it. The Qemu community wants that we avoid unfocusing. So, by being too 
much inclusive, we abuse of the Qemu's capabilities.

So, because of being "inclusive", we should allow any cases, it is not a 
proper way to make sure that virtio gets all the focuses it deserves.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio?
  @ 2017-03-16 23:17  3%           ` Stephen Hemminger
  2017-03-16 23:41  0%             ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back Vincent JARDIN
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2017-03-16 23:17 UTC (permalink / raw)
  To: O'Driscoll, Tim
  Cc: Vincent JARDIN, Legacy, Allain (Wind River),
	Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, Wiles, Keith, thomas.monjalon,
	jerin.jacob, 3chas3

On Wed, 15 Mar 2017 04:10:56 +0000
"O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:

> I've included a couple of specific comments inline below, and a general comment here.
> 
> We have somebody proposing to add a new driver to DPDK. It's standalone and doesn't affect any of the core libraries. They're willing to maintain the driver and have included a patch to update the maintainers file. They've also included the relevant documentation changes. I haven't seen any negative comment on the patches themselves except for a request from John McNamara for an update to the Release Notes that was addressed in a later version. I think we should be welcoming this into DPDK rather than questioning/rejecting it.
> 
> I'd suggest that this is a good topic for the next Tech Board meeting.

This is a virtualization driver for supporting DPDK on platform that provides an alternative
virtual network driver. I see no reason it shouldn't be part of DPDK. Given the unstable
ABI for drivers, supporting out of tree DPDK drivers is difficult. The DPDK should try
to be inclusive and support as many environments as possible.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] mk: Provide option to set Major ABI version
  2017-03-01 14:35  4%             ` Jan Blunck
@ 2017-03-16 17:19  4%               ` Thomas Monjalon
  2017-03-17  8:27  4%                 ` Christian Ehrhardt
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-03-16 17:19 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: dev, Jan Blunck, cjcollier, ricardo.salveti, Luca Boccassi

2017-03-01 15:35, Jan Blunck:
> On Wed, Mar 1, 2017 at 10:34 AM, Christian Ehrhardt
> <christian.ehrhardt@canonical.com> wrote:
> > Downstreams might want to provide different DPDK releases at the same
> > time to support multiple consumers of DPDK linked against older and newer
> > sonames.
> >
> > Also due to the interdependencies that DPDK libraries can have applications
> > might end up with an executable space in which multiple versions of a
> > library are mapped by ld.so.
> >
> > Think of LibA that got an ABI bump and LibB that did not get an ABI bump
> > but is depending on LibA.
> >
> >     Application
> >     \-> LibA.old
> >     \-> LibB.new -> LibA.new
> >
> > That is a conflict which can be avoided by setting CONFIG_RTE_MAJOR_ABI.
> > If set CONFIG_RTE_MAJOR_ABI overwrites any LIBABIVER value.
> > An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all
> > libraries librte<?>.so.16.11 instead of librte<?>.so.<LIBABIVER>.
[...]
> >
> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> 
> Reviewed-by: Jan Blunck <jblunck@infradead.org>
> Tested-by: Jan Blunck <jblunck@infradead.org>

Not sure about how it can be used in distributions, but it does not hurt
to provide the config option.
Are you going to link applications against a fixed DPDK version for
every libraries?

Applied, thanks

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 04/17] net/avp: add PMD version map file
  2017-03-13 19:16  3%       ` [dpdk-dev] [PATCH v4 04/17] net/avp: add PMD version map file Allain Legacy
@ 2017-03-16 14:52  0%         ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-03-16 14:52 UTC (permalink / raw)
  To: Allain Legacy
  Cc: dev, ian.jolliffe, bruce.richardson, john.mcnamara, keith.wiles,
	thomas.monjalon, vincent.jardin, jerin.jacob, stephen, 3chas3

On 3/13/2017 7:16 PM, Allain Legacy wrote:
> Adds a default ABI version file for the AVP PMD.
> 
> Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
> Signed-off-by: Matt Peters <matt.peters@windriver.com>
> ---
>  drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
>  1 file changed, 4 insertions(+)
>  create mode 100644 drivers/net/avp/rte_pmd_avp_version.map
> 
> diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
> new file mode 100644
> index 0000000..af8f3f4
> --- /dev/null
> +++ b/drivers/net/avp/rte_pmd_avp_version.map
> @@ -0,0 +1,4 @@
> +DPDK_17.05 {
> +
> +    local: *;
> +};
> 

Hi Allain,

Instead of adding files per patch, may I suggest different ordering:
First add skeleton files in a patch, later add functional pieces one by
one, like:

Merge patch 1/17, 3/17, this patch (4/17), 6/17 (removing SYMLINK), into
single patch and make it AVP first patch. This will be skeleton patch.

Second patch can be introducing public headers (2/17) and updating
Makefile to include them.

Third, debug log patch (5/17)

Patch 7/17 and later can stay same.

What do you think?

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v10 08/18] lib: add symbol versioning to distributor
  2017-03-15  6:19  2%             ` [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements David Hunt
  2017-03-15  6:19  1%               ` [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-15  6:19  2%               ` David Hunt
  1 sibling, 0 replies; 200+ results
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
 lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 +++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++
 5 files changed, 162 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..06df13d 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -44,6 +45,7 @@
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
 #include "rte_distributor_v20.h"
+#include "rte_distributor_v1705.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
@@ -57,7 +59,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
@@ -102,9 +104,14 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count),
+		rte_distributor_request_pkt_v1705);
 
 int
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -138,9 +145,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts),
+		rte_distributor_poll_pkt_v1705);
 
 int
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -168,9 +179,14 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count),
+		rte_distributor_get_pkt_v1705);
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -197,6 +213,10 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num),
+		rte_distributor_return_pkt_v1705);
 
 /**** APIs called on distributor core ***/
 
@@ -342,7 +362,7 @@ release(struct rte_distributor *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -476,10 +496,14 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs),
+		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -504,6 +528,10 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs),
+		rte_distributor_returned_pkts_v1705);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -525,7 +553,7 @@ total_outstanding(const struct rte_distributor *d)
  * queued up.
  */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
@@ -549,10 +577,13 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
+		rte_distributor_flush_v1705);
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
@@ -565,10 +596,13 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
+		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
 struct rte_distributor *
-rte_distributor_create(const char *name,
+rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
@@ -638,3 +672,8 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, _v1705, 17.05);
+MAP_STATIC_SYMBOL(struct rte_distributor *rte_distributor_create(
+		const char *name, unsigned int socket_id,
+		unsigned int num_workers, unsigned int alg_type),
+		rte_distributor_create_v1705);
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..81b2691
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,89 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V1705_H_
+#define _RTE_DISTRIB_V1705_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files
  2017-03-15  6:19  2%             ` [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements David Hunt
@ 2017-03-15  6:19  1%               ` David Hunt
  2017-03-20 10:08  2%                 ` [dpdk-dev] [PATCH v11 0/18] distributor lib performance enhancements David Hunt
  2017-03-15  6:19  2%               ` [dpdk-dev] [PATCH v10 " David Hunt
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements
  2017-03-06  9:10  1%           ` [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-15  6:19  2%             ` David Hunt
  2017-03-15  6:19  1%               ` [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-15  6:19  2%               ` [dpdk-dev] [PATCH v10 " David Hunt
  0 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v10 changes:
   * Addressed all review comments from v9 (thanks, Bruce)
   * Squashed the two patches containing distributor structs and code
   * Renamed confusing rte_distributor_v1705.h to rte_distributor_next.h
   * Added usleep in main so as to be a little more gentle with that core
   * Fixed some patch titles and improved some descriptions
   * Updated sample app guide documentation
   * Removed un-needed code limiting Tx rings and cleaned up patch
   * Inherited v9 series Ack by Bruce, except new suggested addition
     for example app documentation (17/18)

v9 changes:
   * fixed symbol versioning so it will compile on CentOS and RedHat

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new distributor code
[04/18] lib: add SIMD flow matching to distributor
[05/18] test/distributor: extra params for autotests
[06/18] lib: switch distributor over to new API
[07/18] lib: make v20 header file private
[08/18] lib: add symbol versioning to distributor
[09/18] test: test single and burst distributor API
[10/18] test: add perf test for distributor burst mode
[11/18] examples/distributor: allow for extra stats
[12/18] examples/distributor: wait for ports to come up
[13/18] examples/distributor: add dedicated core for dist
[14/18] examples/distributor: tweaks for performance
[15/18] examples/distributor: give Rx thread a core
[16/18] doc: distributor library changes for new burst API
[17/18] doc: distributor app changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3] lpm: extend IPv6 next hop field
  @ 2017-03-14 17:17  4% ` Vladyslav Buslov
  0 siblings, 0 replies; 200+ results
From: Vladyslav Buslov @ 2017-03-14 17:17 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: bruce.richardson, dev

This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---

Fixed compilation error in l3fwd_lpm.h

 doc/guides/prog_guide/lpm6_lib.rst              |   2 +-
 doc/guides/rel_notes/release_17_05.rst          |   5 +
 examples/ip_fragmentation/main.c                |  17 +--
 examples/ip_reassembly/main.c                   |  17 +--
 examples/ipsec-secgw/ipsec-secgw.c              |   2 +-
 examples/l3fwd/l3fwd_lpm.h                      |   2 +-
 examples/l3fwd/l3fwd_lpm_sse.h                  |  24 ++---
 examples/performance-thread/l3fwd-thread/main.c |  11 +-
 lib/librte_lpm/rte_lpm6.c                       | 134 +++++++++++++++++++++---
 lib/librte_lpm/rte_lpm6.h                       |  32 +++++-
 lib/librte_lpm/rte_lpm_version.map              |  10 ++
 lib/librte_table/rte_table_lpm_ipv6.c           |   9 +-
 test/test/test_lpm6.c                           | 115 ++++++++++++++------
 test/test/test_lpm6_perf.c                      |   4 +-
 14 files changed, 293 insertions(+), 91 deletions(-)

diff --git a/doc/guides/prog_guide/lpm6_lib.rst b/doc/guides/prog_guide/lpm6_lib.rst
index 0aea5c5..f791507 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -53,7 +53,7 @@ several thousand IPv6 rules, but the number can vary depending on the case.
 An LPM prefix is represented by a pair of parameters (128-bit key, depth), with depth in the range of 1 to 128.
 An LPM rule is represented by an LPM prefix and some user data associated with the prefix.
 The prefix serves as the unique identifier for the LPM rule.
-In this implementation, the user data is 1-byte long and is called "next hop",
+In this implementation, the user data is 21-bits long and is called "next hop",
 which corresponds to its main use of storing the ID of the next hop in a routing table entry.
 
 The main methods exported for the LPM component are:
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4b90036..918f483 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,6 +41,9 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Increased number of next hops for LPM IPv6 to 2^21.**
+
+  The next_hop field is extended from 8 bits to 21 bits for IPv6.
 
 * **Added powerpc support in pci probing for vfio-pci devices.**
 
@@ -114,6 +117,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
+  while keeping ABI compatibility.
 
 ABI Changes
 -----------
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 9e9ecae..1b005b5 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -265,8 +265,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		uint8_t queueid, uint8_t port_in)
 {
 	struct rx_queue *rxq;
-	uint32_t i, len, next_hop_ipv4;
-	uint8_t next_hop_ipv6, port_out, ipv6;
+	uint32_t i, len, next_hop;
+	uint8_t port_out, ipv6;
 	int32_t len2;
 
 	ipv6 = 0;
@@ -290,9 +290,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			port_out = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
@@ -326,9 +326,10 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_hdr = rte_pktmbuf_mtod(m, struct ipv6_hdr *);
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			port_out = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+						&next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index e62674c..b641576 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -346,8 +346,8 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	struct rte_ip_frag_death_row *dr;
 	struct rx_queue *rxq;
 	void *d_addr_bytes;
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6, dst_port;
+	uint32_t next_hop;
+	uint8_t dst_port;
 
 	rxq = &qconf->rx_queue_list[queue];
 
@@ -390,9 +390,9 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			dst_port = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
@@ -427,9 +427,10 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			dst_port = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+						&next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv6);
diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
index 685feec..d3c229a 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -618,7 +618,7 @@ route4_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 static inline void
 route6_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 {
-	int16_t hop[MAX_PKT_BURST * 2];
+	int32_t hop[MAX_PKT_BURST * 2];
 	uint8_t dst_ip[MAX_PKT_BURST * 2][16];
 	uint8_t *ip6_dst;
 	uint16_t i, offset;
diff --git a/examples/l3fwd/l3fwd_lpm.h b/examples/l3fwd/l3fwd_lpm.h
index a43c507..258a82f 100644
--- a/examples/l3fwd/l3fwd_lpm.h
+++ b/examples/l3fwd/l3fwd_lpm.h
@@ -49,7 +49,7 @@ lpm_get_ipv4_dst_port(void *ipv4_hdr,  uint8_t portid, void *lookup_struct)
 static inline uint8_t
 lpm_get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid, void *lookup_struct)
 {
-	uint8_t next_hop;
+	uint32_t next_hop;
 	struct rte_lpm6 *ipv6_l3fwd_lookup_struct =
 		(struct rte_lpm6 *)lookup_struct;
 
diff --git a/examples/l3fwd/l3fwd_lpm_sse.h b/examples/l3fwd/l3fwd_lpm_sse.h
index 538fe3d..aa06b6d 100644
--- a/examples/l3fwd/l3fwd_lpm_sse.h
+++ b/examples/l3fwd/l3fwd_lpm_sse.h
@@ -40,8 +40,7 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ipv4_hdr *ipv4_hdr;
 	struct ether_hdr *eth_hdr;
@@ -51,9 +50,11 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
 		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
 
-		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct,
-				rte_be_to_cpu_32(ipv4_hdr->dst_addr), &next_hop_ipv4) == 0) ?
-						next_hop_ipv4 : portid);
+		return (uint16_t) (
+			(rte_lpm_lookup(qconf->ipv4_lookup_struct,
+					rte_be_to_cpu_32(ipv4_hdr->dst_addr),
+					&next_hop) == 0) ?
+						next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -61,8 +62,8 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
@@ -78,14 +79,13 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
-			&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+			&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -93,8 +93,8 @@ lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
diff --git a/examples/performance-thread/l3fwd-thread/main.c b/examples/performance-thread/l3fwd-thread/main.c
index 6845e28..bf92582 100644
--- a/examples/performance-thread/l3fwd-thread/main.c
+++ b/examples/performance-thread/l3fwd-thread/main.c
@@ -909,7 +909,7 @@ static inline uint8_t
 get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid,
 		lookup6_struct_t *ipv6_l3fwd_lookup_struct)
 {
-	uint8_t next_hop;
+	uint32_t next_hop;
 
 	return (uint8_t) ((rte_lpm6_lookup(ipv6_l3fwd_lookup_struct,
 			((struct ipv6_hdr *)ipv6_hdr)->dst_addr, &next_hop) == 0) ?
@@ -1396,15 +1396,14 @@ rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
 static inline __attribute__((always_inline)) uint16_t
 get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv4_lookup_struct, dst_ipv4,
-				&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+				&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -1413,8 +1412,8 @@ get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 
 		return (uint16_t) ((rte_lpm6_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0) ? next_hop_ipv6 :
-						portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0) ?
+				next_hop : portid);
 
 	}
 
diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 32fdba0..9cc7be7 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -97,7 +97,7 @@ struct rte_lpm6_tbl_entry {
 /** Rules tbl entry structure. */
 struct rte_lpm6_rule {
 	uint8_t ip[RTE_LPM6_IPV6_ADDR_SIZE]; /**< Rule IP address. */
-	uint8_t next_hop; /**< Rule next hop. */
+	uint32_t next_hop; /**< Rule next hop. */
 	uint8_t depth; /**< Rule depth. */
 };
 
@@ -297,7 +297,7 @@ rte_lpm6_free(struct rte_lpm6 *lpm)
  * the nexthop if so. Otherwise it adds a new rule if enough space is available.
  */
 static inline int32_t
-rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
+rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint32_t next_hop, uint8_t depth)
 {
 	uint32_t rule_index;
 
@@ -340,7 +340,7 @@ rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
  */
 static void
 expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
-		uint8_t next_hop)
+		uint32_t next_hop)
 {
 	uint32_t tbl8_group_end, tbl8_gindex_next, j;
 
@@ -377,7 +377,7 @@ expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
 static inline int
 add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
 		struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip, uint8_t bytes,
-		uint8_t first_byte, uint8_t depth, uint8_t next_hop)
+		uint8_t first_byte, uint8_t depth, uint32_t next_hop)
 {
 	uint32_t tbl_index, tbl_range, tbl8_group_start, tbl8_group_end, i;
 	int32_t tbl8_gindex;
@@ -507,9 +507,17 @@ add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
  * Add a route
  */
 int
-rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop)
 {
+	return rte_lpm6_add_v1705(lpm, ip, depth, next_hop);
+}
+VERSION_SYMBOL(rte_lpm6_add, _v20, 2.0);
+
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop)
+{
 	struct rte_lpm6_tbl_entry *tbl;
 	struct rte_lpm6_tbl_entry *tbl_next;
 	int32_t rule_index;
@@ -560,6 +568,10 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_add, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip,
+				uint8_t depth, uint32_t next_hop),
+		rte_lpm6_add_v1705);
 
 /*
  * Takes a pointer to a table entry and inspect one level.
@@ -569,7 +581,7 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 static inline int
 lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		const struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip,
-		uint8_t first_byte, uint8_t *next_hop)
+		uint8_t first_byte, uint32_t *next_hop)
 {
 	uint32_t tbl8_index, tbl_entry;
 
@@ -589,7 +601,7 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		return 1;
 	} else {
 		/* If not extended then we can have a match. */
-		*next_hop = (uint8_t)tbl_entry;
+		*next_hop = ((uint32_t)tbl_entry & RTE_LPM6_TBL8_BITMASK);
 		return (tbl_entry & RTE_LPM6_LOOKUP_SUCCESS) ? 0 : -ENOENT;
 	}
 }
@@ -598,7 +610,26 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
  * Looks up an IP
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL)
+		return -EINVAL;
+
+	status = rte_lpm6_lookup_v1705(lpm, ip, &next_hop32);
+	if (status == 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+}
+VERSION_SYMBOL(rte_lpm6_lookup, _v20, 2.0);
+
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
+		uint32_t *next_hop)
 {
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
@@ -625,20 +656,23 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip,
+				uint32_t *next_hop), rte_lpm6_lookup_v1705);
 
 /*
  * Looks up a group of IP addresses
  */
 int
-rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n)
 {
 	unsigned i;
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
-	uint32_t tbl24_index;
-	uint8_t first_byte, next_hop;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
 	int status;
 
 	/* DEBUG: Check user input arguments. */
@@ -664,11 +698,59 @@ rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		if (status < 0)
 			next_hops[i] = -1;
 		else
-			next_hops[i] = next_hop;
+			next_hops[i] = (int16_t)next_hop;
+	}
+
+	return 0;
+}
+VERSION_SYMBOL(rte_lpm6_lookup_bulk_func, _v20, 2.0);
+
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t *next_hops, unsigned int n)
+{
+	unsigned int i;
+	const struct rte_lpm6_tbl_entry *tbl;
+	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
+	int status;
+
+	/* DEBUG: Check user input arguments. */
+	if ((lpm == NULL) || (ips == NULL) || (next_hops == NULL))
+		return -EINVAL;
+
+	for (i = 0; i < n; i++) {
+		first_byte = LOOKUP_FIRST_BYTE;
+		tbl24_index = (ips[i][0] << BYTES2_SIZE) |
+				(ips[i][1] << BYTE_SIZE) | ips[i][2];
+
+		/* Calculate pointer to the first entry to be inspected */
+		tbl = &lpm->tbl24[tbl24_index];
+
+		do {
+			/* Continue inspecting following levels
+			 * until success or failure
+			 */
+			status = lookup_step(lpm, tbl, &tbl_next, ips[i],
+					first_byte++, &next_hop);
+			tbl = tbl_next;
+		} while (status == 1);
+
+		if (status < 0)
+			next_hops[i] = -1;
+		else
+			next_hops[i] = (int32_t)next_hop;
 	}
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup_bulk_func, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+				uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+				int32_t *next_hops, unsigned int n),
+		rte_lpm6_lookup_bulk_func_v1705);
 
 /*
  * Finds a rule in rule table.
@@ -698,8 +780,28 @@ rule_find(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
  * Look for a rule in the high-level rules table
  */
 int
-rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop)
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL)
+		return -EINVAL;
+
+	status = rte_lpm6_is_rule_present_v1705(lpm, ip, depth, &next_hop32);
+	if (status > 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+
+}
+VERSION_SYMBOL(rte_lpm6_is_rule_present, _v20, 2.0);
+
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop)
 {
 	uint8_t ip_masked[RTE_LPM6_IPV6_ADDR_SIZE];
 	int32_t rule_index;
@@ -724,6 +826,10 @@ uint8_t *next_hop)
 	/* If rule is not found return 0. */
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_is_rule_present, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_is_rule_present(struct rte_lpm6 *lpm,
+				uint8_t *ip, uint8_t depth, uint32_t *next_hop),
+		rte_lpm6_is_rule_present_v1705);
 
 /*
  * Delete a rule from the rule table.
diff --git a/lib/librte_lpm/rte_lpm6.h b/lib/librte_lpm/rte_lpm6.h
index 13d027f..3a3342d 100644
--- a/lib/librte_lpm/rte_lpm6.h
+++ b/lib/librte_lpm/rte_lpm6.h
@@ -39,6 +39,7 @@
  */
 
 #include <stdint.h>
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -123,7 +124,13 @@ rte_lpm6_free(struct rte_lpm6 *lpm);
  */
 int
 rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
+int
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop);
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
 
 /**
  * Check if a rule is present in the LPM table,
@@ -142,7 +149,13 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
  */
 int
 rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop);
+		uint32_t *next_hop);
+int
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop);
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop);
 
 /**
  * Delete a rule from the LPM table.
@@ -199,7 +212,12 @@ rte_lpm6_delete_all(struct rte_lpm6 *lpm);
  *   -EINVAL for incorrect arguments, -ENOENT on lookup miss, 0 on lookup hit
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop);
+int
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
+		uint32_t *next_hop);
 
 /**
  * Lookup multiple IP addresses in an LPM table.
@@ -220,7 +238,15 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
 int
 rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
-		int16_t * next_hops, unsigned n);
+		int32_t *next_hops, unsigned int n);
+int
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int16_t *next_hops, unsigned int n);
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t *next_hops, unsigned int n);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 239b371..90beac8 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -34,3 +34,13 @@ DPDK_16.04 {
 	rte_lpm_delete_all;
 
 } DPDK_2.0;
+
+DPDK_17.05 {
+	global:
+
+	rte_lpm6_add;
+	rte_lpm6_is_rule_present;
+	rte_lpm6_lookup;
+	rte_lpm6_lookup_bulk_func;
+
+} DPDK_16.04;
diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c
index 836f4cf..1e1a173 100644
--- a/lib/librte_table/rte_table_lpm_ipv6.c
+++ b/lib/librte_table/rte_table_lpm_ipv6.c
@@ -211,9 +211,8 @@ rte_table_lpm_ipv6_entry_add(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint32_t nht_pos, nht_pos0_valid;
+	uint32_t nht_pos, nht_pos0, nht_pos0_valid;
 	int status;
-	uint8_t nht_pos0;
 
 	/* Check input parameters */
 	if (lpm == NULL) {
@@ -256,7 +255,7 @@ rte_table_lpm_ipv6_entry_add(
 
 	/* Add rule to low level LPM table */
 	if (rte_lpm6_add(lpm->lpm, ip_prefix->ip, ip_prefix->depth,
-		(uint8_t) nht_pos) < 0) {
+		nht_pos) < 0) {
 		RTE_LOG(ERR, TABLE, "%s: LPM IPv6 rule add failed\n", __func__);
 		return -1;
 	}
@@ -280,7 +279,7 @@ rte_table_lpm_ipv6_entry_delete(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint8_t nht_pos;
+	uint32_t nht_pos;
 	int status;
 
 	/* Check input parameters */
@@ -356,7 +355,7 @@ rte_table_lpm_ipv6_lookup(
 			uint8_t *ip = RTE_MBUF_METADATA_UINT8_PTR(pkt,
 				lpm->offset);
 			int status;
-			uint8_t nht_pos;
+			uint32_t nht_pos;
 
 			status = rte_lpm6_lookup(lpm->lpm, ip, &nht_pos);
 			if (status == 0) {
diff --git a/test/test/test_lpm6.c b/test/test/test_lpm6.c
index 61134f7..e0e7bf0 100644
--- a/test/test/test_lpm6.c
+++ b/test/test/test_lpm6.c
@@ -79,6 +79,7 @@ static int32_t test24(void);
 static int32_t test25(void);
 static int32_t test26(void);
 static int32_t test27(void);
+static int32_t test28(void);
 
 rte_lpm6_test tests6[] = {
 /* Test Cases */
@@ -110,6 +111,7 @@ rte_lpm6_test tests6[] = {
 	test25,
 	test26,
 	test27,
+	test28,
 };
 
 #define NUM_LPM6_TESTS                (sizeof(tests6)/sizeof(tests6[0]))
@@ -354,7 +356,7 @@ test6(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -392,7 +394,7 @@ test7(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[10][16];
-	int16_t next_hop_return[10];
+	int32_t next_hop_return[10];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -469,7 +471,8 @@ test9(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 16, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 	uint8_t i;
 
@@ -513,7 +516,8 @@ test10(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -557,7 +561,8 @@ test11(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -617,7 +622,8 @@ test12(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -655,7 +661,8 @@ test13(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = 2;
@@ -702,7 +709,8 @@ test14(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 25, next_hop_add = 100;
+	uint8_t depth = 25;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -748,7 +756,8 @@ test15(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 24, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 24;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -784,7 +793,8 @@ test16(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {12,12,1,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 128, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 128;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -828,7 +838,8 @@ test17(void)
 	uint8_t ip1[] = {127,255,255,255,255,255,255,255,255,
 			255,255,255,255,255,255,255};
 	uint8_t ip2[] = {128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -857,7 +868,7 @@ test17(void)
 
 	/* Loop with rte_lpm6_delete. */
 	for (depth = 16; depth >= 1; depth--) {
-		next_hop_add = (uint8_t) (depth - 1);
+		next_hop_add = (depth - 1);
 
 		status = rte_lpm6_delete(lpm, ip2, depth);
 		TEST_LPM_ASSERT(status == 0);
@@ -893,8 +904,9 @@ test18(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16], ip_1[16], ip_2[16];
-	uint8_t depth, depth_1, depth_2, next_hop_add, next_hop_add_1,
-		next_hop_add_2, next_hop_return;
+	uint8_t depth, depth_1, depth_2;
+	uint32_t next_hop_add, next_hop_add_1,
+			next_hop_add_2, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1055,7 +1067,8 @@ test19(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1253,7 +1266,8 @@ test20(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1320,8 +1334,9 @@ test21(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[4][16];
-	uint8_t depth, next_hop_add;
-	int16_t next_hop_return[4];
+	uint8_t depth;
+	uint32_t next_hop_add;
+	int32_t next_hop_return[4];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1378,8 +1393,9 @@ test22(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[5][16];
-	uint8_t depth[5], next_hop_add;
-	int16_t next_hop_return[5];
+	uint8_t depth[5];
+	uint32_t next_hop_add;
+	int32_t next_hop_return[5];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1495,7 +1511,8 @@ test23(void)
 	struct rte_lpm6_config config;
 	uint32_t i;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1579,7 +1596,8 @@ test25(void)
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
 	uint32_t i;
-	uint8_t depth, next_hop_add, next_hop_return, next_hop_expected;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return, next_hop_expected;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1632,10 +1650,10 @@ test26(void)
 	uint8_t d_ip_10_32 = 32;
 	uint8_t	d_ip_10_24 = 24;
 	uint8_t	d_ip_20_25 = 25;
-	uint8_t next_hop_ip_10_32 = 100;
-	uint8_t	next_hop_ip_10_24 = 105;
-	uint8_t	next_hop_ip_20_25 = 111;
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_ip_10_32 = 100;
+	uint32_t next_hop_ip_10_24 = 105;
+	uint32_t next_hop_ip_20_25 = 111;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1650,7 +1668,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_32, &next_hop_return);
-	uint8_t test_hop_10_32 = next_hop_return;
+	uint32_t test_hop_10_32 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_32);
 
@@ -1659,7 +1677,7 @@ test26(void)
 			return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_24, &next_hop_return);
-	uint8_t test_hop_10_24 = next_hop_return;
+	uint32_t test_hop_10_24 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_24);
 
@@ -1668,7 +1686,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_20_25, &next_hop_return);
-	uint8_t test_hop_20_25 = next_hop_return;
+	uint32_t test_hop_20_25 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_20_25);
 
@@ -1707,7 +1725,8 @@ test27(void)
 		struct rte_lpm6 *lpm = NULL;
 		struct rte_lpm6_config config;
 		uint8_t ip[] = {128,128,128,128,128,128,128,128,128,128,128,128,128,128,0,0};
-		uint8_t depth = 128, next_hop_add = 100, next_hop_return;
+		uint8_t depth = 128;
+		uint32_t next_hop_add = 100, next_hop_return;
 		int32_t status = 0;
 		int i, j;
 
@@ -1746,6 +1765,42 @@ test27(void)
 }
 
 /*
+ * Call add, lookup and delete for a single rule with maximum 21bit next_hop
+ * size.
+ * Check that next_hop returned from lookup is equal to provisioned value.
+ * Delete the rule and check that the same test returs a miss.
+ */
+int32_t
+test28(void)
+{
+	struct rte_lpm6 *lpm = NULL;
+	struct rte_lpm6_config config;
+	uint8_t ip[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 0x001FFFFF, next_hop_return = 0;
+	int32_t status = 0;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	status = rte_lpm6_add(lpm, ip, depth, next_hop_add);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm6_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add));
+
+	status = rte_lpm6_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	rte_lpm6_free(lpm);
+
+	return PASS;
+}
+
+/*
  * Do all unit tests.
  */
 static int
diff --git a/test/test/test_lpm6_perf.c b/test/test/test_lpm6_perf.c
index 0723081..30be430 100644
--- a/test/test/test_lpm6_perf.c
+++ b/test/test/test_lpm6_perf.c
@@ -86,7 +86,7 @@ test_lpm6_perf(void)
 	struct rte_lpm6_config config;
 	uint64_t begin, total_time;
 	unsigned i, j;
-	uint8_t next_hop_add = 0xAA, next_hop_return = 0;
+	uint32_t next_hop_add = 0xAA, next_hop_return = 0;
 	int status = 0;
 	int64_t count = 0;
 
@@ -148,7 +148,7 @@ test_lpm6_perf(void)
 	count = 0;
 
 	uint8_t ip_batch[NUM_IPS_ENTRIES][16];
-	int16_t next_hops[NUM_IPS_ENTRIES];
+	int32_t next_hops[NUM_IPS_ENTRIES];
 
 	for (i = 0; i < NUM_IPS_ENTRIES; i++)
 		memcpy(ip_batch[i], large_ips_table[i].ip, 16);
-- 
2.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4 04/17] net/avp: add PMD version map file
  @ 2017-03-13 19:16  3%       ` Allain Legacy
  2017-03-16 14:52  0%         ` Ferruh Yigit
    1 sibling, 1 reply; 200+ results
From: Allain Legacy @ 2017-03-13 19:16 UTC (permalink / raw)
  To: ferruh.yigit
  Cc: dev, ian.jolliffe, bruce.richardson, john.mcnamara, keith.wiles,
	thomas.monjalon, vincent.jardin, jerin.jacob, stephen, 3chas3

Adds a default ABI version file for the AVP PMD.

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
---
 drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/net/avp/rte_pmd_avp_version.map

diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
new file mode 100644
index 0000000..af8f3f4
--- /dev/null
+++ b/drivers/net/avp/rte_pmd_avp_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+
+    local: *;
+};
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-13 11:01  0%                 ` Van Haaren, Harry
@ 2017-03-13 11:02  0%                   ` Hunt, David
  0 siblings, 0 replies; 200+ results
From: Hunt, David @ 2017-03-13 11:02 UTC (permalink / raw)
  To: Van Haaren, Harry, Richardson, Bruce; +Cc: dev


On 13/3/2017 11:01 AM, Van Haaren, Harry wrote:
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hunt, David
>> Subject: Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
>>
>> On 10/3/2017 4:22 PM, Bruce Richardson wrote:
>>> On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
>>>> Also bumped up the ABI version number in the Makefile
>>>>
>>>> Signed-off-by: David Hunt <david.hunt@intel.com>
>>>> ---
>>>>    lib/librte_distributor/Makefile                    |  2 +-
>>>>    lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>>>>    lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
>>> A file named rte_distributor_v1705.h was added in patch 4, then deleted
>>> in patch 7, and now added again here. Seems a lot of churn.
>>>
>>> /Bruce
>>>
>> The first introduction of this file is what will become the public
>> header. For successful compilation,
>> this cannot be called rte_distributor.h until the symbol versioning
>> patch, at which stage I will
>> rename the file, and introduce the symbol versioned header at the same
>> time. In the next patch
>> I'll rename this version of the files as rte_distributor_public.h to
>> make this clearer.
>
> Suggestion to use rte_distributor_next.h instead of public?
> Public doesn't indicate if its old or new, while next would make that clearer IMO :)

Good call, will use "_next". Its clearer.
Thanks,
Dave.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-13 10:28  0%               ` Hunt, David
@ 2017-03-13 11:01  0%                 ` Van Haaren, Harry
  2017-03-13 11:02  0%                   ` Hunt, David
  0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2017-03-13 11:01 UTC (permalink / raw)
  To: Hunt, David, Richardson, Bruce; +Cc: dev

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hunt, David
> Subject: Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
> 
> On 10/3/2017 4:22 PM, Bruce Richardson wrote:
> > On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> >> Also bumped up the ABI version number in the Makefile
> >>
> >> Signed-off-by: David Hunt <david.hunt@intel.com>
> >> ---
> >>   lib/librte_distributor/Makefile                    |  2 +-
> >>   lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
> >>   lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
> > A file named rte_distributor_v1705.h was added in patch 4, then deleted
> > in patch 7, and now added again here. Seems a lot of churn.
> >
> > /Bruce
> >
> 
> The first introduction of this file is what will become the public
> header. For successful compilation,
> this cannot be called rte_distributor.h until the symbol versioning
> patch, at which stage I will
> rename the file, and introduce the symbol versioned header at the same
> time. In the next patch
> I'll rename this version of the files as rte_distributor_public.h to
> make this clearer.


Suggestion to use rte_distributor_next.h instead of public?
Public doesn't indicate if its old or new, while next would make that clearer IMO :)

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-10 16:22  0%             ` Bruce Richardson
  2017-03-13 10:17  0%               ` Hunt, David
@ 2017-03-13 10:28  0%               ` Hunt, David
  2017-03-13 11:01  0%                 ` Van Haaren, Harry
  1 sibling, 1 reply; 200+ results
From: Hunt, David @ 2017-03-13 10:28 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev


On 10/3/2017 4:22 PM, Bruce Richardson wrote:
> On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
>> Also bumped up the ABI version number in the Makefile
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   lib/librte_distributor/Makefile                    |  2 +-
>>   lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>>   lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
> A file named rte_distributor_v1705.h was added in patch 4, then deleted
> in patch 7, and now added again here. Seems a lot of churn.
>
> /Bruce
>

The first introduction of this file is what will become the public 
header. For successful compilation,
this cannot be called rte_distributor.h until the symbol versioning 
patch, at which stage I will
rename the file, and introduce the symbol versioned header at the same 
time. In the next patch
I'll rename this version of the files as rte_distributor_public.h to 
make this clearer.

Regards,
Dave.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-10 16:22  0%             ` Bruce Richardson
@ 2017-03-13 10:17  0%               ` Hunt, David
  2017-03-13 10:28  0%               ` Hunt, David
  1 sibling, 0 replies; 200+ results
From: Hunt, David @ 2017-03-13 10:17 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev



-----Original Message-----
From: Richardson, Bruce 
Sent: Friday, 10 March, 2017 4:22 PM
To: Hunt, David <david.hunt@intel.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH v9 09/18] lib: add symbol versioning to distributor

On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |  2 +-
>  lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>  lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++

A file named rte_distributor_v1705.h was added in patch 4, then deleted in patch 7, and now added again here. Seems a lot of churn.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-06  9:10  2%           ` [dpdk-dev] [PATCH v9 09/18] " David Hunt
@ 2017-03-10 16:22  0%             ` Bruce Richardson
  2017-03-13 10:17  0%               ` Hunt, David
  2017-03-13 10:28  0%               ` Hunt, David
  0 siblings, 2 replies; 200+ results
From: Bruce Richardson @ 2017-03-10 16:22 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |  2 +-
>  lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>  lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++

A file named rte_distributor_v1705.h was added in patch 4, then deleted
in patch 7, and now added again here. Seems a lot of churn.

/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v11 3/7] lib: add bitrate statistics library
    2017-03-09 16:25  1% ` [dpdk-dev] [PATCH v11 1/7] lib: add information metrics library Remy Horton
@ 2017-03-09 16:25  2% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-03-09 16:25 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/prog_guide/metrics_lib.rst              |  65 ++++++++++
 doc/guides/rel_notes/release_17_02.rst             |   1 +
 doc/guides/rel_notes/release_17_05.rst             |   5 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 ++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 141 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 ++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 mk/rte.app.mk                                      |   1 +
 13 files changed, 367 insertions(+)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 66478f3..8abf4fd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -639,6 +639,10 @@ Metrics
 M: Remy Horton <remy.horton@intel.com>
 F: lib/librte_metrics/
 
+Bit-rate statistica
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_bitratestats/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index cea055f..d700ee0 100644
--- a/config/common_base
+++ b/config/common_base
@@ -630,3 +630,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the crypto performance application
 #
 CONFIG_RTE_APP_CRYPTO_PERF=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 26a26b7..8492bce 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -157,4 +157,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [device metrics]     (@ref rte_metrics.h),
+  [bitrate statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fbbcf8e..c4b3b68 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -36,6 +36,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_eal/common/include \
                           lib/librte_eal/common/include/generic \
                           lib/librte_acl \
+                          lib/librte_bitratestats \
                           lib/librte_cfgfile \
                           lib/librte_cmdline \
                           lib/librte_compat \
diff --git a/doc/guides/prog_guide/metrics_lib.rst b/doc/guides/prog_guide/metrics_lib.rst
index 87f806d..1c2a28f 100644
--- a/doc/guides/prog_guide/metrics_lib.rst
+++ b/doc/guides/prog_guide/metrics_lib.rst
@@ -178,3 +178,68 @@ print out all metrics for a given port:
         free(metrics);
         free(names);
     }
+
+
+Bit-rate statistics library
+---------------------------
+
+The bit-rate library calculates the exponentially-weighted moving
+average and peak bit-rates for each active port (i.e. network device).
+These statistics are reported via the metrics library using the
+following names:
+
+    - ``mean_bits_in``: Average inbound bit-rate
+    - ``mean_bits_out``:  Average outbound bit-rate
+    - ``ewma_bits_in``: Average inbound bit-rate (EWMA smoothed)
+    - ``ewma_bits_out``:  Average outbound bit-rate (EWMA smoothed)
+    - ``peak_bits_in``:  Peak inbound bit-rate
+    - ``peak_bits_out``:  Peak outbound bit-rate
+
+Once initialised and clocked at the appropriate frequency, these
+statistics can be obtained by querying the metrics library.
+
+Initialization
+~~~~~~~~~~~~~~
+
+Before it is used the bit-rate statistics library has to be initialised
+by calling ``rte_stats_bitrate_create()``, which will return a bit-rate
+calculation object. Since the bit-rate library uses the metrics library
+to report the calculated statistics, the bit-rate library then needs to
+register the calculated statistics with the metrics library. This is
+done using the helper function ``rte_stats_bitrate_reg()``.
+
+.. code-block:: c
+
+    struct rte_stats_bitrates *bitrate_data;
+
+    bitrate_data = rte_stats_bitrate_create();
+    if (bitrate_data == NULL)
+        rte_exit(EXIT_FAILURE, "Could not allocate bit-rate data.\n");
+    rte_stats_bitrate_reg(bitrate_data);
+
+Controlling the sampling rate
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since the library works by periodic sampling but does not use an
+internal thread, the application has to periodically call
+``rte_stats_bitrate_calc()``. The frequency at which this function
+is called should be the intended sampling rate required for the
+calculated statistics. For instance if per-second statistics are
+desired, this function should be called once a second.
+
+.. code-block:: c
+
+    tics_datum = rte_rdtsc();
+    tics_per_1sec = rte_get_timer_hz();
+
+    while( 1 ) {
+        /* ... */
+        tics_current = rte_rdtsc();
+	if (tics_current - tics_datum >= tics_per_1sec) {
+	    /* Periodic bitrate calculation */
+	    for (idx_port = 0; idx_port < cnt_ports; idx_port++)
+	            rte_stats_bitrate_calc(bitrate_data, idx_port);
+		tics_datum = tics_current;
+	    }
+        /* ... */
+    }
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 8bd706f..63786df 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -353,6 +353,7 @@ The libraries prepended with a plus sign were incremented in this version.
 .. code-block:: diff
 
      librte_acl.so.2
+   + librte_bitratestats.so.1
      librte_cfgfile.so.2
      librte_cmdline.so.2
      librte_cryptodev.so.2
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 3ed809e..83c83b2 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -69,6 +69,11 @@ Resolved Issues
   reporting mechanism that is independent of other libraries such
   as ethdev.
 
+* **Added bit-rate calculation library.**
+
+  A library that can be used to calculate device bit-rates. Calculated
+  bitrates are reported using the metrics library.
+
 
 EAL
 ~~~
diff --git a/lib/Makefile b/lib/Makefile
index 29f6a81..ecc54c0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -50,6 +50,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += librte_jobstats
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..743b62c
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..3252598
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,141 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+/*
+ * Persistent bit-rate data.
+ * @internal
+ */
+struct rte_stats_bitrate {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t mean_ibits;
+	uint64_t mean_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates {
+	struct rte_stats_bitrate port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+struct rte_stats_bitrates *
+rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates),
+		RTE_CACHE_LINE_SIZE);
+}
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data)
+{
+	const char * const names[] = {
+		"ewma_bits_in", "ewma_bits_out",
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	return_value = rte_metrics_reg_names(&names[0], 6);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[6];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming bitrate. This is an iteratively calculated EWMA
+	 * (Expomentially Weighted Moving Average) that uses a
+	 * weighting factor of alpha_percent. An unsmoothed mean
+	 * for just the current time delta is also calculated for the
+	 * benefit of people who don't understand signal processing.
+	 */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	/* The +-50 fixes integer rounding during divison */
+	if (delta > 0)
+		delta = (delta * alpha_percent + 50) / 100;
+	else
+		delta = (delta * alpha_percent - 50) / 100;
+	port_data->ewma_ibits += delta;
+	port_data->mean_ibits = cnt_bits;
+
+	/* Outgoing bitrate (also EWMA) */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_obits += delta;
+	port_data->mean_obits = cnt_bits;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->mean_ibits;
+	values[3] = port_data->mean_obits;
+	values[4] = port_data->peak_ibits;
+	values[5] = port_data->peak_obits;
+	rte_metrics_update_values(port_id, bitrate_data->id_stats_set,
+		values, 6);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..564e4f7
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+/**
+ *  Bitrate statistics data structure.
+ *  This data structure is intentionally opaque.
+ */
+struct rte_stats_bitrates;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics with the metric library.
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period with which
+ * this function is called should be the intended sampling window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..fe74544
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_17.05 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 98eb052..39c988a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -100,6 +100,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v11 1/7] lib: add information metrics library
  @ 2017-03-09 16:25  1% ` Remy Horton
  2017-03-09 16:25  2% ` [dpdk-dev] [PATCH v11 3/7] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-03-09 16:25 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a new information metrics library. This Metrics
library implements a mechanism by which producers can publish
numeric information for later querying by consumers. Metrics
themselves are statistics that are not generated by PMDs, and
hence are not reported via ethdev extended statistics.

Metric information is populated using a push model, where
producers update the values contained within the metric
library by calling an update function on the relevant metrics.
Consumers receive metric information by querying the central
metric data, which is held in shared memory.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                |   4 +
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 doc/guides/prog_guide/index.rst            |   1 +
 doc/guides/prog_guide/metrics_lib.rst      | 180 +++++++++++++++++
 doc/guides/rel_notes/release_17_02.rst     |   1 +
 doc/guides/rel_notes/release_17_05.rst     |   8 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 +++++
 lib/librte_metrics/rte_metrics.c           | 299 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 240 +++++++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 14 files changed, 807 insertions(+)
 create mode 100644 doc/guides/prog_guide/metrics_lib.rst
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 5030c1c..66478f3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -635,6 +635,10 @@ F: lib/librte_jobstats/
 F: examples/l2fwd-jobstats/
 F: doc/guides/sample_app_ug/l2_forward_job_stats.rst
 
+Metrics
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_metrics/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index aeee13e..cea055f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -501,6 +501,11 @@ CONFIG_RTE_LIBRTE_EFD=y
 CONFIG_RTE_LIBRTE_JOBSTATS=y
 
 #
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
+
+#
 # Compile librte_lpm
 #
 CONFIG_RTE_LIBRTE_LPM=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index eb39f69..26a26b7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -156,4 +156,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [device metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fdcf13c..fbbcf8e 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -52,6 +52,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_mbuf \
                           lib/librte_mempool \
                           lib/librte_meter \
+                          lib/librte_metrics \
                           lib/librte_net \
                           lib/librte_pdump \
                           lib/librte_pipeline \
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 77f427e..2a69844 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -62,6 +62,7 @@ Programmer's Guide
     packet_classif_access_ctrl
     packet_framework
     vhost_lib
+    metrics_lib
     port_hotplug_framework
     source_org
     dev_kit_build_system
diff --git a/doc/guides/prog_guide/metrics_lib.rst b/doc/guides/prog_guide/metrics_lib.rst
new file mode 100644
index 0000000..87f806d
--- /dev/null
+++ b/doc/guides/prog_guide/metrics_lib.rst
@@ -0,0 +1,180 @@
+..  BSD LICENSE
+    Copyright(c) 2017 Intel Corporation. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Intel Corporation nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Metrics_Library:
+
+Metrics Library
+===============
+
+The Metrics library implements a mechanism by which *producers* can
+publish numeric information for later querying by *consumers*. In
+practice producers will typically be other libraries or primary
+processes, whereas consumers will typically be applications.
+
+Metrics themselves are statistics that are not generated by PMDs. Metric
+information is populated using a push model, where producers update the
+values contained within the metric library by calling an update function
+on the relevant metrics. Consumers receive metric information by querying
+the central metric data, which is held in shared memory.
+
+For each metric, a separate value is maintained for each port id, and
+when publishing metric values the producers need to specify which port is
+being updated. In addition there is a special id ``RTE_METRICS_GLOBAL``
+that is intended for global statistics that are not associated with any
+individual device. Since the metrics library is self-contained, the only
+restriction on port numbers is that they are less than ``RTE_MAX_ETHPORTS``
+- there is no requirement for the ports to actually exist.
+
+Initialising the library
+------------------------
+
+Before the library can be used, it has to be initialized by calling
+``rte_metrics_init()`` which sets up the metric store in shared memory.
+This is where producers will publish metric information to, and where
+consumers will query it from.
+
+.. code-block:: c
+
+    rte_metrics_init(rte_socket_id());
+
+This function **must** be called from a primary process, but otherwise
+producers and consumers can be in either primary or secondary processes.
+
+Registering metrics
+-------------------
+
+Metrics must first be *registered*, which is the way producers declare
+the names of the metrics they will be publishing. Registration can either
+be done individually, or a set of metrics can be registered as a group.
+Individual registration is done using ``rte_metrics_reg_name()``:
+
+.. code-block:: c
+
+    id_1 = rte_metrics_reg_name("mean_bits_in");
+    id_2 = rte_metrics_reg_name("mean_bits_out");
+    id_3 = rte_metrics_reg_name("peak_bits_in");
+    id_4 = rte_metrics_reg_name("peak_bits_out");
+
+or alternatively, a set of metrics can be registered together using
+``rte_metrics_reg_names()``:
+
+.. code-block:: c
+
+    const char * const names[] = {
+        "mean_bits_in", "mean_bits_out",
+        "peak_bits_in", "peak_bits_out",
+    };
+    id_set = rte_metrics_reg_names(&names[0], 4);
+
+If the return value is negative, it means registration failed. Otherwise
+the return value is the *key* for the metric, which is used when updating
+values. A table mapping together these key values and the metrics' names
+can be obtained using ``rte_metrics_get_names()``.
+
+Updating metric values
+----------------------
+
+Once registered, producers can update the metric for a given port using
+the ``rte_metrics_update_value()`` function. This uses the metric key
+that is returned when registering the metric, and can also be looked up
+using ``rte_metrics_get_names()``.
+
+.. code-block:: c
+
+    rte_metrics_update_value(port_id, id_1, values[0]);
+    rte_metrics_update_value(port_id, id_2, values[1]);
+    rte_metrics_update_value(port_id, id_3, values[2]);
+    rte_metrics_update_value(port_id, id_4, values[3]);
+
+if metrics were registered as a single set, they can either be updated
+individually using ``rte_metrics_update_value()``, or updated together
+using the ``rte_metrics_update_values()`` function:
+
+.. code-block:: c
+
+    rte_metrics_update_value(port_id, id_set, values[0]);
+    rte_metrics_update_value(port_id, id_set + 1, values[1]);
+    rte_metrics_update_value(port_id, id_set + 2, values[2]);
+    rte_metrics_update_value(port_id, id_set + 3, values[3]);
+
+    rte_metrics_update_values(port_id, id_set, values, 4);
+
+Note that ``rte_metrics_update_values()`` cannot be used to update
+metric values from *multiple* *sets*, as there is no guarantee two
+sets registered one after the other have contiguous id values.
+
+Querying metrics
+----------------
+
+Consumers can obtain metric values by querying the metrics library using
+the ``rte_metrics_get_values()`` function that return an array of
+``struct rte_metric_value``. Each entry within this array contains a metric
+value and its associated key. A key-name mapping can be obtained using the
+``rte_metrics_get_names()`` function that returns an array of
+``struct rte_metric_name`` that is indexed by the key. The following will
+print out all metrics for a given port:
+
+.. code-block:: c
+
+    void print_metrics() {
+        struct rte_metric_name *names;
+        int len;
+
+        len = rte_metrics_get_names(NULL, 0);
+        if (len < 0) {
+            printf("Cannot get metrics count\n");
+            return;
+        }
+        if (len == 0) {
+            printf("No metrics to display (none have been registered)\n");
+            return;
+        }
+        metrics = malloc(sizeof(struct rte_metric_value) * len);
+        names =  malloc(sizeof(struct rte_metric_name) * len);
+        if (metrics == NULL || names == NULL) {
+            printf("Cannot allocate memory\n");
+            free(metrics);
+            free(names);
+            return;
+        }
+        ret = rte_metrics_get_values(port_id, metrics, len);
+        if (ret < 0 || ret > len) {
+            printf("Cannot get metrics values\n");
+            free(metrics);
+            free(names);
+            return;
+        }
+        printf("Metrics for port %i:\n", port_id);
+        for (i = 0; i < len; i++)
+            printf("  %s: %"PRIu64"\n",
+                names[metrics[i].key].name, metrics[i].value);
+        free(metrics);
+        free(names);
+    }
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 357965a..8bd706f 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -368,6 +368,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_mbuf.so.2
      librte_mempool.so.2
      librte_meter.so.1
+   + librte_metrics.so.1
      librte_net.so.1
      librte_pdump.so.1
      librte_pipeline.so.3
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e25ea9f..3ed809e 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -61,6 +61,14 @@ Resolved Issues
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Added information metric library.**
+
+  A library that allows information metrics to be added and updated
+  by producers, typically other libraries, for later retrieval by
+  consumers such as applications. It is intended to provide a
+  reporting mechanism that is independent of other libraries such
+  as ethdev.
+
 
 EAL
 ~~~
diff --git a/lib/Makefile b/lib/Makefile
index 4178325..29f6a81 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += librte_jobstats
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..aa9ec50
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,299 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+#include <rte_spinlock.h>
+
+#define RTE_METRICS_MAX_METRICS 256
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ */
+struct rte_metrics_meta_s {
+	/** Name of metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+	/** Current value for metric */
+	uint64_t value[RTE_MAX_ETHPORTS];
+	/** Used for global metrics */
+	uint64_t global_value;
+	/** Index of next root element (zero for none) */
+	uint16_t idx_next_set;
+	/** Index of next metric in set (zero for none) */
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	/**   Index of last metadata entry with valid data.
+	 * This value is not valid if cnt_stats is zero.
+	 */
+	uint16_t idx_last_set;
+	/**   Number of metrics. */
+	uint16_t cnt_stats;
+	/** Metric data memory block. */
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+	/** Metric data access lock */
+	rte_spinlock_t lock;
+};
+
+void
+rte_metrics_init(int socket_id)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
+		sizeof(struct rte_metrics_data_s), socket_id, 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+	rte_spinlock_init(&stats->lock);
+}
+
+int
+rte_metrics_reg_name(const char *name)
+{
+	const char * const list_names[] = {name};
+
+	return rte_metrics_reg_names(list_names, 1);
+}
+
+int
+rte_metrics_reg_names(const char * const *names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	rte_spinlock_lock(&stats->lock);
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	rte_spinlock_unlock(&stats->lock);
+
+	return idx_base;
+}
+
+int
+rte_metrics_update_value(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_values(port_id, key, &value, 1);
+}
+
+int
+rte_metrics_update_values(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	if (port_id != RTE_METRICS_GLOBAL &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	rte_spinlock_lock(&stats->lock);
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize) {
+		rte_spinlock_unlock(&stats->lock);
+		return -ERANGE;
+	}
+
+	if (port_id == RTE_METRICS_GLOBAL)
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].global_value =
+				values[idx_value];
+		}
+	else
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].value[port_id] =
+				values[idx_value];
+		}
+	rte_spinlock_unlock(&stats->lock);
+	return 0;
+}
+
+int
+rte_metrics_get_names(struct rte_metric_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats) {
+			return_value = stats->cnt_stats;
+			rte_spinlock_unlock(&stats->lock);
+			return return_value;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	if (port_id != RTE_METRICS_GLOBAL &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats) {
+			return_value = stats->cnt_stats;
+			rte_spinlock_unlock(&stats->lock);
+			return return_value;
+		}
+		if (port_id == RTE_METRICS_GLOBAL)
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->global_value;
+			}
+		else
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->value[port_id];
+			}
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..7458328
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,240 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * DPDK Metrics module
+ *
+ * Metrics are statistics that are not generated by PMDs, and hence
+ * are better reported through a mechanism that is independent from
+ * the ethdev-based extended statistics. Providers will typically
+ * be other libraries and consumers will typically be applications.
+ *
+ * Metric information is populated using a push model, where producers
+ * update the values contained within the metric library by calling
+ * an update function on the relevant metrics. Consumers receive
+ * metric information by querying the central metric data, which is
+ * held in shared memory. Currently only bulk querying of metrics
+ * by consumers is supported.
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of metric name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/**
+ * Global metric special id.
+ *
+ * When used for the port_id parameter when calling
+ * rte_metrics_update_metric() or rte_metrics_update_metric(),
+ * the global metric, which are not associated with any specific
+ * port (i.e. device), are updated.
+ */
+#define RTE_METRICS_GLOBAL -1
+
+
+/**
+ * A name-key lookup for metrics.
+ *
+ * An array of this structure is returned by rte_metrics_get_names().
+ * The struct rte_metric_value references these names via their array index.
+ */
+struct rte_metric_name {
+	/** String describing metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Metric value structure.
+ *
+ * This structure is used by rte_metrics_get_values() to return metrics,
+ * which are statistics that are not generated by PMDs. It maps a name key,
+ * which corresponds to an index in the array returned by
+ * rte_metrics_get_names().
+ */
+struct rte_metric_value {
+	/** Numeric identifier of metric. */
+	uint16_t key;
+	/** Value for metric */
+	uint64_t value;
+};
+
+
+/**
+ * Initializes metric module. This function must be called from
+ * a primary process before metrics are used.
+ *
+ * @param socket_id
+ *   Socket to use for shared memory allocation.
+ */
+void rte_metrics_init(int socket_id);
+
+/**
+ * Register a metric, making it available as a reporting parameter.
+ *
+ * Registering a metric is the way producers declare a parameter
+ * that they wish to be reported. Once registered, the associated
+ * numeric key can be obtained via rte_metrics_get_names(), which
+ * is required for updating said metric's value.
+ *
+ * @param name
+ *   Metric name
+ *
+ * @return
+ *  - Zero or positive: Success (index key of new metric)
+ *  - -EIO: Error, unable to access metrics shared memory
+ *    (rte_metrics_init() not called)
+ *  - -EINVAL: Error, invalid parameters
+ *  - -ENOMEM: Error, maximum metrics reached
+ */
+int rte_metrics_reg_name(const char *name);
+
+/**
+ * Register a set of metrics.
+ *
+ * This is a bulk version of rte_metrics_reg_metrics() and aside from
+ * handling multiple keys at once is functionally identical.
+ *
+ * @param names
+ *   List of metric names
+ *
+ * @param cnt_names
+ *   Number of metrics in set
+ *
+ * @return
+ *  - Zero or positive: Success (index key of start of set)
+ *  - -EIO: Error, unable to access metrics shared memory
+ *    (rte_metrics_init() not called)
+ *  - -EINVAL: Error, invalid parameters
+ *  - -ENOMEM: Error, maximum metrics reached
+ */
+int rte_metrics_reg_names(const char * const *names, uint16_t cnt_names);
+
+/**
+ * Get metric name-key lookup table.
+ *
+ * @param names
+ *   A struct rte_metric_name array of at least *capacity* in size to
+ *   receive key names. If this is NULL, function returns the required
+ *   number of elements for this array.
+ *
+ * @param capacity
+ *   Size (number of elements) of struct rte_metric_name array.
+ *   Disregarded if names is NULL.
+ *
+ * @return
+ *   - Positive value above capacity: error, *names* is too small.
+ *     Return value is required size.
+ *   - Positive value equal or less than capacity: Success. Return
+ *     value is number of elements filled in.
+ *   - Negative value: error.
+ */
+int rte_metrics_get_names(
+	struct rte_metric_name *names,
+	uint16_t capacity);
+
+/**
+ * Get metric value table.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   A struct rte_metric_value array of at least *capacity* in size to
+ *   receive metric ids and values. If this is NULL, function returns
+ *   the required number of elements for this array.
+ *
+ * @param capacity
+ *   Size (number of elements) of struct rte_metric_value array.
+ *   Disregarded if names is NULL.
+ *
+ * @return
+ *   - Positive value above capacity: error, *values* is too small.
+ *     Return value is required size.
+ *   - Positive value equal or less than capacity: Success. Return
+ *     value is number of elements filled in.
+ *   - Negative value: error.
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a metric
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Id of metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_value(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Base id of metrics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds metric set size
+ *   - -EIO if upable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_values(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..4c5234c
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_17.05 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_name;
+	rte_metrics_reg_names;
+	rte_metrics_update_value;
+	rte_metrics_update_values;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index d46a33e..98eb052 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
-- 
2.5.5

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
  2017-03-09 10:01  0%       ` Ferruh Yigit
@ 2017-03-09 10:43  0%         ` Xing, Beilei
  0 siblings, 0 replies; 200+ results
From: Xing, Beilei @ 2017-03-09 10:43 UTC (permalink / raw)
  To: Yigit, Ferruh, Wu, Jingjing
  Cc: Zhang, Helin, dev, Iremonger, Bernard, Stroe, Laura



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Thursday, March 9, 2017 6:02 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Iremonger,
> Bernard <bernard.iremonger@intel.com>; Stroe, Laura
> <laura.stroe@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
> 
> On 3/9/2017 5:59 AM, Xing, Beilei wrote:
> >
> >
> >> -----Original Message-----
> >> From: Yigit, Ferruh
> >> Sent: Wednesday, March 8, 2017 11:50 PM
> >> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> >> <jingjing.wu@intel.com>
> >> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Iremonger,
> >> Bernard <bernard.iremonger@intel.com>; Stroe, Laura
> >> <laura.stroe@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter
> >> type
> >>
> >> On 3/3/2017 9:31 AM, Beilei Xing wrote:
> >>> Add new admin queue function and extended fields in DCR 288:
> >>>  - Add admin queue function for Replace filter
> >>>    command (Opcode: 0x025F)
> >>>  - Add General fields for Add/Remove Cloud filters
> >>>    command
> >>>
> >>> This patch will be removed to base driver in future.
> >>>
> >>> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> >>> Signed-off-by: Stroe Laura <laura.stroe@intel.com>
> >>> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
> >>> Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> >>> ---
> >>>  drivers/net/i40e/i40e_ethdev.h | 106
> ++++++++++++++++++++++++++++
> >>>  drivers/net/i40e/i40e_flow.c   | 152
> >> +++++++++++++++++++++++++++++++++++++++++
> >>>  2 files changed, 258 insertions(+)
> >>>
> >>> diff --git a/drivers/net/i40e/i40e_ethdev.h
> >>> b/drivers/net/i40e/i40e_ethdev.h index f545850..3a49865 100644
> >>> --- a/drivers/net/i40e/i40e_ethdev.h
> >>> +++ b/drivers/net/i40e/i40e_ethdev.h
> >>> @@ -729,6 +729,100 @@ struct i40e_valid_pattern {
> >>>  	parse_filter_t parse_filter;
> >>>  };
> >>>
> >>> +/* Support replace filter */
> >>> +
> >>> +/* i40e_aqc_add_remove_cloud_filters_element_big_data is used
> when
> >>> + * I40E_AQC_ADD_REM_CLOUD_CMD_BIG_BUFFER flag is set. refer to
> >>> + * DCR288
> >>
> >> Please do not refer to DCR, unless you can provide a public link for it.
> > OK, got it.
> >
> >>
> >>> + */
> >>> +struct i40e_aqc_add_remove_cloud_filters_element_big_data {
> >>> +	struct i40e_aqc_add_remove_cloud_filters_element_data element;
> >>
> >> What is the difference between
> >> "i40e_aqc_add_remove_cloud_filters_element_big_data" and
> >> "i40e_aqc_add_remove_cloud_filters_element_data", why need
> big_data
> >> one?
> >
> > As ' Add/Remove Cloud filters -command buffer ' is changed in the DCR288,
> 'general fields' exists only when big_buffer is set.
> 
> What does it mean having "big_buffer" set? What changes functionally being
> big_buffer set or not?

According to DCR288, "Add/Remove Cloud Filter Command" should add 'Big Buffer' in byte20, but we can't change ' struct i40e_aqc_add_remove_cloud_filters ' in base code,
struct i40e_aqc_add_remove_cloud_filters {
        u8      num_filters;
        u8      reserved;
        __le16  seid;
#define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT   0
#define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_MASK    (0x3FF << \
                                        I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT)
        u8      reserved2[4];
        __le32  addr_high;
        __le32  addr_low;
};

So we use reserverd[0] for 'Big Buffer' here, in the patch for ND, we changed above structure with following:

struct i40e_aqc_add_remove_cloud_filters {
        u8      num_filters;
        u8      reserved;
        __le16  seid;
#define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT   0
#define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_MASK    (0x3FF << \
                                        I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT)
        u8      big_buffer;
        u8      reserved2[3];
        __le32  addr_high;
        __le32  addr_low;
};


> 
> > But we don't want to change the  "
> i40e_aqc_add_remove_cloud_filters_element_data " as it will cause ABI/API
> change in kernel driver.
> >
> <...>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
  2017-03-09  5:59  3%     ` Xing, Beilei
@ 2017-03-09 10:01  0%       ` Ferruh Yigit
  2017-03-09 10:43  0%         ` Xing, Beilei
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-03-09 10:01 UTC (permalink / raw)
  To: Xing, Beilei, Wu, Jingjing
  Cc: Zhang, Helin, dev, Iremonger, Bernard, Stroe, Laura

On 3/9/2017 5:59 AM, Xing, Beilei wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Wednesday, March 8, 2017 11:50 PM
>> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
>> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Iremonger,
>> Bernard <bernard.iremonger@intel.com>; Stroe, Laura
>> <laura.stroe@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
>>
>> On 3/3/2017 9:31 AM, Beilei Xing wrote:
>>> Add new admin queue function and extended fields in DCR 288:
>>>  - Add admin queue function for Replace filter
>>>    command (Opcode: 0x025F)
>>>  - Add General fields for Add/Remove Cloud filters
>>>    command
>>>
>>> This patch will be removed to base driver in future.
>>>
>>> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
>>> Signed-off-by: Stroe Laura <laura.stroe@intel.com>
>>> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
>>> Signed-off-by: Beilei Xing <beilei.xing@intel.com>
>>> ---
>>>  drivers/net/i40e/i40e_ethdev.h | 106 ++++++++++++++++++++++++++++
>>>  drivers/net/i40e/i40e_flow.c   | 152
>> +++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 258 insertions(+)
>>>
>>> diff --git a/drivers/net/i40e/i40e_ethdev.h
>>> b/drivers/net/i40e/i40e_ethdev.h index f545850..3a49865 100644
>>> --- a/drivers/net/i40e/i40e_ethdev.h
>>> +++ b/drivers/net/i40e/i40e_ethdev.h
>>> @@ -729,6 +729,100 @@ struct i40e_valid_pattern {
>>>  	parse_filter_t parse_filter;
>>>  };
>>>
>>> +/* Support replace filter */
>>> +
>>> +/* i40e_aqc_add_remove_cloud_filters_element_big_data is used when
>>> + * I40E_AQC_ADD_REM_CLOUD_CMD_BIG_BUFFER flag is set. refer to
>>> + * DCR288
>>
>> Please do not refer to DCR, unless you can provide a public link for it.
> OK, got it.
> 
>>
>>> + */
>>> +struct i40e_aqc_add_remove_cloud_filters_element_big_data {
>>> +	struct i40e_aqc_add_remove_cloud_filters_element_data element;
>>
>> What is the difference between
>> "i40e_aqc_add_remove_cloud_filters_element_big_data" and
>> "i40e_aqc_add_remove_cloud_filters_element_data", why need big_data
>> one?
> 
> As ' Add/Remove Cloud filters -command buffer ' is changed in the DCR288, 'general fields' exists only when big_buffer is set.

What does it mean having "big_buffer" set? What changes functionally
being big_buffer set or not?

> But we don't want to change the  " i40e_aqc_add_remove_cloud_filters_element_data " as it will cause ABI/API change in kernel driver.
> 
<...>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 4/4] net/i40e: refine consistent tunnel filter
  @ 2017-03-09  6:11  3%     ` Xing, Beilei
  0 siblings, 0 replies; 200+ results
From: Xing, Beilei @ 2017-03-09  6:11 UTC (permalink / raw)
  To: Yigit, Ferruh, Wu, Jingjing; +Cc: Zhang, Helin, dev



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, March 8, 2017 11:51 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 4/4] net/i40e: refine consistent tunnel filter
> 
> On 3/3/2017 9:31 AM, Beilei Xing wrote:
> > Add i40e_tunnel_type enumeration type to refine consistent tunnel
> > filter, it will be esay to add new tunnel type for
> 
> s/esay/easy
> 
> > i40e.
> >
> > Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> 
> <...>
> 
> >  /**
> > + * Tunnel type.
> > + */
> > +enum i40e_tunnel_type {
> > +	I40E_TUNNEL_TYPE_NONE = 0,
> > +	I40E_TUNNEL_TYPE_VXLAN,
> > +	I40E_TUNNEL_TYPE_GENEVE,
> > +	I40E_TUNNEL_TYPE_TEREDO,
> > +	I40E_TUNNEL_TYPE_NVGRE,
> > +	I40E_TUNNEL_TYPE_IP_IN_GRE,
> > +	I40E_L2_TUNNEL_TYPE_E_TAG,
> > +	I40E_TUNNEL_TYPE_MAX,
> > +};
> 
> Same question here, there is already "rte_eth_tunnel_type", why driver is
> duplicating the structure?
> 

Same with " struct i40e_tunnel_filter_conf ", to avoid ABI change, we create it in PMD to add new tunnel type easily, like MPLS.

> <...>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
  @ 2017-03-09  5:59  3%     ` Xing, Beilei
  2017-03-09 10:01  0%       ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Xing, Beilei @ 2017-03-09  5:59 UTC (permalink / raw)
  To: Yigit, Ferruh, Wu, Jingjing
  Cc: Zhang, Helin, dev, Iremonger, Bernard, Stroe, Laura



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, March 8, 2017 11:50 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Iremonger,
> Bernard <bernard.iremonger@intel.com>; Stroe, Laura
> <laura.stroe@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
> 
> On 3/3/2017 9:31 AM, Beilei Xing wrote:
> > Add new admin queue function and extended fields in DCR 288:
> >  - Add admin queue function for Replace filter
> >    command (Opcode: 0x025F)
> >  - Add General fields for Add/Remove Cloud filters
> >    command
> >
> > This patch will be removed to base driver in future.
> >
> > Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> > Signed-off-by: Stroe Laura <laura.stroe@intel.com>
> > Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
> > Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> > ---
> >  drivers/net/i40e/i40e_ethdev.h | 106 ++++++++++++++++++++++++++++
> >  drivers/net/i40e/i40e_flow.c   | 152
> +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 258 insertions(+)
> >
> > diff --git a/drivers/net/i40e/i40e_ethdev.h
> > b/drivers/net/i40e/i40e_ethdev.h index f545850..3a49865 100644
> > --- a/drivers/net/i40e/i40e_ethdev.h
> > +++ b/drivers/net/i40e/i40e_ethdev.h
> > @@ -729,6 +729,100 @@ struct i40e_valid_pattern {
> >  	parse_filter_t parse_filter;
> >  };
> >
> > +/* Support replace filter */
> > +
> > +/* i40e_aqc_add_remove_cloud_filters_element_big_data is used when
> > + * I40E_AQC_ADD_REM_CLOUD_CMD_BIG_BUFFER flag is set. refer to
> > + * DCR288
> 
> Please do not refer to DCR, unless you can provide a public link for it.
OK, got it.

> 
> > + */
> > +struct i40e_aqc_add_remove_cloud_filters_element_big_data {
> > +	struct i40e_aqc_add_remove_cloud_filters_element_data element;
> 
> What is the difference between
> "i40e_aqc_add_remove_cloud_filters_element_big_data" and
> "i40e_aqc_add_remove_cloud_filters_element_data", why need big_data
> one?

As ' Add/Remove Cloud filters -command buffer ' is changed in the DCR288, 'general fields' exists only when big_buffer is set.
But we don't want to change the  " i40e_aqc_add_remove_cloud_filters_element_data " as it will cause ABI/API change in kernel driver.

> 
> > +	uint16_t     general_fields[32];
> 
> Not very useful variable name.

It's the name from DCR.

> 
> <...>
> 
> > +/* Replace filter Command 0x025F
> > + * uses the i40e_aqc_replace_cloud_filters,
> > + * and the generic indirect completion structure  */ struct
> > +i40e_filter_data {
> > +	uint8_t filter_type;
> > +	uint8_t input[3];
> > +};
> > +
> > +struct i40e_aqc_replace_cloud_filters_cmd {
> 
> Is replace does something different than remove old and add new cloud
> filter?

It's just like remove an old filter and add a new filter.
It can replace both l1 filter and cloud filter.

> 
> <...>
> 
> > +enum i40e_status_code i40e_aq_add_cloud_filters_big_buffer(struct
> i40e_hw *hw,
> > +	   uint16_t seid,
> > +	   struct i40e_aqc_add_remove_cloud_filters_element_big_data
> *filters,
> > +	   uint8_t filter_count);
> > +enum i40e_status_code i40e_aq_remove_cloud_filters_big_buffer(
> > +	struct i40e_hw *hw, uint16_t seid,
> > +	struct i40e_aqc_add_remove_cloud_filters_element_big_data
> *filters,
> > +	uint8_t filter_count);
> > +enum i40e_status_code i40e_aq_replace_cloud_filters(struct i40e_hw
> *hw,
> > +		    struct i40e_aqc_replace_cloud_filters_cmd *filters,
> > +		    struct i40e_aqc_replace_cloud_filters_cmd_buf
> *cmd_buf);
> > +
> 
> Do you need these function declarations?
We can remove it if we define them with "static".

> 
> >  #define I40E_DEV_TO_PCI(eth_dev) \
> >  	RTE_DEV_TO_PCI((eth_dev)->device)
> >
> > diff --git a/drivers/net/i40e/i40e_flow.c
> > b/drivers/net/i40e/i40e_flow.c index f163ce5..3c49228 100644
> > --- a/drivers/net/i40e/i40e_flow.c
> > +++ b/drivers/net/i40e/i40e_flow.c
> > @@ -1874,3 +1874,155 @@ i40e_flow_flush_tunnel_filter(struct i40e_pf
> > *pf)
> >
> >  	return ret;
> >  }
> > +
> > +#define i40e_aqc_opc_replace_cloud_filters 0x025F #define
> > +I40E_AQC_ADD_REM_CLOUD_CMD_BIG_BUFFER 1
> > +/**
> > + * i40e_aq_add_cloud_filters_big_buffer
> > + * @hw: pointer to the hardware structure
> > + * @seid: VSI seid to add cloud filters from
> > + * @filters: Buffer which contains the filters in big buffer to be
> > +added
> > + * @filter_count: number of filters contained in the buffer
> > + *
> > + * Set the cloud filters for a given VSI.  The contents of the
> > + * i40e_aqc_add_remove_cloud_filters_element_big_data are filled
> > + * in by the caller of the function.
> > + *
> > + **/
> > +enum i40e_status_code i40e_aq_add_cloud_filters_big_buffer(
> 
> There are already non big_buffer versions of these functions, like
> "i40e_aq_add_cloud_filters()" why big_data version required, what it does
> differently?

Parameters are different.
We add i40e_aq_add_cloud_filters_big_buffer to handle structure " i40e_aqc_add_remove_cloud_filters_element_data " which includes general_fields.

> 
> And is there a reason that these functions are not static? (For this patch they
> are not used at all and will cause build error, but my question is after they
> started to be used)

No.. same with the patch for Pipeline Personalization Profile, it's designed according to base code style.

> 
> <...>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] Issues with ixgbe  and rte_flow
  @ 2017-03-08 15:41  3%     ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2017-03-08 15:41 UTC (permalink / raw)
  To: Le Scouarnec Nicolas
  Cc: Lu, Wenzhuo, dev, users, Jan Medala, Evgeny Schemeilin,
	Stephen Hurd, Jerin Jacob, Rahul Lakkireddy, John Daley,
	Matej Vido, Helin Zhang, Konstantin Ananyev, Jingjing Wu,
	Jing Chen, Alejandro Lucero, Harish Patil, Rasesh Mody,
	Andrew Rybchenko, Nelio Laranjeiro, Vasily Philipov,
	Pascal Mazon, Thomas Monjalon

CC'ing users@dpdk.org since this issue primarily affects rte_flow users, and
several PMD maintainers to get their opinion on the matter, see below.

On Wed, Mar 08, 2017 at 09:24:26AM +0000, Le Scouarnec Nicolas wrote:
> My response is inline bellow, and further comment on the code excerpt also
> 
> 
> From: Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Sent: Wednesday, March 8, 2017 4:16 AM
> To: Le Scouarnec Nicolas; dev@dpdk.org; Adrien Mazarguil (adrien.mazarguil@6wind.com)
> Cc: Yigit, Ferruh
> Subject: RE: Issues with ixgbe and rte_flow
>     
> >> I have been using the new API rte_flow to program filtering on an X540 (ixgbe)
> >> NIC. My goal is to send packets from different VLANs to different queues
> >> (filtering which should be supported by flow director as far as I understand). I
> >> enclosed the setup code at the bottom of this email.
> >> For reference, here is the setup code I use
> >>
> >>       vlan_spec.tci = vlan_be;
> >>       vlan_spec.tpid = 0;
> >>
> >>       vlan_mask.tci = rte_cpu_to_be_16(0x0fff);
> >>       vlan_mask.tpid =  0;
> 
> >To my opinion, this setting is not right. As we know, vlan tag is inserted between MAC source address and Ether type.
> >So if we have a MAC+VLAN+IPv4 packet, the vlan_spec.tpid should be 0x8100, the eth_spec.type should be 0x0800.
> >+ Adrien, the author. He can correct me if I'm wrong.

That's right, however the confusion is understandable, perhaps the
documentation should be clearer. It currently states what follows without
describing the reason:

 /**
  * RTE_FLOW_ITEM_TYPE_VLAN
  *
  * Matches an 802.1Q/ad VLAN tag.
  *
  * This type normally follows either RTE_FLOW_ITEM_TYPE_ETH or
  * RTE_FLOW_ITEM_TYPE_VLAN.
  */

> Ok, I apologize, you're right. Being more used to the software-side than to the hardware-side, I misunderstood struct rte_flow_item_vlan and though it was the "equivalent" of struct vlan_hdr, in which case the vlan_hdr contains the type of the encapsulated frame.
> 
> (  /**
>  * Ethernet VLAN Header.
>  * Contains the 16-bit VLAN Tag Control Identifier and the Ethernet type
>  * of the encapsulated frame.
>  */
> struct vlan_hdr {
> 	uint16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
> 	uint16_t eth_proto;/**< Ethernet type of encapsulated frame. */
> } __attribute__((__packed__));        )

Indeed, struct vlan_hdr and struct rte_flow_item_vlan are not mapped at the
same offset; the former includes EtherType of the inner packet (eth_proto),
while the latter describes the inserted VLAN header itself starting with
TPID.

This approach was chosen for rte_flow for consistency with the fact each
pattern item describes exactly one protocol header, even though in the case
of VLAN and other layer 2.5 protocols, some happen to be embedded.
IPv4/IPv6 options will be provided as separate items in a similar fashion.

It also allows adding/removing VLAN tags to an existing rule without
modifying the EtherType of the inner frame.

Now assuming you're not the only one facing that issue, if the current
definition does not make sense, perhaps we can update the API before it's
too late. I'll attempt to summarize it with an example below.

In any case, matching nonspecific VLAN-tagged and QinQ UDPv4 packets in
testpmd is written as:

 flow create 0 pattern eth / vlan / ipv4 / udp / end actions queue 1 / end
 flow create 0 pattern eth / vlan / vlan / ipv4 / udp / end actions queue 1 / end

However, with the current API described above, specifying inner/outer
EtherTypes for the above packets yields (as a reminder, 0x8100 stands for
VLAN, 0x8000 for IPv4 and 0x88A8 for QinQ):

#1

 flow create 0 pattern eth type is 0x8000 / vlan tpid is 0x8100 / ipv4 / udp / actions queue 1 / end
 flow create 0 pattern eth type is 0x8000 / vlan tpid is 0x88A8 / vlan tpid is 0x8100 / ipv4 / udp / actions queue 1 / end

Instead of the arguably more accurate (renaming "tpid" to "inner_type" for
clarity):

#2

 flow create 0 pattern eth type is 0x8100 / vlan type is 0x8000 / ipv4 / udp / actions queue 1 / end
 flow create 0 pattern eth type is 0x88A8 / vlan inner_type is 0x8100 / vlan inner_type is 0x8000 / ipv4 / udp / actions queue 1 / end

So, should the VLAN item be updated to behave as described in #2?

Note: doing so will cause a serious API/ABI breakage, I know it was not
supposed to happen according to the rte_flow sales pitch, but hey.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 09/14] ring: allow dequeue fns to return remaining entry count
                       ` (5 preceding siblings ...)
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
@ 2017-03-07 11:32  2%   ` Bruce Richardson
    7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

Add an extra parameter to the ring dequeue burst/bulk functions so that
those functions can optionally return the amount of remaining objs in the
ring. This information can be used by applications in a number of ways,
for instance, with single-consumer queues, it provides a max
dequeue size which is guaranteed to work.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/pdump/main.c                                   |  2 +-
 doc/guides/rel_notes/release_17_05.rst             |  8 ++
 drivers/crypto/null/null_crypto_pmd.c              |  2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c             |  3 +-
 drivers/net/ring/rte_eth_ring.c                    |  2 +-
 examples/distributor/main.c                        |  2 +-
 examples/load_balancer/runtime.c                   |  6 +-
 .../client_server_mp/mp_client/client.c            |  3 +-
 examples/packet_ordering/main.c                    |  6 +-
 examples/qos_sched/app_thread.c                    |  6 +-
 examples/quota_watermark/qw/main.c                 |  5 +-
 examples/server_node_efd/node/node.c               |  2 +-
 lib/librte_hash/rte_cuckoo_hash.c                  |  3 +-
 lib/librte_mempool/rte_mempool_ring.c              |  4 +-
 lib/librte_port/rte_port_frag.c                    |  3 +-
 lib/librte_port/rte_port_ring.c                    |  6 +-
 lib/librte_ring/rte_ring.h                         | 90 +++++++++++-----------
 test/test-pipeline/runtime.c                       |  6 +-
 test/test/test_link_bonding_mode4.c                |  3 +-
 test/test/test_pmd_ring_perf.c                     |  7 +-
 test/test/test_ring.c                              | 54 ++++++-------
 test/test/test_ring_perf.c                         | 20 +++--
 test/test/test_table_acl.c                         |  2 +-
 test/test/test_table_pipeline.c                    |  2 +-
 test/test/test_table_ports.c                       |  8 +-
 test/test/virtual_pmd.c                            |  4 +-
 26 files changed, 145 insertions(+), 114 deletions(-)

diff --git a/app/pdump/main.c b/app/pdump/main.c
index b88090d..3b13753 100644
--- a/app/pdump/main.c
+++ b/app/pdump/main.c
@@ -496,7 +496,7 @@ pdump_rxtx(struct rte_ring *ring, uint8_t vdev_id, struct pdump_stats *stats)
 
 	/* first dequeue packets from ring of primary process */
 	const uint16_t nb_in_deq = rte_ring_dequeue_burst(ring,
-			(void *)rxtx_bufs, BURST_SIZE);
+			(void *)rxtx_bufs, BURST_SIZE, NULL);
 	stats->dequeue_pkts += nb_in_deq;
 
 	if (nb_in_deq) {
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 249ad6e..563a74c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -123,6 +123,8 @@ API Changes
   * added an extra parameter to the burst/bulk enqueue functions to
     return the number of free spaces in the ring after enqueue. This can
     be used by an application to implement its own watermark functionality.
+  * added an extra parameter to the burst/bulk dequeue functions to return
+    the number elements remaining in the ring after dequeue.
   * changed the return value of the enqueue and dequeue bulk functions to
     match that of the burst equivalents. In all cases, ring functions which
     operate on multiple packets now return the number of elements enqueued
@@ -135,6 +137,12 @@ API Changes
     - ``rte_ring_sc_dequeue_bulk``
     - ``rte_ring_dequeue_bulk``
 
+    NOTE: the above functions all have different parameters as well as
+    different return values, due to the other listed changes above. This
+    means that all instances of the functions in existing code will be
+    flagged by the compiler. The return value usage should be checked
+    while fixing the compiler error due to the extra parameter.
+
 ABI Changes
 -----------
 
diff --git a/drivers/crypto/null/null_crypto_pmd.c b/drivers/crypto/null/null_crypto_pmd.c
index ed5a9fc..f68ec8d 100644
--- a/drivers/crypto/null/null_crypto_pmd.c
+++ b/drivers/crypto/null/null_crypto_pmd.c
@@ -155,7 +155,7 @@ null_crypto_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
 	unsigned nb_dequeued;
 
 	nb_dequeued = rte_ring_dequeue_burst(qp->processed_pkts,
-			(void **)ops, nb_ops);
+			(void **)ops, nb_ops, NULL);
 	qp->qp_stats.dequeued_count += nb_dequeued;
 
 	return nb_dequeued;
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index f3ac9e2..96638af 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1008,7 +1008,8 @@ bond_ethdev_tx_burst_8023ad(void *queue, struct rte_mbuf **bufs,
 		struct port *port = &mode_8023ad_ports[slaves[i]];
 
 		slave_slow_nb_pkts[i] = rte_ring_dequeue_burst(port->tx_ring,
-				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS);
+				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS,
+				NULL);
 		slave_nb_pkts[i] = slave_slow_nb_pkts[i];
 
 		for (j = 0; j < slave_slow_nb_pkts[i]; j++)
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index adbf478..77ef3a1 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -88,7 +88,7 @@ eth_ring_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
 	void **ptrs = (void *)&bufs[0];
 	struct ring_queue *r = q;
 	const uint16_t nb_rx = (uint16_t)rte_ring_dequeue_burst(r->rng,
-			ptrs, nb_bufs);
+			ptrs, nb_bufs, NULL);
 	if (r->rng->flags & RING_F_SC_DEQ)
 		r->rx_pkts.cnt += nb_rx;
 	else
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index cfd360b..5cb6185 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -330,7 +330,7 @@ lcore_tx(struct rte_ring *in_r)
 
 			struct rte_mbuf *bufs[BURST_SIZE];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE, NULL);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 1645994..8192c08 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -349,7 +349,8 @@ app_lcore_io_tx(
 			ret = rte_ring_sc_dequeue_bulk(
 				ring,
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
-				bsz_rd);
+				bsz_rd,
+				NULL);
 
 			if (unlikely(ret == 0))
 				continue;
@@ -504,7 +505,8 @@ app_lcore_worker(
 		ret = rte_ring_sc_dequeue_bulk(
 			ring_in,
 			(void **) lp->mbuf_in.array,
-			bsz_rd);
+			bsz_rd,
+			NULL);
 
 		if (unlikely(ret == 0))
 			continue;
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index dca9eb9..01b535c 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -279,7 +279,8 @@ main(int argc, char *argv[])
 		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts,
+				PKT_READ_SIZE, NULL);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index d268350..7719dad 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -462,7 +462,7 @@ worker_thread(void *args_ptr)
 
 		/* dequeue the mbufs from rx_to_workers ring */
 		burst_size = rte_ring_dequeue_burst(ring_in,
-				(void *)burst_buffer, MAX_PKTS_BURST);
+				(void *)burst_buffer, MAX_PKTS_BURST, NULL);
 		if (unlikely(burst_size == 0))
 			continue;
 
@@ -510,7 +510,7 @@ send_thread(struct send_thread_args *args)
 
 		/* deque the mbufs from workers_to_tx ring */
 		nb_dq_mbufs = rte_ring_dequeue_burst(args->ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(nb_dq_mbufs == 0))
 			continue;
@@ -595,7 +595,7 @@ tx_thread(struct rte_ring *ring_in)
 
 		/* deque the mbufs from workers_to_tx ring */
 		dqnum = rte_ring_dequeue_burst(ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(dqnum == 0))
 			continue;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 0c81a15..15f117f 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -179,7 +179,7 @@ app_tx_thread(struct thread_conf **confs)
 
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
-					burst_conf.qos_dequeue);
+					burst_conf.qos_dequeue, NULL);
 		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
@@ -218,7 +218,7 @@ app_worker_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
@@ -254,7 +254,7 @@ app_mixed_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
diff --git a/examples/quota_watermark/qw/main.c b/examples/quota_watermark/qw/main.c
index 57df8ef..2dcddea 100644
--- a/examples/quota_watermark/qw/main.c
+++ b/examples/quota_watermark/qw/main.c
@@ -247,7 +247,8 @@ pipeline_stage(__attribute__((unused)) void *args)
 			}
 
 			/* Dequeue up to quota mbuf from rx */
-			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota);
+			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts,
+					*quota, NULL);
 			if (unlikely(nb_dq_pkts < 0))
 				continue;
 
@@ -305,7 +306,7 @@ send_stage(__attribute__((unused)) void *args)
 
 			/* Dequeue packets from tx and send them */
 			nb_dq_pkts = (uint16_t) rte_ring_dequeue_burst(tx,
-					(void *) tx_pkts, *quota);
+					(void *) tx_pkts, *quota, NULL);
 			rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
 
 			/* TODO: Check if nb_dq_pkts == nb_tx_pkts? */
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index 9ec6a05..f780b92 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) == 0))
+					rx_pkts, NULL) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 6552199..645c0cf 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -536,7 +536,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
 			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
-					cached_free_slots->objs, LCORE_CACHE_SIZE);
+					cached_free_slots->objs,
+					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0)
 				return -ENOSPC;
 
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index 9b8fd2b..5c132bf 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -58,14 +58,14 @@ static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_mc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_sc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_port/rte_port_frag.c b/lib/librte_port/rte_port_frag.c
index 0fcace9..320407e 100644
--- a/lib/librte_port/rte_port_frag.c
+++ b/lib/librte_port/rte_port_frag.c
@@ -186,7 +186,8 @@ rte_port_ring_reader_frag_rx(void *port,
 		/* If "pkts" buffer is empty, read packet burst from ring */
 		if (p->n_pkts == 0) {
 			p->n_pkts = rte_ring_sc_dequeue_burst(p->ring,
-				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX);
+				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX,
+				NULL);
 			RTE_PORT_RING_READER_FRAG_STATS_PKTS_IN_ADD(p, p->n_pkts);
 			if (p->n_pkts == 0)
 				return n_pkts_out;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 9fadac7..492b0e7 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -111,7 +111,8 @@ rte_port_ring_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts)
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
@@ -124,7 +125,8 @@ rte_port_ring_multi_reader_rx(void *port, struct rte_mbuf **pkts,
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 73b1c26..ca25dd7 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -491,7 +491,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -500,11 +501,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	unsigned int i;
 	uint32_t mask = r->mask;
 
-	/* Avoid the unnecessary cmpset operation below, which is also
-	 * potentially harmful when n equals 0. */
-	if (n == 0)
-		return 0;
-
 	/* move cons.head atomically */
 	do {
 		/* Restore n as it may change every loop */
@@ -519,15 +515,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		entries = (prod_tail - cons_head);
 
 		/* Set the actual entries for dequeue */
-		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED)
-				return 0;
-			else {
-				if (unlikely(entries == 0))
-					return 0;
-				n = entries;
-			}
-		}
+		if (n > entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+		if (unlikely(n == 0))
+			goto end;
 
 		cons_next = cons_head + n;
 		success = rte_atomic32_cmpset(&r->cons.head, cons_head,
@@ -546,7 +538,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		rte_pause();
 
 	r->cons.tail = cons_next;
-
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -570,7 +564,8 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  */
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -585,15 +580,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * and size(ring)-1. */
 	entries = prod_tail - cons_head;
 
-	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED)
-			return 0;
-		else {
-			if (unlikely(entries == 0))
-				return 0;
-			n = entries;
-		}
-	}
+	if (n > entries)
+		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+	if (unlikely(entries == 0))
+		goto end;
 
 	cons_next = cons_head + n;
 	r->cons.head = cons_next;
@@ -603,6 +594,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -749,9 +743,11 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -768,9 +764,11 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -790,12 +788,13 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
+		unsigned int *available)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
 }
 
 /**
@@ -816,7 +815,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -834,7 +833,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -856,7 +855,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -1046,9 +1045,11 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1066,9 +1067,11 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1088,12 +1091,13 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - Number of objects dequeued
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
 }
 
 #ifdef __cplusplus
diff --git a/test/test-pipeline/runtime.c b/test/test-pipeline/runtime.c
index c06ff54..8970e1c 100644
--- a/test/test-pipeline/runtime.c
+++ b/test/test-pipeline/runtime.c
@@ -121,7 +121,8 @@ app_main_loop_worker(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_rx[i],
 			(void **) worker_mbuf->array,
-			app.burst_size_worker_read);
+			app.burst_size_worker_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
@@ -151,7 +152,8 @@ app_main_loop_tx(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_tx[i],
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
-			app.burst_size_tx_read);
+			app.burst_size_tx_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
diff --git a/test/test/test_link_bonding_mode4.c b/test/test/test_link_bonding_mode4.c
index 8df28b4..15091b1 100644
--- a/test/test/test_link_bonding_mode4.c
+++ b/test/test/test_link_bonding_mode4.c
@@ -193,7 +193,8 @@ static uint8_t lacpdu_rx_count[RTE_MAX_ETHPORTS] = {0, };
 static int
 slave_get_pkts(struct slave_conf *slave, struct rte_mbuf **buf, uint16_t size)
 {
-	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf, size);
+	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf,
+			size, NULL);
 }
 
 /*
diff --git a/test/test/test_pmd_ring_perf.c b/test/test/test_pmd_ring_perf.c
index 045a7f2..004882a 100644
--- a/test/test/test_pmd_ring_perf.c
+++ b/test/test/test_pmd_ring_perf.c
@@ -67,7 +67,7 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t eth_start = rte_rdtsc();
@@ -99,7 +99,7 @@ test_single_enqueue_dequeue(void)
 	rte_compiler_barrier();
 	for (i = 0; i < iterations; i++) {
 		rte_ring_enqueue_bulk(r, &burst, 1, NULL);
-		rte_ring_dequeue_bulk(r, &burst, 1);
+		rte_ring_dequeue_bulk(r, &burst, 1, NULL);
 	}
 	const uint64_t sc_end = rte_rdtsc_precise();
 	rte_compiler_barrier();
@@ -133,7 +133,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, (void *)burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, (void *)burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, (void *)burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index b0ca88b..858ebc1 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -119,7 +119,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		    __func__, i, rand);
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
 				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
+				NULL) == rand);
 
 		/* fill the ring */
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
@@ -129,7 +130,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
+				NULL) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -186,19 +188,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -232,19 +234,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -265,7 +267,7 @@ test_ring_basic(void)
 		cur_src += MAX_BULK;
 		if (ret == 0)
 			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if (ret == 0)
 			goto fail;
@@ -303,13 +305,13 @@ test_ring_basic(void)
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue2\n");
@@ -390,19 +392,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1) ;
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -451,19 +453,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -505,19 +507,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -539,7 +541,7 @@ test_ring_burst_basic(void)
 		cur_src += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
@@ -578,19 +580,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -613,7 +615,7 @@ test_ring_burst_basic(void)
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret != 2)
 		goto fail;
@@ -753,7 +755,7 @@ test_ring_basic_ex(void)
 		goto fail_test;
 	}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2);
+	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
 	if (ret != 2) {
 		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
 		goto fail_test;
diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c
index f95a8e9..ed89896 100644
--- a/test/test/test_ring_perf.c
+++ b/test/test/test_ring_perf.c
@@ -152,12 +152,12 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t mc_end = rte_rdtsc();
 
 	printf("SC empty dequeue: %.2F\n",
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
@@ -325,7 +325,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -333,7 +334,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
@@ -361,7 +363,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -369,7 +372,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
diff --git a/test/test/test_table_acl.c b/test/test/test_table_acl.c
index b3bfda4..4d43be7 100644
--- a/test/test/test_table_acl.c
+++ b/test/test/test_table_acl.c
@@ -713,7 +713,7 @@ test_pipeline_single_filter(int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0) {
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/test/test/test_table_pipeline.c b/test/test/test_table_pipeline.c
index 36bfeda..b58aa5d 100644
--- a/test/test/test_table_pipeline.c
+++ b/test/test/test_table_pipeline.c
@@ -494,7 +494,7 @@ test_pipeline_single_filter(int test_type, int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0)
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/test/test/test_table_ports.c b/test/test/test_table_ports.c
index 395f4f3..39592ce 100644
--- a/test/test/test_table_ports.c
+++ b/test/test/test_table_ports.c
@@ -163,7 +163,7 @@ test_port_ring_writer(void)
 	rte_port_ring_writer_ops.f_flush(port);
 	expected_pkts = 1;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -7;
@@ -178,7 +178,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -193,7 +193,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -208,7 +208,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -9;
diff --git a/test/test/virtual_pmd.c b/test/test/virtual_pmd.c
index 39e070c..b209355 100644
--- a/test/test/virtual_pmd.c
+++ b/test/test/virtual_pmd.c
@@ -342,7 +342,7 @@ virtual_ethdev_rx_burst_success(void *queue __rte_unused,
 	dev_private = vrtl_eth_dev->data->dev_private;
 
 	rx_count = rte_ring_dequeue_burst(dev_private->rx_queue, (void **) bufs,
-			nb_pkts);
+			nb_pkts, NULL);
 
 	/* increments ipackets count */
 	dev_private->eth_stats.ipackets += rx_count;
@@ -508,7 +508,7 @@ virtual_ethdev_get_mbufs_from_tx_queue(uint8_t port_id,
 
 	dev_private = vrtl_eth_dev->data->dev_private;
 	return rte_ring_dequeue_burst(dev_private->tx_queue, (void **)pkt_burst,
-		burst_length);
+		burst_length, NULL);
 }
 
 static uint8_t
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent
                       ` (4 preceding siblings ...)
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 06/14] ring: remove watermark support Bruce Richardson
@ 2017-03-07 11:32  2%   ` Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
    7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

The bulk fns for rings returns 0 for all elements enqueued and negative
for no space. Change that to make them consistent with the burst functions
in returning the number of elements enqueued/dequeued, i.e. 0 or N.
This change also allows the return value from enq/deq to be used directly
without a branch for error checking.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/rel_notes/release_17_05.rst             |  11 +++
 doc/guides/sample_app_ug/server_node_efd.rst       |   2 +-
 examples/load_balancer/runtime.c                   |  16 ++-
 .../client_server_mp/mp_client/client.c            |   8 +-
 .../client_server_mp/mp_server/main.c              |   2 +-
 examples/qos_sched/app_thread.c                    |   8 +-
 examples/server_node_efd/node/node.c               |   2 +-
 examples/server_node_efd/server/main.c             |   2 +-
 lib/librte_mempool/rte_mempool_ring.c              |  12 ++-
 lib/librte_ring/rte_ring.h                         | 109 +++++++--------------
 test/test-pipeline/pipeline_hash.c                 |   2 +-
 test/test-pipeline/runtime.c                       |   8 +-
 test/test/test_ring.c                              |  46 +++++----
 test/test/test_ring_perf.c                         |   8 +-
 14 files changed, 106 insertions(+), 130 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4e748dc..2b11765 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -120,6 +120,17 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
   * removed the function ``rte_ring_set_water_mark`` as part of a general
     removal of watermarks support in the library.
+  * changed the return value of the enqueue and dequeue bulk functions to
+    match that of the burst equivalents. In all cases, ring functions which
+    operate on multiple packets now return the number of elements enqueued
+    or dequeued, as appropriate. The updated functions are:
+
+    - ``rte_ring_mp_enqueue_bulk``
+    - ``rte_ring_sp_enqueue_bulk``
+    - ``rte_ring_enqueue_bulk``
+    - ``rte_ring_mc_dequeue_bulk``
+    - ``rte_ring_sc_dequeue_bulk``
+    - ``rte_ring_dequeue_bulk``
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/server_node_efd.rst b/doc/guides/sample_app_ug/server_node_efd.rst
index 9b69cfe..e3a63c8 100644
--- a/doc/guides/sample_app_ug/server_node_efd.rst
+++ b/doc/guides/sample_app_ug/server_node_efd.rst
@@ -286,7 +286,7 @@ repeated infinitely.
 
         cl = &nodes[node];
         if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-                cl_rx_buf[node].count) != 0){
+                cl_rx_buf[node].count) != cl_rx_buf[node].count){
             for (j = 0; j < cl_rx_buf[node].count; j++)
                 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
             cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 6944325..82b10bc 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -146,7 +146,7 @@ app_lcore_io_rx_buffer_to_send (
 		(void **) lp->rx.mbuf_out[worker].array,
 		bsz);
 
-	if (unlikely(ret == -ENOBUFS)) {
+	if (unlikely(ret == 0)) {
 		uint32_t k;
 		for (k = 0; k < bsz; k ++) {
 			struct rte_mbuf *m = lp->rx.mbuf_out[worker].array[k];
@@ -312,7 +312,7 @@ app_lcore_io_rx_flush(struct app_lcore_params_io *lp, uint32_t n_workers)
 			(void **) lp->rx.mbuf_out[worker].array,
 			lp->rx.mbuf_out[worker].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->rx.mbuf_out[worker].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->rx.mbuf_out[worker].array[k];
@@ -349,9 +349,8 @@ app_lcore_io_tx(
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
 				bsz_rd);
 
-			if (unlikely(ret == -ENOENT)) {
+			if (unlikely(ret == 0))
 				continue;
-			}
 
 			n_mbufs += bsz_rd;
 
@@ -505,9 +504,8 @@ app_lcore_worker(
 			(void **) lp->mbuf_in.array,
 			bsz_rd);
 
-		if (unlikely(ret == -ENOENT)) {
+		if (unlikely(ret == 0))
 			continue;
-		}
 
 #if APP_WORKER_DROP_ALL_PACKETS
 		for (j = 0; j < bsz_rd; j ++) {
@@ -559,7 +557,7 @@ app_lcore_worker(
 
 #if APP_STATS
 			lp->rings_out_iters[port] ++;
-			if (ret == 0) {
+			if (ret > 0) {
 				lp->rings_out_count[port] += 1;
 			}
 			if (lp->rings_out_iters[port] == APP_STATS){
@@ -572,7 +570,7 @@ app_lcore_worker(
 			}
 #endif
 
-			if (unlikely(ret == -ENOBUFS)) {
+			if (unlikely(ret == 0)) {
 				uint32_t k;
 				for (k = 0; k < bsz_wr; k ++) {
 					struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
@@ -609,7 +607,7 @@ app_lcore_worker_flush(struct app_lcore_params_worker *lp)
 			(void **) lp->mbuf_out[port].array,
 			lp->mbuf_out[port].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->mbuf_out[port].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index d4f9ca3..dca9eb9 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -276,14 +276,10 @@ main(int argc, char *argv[])
 	printf("[Press Ctrl-C to quit ...]\n");
 
 	for (;;) {
-		uint16_t i, rx_pkts = PKT_READ_SIZE;
+		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		/* try dequeuing max possible packets first, if that fails, get the
-		 * most we can. Loop body should only execute once, maximum */
-		while (rx_pkts > 0 &&
-				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts, rx_pkts) != 0))
-			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring), PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/multi_process/client_server_mp/mp_server/main.c b/examples/multi_process/client_server_mp/mp_server/main.c
index a6dc12d..19c95b2 100644
--- a/examples/multi_process/client_server_mp/mp_server/main.c
+++ b/examples/multi_process/client_server_mp/mp_server/main.c
@@ -227,7 +227,7 @@ flush_rx_queue(uint16_t client)
 
 	cl = &clients[client];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[client].buffer,
-			cl_rx_buf[client].count) != 0){
+			cl_rx_buf[client].count) == 0){
 		for (j = 0; j < cl_rx_buf[client].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[client].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[client].count;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 70fdcdb..dab4594 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -107,7 +107,7 @@ app_rx_thread(struct thread_conf **confs)
 			}
 
 			if (unlikely(rte_ring_sp_enqueue_bulk(conf->rx_ring,
-								(void **)rx_mbufs, nb_rx) != 0)) {
+					(void **)rx_mbufs, nb_rx) == 0)) {
 				for(i = 0; i < nb_rx; i++) {
 					rte_pktmbuf_free(rx_mbufs[i]);
 
@@ -180,7 +180,7 @@ app_tx_thread(struct thread_conf **confs)
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
 					burst_conf.qos_dequeue);
-		if (likely(retval == 0)) {
+		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
 			conf->counter = 0; /* reset empty read loop counter */
@@ -230,7 +230,9 @@ app_worker_thread(struct thread_conf **confs)
 		nb_pkt = rte_sched_port_dequeue(conf->sched_port, mbufs,
 					burst_conf.qos_dequeue);
 		if (likely(nb_pkt > 0))
-			while (rte_ring_sp_enqueue_bulk(conf->tx_ring, (void **)mbufs, nb_pkt) != 0);
+			while (rte_ring_sp_enqueue_bulk(conf->tx_ring,
+					(void **)mbufs, nb_pkt) == 0)
+				; /* empty body */
 
 		conf_idx++;
 		if (confs[conf_idx] == NULL)
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index a6c0c70..9ec6a05 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) != 0))
+					rx_pkts) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/examples/server_node_efd/server/main.c b/examples/server_node_efd/server/main.c
index 1a54d1b..3eb7fac 100644
--- a/examples/server_node_efd/server/main.c
+++ b/examples/server_node_efd/server/main.c
@@ -247,7 +247,7 @@ flush_rx_queue(uint16_t node)
 
 	cl = &nodes[node];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-			cl_rx_buf[node].count) != 0){
+			cl_rx_buf[node].count) != cl_rx_buf[node].count){
 		for (j = 0; j < cl_rx_buf[node].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index b9aa64d..409b860 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -42,26 +42,30 @@ static int
 common_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_mp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_sp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_mc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_sc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index e7061be..5f6589f 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -352,14 +352,10 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -391,7 +387,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOBUFS;
+				return 0;
 			else {
 				/* No free entry available */
 				if (unlikely(free_entries == 0))
@@ -417,7 +413,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -433,14 +429,10 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -460,7 +452,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOBUFS;
+			return 0;
 		else {
 			/* No free entry available */
 			if (unlikely(free_entries == 0))
@@ -477,7 +469,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -498,16 +490,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
 
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -539,7 +526,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOENT;
+				return 0;
 			else {
 				if (unlikely(entries == 0))
 					return 0;
@@ -565,7 +552,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	r->cons.tail = cons_next;
 
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -583,15 +570,10 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -610,7 +592,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	if (n > entries) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOENT;
+			return 0;
 		else {
 			if (unlikely(entries == 0))
 				return 0;
@@ -626,7 +608,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -642,10 +624,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -662,10 +643,9 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -686,10 +666,9 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned n)
 {
@@ -716,7 +695,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 static inline int __attribute__((always_inline))
 rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_mp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -733,7 +712,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_sp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -754,10 +733,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_enqueue(struct rte_ring *r, void *obj)
 {
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue(r, obj);
-	else
-		return rte_ring_mp_enqueue(r, obj);
+	return rte_ring_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -773,11 +749,9 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -794,11 +768,9 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects to dequeue from the ring to the obj_table,
  *   must be strictly positive.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -818,11 +790,9 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	if (r->cons.sc_dequeue)
@@ -849,7 +819,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -867,7 +837,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -889,10 +859,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue(r, obj_p);
-	else
-		return rte_ring_mc_dequeue(r, obj_p);
+	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
diff --git a/test/test-pipeline/pipeline_hash.c b/test/test-pipeline/pipeline_hash.c
index 10d2869..1ac0aa8 100644
--- a/test/test-pipeline/pipeline_hash.c
+++ b/test/test-pipeline/pipeline_hash.c
@@ -547,6 +547,6 @@ app_main_loop_rx_metadata(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
diff --git a/test/test-pipeline/runtime.c b/test/test-pipeline/runtime.c
index 42a6142..4e20669 100644
--- a/test/test-pipeline/runtime.c
+++ b/test/test-pipeline/runtime.c
@@ -98,7 +98,7 @@ app_main_loop_rx(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -123,7 +123,7 @@ app_main_loop_worker(void) {
 			(void **) worker_mbuf->array,
 			app.burst_size_worker_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		do {
@@ -131,7 +131,7 @@ app_main_loop_worker(void) {
 				app.rings_tx[i ^ 1],
 				(void **) worker_mbuf->array,
 				app.burst_size_worker_write);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -152,7 +152,7 @@ app_main_loop_tx(void) {
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
 			app.burst_size_tx_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		n_mbufs += app.burst_size_tx_read;
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 666a451..112433b 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -117,20 +117,18 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
 		printf("%s: iteration %u, random shift: %u;\n",
 		    __func__, i, rand);
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rand));
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rand));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand) != 0);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
 
 		/* fill the ring */
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rsz));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz) != 0);
 		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
 		TEST_RING_VERIFY(rsz == rte_ring_count(r));
 		TEST_RING_VERIFY(rte_ring_full(r));
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rsz));
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -171,37 +169,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -217,37 +215,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -264,11 +262,11 @@ test_ring_basic(void)
 	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
 		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 		cur_src += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 		cur_dst += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 	}
 
@@ -294,25 +292,25 @@ test_ring_basic(void)
 
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue2\n");
 		goto fail;
 	}
diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c
index 320c20c..8ccbdef 100644
--- a/test/test/test_ring_perf.c
+++ b/test/test/test_ring_perf.c
@@ -195,13 +195,13 @@ enqueue_bulk(void *p)
 
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_sp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_mp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mp_end = rte_rdtsc();
 
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 03/14] ring: eliminate duplication of size and mask fields
    2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 01/14] ring: remove split cacheline build setting Bruce Richardson
@ 2017-03-07 11:32  3%   ` Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 04/14] ring: remove debug setting Bruce Richardson
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_ring/rte_ring.c | 20 ++++++++++----------
 lib/librte_ring/rte_ring.h | 32 ++++++++++++++++----------------
 test/test/test_ring.c      |  6 +++---
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4bc6da1..80fc356 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -144,11 +144,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.watermark = count;
+	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
-	r->prod.size = r->cons.size = count;
-	r->prod.mask = r->cons.mask = count-1;
+	r->size = count;
+	r->mask = count - 1;
 	r->prod.head = r->cons.head = 0;
 	r->prod.tail = r->cons.tail = 0;
 
@@ -269,14 +269,14 @@ rte_ring_free(struct rte_ring *r)
 int
 rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 {
-	if (count >= r->prod.size)
+	if (count >= r->size)
 		return -EINVAL;
 
 	/* if count is 0, disable the watermarking */
 	if (count == 0)
-		count = r->prod.size;
+		count = r->size;
 
-	r->prod.watermark = count;
+	r->watermark = count;
 	return 0;
 }
 
@@ -291,17 +291,17 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->prod.watermark == r->prod.size)
+	if (r->watermark == r->size)
 		fprintf(f, "  watermark=0\n");
 	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->prod.watermark);
+		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 
 	/* sum and dump statistics */
 #ifdef RTE_LIBRTE_RING_DEBUG
@@ -318,7 +318,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
 		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
 	}
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
 	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
 	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 659c6d0..61c0982 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -151,13 +151,10 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 struct rte_ring_headtail {
 	volatile uint32_t head;  /**< Prod/consumer head. */
 	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
 	union {
 		uint32_t sp_enqueue; /**< True, if single producer. */
 		uint32_t sc_dequeue; /**< True, if single consumer. */
 	};
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 };
 
 /**
@@ -177,9 +174,12 @@ struct rte_ring {
 	 * next time the ABI changes
 	 */
 	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
+	int flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_headtail prod __rte_aligned(PROD_ALIGN);
@@ -358,7 +358,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * Placed here since identical code needed in both
  * single and multi producer enqueue functions */
 #define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
+	const uint32_t size = r->size; \
 	uint32_t idx = prod_head & mask; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
@@ -385,7 +385,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * single and multi consumer dequeue functions */
 #define DEQUEUE_PTRS() do { \
 	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
+	const uint32_t size = r->size; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
 			obj_table[i] = r->ring[idx]; \
@@ -440,7 +440,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -488,7 +488,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -547,7 +547,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	prod_head = r->prod.head;
@@ -583,7 +583,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -633,7 +633,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -730,7 +730,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
 	prod_tail = r->prod.tail;
@@ -1059,7 +1059,7 @@ rte_ring_full(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+	return ((cons_tail - prod_tail - 1) & r->mask) == 0;
 }
 
 /**
@@ -1092,7 +1092,7 @@ rte_ring_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (prod_tail - cons_tail) & r->prod.mask;
+	return (prod_tail - cons_tail) & r->mask;
 }
 
 /**
@@ -1108,7 +1108,7 @@ rte_ring_free_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (cons_tail - prod_tail - 1) & r->prod.mask;
+	return (cons_tail - prod_tail - 1) & r->mask;
 }
 
 /**
@@ -1122,7 +1122,7 @@ rte_ring_free_count(const struct rte_ring *r)
 static inline unsigned int
 rte_ring_get_size(const struct rte_ring *r)
 {
-	return r->prod.size;
+	return r->size;
 }
 
 /**
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index ebcb896..5f09097 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -148,7 +148,7 @@ check_live_watermark_change(__attribute__((unused)) void *dummy)
 		}
 
 		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->prod.watermark;
+		watermark = r->watermark;
 		if (watermark != watermark_old &&
 		    (watermark_old != 16 || watermark != 32)) {
 			printf("Bad watermark change %u -> %u\n", watermark_old,
@@ -213,7 +213,7 @@ test_set_watermark( void ){
 		printf( " ring lookup failed\n" );
 		goto error;
 	}
-	count = r->prod.size*2;
+	count = r->size * 2;
 	setwm = rte_ring_set_water_mark(r, count);
 	if (setwm != -EINVAL){
 		printf("Test failed to detect invalid watermark count value\n");
@@ -222,7 +222,7 @@ test_set_watermark( void ){
 
 	count = 0;
 	rte_ring_set_water_mark(r, count);
-	if (r->prod.watermark != r->prod.size) {
+	if (r->watermark != r->size) {
 		printf("Test failed to detect invalid watermark count value\n");
 		goto error;
 	}
-- 
2.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 06/14] ring: remove watermark support
                       ` (3 preceding siblings ...)
  2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
@ 2017-03-07 11:32  2%   ` Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

Remove the watermark support. A future commit will add support for having
enqueue functions return the amount of free space in the ring, which will
allow applications to implement their own watermark checks, while also
being more useful to the app.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
V2: fix missed references to watermarks in v1

---
 doc/guides/prog_guide/ring_lib.rst     |   8 --
 doc/guides/rel_notes/release_17_05.rst |   2 +
 examples/Makefile                      |   2 +-
 lib/librte_ring/rte_ring.c             |  23 -----
 lib/librte_ring/rte_ring.h             |  58 +------------
 test/test/autotest_test_funcs.py       |   7 --
 test/test/commands.c                   |  52 ------------
 test/test/test_ring.c                  | 149 +--------------------------------
 8 files changed, 8 insertions(+), 293 deletions(-)

diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index d4ab502..b31ab7a 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -102,14 +102,6 @@ Name
 A ring is identified by a unique name.
 It is not possible to create two rings with the same name (rte_ring_create() returns NULL if this is attempted).
 
-Water Marking
-~~~~~~~~~~~~~
-
-The ring can have a high water mark (threshold).
-Once an enqueue operation reaches the high water mark, the producer is notified, if the water mark is configured.
-
-This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE.
-
 Use Cases
 ---------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index c69ca8f..4e748dc 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -118,6 +118,8 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
+  * removed the function ``rte_ring_set_water_mark`` as part of a general
+    removal of watermarks support in the library.
 
 ABI Changes
 -----------
diff --git a/examples/Makefile b/examples/Makefile
index da2bfdd..19cd5ad 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -81,7 +81,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += packet_ordering
 DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += qos_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += qos_sched
-DIRS-y += quota_watermark
+#DIRS-y += quota_watermark
 DIRS-$(CONFIG_RTE_ETHDEV_RXTX_CALLBACKS) += rxtx_callbacks
 DIRS-y += skeleton
 ifeq ($(CONFIG_RTE_LIBRTE_HASH),y)
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 90ee63f..18fb644 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -138,7 +138,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
 	r->size = count;
@@ -256,24 +255,6 @@ rte_ring_free(struct rte_ring *r)
 	rte_free(te);
 }
 
-/*
- * change the high water mark. If *count* is 0, water marking is
- * disabled
- */
-int
-rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
-{
-	if (count >= r->size)
-		return -EINVAL;
-
-	/* if count is 0, disable the watermarking */
-	if (count == 0)
-		count = r->size;
-
-	r->watermark = count;
-	return 0;
-}
-
 /* dump the status of the ring on the console */
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
@@ -287,10 +268,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->watermark == r->size)
-		fprintf(f, "  watermark=0\n");
-	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2177954..e7061be 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -156,7 +156,6 @@ struct rte_ring {
 			/**< Memzone, if any, containing the rte_ring */
 	uint32_t size;           /**< Size of ring. */
 	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_headtail prod __rte_aligned(PROD_ALIGN);
@@ -171,7 +170,6 @@ struct rte_ring {
 
 #define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
 #define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
@@ -277,26 +275,6 @@ struct rte_ring *rte_ring_create(const char *name, unsigned count,
 void rte_ring_free(struct rte_ring *r);
 
 /**
- * Change the high water mark.
- *
- * If *count* is 0, water marking is disabled. Otherwise, it is set to the
- * *count* value. The *count* value must be greater than 0 and less
- * than the ring size.
- *
- * This function can be called at any time (not necessarily at
- * initialization).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param count
- *   The new water mark value.
- * @return
- *   - 0: Success; water mark changed.
- *   - -EINVAL: Invalid water mark value.
- */
-int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
-
-/**
  * Dump the status of the ring to a file.
  *
  * @param f
@@ -377,8 +355,6 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -393,7 +369,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	int success;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -434,13 +409,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-				(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	/*
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
@@ -449,7 +417,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -468,8 +436,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -482,7 +448,6 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_next, free_entries;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	prod_head = r->prod.head;
 	cons_tail = r->cons.tail;
@@ -511,15 +476,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-			(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -685,8 +643,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -707,8 +663,6 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -733,8 +687,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -759,8 +711,6 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -778,8 +728,6 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -801,8 +749,6 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
diff --git a/test/test/autotest_test_funcs.py b/test/test/autotest_test_funcs.py
index 1c5f390..8da8fcd 100644
--- a/test/test/autotest_test_funcs.py
+++ b/test/test/autotest_test_funcs.py
@@ -292,11 +292,4 @@ def ring_autotest(child, test_name):
     elif index == 2:
         return -1, "Fail [Timeout]"
 
-    child.sendline("set_watermark test 100")
-    child.sendline("dump_ring test")
-    index = child.expect(["  watermark=100",
-                          pexpect.TIMEOUT], timeout=1)
-    if index != 0:
-        return -1, "Fail [Bad watermark]"
-
     return 0, "Success"
diff --git a/test/test/commands.c b/test/test/commands.c
index 2df46b0..551c81d 100644
--- a/test/test/commands.c
+++ b/test/test/commands.c
@@ -228,57 +228,6 @@ cmdline_parse_inst_t cmd_dump_one = {
 
 /****************/
 
-struct cmd_set_ring_result {
-	cmdline_fixed_string_t set;
-	cmdline_fixed_string_t name;
-	uint32_t value;
-};
-
-static void cmd_set_ring_parsed(void *parsed_result, struct cmdline *cl,
-				__attribute__((unused)) void *data)
-{
-	struct cmd_set_ring_result *res = parsed_result;
-	struct rte_ring *r;
-	int ret;
-
-	r = rte_ring_lookup(res->name);
-	if (r == NULL) {
-		cmdline_printf(cl, "Cannot find ring\n");
-		return;
-	}
-
-	if (!strcmp(res->set, "set_watermark")) {
-		ret = rte_ring_set_water_mark(r, res->value);
-		if (ret != 0)
-			cmdline_printf(cl, "Cannot set water mark\n");
-	}
-}
-
-cmdline_parse_token_string_t cmd_set_ring_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, set,
-				 "set_watermark");
-
-cmdline_parse_token_string_t cmd_set_ring_name =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, name, NULL);
-
-cmdline_parse_token_num_t cmd_set_ring_value =
-	TOKEN_NUM_INITIALIZER(struct cmd_set_ring_result, value, UINT32);
-
-cmdline_parse_inst_t cmd_set_ring = {
-	.f = cmd_set_ring_parsed,  /* function to call */
-	.data = NULL,      /* 2nd arg of func */
-	.help_str = "set watermark: "
-			"set_watermark <ring_name> <value>",
-	.tokens = {        /* token list, NULL terminated */
-		(void *)&cmd_set_ring_set,
-		(void *)&cmd_set_ring_name,
-		(void *)&cmd_set_ring_value,
-		NULL,
-	},
-};
-
-/****************/
-
 struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
@@ -419,7 +368,6 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_autotest,
 	(cmdline_parse_inst_t *)&cmd_dump,
 	(cmdline_parse_inst_t *)&cmd_dump_one,
-	(cmdline_parse_inst_t *)&cmd_set_ring,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx_anchor,
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 3891f5d..666a451 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -78,21 +78,6 @@
  *      - Dequeue one object, two objects, MAX_BULK objects
  *      - Check that dequeued pointers are correct
  *
- *    - Test watermark and default bulk enqueue/dequeue:
- *
- *      - Set watermark
- *      - Set default bulk value
- *      - Enqueue objects, check that -EDQUOT is returned when
- *        watermark is exceeded
- *      - Check that dequeued pointers are correct
- *
- * #. Check live watermark change
- *
- *    - Start a loop on another lcore that will enqueue and dequeue
- *      objects in a ring. It will monitor the value of watermark.
- *    - At the same time, change the watermark on the master lcore.
- *    - The slave lcore will check that watermark changes from 16 to 32.
- *
  * #. Performance tests.
  *
  * Tests done in test_ring_perf.c
@@ -115,123 +100,6 @@ static struct rte_ring *r;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
-static int
-check_live_watermark_change(__attribute__((unused)) void *dummy)
-{
-	uint64_t hz = rte_get_timer_hz();
-	void *obj_table[MAX_BULK];
-	unsigned watermark, watermark_old = 16;
-	uint64_t cur_time, end_time;
-	int64_t diff = 0;
-	int i, ret;
-	unsigned count = 4;
-
-	/* init the object table */
-	memset(obj_table, 0, sizeof(obj_table));
-	end_time = rte_get_timer_cycles() + (hz / 4);
-
-	/* check that bulk and watermark are 4 and 32 (respectively) */
-	while (diff >= 0) {
-
-		/* add in ring until we reach watermark */
-		ret = 0;
-		for (i = 0; i < 16; i ++) {
-			if (ret != 0)
-				break;
-			ret = rte_ring_enqueue_bulk(r, obj_table, count);
-		}
-
-		if (ret != -EDQUOT) {
-			printf("Cannot enqueue objects, or watermark not "
-			       "reached (ret=%d)\n", ret);
-			return -1;
-		}
-
-		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->watermark;
-		if (watermark != watermark_old &&
-		    (watermark_old != 16 || watermark != 32)) {
-			printf("Bad watermark change %u -> %u\n", watermark_old,
-			       watermark);
-			return -1;
-		}
-		watermark_old = watermark;
-
-		/* dequeue objects from ring */
-		while (i--) {
-			ret = rte_ring_dequeue_bulk(r, obj_table, count);
-			if (ret != 0) {
-				printf("Cannot dequeue (ret=%d)\n", ret);
-				return -1;
-			}
-		}
-
-		cur_time = rte_get_timer_cycles();
-		diff = end_time - cur_time;
-	}
-
-	if (watermark_old != 32 ) {
-		printf(" watermark was not updated (wm=%u)\n",
-		       watermark_old);
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-test_live_watermark_change(void)
-{
-	unsigned lcore_id = rte_lcore_id();
-	unsigned lcore_id2 = rte_get_next_lcore(lcore_id, 0, 1);
-
-	printf("Test watermark live modification\n");
-	rte_ring_set_water_mark(r, 16);
-
-	/* launch a thread that will enqueue and dequeue, checking
-	 * watermark and quota */
-	rte_eal_remote_launch(check_live_watermark_change, NULL, lcore_id2);
-
-	rte_delay_ms(100);
-	rte_ring_set_water_mark(r, 32);
-	rte_delay_ms(100);
-
-	if (rte_eal_wait_lcore(lcore_id2) < 0)
-		return -1;
-
-	return 0;
-}
-
-/* Test for catch on invalid watermark values */
-static int
-test_set_watermark( void ){
-	unsigned count;
-	int setwm;
-
-	struct rte_ring *r = rte_ring_lookup("test_ring_basic_ex");
-	if(r == NULL){
-		printf( " ring lookup failed\n" );
-		goto error;
-	}
-	count = r->size * 2;
-	setwm = rte_ring_set_water_mark(r, count);
-	if (setwm != -EINVAL){
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-
-	count = 0;
-	rte_ring_set_water_mark(r, count);
-	if (r->watermark != r->size) {
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-	return 0;
-
-error:
-	return -1;
-}
-
 /*
  * helper routine for test_ring_basic
  */
@@ -418,8 +286,7 @@ test_ring_basic(void)
 	cur_src = src;
 	cur_dst = dst;
 
-	printf("test watermark and default bulk enqueue / dequeue\n");
-	rte_ring_set_water_mark(r, 20);
+	printf("test default bulk enqueue / dequeue\n");
 	num_elems = 16;
 
 	cur_src = src;
@@ -433,8 +300,8 @@ test_ring_basic(void)
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != -EDQUOT) {
-		printf("Watermark not exceeded\n");
+	if (ret != 0) {
+		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
@@ -930,16 +797,6 @@ test_ring(void)
 		return -1;
 
 	/* basic operations */
-	if (test_live_watermark_change() < 0)
-		return -1;
-
-	if ( test_set_watermark() < 0){
-		printf ("Test failed to detect invalid parameter\n");
-		return -1;
-	}
-	else
-		printf ( "Test detected forced bad watermark values\n");
-
 	if ( test_create_count_odd() < 0){
 			printf ("Test failed to detect odd count\n");
 			return -1;
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 01/14] ring: remove split cacheline build setting
  @ 2017-03-07 11:32  4%   ` Bruce Richardson
  2017-03-07 11:32  3%   ` [dpdk-dev] [PATCH v2 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

Users compiling DPDK should not need to know or care about the arrangement
of cachelines in the rte_ring structure.  Therefore just remove the build
option and set the structures to be always split. On platforms with 64B
cachelines, for improved performance use 128B rather than 64B alignment
since it stops the producer and consumer data being on adjacent cachelines.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---

V2: Limit the cacheline * 2 alignment to platforms with < 128B line size
---
 config/common_base                     |  1 -
 doc/guides/rel_notes/release_17_05.rst |  6 ++++++
 lib/librte_ring/rte_ring.c             |  2 --
 lib/librte_ring/rte_ring.h             | 16 ++++++++++------
 4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/config/common_base b/config/common_base
index aeee13e..099ffda 100644
--- a/config/common_base
+++ b/config/common_base
@@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 #
 CONFIG_RTE_LIBRTE_RING=y
 CONFIG_RTE_LIBRTE_RING_DEBUG=n
-CONFIG_RTE_RING_SPLIT_PROD_CONS=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e25ea9f..ea45e0c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -110,6 +110,12 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Reworked rte_ring library**
+
+  The rte_ring library has been reworked and updated. The following changes
+  have been made to it:
+
+  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index ca0a108..4bc6da1 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	/* compilation-time checks */
 	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_RING_SPLIT_PROD_CONS
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #ifdef RTE_LIBRTE_RING_DEBUG
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 72ccca5..399ae3b 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -139,6 +139,14 @@ struct rte_ring_debug_stats {
 
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
+#if RTE_CACHE_LINE_SIZE < 128
+#define PROD_ALIGN (RTE_CACHE_LINE_SIZE * 2)
+#define CONS_ALIGN (RTE_CACHE_LINE_SIZE * 2)
+#else
+#define PROD_ALIGN RTE_CACHE_LINE_SIZE
+#define CONS_ALIGN RTE_CACHE_LINE_SIZE
+#endif
+
 /**
  * An RTE ring structure.
  *
@@ -168,7 +176,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Producer head. */
 		volatile uint32_t tail;  /**< Producer tail. */
-	} prod __rte_cache_aligned;
+	} prod __rte_aligned(PROD_ALIGN);
 
 	/** Ring consumer status. */
 	struct cons {
@@ -177,11 +185,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Consumer head. */
 		volatile uint32_t tail;  /**< Consumer tail. */
-#ifdef RTE_RING_SPLIT_PROD_CONS
-	} cons __rte_cache_aligned;
-#else
-	} cons;
-#endif
+	} cons __rte_aligned(CONS_ALIGN);
 
 #ifdef RTE_LIBRTE_RING_DEBUG
 	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 05/14] ring: remove the yield when waiting for tail update
                       ` (2 preceding siblings ...)
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 04/14] ring: remove debug setting Bruce Richardson
@ 2017-03-07 11:32  4%   ` Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 06/14] ring: remove watermark support Bruce Richardson
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

There was a compile time setting to enable a ring to yield when
it entered a loop in mp or mc rings waiting for the tail pointer update.
Build time settings are not recommended for enabling/disabling features,
and since this was off by default, remove it completely. If needed, a
runtime enabled equivalent can be used.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                              |  1 -
 doc/guides/prog_guide/env_abstraction_layer.rst |  5 ----
 doc/guides/rel_notes/release_17_05.rst          |  1 +
 lib/librte_ring/rte_ring.h                      | 35 +++++--------------------
 4 files changed, 7 insertions(+), 35 deletions(-)

diff --git a/config/common_base b/config/common_base
index b3d8272..d5beadd 100644
--- a/config/common_base
+++ b/config/common_base
@@ -447,7 +447,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
 # Compile librte_mempool
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 10a10a8..7c39cd2 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -352,11 +352,6 @@ Known Issues
 
   3. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
 
-  ``RTE_RING_PAUSE_REP_COUNT`` is defined for rte_ring to reduce contention. It's mainly for case 2, a yield is issued after number of times pause repeat.
-
-  It adds a sched_yield() syscall if the thread spins for too long while waiting on the other thread to finish its operations on the ring.
-  This gives the preempted thread a chance to proceed and finish with the ring enqueue/dequeue operation.
-
 + rte_timer
 
   Running  ``rte_timer_manager()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e0ebd71..c69ca8f 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -117,6 +117,7 @@ API Changes
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
+  * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index af7b7d4..2177954 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -114,11 +114,6 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-#ifndef RTE_RING_PAUSE_REP_COUNT
-#define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
-                                    *   if RTE_RING_PAUSE_REP not defined. */
-#endif
-
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
 #if RTE_CACHE_LINE_SIZE < 128
@@ -396,7 +391,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t cons_tail, free_entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -450,18 +445,9 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->prod.tail != prod_head)) {
+	while (unlikely(r->prod.tail != prod_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->prod.tail = prod_next;
 	return ret;
 }
@@ -494,7 +480,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 {
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -571,7 +557,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_next, entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -616,18 +602,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * If there are other dequeues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->cons.tail != cons_head)) {
+	while (unlikely(r->cons.tail != cons_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -662,7 +639,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 04/14] ring: remove debug setting
    2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 01/14] ring: remove split cacheline build setting Bruce Richardson
  2017-03-07 11:32  3%   ` [dpdk-dev] [PATCH v2 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
@ 2017-03-07 11:32  2%   ` Bruce Richardson
  2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

The debug option only provided statistics to the user, most of
which could be tracked by the application itself. Remove this as a
compile time option, and feature, simplifying the code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                     |   1 -
 doc/guides/prog_guide/ring_lib.rst     |   7 -
 doc/guides/rel_notes/release_17_05.rst |   1 +
 lib/librte_ring/rte_ring.c             |  41 ----
 lib/librte_ring/rte_ring.h             |  97 +-------
 test/test/test_ring.c                  | 410 ---------------------------------
 6 files changed, 13 insertions(+), 544 deletions(-)

diff --git a/config/common_base b/config/common_base
index 099ffda..b3d8272 100644
--- a/config/common_base
+++ b/config/common_base
@@ -447,7 +447,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_LIBRTE_RING_DEBUG=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index 9f69753..d4ab502 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -110,13 +110,6 @@ Once an enqueue operation reaches the high water mark, the producer is notified,
 
 This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE.
 
-Debug
-~~~~~
-
-When debug is enabled (CONFIG_RTE_LIBRTE_RING_DEBUG is set),
-the library stores some per-ring statistic counters about the number of enqueues/dequeues.
-These statistics are per-core to avoid concurrent accesses or atomic operations.
-
 Use Cases
 ---------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index ea45e0c..e0ebd71 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -116,6 +116,7 @@ API Changes
   have been made to it:
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
+  * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 80fc356..90ee63f 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -131,12 +131,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 			  RTE_CACHE_LINE_MASK) != 0);
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_LIBRTE_RING_DEBUG
-	RTE_BUILD_BUG_ON((sizeof(struct rte_ring_debug_stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
@@ -284,11 +278,6 @@ rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
 {
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats sum;
-	unsigned lcore_id;
-#endif
-
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
 	fprintf(f, "  size=%"PRIu32"\n", r->size);
@@ -302,36 +291,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		fprintf(f, "  watermark=0\n");
 	else
 		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
-
-	/* sum and dump statistics */
-#ifdef RTE_LIBRTE_RING_DEBUG
-	memset(&sum, 0, sizeof(sum));
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		sum.enq_success_bulk += r->stats[lcore_id].enq_success_bulk;
-		sum.enq_success_objs += r->stats[lcore_id].enq_success_objs;
-		sum.enq_quota_bulk += r->stats[lcore_id].enq_quota_bulk;
-		sum.enq_quota_objs += r->stats[lcore_id].enq_quota_objs;
-		sum.enq_fail_bulk += r->stats[lcore_id].enq_fail_bulk;
-		sum.enq_fail_objs += r->stats[lcore_id].enq_fail_objs;
-		sum.deq_success_bulk += r->stats[lcore_id].deq_success_bulk;
-		sum.deq_success_objs += r->stats[lcore_id].deq_success_objs;
-		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
-		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
-	}
-	fprintf(f, "  size=%"PRIu32"\n", r->size);
-	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
-	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
-	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
-	fprintf(f, "  enq_quota_objs=%"PRIu64"\n", sum.enq_quota_objs);
-	fprintf(f, "  enq_fail_bulk=%"PRIu64"\n", sum.enq_fail_bulk);
-	fprintf(f, "  enq_fail_objs=%"PRIu64"\n", sum.enq_fail_objs);
-	fprintf(f, "  deq_success_bulk=%"PRIu64"\n", sum.deq_success_bulk);
-	fprintf(f, "  deq_success_objs=%"PRIu64"\n", sum.deq_success_objs);
-	fprintf(f, "  deq_fail_bulk=%"PRIu64"\n", sum.deq_fail_bulk);
-	fprintf(f, "  deq_fail_objs=%"PRIu64"\n", sum.deq_fail_objs);
-#else
-	fprintf(f, "  no statistics available\n");
-#endif
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 61c0982..af7b7d4 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -109,24 +109,6 @@ enum rte_ring_queue_behavior {
 	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
 };
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-/**
- * A structure that stores the ring statistics (per-lcore).
- */
-struct rte_ring_debug_stats {
-	uint64_t enq_success_bulk; /**< Successful enqueues number. */
-	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
-	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
-	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
-	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
-	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
-	uint64_t deq_success_bulk; /**< Successful dequeues number. */
-	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
-	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
-	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
-} __rte_cache_aligned;
-#endif
-
 #define RTE_RING_MZ_PREFIX "RG_"
 /**< The maximum length of a ring name. */
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
@@ -187,10 +169,6 @@ struct rte_ring {
 	/** Ring consumer status. */
 	struct rte_ring_headtail cons __rte_aligned(CONS_ALIGN);
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-#endif
-
 	void *ring[] __rte_cache_aligned;   /**< Memory space of ring starts here.
 	                                     * not volatile so need to be careful
 	                                     * about compiler re-ordering */
@@ -202,27 +180,6 @@ struct rte_ring {
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
- * @internal When debug is enabled, store ring statistics.
- * @param r
- *   A pointer to the ring.
- * @param name
- *   The name of the statistics field to increment in the ring.
- * @param n
- *   The number to add to the object-oriented statistics.
- */
-#ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {                        \
-		unsigned __lcore_id = rte_lcore_id();           \
-		if (__lcore_id < RTE_MAX_LCORE) {               \
-			r->stats[__lcore_id].name##_objs += n;  \
-			r->stats[__lcore_id].name##_bulk += 1;  \
-		}                                               \
-	} while(0)
-#else
-#define __RING_STAT_ADD(r, name, n) do {} while(0)
-#endif
-
-/**
  * Calculate the memory size needed for a ring
  *
  * This function returns the number of bytes needed for a ring, given
@@ -463,17 +420,12 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOBUFS;
-			}
 			else {
 				/* No free entry available */
-				if (unlikely(free_entries == 0)) {
-					__RING_STAT_ADD(r, enq_fail, n);
+				if (unlikely(free_entries == 0))
 					return 0;
-				}
-
 				n = free_entries;
 			}
 		}
@@ -488,15 +440,11 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	/*
 	 * If there are other enqueues in progress that preceded us,
@@ -560,17 +508,12 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, enq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOBUFS;
-		}
 		else {
 			/* No free entry available */
-			if (unlikely(free_entries == 0)) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (unlikely(free_entries == 0))
 				return 0;
-			}
-
 			n = free_entries;
 		}
 	}
@@ -583,15 +526,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	r->prod.tail = prod_next;
 	return ret;
@@ -655,16 +594,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOENT;
-			}
 			else {
-				if (unlikely(entries == 0)){
-					__RING_STAT_ADD(r, deq_fail, n);
+				if (unlikely(entries == 0))
 					return 0;
-				}
-
 				n = entries;
 			}
 		}
@@ -694,7 +628,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 			sched_yield();
 		}
 	}
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -741,16 +674,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	entries = prod_tail - cons_head;
 
 	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, deq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOENT;
-		}
 		else {
-			if (unlikely(entries == 0)){
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (unlikely(entries == 0))
 				return 0;
-			}
-
 			n = entries;
 		}
 	}
@@ -762,7 +690,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	DEQUEUE_PTRS();
 	rte_smp_rmb();
 
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
 }
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 5f09097..3891f5d 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -763,412 +763,6 @@ test_ring_burst_basic(void)
 	return -1;
 }
 
-static int
-test_ring_stats(void)
-{
-
-#ifndef RTE_LIBRTE_RING_DEBUG
-	printf("Enable RTE_LIBRTE_RING_DEBUG to test ring stats.\n");
-	return 0;
-#else
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i;
-	unsigned num_items            = 0;
-	unsigned failed_enqueue_ops   = 0;
-	unsigned failed_enqueue_items = 0;
-	unsigned failed_dequeue_ops   = 0;
-	unsigned failed_dequeue_items = 0;
-	unsigned last_enqueue_ops     = 0;
-	unsigned last_enqueue_items   = 0;
-	unsigned last_quota_ops       = 0;
-	unsigned last_quota_items     = 0;
-	unsigned lcore_id = rte_lcore_id();
-	struct rte_ring_debug_stats *ring_stats = &r->stats[lcore_id];
-
-	printf("Test the ring stats.\n");
-
-	/* Reset the watermark in case it was set in another test. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Allocate some dummy object pointers. */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
-
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-
-	/* Allocate some memory for copied objects. */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-
-	/* Set the head and tail pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	/* Do Enqueue tests. */
-	printf("Test the dequeue stats.\n");
-
-	/* Fill the ring up to RING_SIZE -1. */
-	printf("Fill the ring.\n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK); i++) {
-		rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK);
-		cur_src += MAX_BULK;
-	}
-
-	/* Adjust for final enqueue = MAX_BULK -1. */
-	cur_src--;
-
-	printf("Verify that the ring is full.\n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
-
-
-	printf("Verify the enqueue success stats.\n");
-	/* Stats should match above enqueue operations to fill the ring. */
-	if (ring_stats->enq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Current max objects is RING_SIZE -1. */
-	if (ring_stats->enq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any failures yet. */
-	if (ring_stats->enq_fail_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_fail_objs != 0)
-		goto fail;
-
-
-	printf("Test stats for SP burst enqueue to a full ring.\n");
-	num_items = 2;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for SP bulk enqueue to a full ring.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP burst enqueue to a full ring.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP bulk enqueue to a full ring.\n");
-	num_items = 16;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	/* Do Dequeue tests. */
-	printf("Test the dequeue stats.\n");
-
-	printf("Empty the ring.\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* There was only RING_SIZE -1 objects to dequeue. */
-	cur_dst++;
-
-	printf("Verify ring is empty.\n");
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	printf("Verify the dequeue success stats.\n");
-	/* Stats should match above dequeue operations. */
-	if (ring_stats->deq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Objects dequeued is RING_SIZE -1. */
-	if (ring_stats->deq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any dequeue failure stats yet. */
-	if (ring_stats->deq_fail_bulk != 0)
-		goto fail;
-
-	printf("Test stats for SC burst dequeue with an empty ring.\n");
-	num_items = 2;
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for SC bulk dequeue with an empty ring.\n");
-	num_items = 4;
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC burst dequeue with an empty ring.\n");
-	num_items = 8;
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC bulk dequeue with an empty ring.\n");
-	num_items = 16;
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test total enqueue/dequeue stats.\n");
-	/* At this point the enqueue and dequeue stats should be the same. */
-	if (ring_stats->enq_success_bulk != ring_stats->deq_success_bulk)
-		goto fail;
-	if (ring_stats->enq_success_objs != ring_stats->deq_success_objs)
-		goto fail;
-	if (ring_stats->enq_fail_bulk    != ring_stats->deq_fail_bulk)
-		goto fail;
-	if (ring_stats->enq_fail_objs    != ring_stats->deq_fail_objs)
-		goto fail;
-
-
-	/* Watermark Tests. */
-	printf("Test the watermark/quota stats.\n");
-
-	printf("Verify the initial watermark stats.\n");
-	/* Watermark stats should be 0 since there is no watermark. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Set a watermark. */
-	rte_ring_set_water_mark(r, 16);
-
-	/* Reset pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue below watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should still be 0. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Success stats should have increased. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops + 1)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items + num_items)
-		goto fail;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue at watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != 1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP burst enqueue above watermark.\n");
-	num_items = 1;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP burst enqueue above watermark.\n");
-	num_items = 2;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP bulk enqueue above watermark.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP bulk enqueue above watermark.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	printf("Test watermark success stats.\n");
-	/* Success stats should be same as last non-watermarked enqueue. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items)
-		goto fail;
-
-
-	/* Cleanup. */
-
-	/* Empty the ring. */
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* Reset the watermark. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
-	return 0;
-
-fail:
-	free(src);
-	free(dst);
-	return -1;
-#endif
-}
-
 /*
  * it will always fail to create ring with a wrong ring size number in this function
  */
@@ -1335,10 +929,6 @@ test_ring(void)
 	if (test_ring_basic() < 0)
 		return -1;
 
-	/* ring stats */
-	if (test_ring_stats() < 0)
-		return -1;
-
 	/* basic operations */
 	if (test_live_watermark_change() < 0)
 		return -1;
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3 1/2] ethdev: add capability control API
  @ 2017-03-06 20:41  3%       ` Wiles, Keith
  0 siblings, 0 replies; 200+ results
From: Wiles, Keith @ 2017-03-06 20:41 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Dumitrescu, Cristian, DPDK, jerin.jacob,
	balasubramanian.manoharan, hemant.agrawal, shreyansh.jain,
	Richardson, Bruce


> On Mar 6, 2017, at 2:21 PM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
>> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
>>> 2017-03-06 16:35, Dumitrescu, Cristian:
>>>>>> +int rte_eth_dev_capability_ops_get(uint8_t port_id,
>>>>>> +	enum rte_eth_capability cap, void *arg);
>>>>> 
>>>>> What is the benefit of getting different kind of capabilities with
>>>>> the same function?
>>>>> enum + void* = ioctl
>>>>> A self-explanatory API should have a dedicated function for each kind
>>>>> of features with different argument types.
>>>> 
>>>> The advantage is providing a standard interface to query the capabilities of
>>> the device rather than having each capability provide its own mechanism in a
>>> slightly different way.
>>>> 
>>>> IMO this mechanism is of great help to guide the developers of future
>>> ethdev features on the clean path to add new features in a modular way,
>>> extending the ethdev functionality while doing so in a separate name space
>>> and file (that's why I tend to call this a plugin-like mechanism), as opposed to
>>> the current monolithic approach for ethdev, where we have 100+ API
>>> functions in a single name space and that are split into functional groups just
>>> by blank lines in the header file. It is simply the generalization of the
>>> mechanism introduced by rte_flow in release 17.02 (so all the credit should
>>> go to Adrien and not me).
>>>> 
>>>> IMO, having a standard function as above it cleaner than having a separate
>>> and slightly different function per feature. People can quickly see the set of
>>> standard ethdev capabilities and which ones are supported by a specific
>>> device. Between A) and B) below, I definitely prefer A):
>>>> A) status = rte_eth_dev_capability_ops_get(port_id,
>>> RTE_ETH_CABABILITY_TM, &tm_ops);
>>>> B) status = rte_eth_dev_tm_ops_get(port_id, &tm_ops);
>>> 
>>> I prefer B because instead of tm_ops, you can use some specific tm
>>> arguments,
>>> show their types and properly document each parameter.
>> 
>> Note that rte_flow already returns the flow ops as a void * with no strong argument type checking (approach A from above). Are you saying this is wrong?
>> 
>> 	rte_eth_dev_filter_ctrl(port_id, RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, void *eth_flow_ops);
>> 
>> Personally, I am in favour of allowing the standard interface at the expense of strong build-time type checking. Especially that this API function is between ethdev and the drivers, as opposed to between app and ethdev.
> 
> rte_eth_dev_filter_ctrl is going to be specialized in rte_flow operations.
> I agree with you on having independent API blocks in ethdev like rte_flow.
> But this function rte_eth_dev_capability_ops_get that you propose would be
> cross-blocks. I don't see the benefit.
> I especially don't think there is a sense in the enum
> 	enum rte_eth_capability {
> 		RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
> 		RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
> 		RTE_ETH_CAPABILITY_MAX
> 	}
> 
> I won't debate more on this. We have to read opinions of other reviewers.

The benefit is providing a generic API, which we do not need to alter in the future (causing ABI breakage). The PMD can add a capability to the list if not present already and then provide a API structure for the feature.

Being able to add features without having to change DPDK maybe a strong feature for companies that have special needs for its application. They just need to add a rte_eth_capability enum in a range that they want to control (which does not mean they need to change the above structure) and they can provide private features to the application especially if they are very specific features to some HW. I do not like private features, but I also do not want to stick just any old API in DPDK for any given special feature.

Today the structure is just APIs, but it could also provide some special or specific information to the application in that structure or via an API call.

Regards,
Keith

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-06  9:10  2%         ` [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements David Hunt
  2017-03-06  9:10  1%           ` [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-06  9:10  2%           ` David Hunt
  2017-03-10 16:22  0%             ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
 lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 +++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++
 5 files changed, 162 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..c4128a0 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -44,6 +45,7 @@
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
 #include "rte_distributor_v20.h"
+#include "rte_distributor_v1705.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
@@ -57,7 +59,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
@@ -102,9 +104,14 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count),
+		rte_distributor_request_pkt_v1705);
 
 int
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -138,9 +145,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts),
+		rte_distributor_poll_pkt_v1705);
 
 int
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -168,9 +179,14 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count),
+		rte_distributor_get_pkt_v1705);
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -197,6 +213,10 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num),
+		rte_distributor_return_pkt_v1705);
 
 /**** APIs called on distributor core ***/
 
@@ -342,7 +362,7 @@ release(struct rte_distributor *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -476,10 +496,14 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs),
+		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -504,6 +528,10 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs),
+		rte_distributor_returned_pkts_v1705);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -525,7 +553,7 @@ total_outstanding(const struct rte_distributor *d)
  * queued up.
  */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
@@ -549,10 +577,13 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
+		rte_distributor_returned_pkts_v1705);
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
@@ -565,10 +596,13 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
+		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
 struct rte_distributor *
-rte_distributor_create(const char *name,
+rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
@@ -638,3 +672,8 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, _v1705, 17.05);
+MAP_STATIC_SYMBOL(struct rte_distributor *rte_distributor_create(
+		const char *name, unsigned int socket_id,
+		unsigned int num_workers, unsigned int alg_type),
+		rte_distributor_create_v1705);
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..81b2691
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,89 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V1705_H_
+#define _RTE_DISTRIB_V1705_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files
  2017-03-06  9:10  2%         ` [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements David Hunt
@ 2017-03-06  9:10  1%           ` David Hunt
  2017-03-15  6:19  2%             ` [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements David Hunt
  2017-03-06  9:10  2%           ` [dpdk-dev] [PATCH v9 09/18] " David Hunt
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements
  2017-03-01  7:47  1%       ` [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-06  9:10  2%         ` David Hunt
  2017-03-06  9:10  1%           ` [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-06  9:10  2%           ` [dpdk-dev] [PATCH v9 09/18] " David Hunt
  0 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v9 changes:
   * fixed symbol versioning so it will compile on CentOS and RedHat

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new burst oriented distributor structs
[04/18] lib: add new distributor code
[05/18] lib: add SIMD flow matching to distributor
[06/18] test/distributor: extra params for autotests
[07/18] lib: switch distributor over to new API
[08/18] lib: make v20 header file private
[09/18] lib: add symbol versioning to distributor
[10/18] test: test single and burst distributor API
[11/18] test: add perf test for distributor burst mode
[12/18] examples/distributor: allow for extra stats
[13/18] sample: distributor: wait for ports to come up
[14/18] examples/distributor: give distributor a core
[15/18] examples/distributor: limit number of Tx rings
[16/18] examples/distributor: give Rx thread a core
[17/18] doc: distributor library changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  @ 2017-03-04  1:10  1% ` Cristian Dumitrescu
    1 sibling, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2017-03-04  1:10 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

This patch introduces the generic ethdev API for the traffic manager
capability, which includes: hierarchical scheduling, traffic shaping,
congestion management, packet marking.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow approach)
- Capability query API per port, per hierarchy level and per hierarchy node
- Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ),
  Weighted Round Robin (WRR)
- Traffic shaping: single/dual rate, private (per node) and shared (by multiple
  nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Changes in v3:
- Implemented feedback from Jerin [5]
- Changed naming convention: scheddev -> tm
- Improvements on the capability API:
	- Specification of marking capabilities per color
	- WFQ/WRR groups: sp_n_children_max -> wfq_wrr_n_children_per_group_max,
	  added wfq_wrr_n_groups_max, improved description of both, improved
	  description of wfq_wrr_weight_max
	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent update
- Enforced/documented restrictions for root node (node_add() and update())
- Enforced/documented shaper profile restrictions on PIR: PIR != 0, PIR >= CIR
- Turned repetitive code in rte_tm.c into macro
- Removed dependency on rte_red.h file (added RED params to rte_tm.h)
- Color: removed "e_" from color names enum
- Fixed small Doxygen style issues

Changes in v2:
- Implemented feedback from Hemant [4]
- Improvements on the capability API
	- Added capability API for hierarchy level
	- Merged stats capability into the capability API
	- Added dynamic updates
	- Added non-leaf/leaf union to the node capability structure
	- Renamed sp_priority_min to sp_n_priorities_max, added clarifications
	- Fixed description for sp_n_children_max
- Clarified and enforced rule on node ID range for leaf and non-leaf nodes
	- Added API functions to get node type (i.e. leaf/non-leaf):
	  get_leaf_nodes(), node_type_get()
- Added clarification for the root node: its creation, its parent, its role
	- Macro NODE_ID_NULL as root node's parent
	- Description of the node_add() and node_parent_update() API functions
- Added clarification for the first time add vs. subsequent updates rule
	- Cleaned up the description for the node_add() function
- Statistics API improvements
	- Merged stats capability into the capability API
	- Added API function node_stats_update()
	- Added more stats per packet color
- Added more error types
- Fixed small Doxygen style issues

Changes in v1 (since RFC [1]):
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below, hopefully
  nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated object
      IDs. IMO the choice to have application-generated object IDs adds marginal
      complexity to the driver (search ID function required), but it provides
      huge simplification for the application. The app does not need to worry
      about building & managing tree-like structure for storing driver-generated
      object IDs, the app can use its own convention for node IDs depending on
      the specific hierarchy that it needs. Trivial example: identify all
      level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
      on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
      310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
      112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
      the other related simplification that was implemented: leaf nodes now have
      predefined IDs that are the same with their Ethernet TX queue ID (
      therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the shaper
  profile as part of node API (no shaper ID needed for private shapers), while
  the shared shapers are configured outside of the node API using shaper profile
  and communicated to the node using shared shaper ID. So there is no
  configuration overhead for shared shapers if the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet TX
  queue ID (therefore no translation is required for leaf nodes). This is also
  used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause (same
  as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
  based on IP packet bytes)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
[4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
[5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_tm.c              |  436 ++++++++++
 lib/librte_ether/rte_tm.h              | 1466 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  365 ++++++++
 6 files changed, 2305 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5030c1c..7893ac6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -247,6 +247,10 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+Traffic Manager API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+F: lib/librte_ether/rte_tm*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 1d095a9..82faa67 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_tm.c
 
 #
 # Export include files
@@ -54,6 +55,8 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_tm.h
+SYMLINK-y-include += rte_tm_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 637317c..42ad3fb 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -159,5 +159,35 @@ DPDK_17.05 {
 	global:
 
 	rte_eth_dev_capability_ops_get;
+	rte_tm_get_leaf_nodes;
+	rte_tm_node_type_get;
+	rte_tm_capabilities_get;
+	rte_tm_level_capabilities_get;
+	rte_tm_node_capabilities_get;
+	rte_tm_wred_profile_add;
+	rte_tm_wred_profile_delete;
+	rte_tm_shared_wred_context_add_update;
+	rte_tm_shared_wred_context_delete;
+	rte_tm_shaper_profile_add;
+	rte_tm_shaper_profile_delete;
+	rte_tm_shared_shaper_add_update;
+	rte_tm_shared_shaper_delete;
+	rte_tm_node_add;
+	rte_tm_node_delete;
+	rte_tm_node_suspend;
+	rte_tm_node_resume;
+	rte_tm_hierarchy_set;
+	rte_tm_node_parent_update;
+	rte_tm_node_shaper_update;
+	rte_tm_node_shared_shaper_update;
+	rte_tm_node_stats_update;
+	rte_tm_node_scheduling_mode_update;
+	rte_tm_node_cman_update;
+	rte_tm_node_wred_context_update;
+	rte_tm_node_shared_wred_context_update;
+	rte_tm_node_stats_read;
+	rte_tm_mark_vlan_dei;
+	rte_tm_mark_ip_ecn;
+	rte_tm_mark_ip_dscp;
 
 } DPDK_17.02;
diff --git a/lib/librte_ether/rte_tm.c b/lib/librte_ether/rte_tm.c
new file mode 100644
index 0000000..f8bd491
--- /dev/null
+++ b/lib/librte_ether/rte_tm.c
@@ -0,0 +1,436 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm_driver.h"
+#include "rte_tm.h"
+
+/* Get generic traffic manager operations structure from a port. */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_tm_error_set(error,
+			ENODEV,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->cap_ops_get == NULL) ||
+		(dev->dev_ops->cap_ops_get(dev, RTE_ETH_CAPABILITY_TM,
+		&ops) != 0) || (ops == NULL)) {
+		rte_tm_error_set(error,
+			ENOSYS,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+#define RTE_TM_FUNC(port_id, func)				\
+({								\
+	const struct rte_tm_ops *ops =			\
+		rte_tm_ops_get(port_id, error);		\
+	if (ops == NULL)						\
+		return -rte_errno;				\
+								\
+	if (ops->func == NULL)					\
+		return -rte_tm_error_set(error,		\
+			ENOSYS,					\
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
+			NULL,					\
+			rte_strerror(ENOSYS));			\
+								\
+	ops->func;						\
+})
+
+/* Get number of leaf nodes */
+int
+rte_tm_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops =
+		rte_tm_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (n_leaf_nodes == NULL) {
+		rte_tm_error_set(error,
+			EINVAL,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(EINVAL));
+		return -rte_errno;
+	}
+
+	*n_leaf_nodes = dev->data->nb_tx_queues;
+	return 0;
+}
+
+/* Check node ID type (leaf or non-leaf) */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_type_get)(dev,
+		node_id, is_leaf, error);
+}
+
+/* Get capabilities */
+int rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, capabilities_get)(dev,
+		cap, error);
+}
+
+/* Get level capabilities */
+int rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, level_capabilities_get)(dev,
+		level_id, cap, error);
+}
+
+/* Get node capabilities */
+int rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_capabilities_get)(dev,
+		node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_add)(dev,
+		wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_delete)(dev,
+		wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_add_update)(dev,
+		shared_wred_context_id, wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_delete)(dev,
+		shared_wred_context_id, error);
+}
+
+/* Add shaper profile */
+int rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_add)(dev,
+		shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_delete)(dev,
+		shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_add_update)(dev,
+		shared_shaper_id, shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_delete)(dev,
+		shared_shaper_id, error);
+}
+
+/* Add node to port traffic manager hierarchy */
+int rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_add)(dev,
+		node_id, parent_node_id, priority, weight, params, error);
+}
+
+/* Delete node from traffic manager hierarchy */
+int rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_delete)(dev,
+		node_id, error);
+}
+
+/* Suspend node */
+int rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_suspend)(dev,
+		node_id, error);
+}
+
+/* Resume node */
+int rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_resume)(dev,
+		node_id, error);
+}
+
+/* Set the initial port traffic manager hierarchy */
+int rte_tm_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, hierarchy_set)(dev,
+		clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_parent_update)(dev,
+		node_id, parent_node_id, priority, weight, error);
+}
+
+/* Update node private shaper */
+int rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shaper_update)(dev,
+		node_id, shaper_profile_id, error);
+}
+
+/* Update node shared shapers */
+int rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_shaper_update)(dev,
+		node_id, shared_shaper_id, add, error);
+}
+
+/* Update node stats */
+int rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_update)(dev,
+		node_id, stats_mask, error);
+}
+
+/* Update scheduling mode */
+int rte_tm_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_scheduling_mode_update)(dev,
+		node_id, scheduling_mode_per_priority, n_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_cman_update)(dev,
+		node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wred_context_update)(dev,
+		node_id, wred_profile_id, error);
+}
+
+/* Update node shared WRED context */
+int rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_wred_context_update)(dev,
+		node_id, shared_wred_context_id, add, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_read)(dev,
+		node_id, stats, stats_mask, clear, error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_vlan_dei)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_ecn)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_dscp)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
diff --git a/lib/librte_ether/rte_tm.h b/lib/librte_ether/rte_tm.h
new file mode 100644
index 0000000..64ef5dd
--- /dev/null
+++ b/lib/librte_ether/rte_tm.h
@@ -0,0 +1,1466 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_H__
+#define __INCLUDE_RTE_TM_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API
+ *
+ * This interface provides the ability to configure the traffic manager in a
+ * generic way. It includes features such as: hierarchical scheduling,
+ * traffic shaping, congestion management, packet marking, etc.
+ */
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Ethernet framing overhead
+ *
+ * Overhead fields per Ethernet frame:
+ * 1. Preamble:                                            7 bytes;
+ * 2. Start of Frame Delimiter (SFD):                      1 byte;
+ * 3. Inter-Frame Gap (IFG):                              12 bytes.
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
+
+/**
+ * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
+ * is generated and added at the end of the Ethernet frame on TX side without
+ * any SW intervention.
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**< Invalid WRED profile ID */
+#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**< Invalid shaper profile ID */
+#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**< Node ID for the parent of the root node */
+#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
+
+/**
+ * Color
+ */
+enum rte_tm_color {
+	RTE_TM_GREEN = 0, /**< Green */
+	RTE_TM_YELLOW, /**< Yellow */
+	RTE_TM_RED, /**< Red */
+	RTE_TM_COLORS /**< Number of colors */
+};
+
+/**
+ * Node statistics counter type
+ */
+enum rte_tm_stats_type {
+	/**< Number of packets scheduled from current node. */
+	RTE_TM_STATS_N_PKTS = 1 << 0,
+
+	/**< Number of bytes scheduled from current node. */
+	RTE_TM_STATS_N_BYTES = 1 << 1,
+
+	/**< Number of green packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
+
+	/**< Number of yellow packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
+
+	/**< Number of red packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
+
+	/**< Number of green bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
+
+	/**< Number of yellow bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
+
+	/**< Number of red bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
+
+	/**< Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
+
+	/**< Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
+};
+
+/**
+ * Node statistics counters
+ */
+struct rte_tm_node_stats {
+	/**< Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/**< Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/**< Statistics counters for leaf nodes only. */
+	struct {
+		/**< Number of packets dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_pkts_dropped[RTE_TM_COLORS];
+
+		/**< Number of bytes dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_bytes_dropped[RTE_TM_COLORS];
+
+		/**< Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/**< Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Traffic manager dynamic updates
+ */
+enum rte_tm_dynamic_update_type {
+	/**< Dynamic parent node update. The new parent node is located on same
+	 * hierarchy level as the former parent node. Consequently, the node
+	 * whose parent is changed preserves its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
+
+	/**< Dynamic parent node update. The new parent node is located on
+	 * different hierarchy level than the former parent node. Consequently,
+	 * the node whose parent is changed also changes its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
+
+	/**< Dynamic node add/delete. */
+	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
+
+	/**< Suspend/resume nodes. */
+	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
+
+	/**< Dynamic switch between WFQ and WRR per node SP priority level. */
+	RTE_TM_UPDATE_NODE_SCHEDULING_MODE = 1 << 4,
+
+	/**< Dynamic update of the set of enabled stats counter types. */
+	RTE_TM_UPDATE_NODE_STATS = 1 << 5,
+
+	/**< Dynamic update of congestion management mode for leaf nodes. */
+	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
+};
+
+/**
+ * Traffic manager node capabilities
+ */
+struct rte_tm_node_capabilities {
+	/**< Private shaper support. */
+	int shaper_private_supported;
+
+	/**< Dual rate shaping support for private shaper. Valid only when
+	 * private shaper is supported.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/**< Minimum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of supported shared shapers. The value of zero
+	 * indicates that shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Mask of supported statistics counter types. */
+	uint64_t stats_mask;
+
+	union {
+		/**< Items valid only for non-leaf nodes. */
+		struct {
+			/**< Maximum number of children nodes. */
+			uint32_t n_children_max;
+
+			/**< Maximum number of supported priority levels. The
+			 * value of zero is invalid. The value of 1 indicates
+			 * that only priority 0 is supported, which essentially
+			 * means that Strict Priority (SP) algorithm is not
+			 * supported.
+			 */
+			uint32_t sp_n_priorities_max;
+
+			/**< Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size
+			 * of the WFQ/WRR sibling node group. The value of zero
+			 * is invalid. The value of 1 indicates that WFQ/WRR
+			 * algorithms are not supported. The maximum value is
+			 * *n_children_max*.
+			 */
+			uint32_t wfq_wrr_n_children_per_group_max;
+
+			/**< Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ/WRR sibling node groups that
+			 * have two or more members. The value of zero states
+			 * that WFQ/WRR algorithms are not supported. The value
+			 * of 1 indicates that (*sp_n_priorities_max* - 1)
+			 * priority levels have at most one child node, so
+			 * there can be only one priority level with two or
+			 * more sibling nodes making up a WFQ/WRR group. The
+			 * maximum value is: min(floor(*n_children_max* / 2),
+			 * *sp_n_priorities_max*).
+			 */
+			uint32_t wfq_wrr_n_groups_max;
+
+			/**< WFQ algorithm support. */
+			int wfq_supported;
+
+			/**< WRR algorithm support. */
+			int wrr_supported;
+
+			/**< Maximum WFQ/WRR weight. The value of 1 indicates
+			 * that all sibling nodes with same priority have the
+			 * same WFQ/WRR weight, so WFQ/WRR is reduced to FQ/RR.
+			 */
+			uint32_t wfq_wrr_weight_max;
+		} nonleaf;
+
+		/**< Items valid only for leaf nodes. */
+		struct {
+			/**< Head drop algorithm support. */
+			int cman_head_drop_supported;
+
+			/**< Private WRED context support. */
+			int cman_wred_context_private_supported;
+
+			/**< Maximum number of shared WRED contexts supported.
+			 * The value of zero indicates that shared WRED
+			 * contexts are not supported.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+		} leaf;
+	};
+};
+
+/**
+ * Traffic manager level capabilities
+ */
+struct rte_tm_level_capabilities {
+	/**< Maximum number of nodes for the current hierarchy level. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of non-leaf nodes for the current hierarchy level.
+	 * The value of 0 indicates that current level only supports leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_nonleaf_max;
+
+	/**< Maximum number of leaf nodes for the current hierarchy level. The
+	 * value of 0 indicates that current level only supports non-leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_leaf_max;
+
+	/**< Summary of node-level capabilities across all the non-leaf nodes
+	 * of the current hierarchy level. Valid only when
+	 * *n_nodes_nonleaf_max* is greater than 0.
+	 */
+	struct rte_tm_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all the leaf nodes of
+	 * the current hierarchy level. Valid only when *n_nodes_leaf_max* is
+	 * greater than 0.
+	 */
+	struct rte_tm_node_capabilities leaf;
+};
+
+/**
+ * Traffic manager capabilities
+ */
+struct rte_tm_capabilities {
+	/**< Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/**< Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resource between private and
+	 * shared shapers, it is typically equal to the sum between
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/**< Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have the private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/**< Maximum number of shared shapers. The value of zero indicates that
+	 * shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Maximum number of nodes that can share the same shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint32_t shaper_shared_n_nodes_max;
+
+	/**< Maximum number of shared shapers that can be configured with dual
+	 * rate shaping. The value of zero indicates that dual rate shaping
+	 * support is not available for shared shapers.
+	 */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/**< Minimum committed/peak rate (bytes per second) for shared shapers.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_max;
+
+	/**< Minimum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_min;
+
+	/**< Maximum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_max;
+
+	/**< Maximum number of WRED contexts. */
+	uint32_t cman_wred_context_n_max;
+
+	/**< Maximum number of private WRED contexts. Indicates the maximum
+	 * number of leaf nodes that can concurrently have the private WRED
+	 * context enabled.
+	 */
+	uint32_t cman_wred_context_private_n_max;
+
+	/**< Maximum number of shared WRED contexts. The value of zero
+	 * indicates that shared WRED contexts are not supported.
+	 */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/**< Maximum number of leaf nodes that can share the same WRED context.
+	 * Only valid when shared WRED contexts are supported.
+	 */
+	uint32_t cman_wred_context_shared_n_nodes_max;
+
+	/**< Support for VLAN DEI packet marking (per color). */
+	int mark_vlan_dei_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 ECN marking of TCP packets (per color). */
+	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 ECN marking of SCTP packets (per color). */
+	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 DSCP packet marking (per color). */
+	int mark_ip_dscp_supported[RTE_TM_COLORS];
+
+	/**< Set of supported dynamic update operations
+	 * (see enum rte_tm_dynamic_update_type).
+	 */
+	uint64_t dynamic_update_mask;
+
+	/**< Summary of node-level capabilities across all non-leaf nodes. */
+	struct rte_tm_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all leaf nodes. */
+	struct rte_tm_node_capabilities leaf;
+};
+
+/**
+ * Congestion management (CMAN) mode
+ *
+ * This is used for controlling the admission of packets into a packet queue or
+ * group of packet queues on congestion. On request of writing a new packet
+ * into the current queue while the queue is full, the *tail drop* algorithm
+ * drops the new packet while leaving the queue unmodified, as opposed to *head
+ * drop* algorithm, which drops the packet at the head of the queue (the oldest
+ * packet waiting in the queue) and admits the new packet at the tail of the
+ * queue.
+ *
+ * The *Random Early Detection (RED)* algorithm works by proactively dropping
+ * more and more input packets as the queue occupancy builds up. When the queue
+ * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+ * RED* algorithm uses a separate set of RED thresholds for each packet color.
+ */
+enum rte_tm_cman_mode {
+	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+ * Random Early Detection (RED) profile
+ */
+struct rte_tm_red_params {
+	/**< Minimum queue threshold */
+	uint16_t min_th;
+
+	/**< Maximum queue threshold */
+	uint16_t max_th;
+
+	/**< Inverse of packet marking probability maximum value (maxp), i.e.
+	 * maxp_inv = 1 / maxp
+	 */
+	uint16_t maxp_inv;
+
+	/**< Negated log2 of queue weight (wq), i.e. wq = 1 / (2 ^ wq_log2) */
+	uint16_t wq_log2;
+};
+
+/**
+ * Weighted RED (WRED) profile
+ *
+ * Multiple WRED contexts can share the same WRED profile. Each leaf node with
+ * WRED enabled as its congestion management mode has zero or one private WRED
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * WRED contexts (multiple leaf nodes use the same WRED context). A private
+ * WRED context is used to perform congestion management for a single leaf
+ * node, while a shared WRED context is used to perform congestion management
+ * for a group of leaf nodes.
+ */
+struct rte_tm_wred_params {
+	/**< One set of RED parameters per packet color */
+	struct rte_tm_red_params red_params[RTE_TM_COLORS];
+};
+
+/**
+ * Token bucket
+ */
+struct rte_tm_token_bucket {
+	/**< Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/**< Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+ * Shaper (rate limiter) profile
+ *
+ * Multiple shaper instances can share the same shaper profile. Each node has
+ * zero or one private shaper (only one node using it) and/or zero, one or
+ * several shared shapers (multiple nodes use the same shaper instance).
+ * A private shaper is used to perform traffic shaping for a single node, while
+ * a shared shaper is used to perform traffic shaping for a group of nodes.
+ *
+ * Single rate shapers use a single token bucket. A single rate shaper can be
+ * configured by setting the rate of the committed bucket to zero, which
+ * effectively disables this bucket. The peak bucket is used to limit the rate
+ * and the burst size for the current shaper.
+ *
+ * Dual rate shapers use both the committed and the peak token buckets. The
+ * rate of the peak bucket has to be bigger than zero, as well as greater than
+ * or equal to the rate of the committed bucket.
+ */
+struct rte_tm_shaper_params {
+	/**< Committed token bucket */
+	struct rte_tm_token_bucket committed;
+
+	/**< Peak token bucket */
+	struct rte_tm_token_bucket peak;
+
+	/**< Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_TM_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+ * Node parameters
+ *
+ * Each hierarchy node has multiple inputs (children nodes of the current
+ * parent node) and a single output (which is input to its parent node). The
+ * current node arbitrates its inputs using Strict Priority (SP), Weighted Fair
+ * Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to schedule input
+ * packets on its output while observing its shaping (rate limiting)
+ * constraints.
+ *
+ * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
+ * approximations of the ideal of WFQ and are assimilated to WFQ, although an
+ * associated implementation-dependent trade-off on accuracy, performance and
+ * resource usage might exist.
+ *
+ * Children nodes with different priorities are scheduled using the SP
+ * algorithm, based on their priority, with zero (0) as the highest priority.
+ * Children with same priority are scheduled using the WFQ or WRR algorithm,
+ * based on their weight, which is relative to the sum of the weights of all
+ * siblings with same priority, with one (1) as the lowest weight.
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
+ * where N is the number of TX queues configured for the current Ethernet port.
+ * The non-leaf nodes have their IDs generated by the application.
+ */
+struct rte_tm_node_params {
+	/**< Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/**< User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	/**< Mask of statistics counter types to be enabled for this node. This
+	 * needs to be a subset of the statistics counter types available for
+	 * the current node. Any statistics counter type not included in this
+	 * set is to be disabled for the current node.
+	 */
+	uint64_t stats_mask;
+
+	union {
+		/**< Parameters only valid for non-leaf nodes. */
+		struct {
+			/**< For each priority, indicates whether the children
+			 * nodes sharing the same priority are to be scheduled
+			 * by WFQ or by WRR. When NULL, it indicates that WFQ
+			 * is to be used for all priorities. When non-NULL, it
+			 * points to a pre-allocated array of *n_priority*
+			 * elements, with a non-zero value element indicating
+			 * WFQ and a zero value element for WRR.
+			 */
+			int *scheduling_mode_per_priority;
+
+			/**< Number of priorities. */
+			uint32_t n_priorities;
+		} nonleaf;
+
+		/**< Parameters only valid for leaf nodes. */
+		struct {
+			/**< Congestion management mode */
+			enum rte_tm_cman_mode cman;
+
+			/**< WRED parameters (valid when *cman* is WRED). */
+			struct {
+				/**< WRED profile for private WRED context. */
+				uint32_t wred_profile_id;
+
+				/**< User allocated array of shared WRED
+				 * context IDs. The absence of a private WRED
+				 * context for current leaf node is indicated
+				 * by value RTE_TM_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/**< Number of shared WRED context IDs in the
+				 * *shared_wred_context_id* array.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_tm_error::cause.
+ */
+enum rte_tm_error_type {
+	RTE_TM_ERROR_TYPE_NONE, /**< No error. */
+	RTE_TM_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_TM_ERROR_TYPE_CAPABILITIES,
+	RTE_TM_ERROR_TYPE_LEVEL_ID,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PRIORITY,
+	RTE_TM_ERROR_TYPE_NODE_WEIGHT,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_STATS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_PRIORITIES,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
+	RTE_TM_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_tm_error {
+	enum rte_tm_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Traffic manager get number of leaf nodes
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the set of leaf nodes is predefined, their number is always equal
+ * to N (where N is the number of TX queues configured for the current port)
+ * and their IDs are 0 .. (N-1).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param n_leaf_nodes
+ *   Number of leaf nodes for the current port.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node type (i.e. leaf or non-leaf) get
+ *
+ * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
+ * the number of TX queues of the current Ethernet port. The non-leaf nodes
+ * have their IDs generated by the application outside of the above range,
+ * which is reserved for leaf nodes.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID value. Needs to be valid.
+ * @param is_leaf
+ *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Traffic manager capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager level capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param level_id
+ *   The hierarchy level identifier. The value of 0 identifies the level of the
+ *   root node.
+ * @param cap
+ *   Traffic manager level capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param cap
+ *   Traffic manager node capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is
+ * currently at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several hierarchy leaf nodes
+ * configured to use WRED as the congestion management mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. hierarchy leaf node) of this shared WRED
+ * context.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is
+ * no longer using the shaper profile previously assigned to it and is updated
+ * to use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. hierarchy node) of this shared shaper.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node add
+ *
+ * Create new node and connect it as child of an existing node. The new node is
+ * further identified by *node_id*, which needs to be unused by any of the
+ * existing nodes. The parent node is identified by *parent_node_id*, which
+ * needs to be the valid ID of an existing non-leaf node. The parent node is
+ * going to use the provided SP *priority* and WFQ/WRR *weight* to schedule its
+ * new child node.
+ *
+ * This function has to be called for both leaf and non-leaf nodes. In the case
+ * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
+ * the number of configured TX queues of the current port), the leaf node is
+ * configured rather than created (as the set of leaf nodes is predefined) and
+ * it is also connected as child of an existing node.
+ *
+ * The first node that is added becomes the root node and all the nodes that
+ * are subsequently added have to be added as descendants of the root node. The
+ * parent of the root node has to be specified as RTE_TM_NODE_ID_NULL and there
+ * can only be one node with this parent ID (i.e. the root node). Further
+ * restrictions for root node: needs to be non-leaf, its private shaper profile
+ * needs to be valid and single rate, cannot use any shared shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has
+ * at least one user (i.e. child node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node suspend
+ *
+ * Suspend an existing node.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node resume
+ *
+ * Resume an existing node that was previously suspended.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager hierarchy set
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the start-up hierarchy.
+ *
+ * This function fails when the currently configured hierarchy is not supported
+ * by the Ethernet port, in which case the user can abort or try out another
+ * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can
+ * be build from scratch (when *clear_on_fail* is enabled) or by modifying the
+ * existing hierarchy configuration (when *clear_on_fail* is disabled).
+ *
+ * Note that, even when the configured hierarchy is supported (so this function
+ * is successful), the Ethernet port start might still fail due to e.g. not
+ * enough memory being available in the system, etc.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node parent update
+ *
+ * Restriction for root node: its parent cannot be changed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the
+ *   WFQ/WRR algorithm running on the parent of the current node for scheduling
+ *   this child node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private shaper update
+ *
+ * Restriction for root node: its private shaper profile needs to be valid and
+ * single rate.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared shapers update
+ *
+ * Restriction for root node: cannot use any shared rate shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node enabled statistics counters update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats_mask
+ *   Mask of statistics counter types to be enabled for the current node. This
+ *   needs to be a subset of the statistics counter types available for the
+ *   current node. Any statistics counter type not included in this set is to
+ *   be disabled for the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node scheduling mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param scheduling_mode_per_priority
+ *   For each priority, indicates whether the children nodes sharing the same
+ *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates
+ *   that WFQ is to be used for all priorities. When non-NULL, it points to a
+ *   pre-allocated array of *n_priority* elements, with a non-zero value
+ *   element indicating WFQ and a zero value element for WRR.
+ * @param n_priorities
+ *   Number of priorities.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node congestion management mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param cman
+ *   Congestion management mode.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with
+ *   the latter disabling the private WRED context of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared WRED context to current node or
+ *   to zero to delete this shared WRED context from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node statistics counters read
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param stats_mask
+ *   When non-NULL, it contains the mask of statistics counter types that are
+ *   currently enabled for this node, indicating which of the counters
+ *   retrieved with the *stats* structure are valid.
+ * @param clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read,
+ *   otherwise the statistics counters are left untouched.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
+ * Notification (ECN) field (2 bits). The DSCP field is typically used to
+ * encode the traffic class and/or drop priority (RFC 2597), while the ECN
+ * field is used by RFC 3168 to implement a congestion notification mechanism
+ * to be leveraged by transport layer protocols such as TCP and SCTP that have
+ * congestion control mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10
+ * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
+ * that congestion is experienced). The destination endpoint can use the
+ * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
+ * source endpoint, which acknowledges it back to the destination endpoint with
+ * the Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ *                       Class 1    Class 2    Class 3    Class 4
+ *                     +----------+----------+----------+----------+
+ *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
+ *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
+ *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
+ *                     +----------+----------+----------+----------+
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2,
+ * as well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_H__ */
diff --git a/lib/librte_ether/rte_tm_driver.h b/lib/librte_ether/rte_tm_driver.h
new file mode 100644
index 0000000..b3c9c15
--- /dev/null
+++ b/lib/librte_ether/rte_tm_driver.h
@@ -0,0 +1,365 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_DRIVER_H__
+#define __INCLUDE_RTE_TM_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int (*rte_tm_node_type_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node type get */
+
+typedef int (*rte_tm_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager capabilities get */
+
+typedef int (*rte_tm_level_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager level capabilities get */
+
+typedef int (*rte_tm_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node capabilities get */
+
+typedef int (*rte_tm_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager WRED profile add */
+
+typedef int (*rte_tm_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager WRED profile delete */
+
+typedef int (*rte_tm_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared WRED context add */
+
+typedef int (*rte_tm_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared WRED context delete */
+
+typedef int (*rte_tm_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shaper profile add */
+
+typedef int (*rte_tm_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shaper profile delete */
+
+typedef int (*rte_tm_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared shaper add/update */
+
+typedef int (*rte_tm_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared shaper delete */
+
+typedef int (*rte_tm_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node add */
+
+typedef int (*rte_tm_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node delete */
+
+typedef int (*rte_tm_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node suspend */
+
+typedef int (*rte_tm_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node resume */
+
+typedef int (*rte_tm_hierarchy_set_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager hierarchy set */
+
+typedef int (*rte_tm_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node parent update */
+
+typedef int (*rte_tm_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node shaper update */
+
+typedef int (*rte_tm_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node shaper update */
+
+typedef int (*rte_tm_node_stats_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node stats update */
+
+typedef int (*rte_tm_node_scheduling_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node scheduling mode update */
+
+typedef int (*rte_tm_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node congestion management mode update */
+
+typedef int (*rte_tm_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WRED context update */
+
+typedef int (*rte_tm_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WRED context update */
+
+typedef int (*rte_tm_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager read stats counters for specific node */
+
+typedef int (*rte_tm_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - VLAN DEI */
+
+typedef int (*rte_tm_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - IPv4/IPv6 ECN */
+
+typedef int (*rte_tm_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - IPv4/IPv6 DSCP */
+
+struct rte_tm_ops {
+	/** Traffic manager node type get */
+	rte_tm_node_type_get_t node_type_get;
+
+	/** Traffic manager capabilities_get */
+	rte_tm_capabilities_get_t capabilities_get;
+	/** Traffic manager level capabilities_get */
+	rte_tm_level_capabilities_get_t level_capabilities_get;
+	/** Traffic manager node capabilities get */
+	rte_tm_node_capabilities_get_t node_capabilities_get;
+
+	/** Traffic manager WRED profile add */
+	rte_tm_wred_profile_add_t wred_profile_add;
+	/** Traffic manager WRED profile delete */
+	rte_tm_wred_profile_delete_t wred_profile_delete;
+	/** Traffic manager shared WRED context add/update */
+	rte_tm_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Traffic manager shared WRED context delete */
+	rte_tm_shared_wred_context_delete_t
+		shared_wred_context_delete;
+
+	/** Traffic manager shaper profile add */
+	rte_tm_shaper_profile_add_t shaper_profile_add;
+	/** Traffic manager shaper profile delete */
+	rte_tm_shaper_profile_delete_t shaper_profile_delete;
+	/** Traffic manager shared shaper add/update */
+	rte_tm_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Traffic manager shared shaper delete */
+	rte_tm_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Traffic manager node add */
+	rte_tm_node_add_t node_add;
+	/** Traffic manager node delete */
+	rte_tm_node_delete_t node_delete;
+	/** Traffic manager node suspend */
+	rte_tm_node_suspend_t node_suspend;
+	/** Traffic manager node resume */
+	rte_tm_node_resume_t node_resume;
+	/** Traffic manager hierarchy set */
+	rte_tm_hierarchy_set_t hierarchy_set;
+
+	/** Traffic manager node parent update */
+	rte_tm_node_parent_update_t node_parent_update;
+	/** Traffic manager node shaper update */
+	rte_tm_node_shaper_update_t node_shaper_update;
+	/** Traffic manager node shared shaper update */
+	rte_tm_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Traffic manager node stats update */
+	rte_tm_node_stats_update_t node_stats_update;
+	/** Traffic manager node scheduling mode update */
+	rte_tm_node_scheduling_mode_update_t node_scheduling_mode_update;
+	/** Traffic manager node congestion management mode update */
+	rte_tm_node_cman_update_t node_cman_update;
+	/** Traffic manager node WRED context update */
+	rte_tm_node_wred_context_update_t node_wred_context_update;
+	/** Traffic manager node shared WRED context update */
+	rte_tm_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+	/** Traffic manager read statistics counters for current node */
+	rte_tm_node_stats_read_t node_stats_read;
+
+	/** Traffic manager packet marking - VLAN DEI */
+	rte_tm_mark_vlan_dei_t mark_vlan_dei;
+	/** Traffic manager packet marking - IPv4/IPv6 ECN */
+	rte_tm_mark_ip_ecn_t mark_ip_ecn;
+	/** Traffic manager packet marking - IPv4/IPv6 DSCP */
+	rte_tm_mark_ip_dscp_t mark_ip_dscp;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param error
+ *   Pointer to error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error type.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_tm_error_set(struct rte_tm_error *error,
+		   int code,
+		   enum rte_tm_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_tm_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic traffic manager operations structure from a port
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param error
+ *   Error details
+ *
+ * @return
+ *   The traffic manager operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_DRIVER_H__ */
-- 
2.5.0

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH 1/5] cfgfile: configurable comment character
  2017-03-03 12:17  0%             ` Legacy, Allain
@ 2017-03-03 13:10  0%               ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-03 13:10 UTC (permalink / raw)
  To: Legacy, Allain
  Cc: Dumitrescu, Cristian, Yuanhan Liu, dev, Jolliffe, Ian (Wind River)

On Fri, Mar 03, 2017 at 12:17:47PM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
>  > Also, for a single parameter like a comment char, I don't think we need to go
> > creating a separate structure. The current flags parameter is unused, so just
> > replace it with the comment char one. With using the structure, any additions
> In my earlier patch, I proprose using a "global" flag to indicate that an unnamed section exists so the flags argument would still be needed.

Ok, good point, I missed that.

> 
> > to the struct would be an ABI change anyway, so I see little point in using it,
> > unless we already know of additional parameters we will be adding in future.
> We already have 2 parameters in mind - flags, and comment char.  I don't feel that combining the two in a single enum is particularly good since it would be better to allow the application the freedom to set an arbitrary comment character and not be locked in to any static list that we choose (see my previous email response).
>
I also agree on not using enums and not limiting comment chars.

I don't particularly like config structs, and would prefer individual
flags and comment char parameters - given it's not a huge list of
params, just 2 - but no big deal either way.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/5] cfgfile: configurable comment character
  2017-03-03 12:10  4%           ` Bruce Richardson
@ 2017-03-03 12:17  0%             ` Legacy, Allain
  2017-03-03 13:10  0%               ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Legacy, Allain @ 2017-03-03 12:17 UTC (permalink / raw)
  To: RICHARDSON, BRUCE
  Cc: DUMITRESCU, CRISTIAN FLORIN, Yuanhan Liu, dev, Jolliffe, Ian

> -----Original Message-----
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
 > Also, for a single parameter like a comment char, I don't think we need to go
> creating a separate structure. The current flags parameter is unused, so just
> replace it with the comment char one. With using the structure, any additions
In my earlier patch, I proprose using a "global" flag to indicate that an unnamed section exists so the flags argument would still be needed.  

> to the struct would be an ABI change anyway, so I see little point in using it,
> unless we already know of additional parameters we will be adding in future.
We already have 2 parameters in mind - flags, and comment char.  I don't feel that combining the two in a single enum is particularly good since it would be better to allow the application the freedom to set an arbitrary comment character and not be locked in to any static list that we choose (see my previous email response). 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/5] cfgfile: configurable comment character
  @ 2017-03-03 12:10  4%           ` Bruce Richardson
  2017-03-03 12:17  0%             ` Legacy, Allain
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-03-03 12:10 UTC (permalink / raw)
  To: Legacy, Allain
  Cc: Dumitrescu, Cristian, Yuanhan Liu, dev, Jolliffe, Ian (Wind River)

On Fri, Mar 03, 2017 at 11:31:11AM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Dumitrescu, Cristian [mailto:cristian.dumitrescu@intel.com]
> > Possible options that I see:
> > 1. Add a new parameters argument to the load functions (e.g. struct
> > cfgfile_params *p), whit the comment char as one (and currently only) field
> > of this struct. Drawbacks: API change that might have to be announced one
> > release before the actual API change.
> 
> I would prefer this option as it provides more flexibility.  We can leave the existing API as is and a wrapper that accepts additional parameters.   Something like the following (with implementations in the .c obviously rather than inline the header like I have it here).  There are several examples of this pattern already in the dpdk (i.e., ring APIs, mempool APIs, etc.) where we use a common function invoked by higher level functions that pass in additional parameters to customize behavior.
> 
> struct rte_cfgfile *_rte_cfgfile_load(const char *filename,
>                                           const struct rte_cfgfile_params *params);
> 
> struct rte_cfgfile *rte_cfgfile_load(const char *filename, int flags)
> {
>         struct rte_cfgfile_params params;
> 
>         rte_cfgfile_set_default_params(&params);
>         params |= flags;
>         return _rte_cfgfile_load(filename, &params);
> }
> 
> struct rte_cfgfile *rte_cfgfile_load_with_params(const char *filename,
>                                                     const struct rte_cfgfile_params *params)
> {
>         return _rte_cfgfile_load(filename, params);
> }

No need for a new API. Just add the extra parameter to the existing load
parameter and use function versioning for ABI compatilibity. Since it's
only one function, I don't think using versioning is a big deal, and
that's what it is there for.

Also, for a single parameter like a comment char, I don't think we need
to go creating a separate structure. The current flags parameter is
unused, so just replace it with the comment char one. With using the
structure, any additions to the struct would be an ABI change anyway, so
I see little point in using it, unless we already know of additional
parameters we will be adding in future. [It's an ABI change even when
adding to the end, since the struct is allocated in the app itself, not
the library.]

/Bruce

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 16/17] vhost: rename header file
  2017-03-03  9:51  4% [dpdk-dev] [PATCH 00/17] vhost: generic vhost API Yuanhan Liu
@ 2017-03-03  9:51  3% ` Yuanhan Liu
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
  1 sibling, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-03  9:51 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 doc/guides/rel_notes/deprecation.rst   |   9 --
 drivers/net/vhost/rte_eth_vhost.c      |   2 +-
 drivers/net/vhost/rte_eth_vhost.h      |   2 +-
 examples/tep_termination/main.c        |   2 +-
 examples/tep_termination/vxlan_setup.c |   2 +-
 examples/vhost/main.c                  |   2 +-
 lib/librte_vhost/Makefile              |   2 +-
 lib/librte_vhost/rte_vhost.h           | 259 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/rte_virtio_net.h      | 259 ---------------------------------
 lib/librte_vhost/vhost.c               |   2 +-
 lib/librte_vhost/vhost.h               |   2 +-
 lib/librte_vhost/vhost_user.h          |   2 +-
 lib/librte_vhost/virtio_net.c          |   2 +-
 13 files changed, 269 insertions(+), 278 deletions(-)
 create mode 100644 lib/librte_vhost/rte_vhost.h
 delete mode 100644 lib/librte_vhost/rte_virtio_net.h

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9d4dfcc..84c8b9d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -104,15 +104,6 @@ Deprecation Notices
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
 
-* vhost: API/ABI changes are planned for 17.05, for making DPDK vhost library
-  generic enough so that applications can build different vhost-user drivers
-  (instead of vhost-user net only) on top of that.
-  Specifically, ``virtio_net_device_ops`` will be renamed to ``vhost_device_ops``.
-  Correspondingly, some API's parameter need be changed. Few more functions also
-  need be reworked to let it be device aware. For example, different virtio device
-  has different feature set, meaning functions like ``rte_vhost_feature_disable``
-  need be changed. Last, file rte_virtio_net.h will be renamed to rte_vhost.h.
-
 * kni: Remove :ref:`kni_vhost_backend-label` feature (KNI_VHOST) in 17.05 release.
   :doc:`Vhost Library </prog_guide/vhost_lib>` is currently preferred method for
   guest - host communication. Just for clarification, this is not to remove KNI
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index df1e386..f7c370e 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -40,7 +40,7 @@
 #include <rte_memcpy.h>
 #include <rte_vdev.h>
 #include <rte_kvargs.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_spinlock.h>
 
 #include "rte_eth_vhost.h"
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
index ea4bce4..39ca771 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -41,7 +41,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 /*
  * Event description.
diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c
index fa1c7a4..63a5dd3 100644
--- a/examples/tep_termination/main.c
+++ b/examples/tep_termination/main.c
@@ -49,7 +49,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 #include "main.h"
 #include "vxlan.h"
diff --git a/examples/tep_termination/vxlan_setup.c b/examples/tep_termination/vxlan_setup.c
index 8f1f15b..87de74d 100644
--- a/examples/tep_termination/vxlan_setup.c
+++ b/examples/tep_termination/vxlan_setup.c
@@ -49,7 +49,7 @@
 #include <rte_tcp.h>
 
 #include "main.h"
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 #include "vxlan.h"
 #include "vxlan_setup.h"
 
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 080c60b..a9b5352 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -49,7 +49,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_ip.h>
 #include <rte_tcp.h>
 
diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 5cf4e93..4847069 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -51,7 +51,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c socket.c vhost.c vhost_user.c \
 				   virtio_net.c
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
 
 # dependencies
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_eal
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
new file mode 100644
index 0000000..cfb3507
--- /dev/null
+++ b/lib/librte_vhost/rte_vhost.h
@@ -0,0 +1,259 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_VHOST_H_
+#define _RTE_VHOST_H_
+
+/**
+ * @file
+ * Interface to vhost-user
+ */
+
+#include <stdint.h>
+#include <linux/vhost.h>
+#include <linux/virtio_ring.h>
+#include <sys/eventfd.h>
+
+#include <rte_memory.h>
+#include <rte_mempool.h>
+
+#define RTE_VHOST_USER_CLIENT		(1ULL << 0)
+#define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
+#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
+
+/**
+ * Information relating to memory regions including offsets to
+ * addresses in QEMUs memory file.
+ */
+struct rte_vhost_mem_region {
+	uint64_t guest_phys_addr;
+	uint64_t guest_user_addr;
+	uint64_t host_user_addr;
+	uint64_t size;
+	void	 *mmap_addr;
+	uint64_t mmap_size;
+	int fd;
+};
+
+/**
+ * Memory structure includes region and mapping information.
+ */
+struct rte_vhost_memory {
+	uint32_t nregions;
+	struct rte_vhost_mem_region regions[0];
+};
+
+struct rte_vhost_vring {
+	struct vring_desc	*desc;
+	struct vring_avail	*avail;
+	struct vring_used	*used;
+	uint64_t		log_guest_addr;
+
+	int			callfd;
+	int			kickfd;
+	uint16_t		size;
+};
+
+/**
+ * Device and vring operations.
+ */
+struct vhost_device_ops {
+	int (*new_device)(int vid);		/**< Add device. */
+	void (*destroy_device)(int vid);	/**< Remove device. */
+
+	int (*vring_state_changed)(int vid, uint16_t queue_id, int enable);	/**< triggered when a vring is enabled or disabled */
+
+	void *reserved[5]; /**< Reserved for future extension */
+};
+
+/**
+ * Convert guest physical Address to host virtual address
+ */
+static inline uint64_t __attribute__((always_inline))
+rte_vhost_gpa_to_vva(struct rte_vhost_memory *mem, uint64_t gpa)
+{
+	struct rte_vhost_mem_region *reg;
+	uint32_t i;
+
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		if (gpa >= reg->guest_phys_addr &&
+		    gpa <  reg->guest_phys_addr + reg->size) {
+			return gpa - reg->guest_phys_addr +
+			       reg->host_user_addr;
+		}
+	}
+
+	return 0;
+}
+
+int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable);
+
+/**
+ * Register vhost driver. path could be different for multiple
+ * instance support.
+ */
+int rte_vhost_driver_register(const char *path, uint64_t flags);
+
+/* Unregister vhost driver. This is only meaningful to vhost user. */
+int rte_vhost_driver_unregister(const char *path);
+
+/**
+ * Set feature bits the vhost driver supports.
+ */
+int rte_vhost_driver_set_features(const char *path, uint64_t features);
+uint64_t rte_vhost_driver_get_features(const char *path);
+
+int rte_vhost_driver_enable_features(const char *path, uint64_t features);
+int rte_vhost_driver_disable_features(const char *path, uint64_t features);
+
+/* Register callbacks. */
+int rte_vhost_driver_callback_register(const char *path,
+	struct vhost_device_ops const * const);
+/* Start vhost driver session blocking loop. */
+int rte_vhost_driver_session_start(void);
+
+/**
+ * Get the numa node from which the virtio net device's memory
+ * is allocated.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The numa node, -1 on failure
+ */
+int rte_vhost_get_numa_node(int vid);
+
+/**
+ * @deprecated
+ * Get the number of queues the device supports.
+ *
+ * Note this function is deprecated, as it returns a queue pair number,
+ * which is vhost specific. Instead, rte_vhost_get_vring_num should
+ * be used.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The number of queues, 0 on failure
+ */
+__rte_deprecated
+uint32_t rte_vhost_get_queue_num(int vid);
+
+/**
+ * Get the number of vrings the device supports.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The number of vrings, 0 on failure
+ */
+uint16_t rte_vhost_get_vring_num(int vid);
+
+/**
+ * Get the virtio net device's ifname, which is the vhost-user socket
+ * file path.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param buf
+ *  The buffer to stored the queried ifname
+ * @param len
+ *  The length of buf
+ *
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_get_ifname(int vid, char *buf, size_t len);
+
+/**
+ * Get how many avail entries are left in the queue
+ *
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index
+ *
+ * @return
+ *  num of avail entires left
+ */
+uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id);
+
+/**
+ * This function adds buffers to the virtio devices RX virtqueue. Buffers can
+ * be received from the physical port or from another virtual device. A packet
+ * count is returned to indicate the number of packets that were succesfully
+ * added to the RX queue.
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index in mq case
+ * @param pkts
+ *  array to contain packets to be enqueued
+ * @param count
+ *  packets num to be enqueued
+ * @return
+ *  num of packets enqueued
+ */
+uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
+	struct rte_mbuf **pkts, uint16_t count);
+
+/**
+ * This function gets guest buffers from the virtio device TX virtqueue,
+ * construct host mbufs, copies guest buffer content to host mbufs and
+ * store them in pkts to be processed.
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index in mq case
+ * @param mbuf_pool
+ *  mbuf_pool where host mbuf is allocated.
+ * @param pkts
+ *  array to contain packets to be dequeued
+ * @param count
+ *  packets num to be dequeued
+ * @return
+ *  num of packets dequeued
+ */
+uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
+
+int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
+uint64_t rte_vhost_get_negotiated_features(int vid);
+int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
+			      struct rte_vhost_vring *vring);
+
+#endif /* _RTE_VHOST_H_ */
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
deleted file mode 100644
index 2f761da..0000000
--- a/lib/librte_vhost/rte_virtio_net.h
+++ /dev/null
@@ -1,259 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_NET_H_
-#define _VIRTIO_NET_H_
-
-/**
- * @file
- * Interface to vhost net
- */
-
-#include <stdint.h>
-#include <linux/vhost.h>
-#include <linux/virtio_ring.h>
-#include <sys/eventfd.h>
-
-#include <rte_memory.h>
-#include <rte_mempool.h>
-
-#define RTE_VHOST_USER_CLIENT		(1ULL << 0)
-#define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
-#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
-
-/**
- * Information relating to memory regions including offsets to
- * addresses in QEMUs memory file.
- */
-struct rte_vhost_mem_region {
-	uint64_t guest_phys_addr;
-	uint64_t guest_user_addr;
-	uint64_t host_user_addr;
-	uint64_t size;
-	void	 *mmap_addr;
-	uint64_t mmap_size;
-	int fd;
-};
-
-/**
- * Memory structure includes region and mapping information.
- */
-struct rte_vhost_memory {
-	uint32_t nregions;
-	struct rte_vhost_mem_region regions[0];
-};
-
-struct rte_vhost_vring {
-	struct vring_desc	*desc;
-	struct vring_avail	*avail;
-	struct vring_used	*used;
-	uint64_t		log_guest_addr;
-
-	int			callfd;
-	int			kickfd;
-	uint16_t		size;
-};
-
-/**
- * Device and vring operations.
- */
-struct vhost_device_ops {
-	int (*new_device)(int vid);		/**< Add device. */
-	void (*destroy_device)(int vid);	/**< Remove device. */
-
-	int (*vring_state_changed)(int vid, uint16_t queue_id, int enable);	/**< triggered when a vring is enabled or disabled */
-
-	void *reserved[5]; /**< Reserved for future extension */
-};
-
-/**
- * Convert guest physical Address to host virtual address
- */
-static inline uint64_t __attribute__((always_inline))
-rte_vhost_gpa_to_vva(struct rte_vhost_memory *mem, uint64_t gpa)
-{
-	struct rte_vhost_mem_region *reg;
-	uint32_t i;
-
-	for (i = 0; i < mem->nregions; i++) {
-		reg = &mem->regions[i];
-		if (gpa >= reg->guest_phys_addr &&
-		    gpa <  reg->guest_phys_addr + reg->size) {
-			return gpa - reg->guest_phys_addr +
-			       reg->host_user_addr;
-		}
-	}
-
-	return 0;
-}
-
-int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable);
-
-/**
- * Register vhost driver. path could be different for multiple
- * instance support.
- */
-int rte_vhost_driver_register(const char *path, uint64_t flags);
-
-/* Unregister vhost driver. This is only meaningful to vhost user. */
-int rte_vhost_driver_unregister(const char *path);
-
-/**
- * Set feature bits the vhost driver supports.
- */
-int rte_vhost_driver_set_features(const char *path, uint64_t features);
-uint64_t rte_vhost_driver_get_features(const char *path);
-
-int rte_vhost_driver_enable_features(const char *path, uint64_t features);
-int rte_vhost_driver_disable_features(const char *path, uint64_t features);
-
-/* Register callbacks. */
-int rte_vhost_driver_callback_register(const char *path,
-	struct vhost_device_ops const * const);
-/* Start vhost driver session blocking loop. */
-int rte_vhost_driver_session_start(void);
-
-/**
- * Get the numa node from which the virtio net device's memory
- * is allocated.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The numa node, -1 on failure
- */
-int rte_vhost_get_numa_node(int vid);
-
-/**
- * @deprecated
- * Get the number of queues the device supports.
- *
- * Note this function is deprecated, as it returns a queue pair number,
- * which is vhost specific. Instead, rte_vhost_get_vring_num should
- * be used.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The number of queues, 0 on failure
- */
-__rte_deprecated
-uint32_t rte_vhost_get_queue_num(int vid);
-
-/**
- * Get the number of vrings the device supports.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The number of vrings, 0 on failure
- */
-uint16_t rte_vhost_get_vring_num(int vid);
-
-/**
- * Get the virtio net device's ifname, which is the vhost-user socket
- * file path.
- *
- * @param vid
- *  vhost device ID
- * @param buf
- *  The buffer to stored the queried ifname
- * @param len
- *  The length of buf
- *
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_get_ifname(int vid, char *buf, size_t len);
-
-/**
- * Get how many avail entries are left in the queue
- *
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index
- *
- * @return
- *  num of avail entires left
- */
-uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id);
-
-/**
- * This function adds buffers to the virtio devices RX virtqueue. Buffers can
- * be received from the physical port or from another virtual device. A packet
- * count is returned to indicate the number of packets that were succesfully
- * added to the RX queue.
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index in mq case
- * @param pkts
- *  array to contain packets to be enqueued
- * @param count
- *  packets num to be enqueued
- * @return
- *  num of packets enqueued
- */
-uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
-	struct rte_mbuf **pkts, uint16_t count);
-
-/**
- * This function gets guest buffers from the virtio device TX virtqueue,
- * construct host mbufs, copies guest buffer content to host mbufs and
- * store them in pkts to be processed.
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index in mq case
- * @param mbuf_pool
- *  mbuf_pool where host mbuf is allocated.
- * @param pkts
- *  array to contain packets to be dequeued
- * @param count
- *  packets num to be dequeued
- * @return
- *  num of packets dequeued
- */
-uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
-	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
-
-int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
-uint64_t rte_vhost_get_negotiated_features(int vid);
-int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
-			      struct rte_vhost_vring *vring);
-
-#endif /* _VIRTIO_NET_H_ */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 0a27888..e0548fe 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -45,7 +45,7 @@
 #include <rte_string_fns.h>
 #include <rte_memory.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 #include "vhost.h"
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index fc9e431..29132f3 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -46,7 +46,7 @@
 #include <rte_log.h>
 #include <rte_ether.h>
 
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 
 #define VHOST_USER_F_PROTOCOL_FEATURES	30
 
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 179e441..f1a7823 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -37,7 +37,7 @@
 #include <stdint.h>
 #include <linux/vhost.h>
 
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 
 /* refer to hw/virtio/vhost-user.c */
 
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8ed2b93..6287c7a 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -39,7 +39,7 @@
 #include <rte_memcpy.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_tcp.h>
 #include <rte_udp.h>
 #include <rte_sctp.h>
-- 
1.9.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 00/17] vhost: generic vhost API
@ 2017-03-03  9:51  4% Yuanhan Liu
  2017-03-03  9:51  3% ` [dpdk-dev] [PATCH 16/17] vhost: rename header file Yuanhan Liu
  2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
  0 siblings, 2 replies; 200+ results
From: Yuanhan Liu @ 2017-03-03  9:51 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

This is a first attempt to make DPDK vhost library be generic enough,
so that user could built its own vhost-user drivers on top of it. For
example, SPDK (Storage Performance Development Kit) is trying to enable
vhost-user SCSI.

The basic idea is, let DPDK vhost be a vhost-user agent. It stores all
the info about the virtio device (i.e. vring address, negotiated features,
etc) and let the specific vhost-user driver to fetch them (by the API
provided by DPDK vhost lib). With those info being provided, the vhost-user
driver then could get/put vring entries, thus, it could exchange data
between the guest and host.

The last patch demonstrates how to use these new APIs to implement a
very simple vhost-user net driver, without any fancy features enabled.


API/ABI Changes summary
=======================

- some renames
  * "struct virtio_net_device_ops" ==> "struct vhost_device_ops"
  * "rte_virtio_net.h"  ==> "rte_vhost.h"

- driver related APIs are bond with the socket file
  * rte_vhost_driver_set_features(socket_file, features);
  * rte_vhost_driver_get_features(socket_file, features);
  * rte_vhost_driver_enable_features(socket_file, features)
  * rte_vhost_driver_disable_features(socket_file, features)
  * rte_vhost_driver_callback_register(socket_file, notify_ops);

- new APIs to fetch guest and vring info
  * rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
  * rte_vhost_get_negotiated_features(int vid);
  * rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
			      struct rte_vhost_vring *vring);

- new exported structures 
  * struct rte_vhost_vring
  * struct rte_vhost_mem_region
  * struct rte_vhost_memory


Some design choices
===================

While making this patchset, I met quite few design choices and here are
two of them, with the issue and the reason I made such choices provided.
Please let me know if you have any comments (or better ideas).

Export public structures or not
-------------------------------

I made an ABI refactor last time (v16.07): move all the structures
internally and let applications use a "vid" to reference the internal
struct. With that, I hope we could never worry about the annoying ABI
issues.

It works great (and as expected) since then, as far as we only support
virito-net, as far as we can handle all the descs inside vhost lib. It
becomes problematic when a user wants to implement a vhost-user driver
somewhere. For example, it needs do the GPA to VVA translation. Without
any structs exported, some functions like gpa_to_vva() can't be inlined.
Calling it would be costly, especially it's a function we have to invoke
for processing each vring desc.

For that reason, the guest memory regions are exported. With that, the
gpa_to_vva could be inlined.

  
Add helper functions to fetch/update descs or not
-------------------------------------------------

I intended to do it like this way: introduce one function to get @count
of descs from a specific vring and another one to update the used descs.
It's something like
    rte_vhost_vring_get_descs(vid, vring_idx, count, offset, iov, descs);
    rte_vhost_vring_update_used_descs(vid, vring_idx, count, offset, descs);

With that, vhost-user driver programmer's task would be easier, as he/she
doesn't have to parse the descs any more (such as to handle indirect desc).

But judging that virtio 1.1 is just emerged and it proposes a completely
ring layout, and most importantly, the vring desc structure is also changed,
I'd like to hold to introduce such two functions. Otherwise, it's very
likely the two will be invalid when virtio 1.1 is out. Though I think it
may could be addressed with a care design, something like making the IOV
generic enough:

	struct rte_vhost_iov {
		uint64_t	gpa;
		uint64_t	vva;
		uint64_t	len;
	};

Instead, I go with the other way: introduce few APIs to export all the vring
infos (vring size, vring addr, callfd, etc), and let the vhost-user driver
read and update the descs. Those info could be passed to vhost-user driver
by introducing one API for each, but for saving few APIs and reducing few
calls for the programmer, I packed few key fields into a new structure, so
that it can be fetched with one call:
        struct rte_vhost_vring {
                struct vring_desc       *desc;
                struct vring_avail      *avail;
                struct vring_used       *used;
                uint64_t                log_guest_addr;
       
                int                     callfd;
                int                     kickfd;
                uint16_t                size;
        };

When virtio 1.1 comes out, likely a simple change like following would
just work:
        struct rte_vhost_vring {
		union {
			struct {
                		struct vring_desc       *desc;
                		struct vring_avail      *avail;
                		struct vring_used       *used;
                		uint64_t                log_guest_addr;
			};
			struct desc	*desc_1_1;	/* vring addr for virtio 1.1 */
		};
       
                int                     callfd;
                int                     kickfd;
                uint16_t                size;
        };

AFAIK, it's not an ABI breakage. Even if it does, we could introduce a new
API to get the virtio 1.1 ring address.

Those fields are the minimum set I got for a specific vring, with the mind
it would bring the minimum chance to break ABI for future extension. If we
need more info, we could introduce a new API.

OTOH, for getting the best performance, the two functions also have to be
inlined ("vid + vring_idx" combo is replaced with "vring"):
    rte_vhost_vring_get_descs(vring, count, offset, iov, descs);
    rte_vhost_vring_update_used_descs(vring, count, offset, descs);

That said, one way or another, we have to export rte_vhost_vring struct.
For this reason, I didn't rush into introducing the two APIs.


TODOs
=====

This series still got few small items to finish, and they are:
- update release note
- fill API comments
- set protocol features


	--yliu

---
Yuanhan Liu (17):
  vhost: introduce driver features related APIs
  net/vhost: remove feature related APIs
  vhost: use new APIs to handle features
  vhost: make notify ops per vhost driver
  vhost: export guest memory regions
  vhost: introduce API to fetch negotiated features
  vhost: export vhost vring info
  vhost: export API to translate gpa to vva
  vhost: turn queue pair to vring
  vhost: export the number of vrings
  vhost: move the device ready check at proper place
  vhost: drop the Rx and Tx queue macro
  vhost: do not include net specific headers
  vhost: rename device ops struct
  vhost: rename virtio-net to vhost
  vhost: rename header file
  examples/vhost: demonstrate the new generic vhost APIs

 doc/guides/rel_notes/deprecation.rst        |   9 -
 drivers/net/vhost/rte_eth_vhost.c           |  51 ++--
 drivers/net/vhost/rte_eth_vhost.h           |  32 +--
 drivers/net/vhost/rte_pmd_vhost_version.map |   3 -
 examples/tep_termination/main.c             |  11 +-
 examples/tep_termination/main.h             |   2 +
 examples/tep_termination/vxlan_setup.c      |   2 +-
 examples/vhost/Makefile                     |   2 +-
 examples/vhost/main.c                       |  88 ++++--
 examples/vhost/main.h                       |  33 ++-
 examples/vhost/virtio_net.c                 | 405 ++++++++++++++++++++++++++++
 lib/librte_vhost/Makefile                   |   4 +-
 lib/librte_vhost/rte_vhost.h                | 259 ++++++++++++++++++
 lib/librte_vhost/rte_vhost_version.map      |  18 +-
 lib/librte_vhost/rte_virtio_net.h           | 193 -------------
 lib/librte_vhost/socket.c                   | 143 ++++++++++
 lib/librte_vhost/vhost.c                    | 209 +++++++-------
 lib/librte_vhost/vhost.h                    |  82 +++---
 lib/librte_vhost/vhost_user.c               |  91 +++----
 lib/librte_vhost/vhost_user.h               |   2 +-
 lib/librte_vhost/virtio_net.c               |  35 +--
 21 files changed, 1140 insertions(+), 534 deletions(-)
 create mode 100644 examples/vhost/virtio_net.c
 create mode 100644 lib/librte_vhost/rte_vhost.h
 delete mode 100644 lib/librte_vhost/rte_virtio_net.h

-- 
1.9.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 5/6] prgdev: add ABI control info
  2017-03-02  4:03  3% [dpdk-dev] [PATCH 0/6] introduce prgdev abstraction library Chen Jing D(Mark)
@ 2017-03-02  4:03  4% ` Chen Jing D(Mark)
  0 siblings, 0 replies; 200+ results
From: Chen Jing D(Mark) @ 2017-03-02  4:03 UTC (permalink / raw)
  To: dev
  Cc: cunming.liang, gerald.rogers, keith.wiles, bruce.richardson,
	Chen Jing D(Mark)

Add rte_prgdev_version.map file.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
---
 lib/librte_prgdev/rte_prgdev_version.map |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_prgdev/rte_prgdev_version.map

diff --git a/lib/librte_prgdev/rte_prgdev_version.map b/lib/librte_prgdev/rte_prgdev_version.map
new file mode 100644
index 0000000..51dc15a
--- /dev/null
+++ b/lib/librte_prgdev/rte_prgdev_version.map
@@ -0,0 +1,19 @@
+DPDK_17.05 {
+	global:
+
+	rte_prgdev_pci_probe;
+	rte_prgdev_pci_remove;
+	rte_prgdev_allocate;
+	rte_prgdev_release;
+	rte_prgdev_info_get;
+	rte_prgdev_is_valid_dev;
+	rte_prgdev_open;
+	rte_prgdev_img_download;
+	rte_prgdev_img_upload;
+	rte_prgdev_check_stat;
+	rte_prgdev_close;
+	rte_prgdev_bind;
+	rte_prgdev_unbind;
+
+	local: *;
+};
-- 
1.7.7.6

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 0/6] introduce prgdev abstraction library
@ 2017-03-02  4:03  3% Chen Jing D(Mark)
  2017-03-02  4:03  4% ` [dpdk-dev] [PATCH 5/6] prgdev: add ABI control info Chen Jing D(Mark)
  0 siblings, 1 reply; 200+ results
From: Chen Jing D(Mark) @ 2017-03-02  4:03 UTC (permalink / raw)
  To: dev
  Cc: cunming.liang, gerald.rogers, keith.wiles, bruce.richardson,
	Chen Jing D(Mark)

These patch set intend to introduce a DPDK generic programming device layer,
called prgdev, to provide an abstract, generic APIs for applications to
program hardware without knowing the details of programmable devices. From
driver's perspective, they'll try to adapt their functions to the abstract
APIs defined in prgdev.

The major purpose of prgdev is to help DPDK users to dynamically load/upgrade
RTL images for FPGA devices, or upgrade firmware for programmble NICs, without
breaking DPDK application running.


Chen Jing D(Mark) (5):
  prgdev: introduce new library
  prgdev: add debug macro for prgdev
  prgdev: add bus probe and remove functions
  prgdev: add prgdev API exposed to application
  prgdev: add ABI control info

Chen, Jing D (1):
  doc: introduction to prgdev

 config/common_base                       |    7 +
 doc/guides/prog_guide/index.rst          |    1 +
 doc/guides/prog_guide/prgdev_lib.rst     |  465 ++++++++++++++++++++++++++++++
 lib/Makefile                             |    1 +
 lib/librte_eal/common/include/rte_log.h  |    1 +
 lib/librte_prgdev/Makefile               |   57 ++++
 lib/librte_prgdev/rte_prgdev.c           |  459 +++++++++++++++++++++++++++++
 lib/librte_prgdev/rte_prgdev.h           |  401 ++++++++++++++++++++++++++
 lib/librte_prgdev/rte_prgdev_version.map |   19 ++
 mk/rte.app.mk                            |    1 +
 10 files changed, 1412 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/prog_guide/prgdev_lib.rst
 create mode 100644 lib/librte_prgdev/Makefile
 create mode 100644 lib/librte_prgdev/rte_prgdev.c
 create mode 100644 lib/librte_prgdev/rte_prgdev.h
 create mode 100644 lib/librte_prgdev/rte_prgdev_version.map

-- 
1.7.7.6

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 04/16] net/avp: add PMD version map file
  @ 2017-03-02  0:19  3%     ` Allain Legacy
    1 sibling, 0 replies; 200+ results
From: Allain Legacy @ 2017-03-02  0:19 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: ian.jolliffe, jerin.jacob, stephen, thomas.monjalon, dev

Adds a default ABI version file for the AVP PMD.

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
---
 drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/net/avp/rte_pmd_avp_version.map

diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
new file mode 100644
index 0000000..af8f3f4
--- /dev/null
+++ b/drivers/net/avp/rte_pmd_avp_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+
+    local: *;
+};
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 09/18] lib: add symbol versioning to distributor
  2017-03-01  7:47  3%       ` [dpdk-dev] [PATCH v8 " David Hunt
@ 2017-03-01 14:50  0%         ` Hunt, David
  0 siblings, 0 replies; 200+ results
From: Hunt, David @ 2017-03-01 14:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

ERROR:SPACING: space prohibited before that ',' (ctx:WxW)
#84: FILE: lib/librte_distributor/rte_distributor.c:172:
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
                                               ^

FYI, checkpatch does not like this regardless of whether there's
a space there or not. It complains either way. :)

Regards,
Dave.



On 1/3/2017 7:47 AM, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
>
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>   lib/librte_distributor/Makefile                    |  2 +-
>   lib/librte_distributor/rte_distributor.c           |  8 ++++++++
>   lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
>   lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
>   4 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 2b28eff..2f05cf3 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
>   
>   EXPORT_MAP := rte_distributor_version.map
>   
> -LIBABIVER := 1
> +LIBABIVER := 2
>   
>   # all source are stored in SRCS-y
>   SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
> diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
> index 6e1debf..2c5511d 100644
> --- a/lib/librte_distributor/rte_distributor.c
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -36,6 +36,7 @@
>   #include <rte_mbuf.h>
>   #include <rte_memory.h>
>   #include <rte_cycles.h>
> +#include <rte_compat.h>
>   #include <rte_memzone.h>
>   #include <rte_errno.h>
>   #include <rte_string_fns.h>
> @@ -168,6 +169,7 @@ rte_distributor_get_pkt(struct rte_distributor *d,
>   	}
>   	return count;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
>   
>   int
>   rte_distributor_return_pkt(struct rte_distributor *d,
> @@ -197,6 +199,7 @@ rte_distributor_return_pkt(struct rte_distributor *d,
>   
>   	return 0;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, , 17.05);
>   
>   /**** APIs called on distributor core ***/
>   
> @@ -476,6 +479,7 @@ rte_distributor_process(struct rte_distributor *d,
>   
>   	return num_mbufs;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_process, , 17.05);
>   
>   /* return to the caller, packets returned from workers */
>   int
> @@ -504,6 +508,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
>   
>   	return retval;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, , 17.05);
>   
>   /*
>    * Return the number of packets in-flight in a distributor, i.e. packets
> @@ -549,6 +554,7 @@ rte_distributor_flush(struct rte_distributor *d)
>   
>   	return flushed;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_flush, , 17.05);
>   
>   /* clears the internal returns array in the distributor */
>   void
> @@ -565,6 +571,7 @@ rte_distributor_clear_returns(struct rte_distributor *d)
>   	for (wkr = 0; wkr < d->num_workers; wkr++)
>   		d->bufs[wkr].retptr64[0] = 0;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, , 17.05);
>   
>   /* creates a distributor instance */
>   struct rte_distributor *
> @@ -638,3 +645,4 @@ rte_distributor_create(const char *name,
>   
>   	return d;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_create, , 17.05);
> diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
> index 1f406c5..bb6c5d7 100644
> --- a/lib/librte_distributor/rte_distributor_v20.c
> +++ b/lib/librte_distributor/rte_distributor_v20.c
> @@ -38,6 +38,7 @@
>   #include <rte_memory.h>
>   #include <rte_memzone.h>
>   #include <rte_errno.h>
> +#include <rte_compat.h>
>   #include <rte_string_fns.h>
>   #include <rte_eal_memconfig.h>
>   #include "rte_distributor_v20.h"
> @@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
>   		rte_pause();
>   	buf->bufptr64 = req;
>   }
> +VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
>   
>   struct rte_mbuf *
>   rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
> @@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
>   	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
>   	return (struct rte_mbuf *)((uintptr_t)ret);
>   }
> +VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
>   
>   struct rte_mbuf *
>   rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
> @@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
>   		rte_pause();
>   	return ret;
>   }
> +VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
>   
>   int
>   rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
> @@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
>   	buf->bufptr64 = req;
>   	return 0;
>   }
> +VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
>   
>   /**** APIs called on distributor core ***/
>   
> @@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
>   	d->returns.count = ret_count;
>   	return num_mbufs;
>   }
> +VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
>   
>   /* return to the caller, packets returned from workers */
>   int
> @@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
>   
>   	return retval;
>   }
> +VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
>   
>   /* return the number of packets in-flight in a distributor, i.e. packets
>    * being workered on or queued up in a backlog. */
> @@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
>   
>   	return flushed;
>   }
> +VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
>   
>   /* clears the internal returns array in the distributor */
>   void
> @@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
>   	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
>   #endif
>   }
> +VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
>   
>   /* creates a distributor instance */
>   struct rte_distributor_v20 *
> @@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
>   
>   	return d;
>   }
> +VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
> diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
> index 73fdc43..3a285b3 100644
> --- a/lib/librte_distributor/rte_distributor_version.map
> +++ b/lib/librte_distributor/rte_distributor_version.map
> @@ -13,3 +13,17 @@ DPDK_2.0 {
>   
>   	local: *;
>   };
> +
> +DPDK_17.05 {
> +	global:
> +
> +	rte_distributor_clear_returns;
> +	rte_distributor_create;
> +	rte_distributor_flush;
> +	rte_distributor_get_pkt;
> +	rte_distributor_poll_pkt;
> +	rte_distributor_process;
> +	rte_distributor_request_pkt;
> +	rte_distributor_return_pkt;
> +	rte_distributor_returned_pkts;
> +} DPDK_2.0;

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v8 09/18] lib: add symbol versioning to distributor
  2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2017-03-01  7:47  1%       ` [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-01  7:47  3%       ` David Hunt
  2017-03-01 14:50  0%         ` Hunt, David
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           |  8 ++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..2c5511d 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -168,6 +169,7 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
 
 int
 rte_distributor_return_pkt(struct rte_distributor *d,
@@ -197,6 +199,7 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, , 17.05);
 
 /**** APIs called on distributor core ***/
 
@@ -476,6 +479,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, , 17.05);
 
 /* return to the caller, packets returned from workers */
 int
@@ -504,6 +508,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, , 17.05);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -549,6 +554,7 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, , 17.05);
 
 /* clears the internal returns array in the distributor */
 void
@@ -565,6 +571,7 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, , 17.05);
 
 /* creates a distributor instance */
 struct rte_distributor *
@@ -638,3 +645,4 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, , 17.05);
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-21 10:27  0%     ` Hunt, David
  2017-02-24 14:03  0%     ` Bruce Richardson
@ 2017-03-01  7:47  2%     ` David Hunt
  2017-03-01  7:47  1%       ` [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-01  7:47  3%       ` [dpdk-dev] [PATCH v8 " David Hunt
  2 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new burst oriented distributor structs
[04/18] lib: add new distributor code
[05/18] lib: add SIMD flow matching to distributor
[06/18] test/distributor: extra params for autotests
[07/18] lib: switch distributor over to new API
[08/18] lib: make v20 header file private
[09/18] lib: add symbol versioning to distributor
[10/18] test: test single and burst distributor API
[11/18] test: add perf test for distributor burst mode
[12/18] examples/distributor: allow for extra stats
[13/18] sample: distributor: wait for ports to come up
[14/18] examples/distributor: give distributor a core
[15/18] examples/distributor: limit number of Tx rings
[16/18] examples/distributor: give Rx thread a core
[17/18] doc: distributor library changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files
  2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
@ 2017-03-01  7:47  1%       ` David Hunt
  2017-03-06  9:10  2%         ` [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements David Hunt
  2017-03-01  7:47  3%       ` [dpdk-dev] [PATCH v8 " David Hunt
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v2] mk: Provide option to set Major ABI version
  2017-03-01  9:34 20%           ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
@ 2017-03-01 14:35  4%             ` Jan Blunck
  2017-03-16 17:19  4%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jan Blunck @ 2017-03-01 14:35 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: dev, cjcollier @ linuxfoundation . org, ricardo.salveti, Luca Boccassi

On Wed, Mar 1, 2017 at 10:34 AM, Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
> Downstreams might want to provide different DPDK releases at the same
> time to support multiple consumers of DPDK linked against older and newer
> sonames.
>
> Also due to the interdependencies that DPDK libraries can have applications
> might end up with an executable space in which multiple versions of a
> library are mapped by ld.so.
>
> Think of LibA that got an ABI bump and LibB that did not get an ABI bump
> but is depending on LibA.
>
>     Application
>     \-> LibA.old
>     \-> LibB.new -> LibA.new
>
> That is a conflict which can be avoided by setting CONFIG_RTE_MAJOR_ABI.
> If set CONFIG_RTE_MAJOR_ABI overwrites any LIBABIVER value.
> An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all
> libraries librte<?>.so.16.11 instead of librte<?>.so.<LIBABIVER>.
>
> We need to cut arbitrary long stings after the .so now and this would work
> for any ABI version in LIBABIVER:
>   $(Q)ln -s -f $< $(patsubst %.$(LIBABIVER),%,$@)
> But using the following instead additionally allows to simplify the Make
> File for the CONFIG_RTE_NEXT_ABI case.
>   $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
>
> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> ---
>  config/common_base                     |  5 +++++
>  doc/guides/contributing/versioning.rst | 25 +++++++++++++++++++++++++
>  mk/rte.lib.mk                          | 14 +++++++++-----
>  3 files changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/config/common_base b/config/common_base
> index aeee13e..37aa1e1 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -75,6 +75,11 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
>  CONFIG_RTE_NEXT_ABI=y
>
>  #
> +# Major ABI to overwrite library specific LIBABIVER
> +#
> +CONFIG_RTE_MAJOR_ABI=
> +
> +#
>  # Machine's cache line size
>  #
>  CONFIG_RTE_CACHE_LINE_SIZE=64
> diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
> index fbc44a7..8aaf370 100644
> --- a/doc/guides/contributing/versioning.rst
> +++ b/doc/guides/contributing/versioning.rst
> @@ -133,6 +133,31 @@ The macros exported are:
>    fully qualified function ``p``, so that if a symbol becomes versioned, it
>    can still be mapped back to the public symbol name.
>
> +Setting a Major ABI version
> +---------------------------
> +
> +Downstreams might want to provide different DPDK releases at the same time to
> +support multiple consumers of DPDK linked against older and newer sonames.
> +
> +Also due to the interdependencies that DPDK libraries can have applications
> +might end up with an executable space in which multiple versions of a library
> +are mapped by ld.so.
> +
> +Think of LibA that got an ABI bump and LibB that did not get an ABI bump but is
> +depending on LibA.
> +
> +.. note::
> +
> +    Application
> +    \-> LibA.old
> +    \-> LibB.new -> LibA.new
> +
> +That is a conflict which can be avoided by setting ``CONFIG_RTE_MAJOR_ABI``.
> +If set, the value of ``CONFIG_RTE_MAJOR_ABI`` overwrites all - otherwise per
> +library - versions defined in the libraries ``LIBABIVER``.
> +An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all libraries
> +``librte<?>.so.16.11`` instead of ``librte<?>.so.<LIBABIVER>``.
> +
>  Examples of ABI Macro use
>  -------------------------
>
> diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
> index 33a5f5a..1ffbf42 100644
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -40,12 +40,20 @@ EXTLIB_BUILD ?= n
>  # VPATH contains at least SRCDIR
>  VPATH += $(SRCDIR)
>
> +ifneq ($(CONFIG_RTE_MAJOR_ABI),)
> +ifneq ($(LIBABIVER),)
> +LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
> +endif
> +endif
> +
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
>  LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
>  ifeq ($(EXTLIB_BUILD),n)
> +ifeq ($(CONFIG_RTE_MAJOR_ABI),)
>  ifeq ($(CONFIG_RTE_NEXT_ABI),y)
>  LIB := $(LIB).1
>  endif
> +endif
>  CPU_LDFLAGS += --version-script=$(SRCDIR)/$(EXPORT_MAP)
>  endif
>  endif
> @@ -156,11 +164,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
>         @[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
>         $(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> -ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
> -       $(Q)ln -s -f $< $(basename $(basename $@))
> -else
> -       $(Q)ln -s -f $< $(basename $@)
> -endif
> +       $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
>  endif
>
>  #
> --
> 2.7.4
>

Reviewed-by: Jan Blunck <jblunck@infradead.org>
Tested-by: Jan Blunck <jblunck@infradead.org>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 0/1] net/mlx5: add TSO support
  @ 2017-03-01 11:11  3% ` Shahaf Shuler
  0 siblings, 0 replies; 200+ results
From: Shahaf Shuler @ 2017-03-01 11:11 UTC (permalink / raw)
  To: nelio.laranjeiro, adrien.mazarguil; +Cc: dev

on v2:
* Suppressed patches:
  [PATCH 1/4] ethdev: add Tx offload limitations.
  [PATCH 2/4] ethdev: add TSO disable flag.
  [PATCH 3/4] app/testpmd: add TSO disable to test options.
* The changes introduced on the above conflict with tx_prepare API and break ABI.
  A proposal to disable by default optional offloads and a way to reflect HW offloads
  limitations to application will be addressed on different commit.
* TSO support modification 

[PATCH v2 1/1] net/mlx5: add hardware TSO support

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-24 14:03  0%     ` Bruce Richardson
@ 2017-03-01  9:55  0%       ` Hunt, David
  0 siblings, 0 replies; 200+ results
From: Hunt, David @ 2017-03-01  9:55 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 24/2/2017 2:03 PM, Bruce Richardson wrote:
> On Tue, Feb 21, 2017 at 03:17:37AM +0000, David Hunt wrote:
>> Move files out of the way so that we can replace with new
>> versions of the distributor libtrary. Files are named in
>> such a way as to match the symbol versioning that we will
>> apply for backward ABI compatibility.
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   app/test/test_distributor.c                  |   2 +-
>>   app/test/test_distributor_perf.c             |   2 +-
>>   examples/distributor/main.c                  |   2 +-
>>   lib/librte_distributor/Makefile              |   4 +-
>>   lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
>>   lib/librte_distributor/rte_distributor.h     | 247 --------------
>>   lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
>>   lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++
> Rather than changing the unit tests and example applications, I think
> this patch would be better with a new rte_distributor.h file which
> simply does "#include  <rte_distributor_v20.h>". Alternatively, I
> recently upstreamed a patch, which went into 17.02, to allow symlinks in
> the folder so you could create a symlink to the renamed file.
>
> /Bruce

Thanks for the review, Bruce. I've just finished reworking the patchset 
on your review comments (including later emails) and will post soon.

Regards,
Dave.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 17:54  0%           ` Jerin Jacob
@ 2017-03-01  9:47  0%             ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-01  9:47 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 11:24:25PM +0530, Jerin Jacob wrote:
> On Tue, Feb 28, 2017 at 01:52:26PM +0000, Bruce Richardson wrote:
> > On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote:
> > > On Tue, Feb 28, 2017 at 11:57:03AM +0000, Bruce Richardson wrote:
> > > > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > > > > On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > > > > > Users compiling DPDK should not need to know or care about the arrangement
> > > > > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > > > > option and set the structures to be always split. For improved
> > > > > > performance use 128B rather than 64B alignment since it stops the producer
> > > > > > and consumer data being on adjacent cachelines.
> > > > > > 
> > > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > > > ---
> > > > > >  config/common_base                     | 1 -
> > > > > >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> > > > > >  lib/librte_ring/rte_ring.c             | 2 --
> > > > > >  lib/librte_ring/rte_ring.h             | 8 ++------
> > > > > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > > > > 
> > > > > > diff --git a/config/common_base b/config/common_base
> > > > > > index aeee13e..099ffda 100644
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > > > > >  #
> > > > > >  CONFIG_RTE_LIBRTE_RING=y
> > > > > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > > > > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > > > > >  
> > > > > >  #
> > > > > > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > > > > > index e25ea9f..ea45e0c 100644
> > > > > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > > > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > > > > @@ -110,6 +110,12 @@ API Changes
> > > > > >     Also, make sure to start the actual text at the margin.
> > > > > >     =========================================================
> > > > > >  
> > > > > > +* **Reworked rte_ring library**
> > > > > > +
> > > > > > +  The rte_ring library has been reworked and updated. The following changes
> > > > > > +  have been made to it:
> > > > > > +
> > > > > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > > > > >  
> > > > > >  ABI Changes
> > > > > >  -----------
> > > > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > > > index ca0a108..4bc6da1 100644
> > > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > > > > >  	/* compilation-time checks */
> > > > > >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > > > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > > -#endif
> > > > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > > > index 72ccca5..04fe667 100644
> > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > @@ -168,7 +168,7 @@ struct rte_ring {
> > > > > >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> > > > > >  		volatile uint32_t head;  /**< Producer head. */
> > > > > >  		volatile uint32_t tail;  /**< Producer tail. */
> > > > > > -	} prod __rte_cache_aligned;
> > > > > > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > > > > 
> > > > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > > > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> > > > > size of 128B
> > > > > 
> > > > Sure.
> > > > 
> > > > However, can you perhaps try a performance test and check to see if
> > > > there is a performance difference between the two values before I change
> > > > it? In my tests I see improved performance by having an extra blank
> > > > cache-line between the producer and consumer data.
> > > 
> > > Sure. Which test are you running to measure the performance difference?
> > > Is it app/test/test_ring_perf.c?
> > > 
> > > > 
> > Yep, just the basic ring perf test. I look mostly at the core-to-core
> > numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket
> > would be far less common use cases IMHO.
> 
> Performance test result shows regression with RTE_CACHE_LINE_MIN_SIZE
> scheme in some use case and some use case has higher performance(Testing using
> two physical cores)
> 
> 
> # base code
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 84
> MP/MC single enq/dequeue: 301
> SP/SC burst enq/dequeue (size: 8): 20
> MP/MC burst enq/dequeue (size: 8): 46
> SP/SC burst enq/dequeue (size: 32): 12
> MP/MC burst enq/dequeue (size: 32): 18
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 7.11
> MC empty dequeue: 12.15
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 19.08
> MP/MC bulk enq/dequeue (size: 8): 46.28
> SP/SC bulk enq/dequeue (size: 32): 11.89
> MP/MC bulk enq/dequeue (size: 32): 18.84
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 37.42
> MP/MC bulk enq/dequeue (size: 8): 73.32
> SP/SC bulk enq/dequeue (size: 32): 18.69
> MP/MC bulk enq/dequeue (size: 32): 24.59
> Test OK
> 
> # with ring rework patch
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 84
> MP/MC single enq/dequeue: 301
> SP/SC burst enq/dequeue (size: 8): 19
> MP/MC burst enq/dequeue (size: 8): 45
> SP/SC burst enq/dequeue (size: 32): 11
> MP/MC burst enq/dequeue (size: 32): 18
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 7.10
> MC empty dequeue: 12.15
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 18.59
> MP/MC bulk enq/dequeue (size: 8): 45.49
> SP/SC bulk enq/dequeue (size: 32): 11.67
> MP/MC bulk enq/dequeue (size: 32): 18.65
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 37.41
> MP/MC bulk enq/dequeue (size: 8): 72.98
> SP/SC bulk enq/dequeue (size: 32): 18.69
> MP/MC bulk enq/dequeue (size: 32): 24.59
> Test OK
> RTE>>
> 
> # with ring rework patch + cache-line size change to one on 128BCL target
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 90
> MP/MC single enq/dequeue: 317
> SP/SC burst enq/dequeue (size: 8): 20
> MP/MC burst enq/dequeue (size: 8): 48
> SP/SC burst enq/dequeue (size: 32): 11
> MP/MC burst enq/dequeue (size: 32): 18
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 8.10
> MC empty dequeue: 11.15
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 20.24
> MP/MC bulk enq/dequeue (size: 8): 48.43
> SP/SC bulk enq/dequeue (size: 32): 11.01
> MP/MC bulk enq/dequeue (size: 32): 18.43
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 25.92
> MP/MC bulk enq/dequeue (size: 8): 69.76
> SP/SC bulk enq/dequeue (size: 32): 14.27
> MP/MC bulk enq/dequeue (size: 32): 22.94
> Test OK
> RTE>>

So given that there is not much difference here, is the MIN_SIZE i.e.
forced 64B, your preference, rather than actual cacheline-size?

/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] mk: Provide option to set Major ABI version
  2017-03-01  9:31  4%         ` Christian Ehrhardt
@ 2017-03-01  9:34 20%           ` Christian Ehrhardt
  2017-03-01 14:35  4%             ` Jan Blunck
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2017-03-01  9:34 UTC (permalink / raw)
  To: dev
  Cc: Christian Ehrhardt, cjcollier @ linuxfoundation . org,
	ricardo.salveti, Luca Boccassi

Downstreams might want to provide different DPDK releases at the same
time to support multiple consumers of DPDK linked against older and newer
sonames.

Also due to the interdependencies that DPDK libraries can have applications
might end up with an executable space in which multiple versions of a
library are mapped by ld.so.

Think of LibA that got an ABI bump and LibB that did not get an ABI bump
but is depending on LibA.

    Application
    \-> LibA.old
    \-> LibB.new -> LibA.new

That is a conflict which can be avoided by setting CONFIG_RTE_MAJOR_ABI.
If set CONFIG_RTE_MAJOR_ABI overwrites any LIBABIVER value.
An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all
libraries librte<?>.so.16.11 instead of librte<?>.so.<LIBABIVER>.

We need to cut arbitrary long stings after the .so now and this would work
for any ABI version in LIBABIVER:
  $(Q)ln -s -f $< $(patsubst %.$(LIBABIVER),%,$@)
But using the following instead additionally allows to simplify the Make
File for the CONFIG_RTE_NEXT_ABI case.
  $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 config/common_base                     |  5 +++++
 doc/guides/contributing/versioning.rst | 25 +++++++++++++++++++++++++
 mk/rte.lib.mk                          | 14 +++++++++-----
 3 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/config/common_base b/config/common_base
index aeee13e..37aa1e1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -75,6 +75,11 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
 CONFIG_RTE_NEXT_ABI=y
 
 #
+# Major ABI to overwrite library specific LIBABIVER
+#
+CONFIG_RTE_MAJOR_ABI=
+
+#
 # Machine's cache line size
 #
 CONFIG_RTE_CACHE_LINE_SIZE=64
diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
index fbc44a7..8aaf370 100644
--- a/doc/guides/contributing/versioning.rst
+++ b/doc/guides/contributing/versioning.rst
@@ -133,6 +133,31 @@ The macros exported are:
   fully qualified function ``p``, so that if a symbol becomes versioned, it
   can still be mapped back to the public symbol name.
 
+Setting a Major ABI version
+---------------------------
+
+Downstreams might want to provide different DPDK releases at the same time to
+support multiple consumers of DPDK linked against older and newer sonames.
+
+Also due to the interdependencies that DPDK libraries can have applications
+might end up with an executable space in which multiple versions of a library
+are mapped by ld.so.
+
+Think of LibA that got an ABI bump and LibB that did not get an ABI bump but is
+depending on LibA.
+
+.. note::
+
+    Application
+    \-> LibA.old
+    \-> LibB.new -> LibA.new
+
+That is a conflict which can be avoided by setting ``CONFIG_RTE_MAJOR_ABI``.
+If set, the value of ``CONFIG_RTE_MAJOR_ABI`` overwrites all - otherwise per
+library - versions defined in the libraries ``LIBABIVER``.
+An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all libraries
+``librte<?>.so.16.11`` instead of ``librte<?>.so.<LIBABIVER>``.
+
 Examples of ABI Macro use
 -------------------------
 
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 33a5f5a..1ffbf42 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -40,12 +40,20 @@ EXTLIB_BUILD ?= n
 # VPATH contains at least SRCDIR
 VPATH += $(SRCDIR)
 
+ifneq ($(CONFIG_RTE_MAJOR_ABI),)
+ifneq ($(LIBABIVER),)
+LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
+endif
+endif
+
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
 ifeq ($(EXTLIB_BUILD),n)
+ifeq ($(CONFIG_RTE_MAJOR_ABI),)
 ifeq ($(CONFIG_RTE_NEXT_ABI),y)
 LIB := $(LIB).1
 endif
+endif
 CPU_LDFLAGS += --version-script=$(SRCDIR)/$(EXPORT_MAP)
 endif
 endif
@@ -156,11 +164,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
 	@[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
 	$(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
-ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
-	$(Q)ln -s -f $< $(basename $(basename $@))
-else
-	$(Q)ln -s -f $< $(basename $@)
-endif
+	$(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
 endif
 
 #
-- 
2.7.4

^ permalink raw reply	[relevance 20%]

* Re: [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version
  2017-02-28  8:34  4%       ` Jan Blunck
@ 2017-03-01  9:31  4%         ` Christian Ehrhardt
  2017-03-01  9:34 20%           ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2017-03-01  9:31 UTC (permalink / raw)
  To: Jan Blunck
  Cc: dev, cjcollier @ linuxfoundation . org, ricardo.salveti, Luca Boccassi

On Tue, Feb 28, 2017 at 9:34 AM, Jan Blunck <jblunck@infradead.org> wrote:

> In case CONFIG_RTE_NEXT_ABI=y is set this is actually generating
> shared objects with suffix:
>
>   .so.$(CONFIG_RTE_MAJOR_ABI).1
>
> I don't think that this is the intention.
>

You are right, thanks for the catch Jan!
The fix is a trivial extra ifeq - a V2 is on the way soon.


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 13:52  0%         ` Bruce Richardson
@ 2017-02-28 17:54  0%           ` Jerin Jacob
  2017-03-01  9:47  0%             ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2017-02-28 17:54 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 01:52:26PM +0000, Bruce Richardson wrote:
> On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote:
> > On Tue, Feb 28, 2017 at 11:57:03AM +0000, Bruce Richardson wrote:
> > > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > > > On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > > > > Users compiling DPDK should not need to know or care about the arrangement
> > > > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > > > option and set the structures to be always split. For improved
> > > > > performance use 128B rather than 64B alignment since it stops the producer
> > > > > and consumer data being on adjacent cachelines.
> > > > > 
> > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > > ---
> > > > >  config/common_base                     | 1 -
> > > > >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> > > > >  lib/librte_ring/rte_ring.c             | 2 --
> > > > >  lib/librte_ring/rte_ring.h             | 8 ++------
> > > > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > > > 
> > > > > diff --git a/config/common_base b/config/common_base
> > > > > index aeee13e..099ffda 100644
> > > > > --- a/config/common_base
> > > > > +++ b/config/common_base
> > > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > > > >  #
> > > > >  CONFIG_RTE_LIBRTE_RING=y
> > > > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > > > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > > > >  
> > > > >  #
> > > > > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > > > > index e25ea9f..ea45e0c 100644
> > > > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > > > @@ -110,6 +110,12 @@ API Changes
> > > > >     Also, make sure to start the actual text at the margin.
> > > > >     =========================================================
> > > > >  
> > > > > +* **Reworked rte_ring library**
> > > > > +
> > > > > +  The rte_ring library has been reworked and updated. The following changes
> > > > > +  have been made to it:
> > > > > +
> > > > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > > > >  
> > > > >  ABI Changes
> > > > >  -----------
> > > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > > index ca0a108..4bc6da1 100644
> > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > > > >  	/* compilation-time checks */
> > > > >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > -#endif
> > > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > > index 72ccca5..04fe667 100644
> > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > @@ -168,7 +168,7 @@ struct rte_ring {
> > > > >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> > > > >  		volatile uint32_t head;  /**< Producer head. */
> > > > >  		volatile uint32_t tail;  /**< Producer tail. */
> > > > > -	} prod __rte_cache_aligned;
> > > > > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > > > 
> > > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> > > > size of 128B
> > > > 
> > > Sure.
> > > 
> > > However, can you perhaps try a performance test and check to see if
> > > there is a performance difference between the two values before I change
> > > it? In my tests I see improved performance by having an extra blank
> > > cache-line between the producer and consumer data.
> > 
> > Sure. Which test are you running to measure the performance difference?
> > Is it app/test/test_ring_perf.c?
> > 
> > > 
> Yep, just the basic ring perf test. I look mostly at the core-to-core
> numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket
> would be far less common use cases IMHO.

Performance test result shows regression with RTE_CACHE_LINE_MIN_SIZE
scheme in some use case and some use case has higher performance(Testing using
two physical cores)


# base code
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 84
MP/MC single enq/dequeue: 301
SP/SC burst enq/dequeue (size: 8): 20
MP/MC burst enq/dequeue (size: 8): 46
SP/SC burst enq/dequeue (size: 32): 12
MP/MC burst enq/dequeue (size: 32): 18

### Testing empty dequeue ###
SC empty dequeue: 7.11
MC empty dequeue: 12.15

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 19.08
MP/MC bulk enq/dequeue (size: 8): 46.28
SP/SC bulk enq/dequeue (size: 32): 11.89
MP/MC bulk enq/dequeue (size: 32): 18.84

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 37.42
MP/MC bulk enq/dequeue (size: 8): 73.32
SP/SC bulk enq/dequeue (size: 32): 18.69
MP/MC bulk enq/dequeue (size: 32): 24.59
Test OK

# with ring rework patch
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 84
MP/MC single enq/dequeue: 301
SP/SC burst enq/dequeue (size: 8): 19
MP/MC burst enq/dequeue (size: 8): 45
SP/SC burst enq/dequeue (size: 32): 11
MP/MC burst enq/dequeue (size: 32): 18

### Testing empty dequeue ###
SC empty dequeue: 7.10
MC empty dequeue: 12.15

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 18.59
MP/MC bulk enq/dequeue (size: 8): 45.49
SP/SC bulk enq/dequeue (size: 32): 11.67
MP/MC bulk enq/dequeue (size: 32): 18.65

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 37.41
MP/MC bulk enq/dequeue (size: 8): 72.98
SP/SC bulk enq/dequeue (size: 32): 18.69
MP/MC bulk enq/dequeue (size: 32): 24.59
Test OK
RTE>>

# with ring rework patch + cache-line size change to one on 128BCL target
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 90
MP/MC single enq/dequeue: 317
SP/SC burst enq/dequeue (size: 8): 20
MP/MC burst enq/dequeue (size: 8): 48
SP/SC burst enq/dequeue (size: 32): 11
MP/MC burst enq/dequeue (size: 32): 18

### Testing empty dequeue ###
SC empty dequeue: 8.10
MC empty dequeue: 11.15

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 20.24
MP/MC bulk enq/dequeue (size: 8): 48.43
SP/SC bulk enq/dequeue (size: 32): 11.01
MP/MC bulk enq/dequeue (size: 32): 18.43

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 25.92
MP/MC bulk enq/dequeue (size: 8): 69.76
SP/SC bulk enq/dequeue (size: 32): 14.27
MP/MC bulk enq/dequeue (size: 32): 22.94
Test OK
RTE>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 12:08  0%       ` Jerin Jacob
@ 2017-02-28 13:52  0%         ` Bruce Richardson
  2017-02-28 17:54  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-28 13:52 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote:
> On Tue, Feb 28, 2017 at 11:57:03AM +0000, Bruce Richardson wrote:
> > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > > On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > > > Users compiling DPDK should not need to know or care about the arrangement
> > > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > > option and set the structures to be always split. For improved
> > > > performance use 128B rather than 64B alignment since it stops the producer
> > > > and consumer data being on adjacent cachelines.
> > > > 
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > >  config/common_base                     | 1 -
> > > >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> > > >  lib/librte_ring/rte_ring.c             | 2 --
> > > >  lib/librte_ring/rte_ring.h             | 8 ++------
> > > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/config/common_base b/config/common_base
> > > > index aeee13e..099ffda 100644
> > > > --- a/config/common_base
> > > > +++ b/config/common_base
> > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > > >  #
> > > >  CONFIG_RTE_LIBRTE_RING=y
> > > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > > >  
> > > >  #
> > > > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > > > index e25ea9f..ea45e0c 100644
> > > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > > @@ -110,6 +110,12 @@ API Changes
> > > >     Also, make sure to start the actual text at the margin.
> > > >     =========================================================
> > > >  
> > > > +* **Reworked rte_ring library**
> > > > +
> > > > +  The rte_ring library has been reworked and updated. The following changes
> > > > +  have been made to it:
> > > > +
> > > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > > >  
> > > >  ABI Changes
> > > >  -----------
> > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > index ca0a108..4bc6da1 100644
> > > > --- a/lib/librte_ring/rte_ring.c
> > > > +++ b/lib/librte_ring/rte_ring.c
> > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > > >  	/* compilation-time checks */
> > > >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > -#endif
> > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > index 72ccca5..04fe667 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -168,7 +168,7 @@ struct rte_ring {
> > > >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> > > >  		volatile uint32_t head;  /**< Producer head. */
> > > >  		volatile uint32_t tail;  /**< Producer tail. */
> > > > -	} prod __rte_cache_aligned;
> > > > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > > 
> > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> > > size of 128B
> > > 
> > Sure.
> > 
> > However, can you perhaps try a performance test and check to see if
> > there is a performance difference between the two values before I change
> > it? In my tests I see improved performance by having an extra blank
> > cache-line between the producer and consumer data.
> 
> Sure. Which test are you running to measure the performance difference?
> Is it app/test/test_ring_perf.c?
> 
> > 
Yep, just the basic ring perf test. I look mostly at the core-to-core
numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket
would be far less common use cases IMHO.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 11:57  0%     ` Bruce Richardson
@ 2017-02-28 12:08  0%       ` Jerin Jacob
  2017-02-28 13:52  0%         ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2017-02-28 12:08 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 11:57:03AM +0000, Bruce Richardson wrote:
> On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > > Users compiling DPDK should not need to know or care about the arrangement
> > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > option and set the structures to be always split. For improved
> > > performance use 128B rather than 64B alignment since it stops the producer
> > > and consumer data being on adjacent cachelines.
> > > 
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> > >  config/common_base                     | 1 -
> > >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> > >  lib/librte_ring/rte_ring.c             | 2 --
> > >  lib/librte_ring/rte_ring.h             | 8 ++------
> > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/config/common_base b/config/common_base
> > > index aeee13e..099ffda 100644
> > > --- a/config/common_base
> > > +++ b/config/common_base
> > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > >  #
> > >  CONFIG_RTE_LIBRTE_RING=y
> > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > >  
> > >  #
> > > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > > index e25ea9f..ea45e0c 100644
> > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > @@ -110,6 +110,12 @@ API Changes
> > >     Also, make sure to start the actual text at the margin.
> > >     =========================================================
> > >  
> > > +* **Reworked rte_ring library**
> > > +
> > > +  The rte_ring library has been reworked and updated. The following changes
> > > +  have been made to it:
> > > +
> > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > >  
> > >  ABI Changes
> > >  -----------
> > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > index ca0a108..4bc6da1 100644
> > > --- a/lib/librte_ring/rte_ring.c
> > > +++ b/lib/librte_ring/rte_ring.c
> > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > >  	/* compilation-time checks */
> > >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > -#endif
> > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > >  			  RTE_CACHE_LINE_MASK) != 0);
> > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 72ccca5..04fe667 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -168,7 +168,7 @@ struct rte_ring {
> > >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> > >  		volatile uint32_t head;  /**< Producer head. */
> > >  		volatile uint32_t tail;  /**< Producer tail. */
> > > -	} prod __rte_cache_aligned;
> > > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > 
> > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> > size of 128B
> > 
> Sure.
> 
> However, can you perhaps try a performance test and check to see if
> there is a performance difference between the two values before I change
> it? In my tests I see improved performance by having an extra blank
> cache-line between the producer and consumer data.

Sure. Which test are you running to measure the performance difference?
Is it app/test/test_ring_perf.c?

> 
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 11:35  0%   ` Jerin Jacob
@ 2017-02-28 11:57  0%     ` Bruce Richardson
  2017-02-28 12:08  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-28 11:57 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > Users compiling DPDK should not need to know or care about the arrangement
> > of cachelines in the rte_ring structure. Therefore just remove the build
> > option and set the structures to be always split. For improved
> > performance use 128B rather than 64B alignment since it stops the producer
> > and consumer data being on adjacent cachelines.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  config/common_base                     | 1 -
> >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> >  lib/librte_ring/rte_ring.c             | 2 --
> >  lib/librte_ring/rte_ring.h             | 8 ++------
> >  4 files changed, 8 insertions(+), 9 deletions(-)
> > 
> > diff --git a/config/common_base b/config/common_base
> > index aeee13e..099ffda 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> >  #
> >  CONFIG_RTE_LIBRTE_RING=y
> >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> >  
> >  #
> > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > index e25ea9f..ea45e0c 100644
> > --- a/doc/guides/rel_notes/release_17_05.rst
> > +++ b/doc/guides/rel_notes/release_17_05.rst
> > @@ -110,6 +110,12 @@ API Changes
> >     Also, make sure to start the actual text at the margin.
> >     =========================================================
> >  
> > +* **Reworked rte_ring library**
> > +
> > +  The rte_ring library has been reworked and updated. The following changes
> > +  have been made to it:
> > +
> > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> >  
> >  ABI Changes
> >  -----------
> > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > index ca0a108..4bc6da1 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> >  	/* compilation-time checks */
> >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> >  			  RTE_CACHE_LINE_MASK) != 0);
> > -#ifdef RTE_RING_SPLIT_PROD_CONS
> >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> >  			  RTE_CACHE_LINE_MASK) != 0);
> > -#endif
> >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> >  			  RTE_CACHE_LINE_MASK) != 0);
> >  #ifdef RTE_LIBRTE_RING_DEBUG
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 72ccca5..04fe667 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -168,7 +168,7 @@ struct rte_ring {
> >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> >  		volatile uint32_t head;  /**< Producer head. */
> >  		volatile uint32_t tail;  /**< Producer tail. */
> > -	} prod __rte_cache_aligned;
> > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> 
> I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> size of 128B
> 
Sure.

However, can you perhaps try a performance test and check to see if
there is a performance difference between the two values before I change
it? In my tests I see improved performance by having an extra blank
cache-line between the producer and consumer data.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting Bruce Richardson
@ 2017-02-28 11:35  0%   ` Jerin Jacob
  2017-02-28 11:57  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2017-02-28 11:35 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: olivier.matz, dev

On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> Users compiling DPDK should not need to know or care about the arrangement
> of cachelines in the rte_ring structure. Therefore just remove the build
> option and set the structures to be always split. For improved
> performance use 128B rather than 64B alignment since it stops the producer
> and consumer data being on adjacent cachelines.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  config/common_base                     | 1 -
>  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
>  lib/librte_ring/rte_ring.c             | 2 --
>  lib/librte_ring/rte_ring.h             | 8 ++------
>  4 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/config/common_base b/config/common_base
> index aeee13e..099ffda 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
>  #
>  CONFIG_RTE_LIBRTE_RING=y
>  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
>  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
>  
>  #
> diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> index e25ea9f..ea45e0c 100644
> --- a/doc/guides/rel_notes/release_17_05.rst
> +++ b/doc/guides/rel_notes/release_17_05.rst
> @@ -110,6 +110,12 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =========================================================
>  
> +* **Reworked rte_ring library**
> +
> +  The rte_ring library has been reworked and updated. The following changes
> +  have been made to it:
> +
> +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
>  
>  ABI Changes
>  -----------
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> index ca0a108..4bc6da1 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
>  	/* compilation-time checks */
>  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
>  			  RTE_CACHE_LINE_MASK) != 0);
> -#ifdef RTE_RING_SPLIT_PROD_CONS
>  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
>  			  RTE_CACHE_LINE_MASK) != 0);
> -#endif
>  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
>  			  RTE_CACHE_LINE_MASK) != 0);
>  #ifdef RTE_LIBRTE_RING_DEBUG
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 72ccca5..04fe667 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -168,7 +168,7 @@ struct rte_ring {
>  		uint32_t mask;           /**< Mask (size-1) of ring. */
>  		volatile uint32_t head;  /**< Producer head. */
>  		volatile uint32_t tail;  /**< Producer tail. */
> -	} prod __rte_cache_aligned;
> +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);

I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
size of 128B

> +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);


>  
>  	/** Ring consumer status. */
>  	struct cons {
> @@ -177,11 +177,7 @@ struct rte_ring {
>  		uint32_t mask;           /**< Mask (size-1) of ring. */
>  		volatile uint32_t head;  /**< Consumer head. */
>  		volatile uint32_t tail;  /**< Consumer tail. */
> -#ifdef RTE_RING_SPLIT_PROD_CONS
> -	} cons __rte_cache_aligned;
> -#else
> -	} cons;
> -#endif
> +	} cons __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
>  
>  #ifdef RTE_LIBRTE_RING_DEBUG
>  	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
> -- 
> 2.9.3
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version
  2017-02-22 13:24 20%     ` [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version Christian Ehrhardt
@ 2017-02-28  8:34  4%       ` Jan Blunck
  2017-03-01  9:31  4%         ` Christian Ehrhardt
  0 siblings, 1 reply; 200+ results
From: Jan Blunck @ 2017-02-28  8:34 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: dev, cjcollier @ linuxfoundation . org, ricardo.salveti, Luca Boccassi

On Wed, Feb 22, 2017 at 2:24 PM, Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -40,6 +40,12 @@ EXTLIB_BUILD ?= n
>  # VPATH contains at least SRCDIR
>  VPATH += $(SRCDIR)
>
> +ifneq ($(CONFIG_RTE_MAJOR_ABI),)
> +ifneq ($(LIBABIVER),)
> +LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
> +endif
> +endif
> +
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
>  LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
>  ifeq ($(EXTLIB_BUILD),n)
> @@ -156,11 +162,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
>         @[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
>         $(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> -ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
> -       $(Q)ln -s -f $< $(basename $(basename $@))
> -else
> -       $(Q)ln -s -f $< $(basename $@)
> -endif
> +       $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
>  endif
>

In case CONFIG_RTE_NEXT_ABI=y is set this is actually generating
shared objects with suffix:

  .so.$(CONFIG_RTE_MAJOR_ABI).1

I don't think that this is the intention.

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 04/15] net/avp: add PMD version map file
  @ 2017-02-26 19:08  3%   ` Allain Legacy
    1 sibling, 0 replies; 200+ results
From: Allain Legacy @ 2017-02-26 19:08 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev

Adds a default ABI version file for the AVP PMD.

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
---
 drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/net/avp/rte_pmd_avp_version.map

diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
new file mode 100644
index 0000000..af8f3f4
--- /dev/null
+++ b/drivers/net/avp/rte_pmd_avp_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+
+    local: *;
+};
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 04/16] net/avp: add PMD version map file
  @ 2017-02-25  1:23  3% ` Allain Legacy
    1 sibling, 0 replies; 200+ results
From: Allain Legacy @ 2017-02-25  1:23 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev

Adds a default ABI version file for the AVP PMD.

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
---
 drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/net/avp/rte_pmd_avp_version.map

diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
new file mode 100644
index 0000000..af8f3f4
--- /dev/null
+++ b/drivers/net/avp/rte_pmd_avp_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+
+    local: *;
+};
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 2/2] ethdev: add hierarchical scheduler API
  @ 2017-02-24 16:28  1% ` Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2017-02-24 16:28 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, jerin.jacob, hemant.agrawal

This patch introduces the generic ethdev API for the hierarchical scheduler
capability.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow approach)
- Capability query API per port, per hierarchy level and per hierarchy node
- Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ),
  Weighted Round Robin (WRR)
- Traffic shaping: single/dual rate, private (per node) and shared (by multiple
  nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Changes in v2:
- Implemented feedback from Hemant [4]
- Improvements on the capability API
	- Added capability API for hierarchy level
	- Merged stats capability into the capability API
	- Added dynamic updates
	- Added non-leaf/leaf union to the node capability structure
	- Renamed sp_priority_min to sp_n_priorities_max, added clarifications
	- Fixed description for sp_n_children_max
- Clarified and enforced rule on node ID range for leaf and non-leaf nodes
	- Added API functions to get node type (i.e. leaf/non-leaf):
	  get_leaf_nodes(), node_type_get()
- Added clarification for the root node: its creation, its parent, its role
	- Macro NODE_ID_NULL as root node's parent
	- Description of the node_add() and node_parent_update() API functions
- Added clarification for the first time add vs. subsequent updates rule
	- Cleaned up the description for the node_add() function
- Statistics API improvements
	- Merged stats capability into the capability API
	- Added API function node_stats_update()
	- Added more stats per packet color
- Added more error types
- Fixed small Doxygen style issues

Changes in v1 (since RFC [1]):
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below, hopefully
  nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated object
      IDs. IMO the choice to have application-generated object IDs adds marginal
      complexity to the driver (search ID function required), but it provides
      huge simplification for the application. The app does not need to worry
      about building & managing tree-like structure for storing driver-generated
      object IDs, the app can use its own convention for node IDs depending on
      the specific hierarchy that it needs. Trivial example: identify all
      level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
      on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
      310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
      112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
      the other related simplification that was implemented: leaf nodes now have
      predefined IDs that are the same with their Ethernet TX queue ID (
      therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the shaper
  profile as part of node API (no shaper ID needed for private shapers), while
  the shared shapers are configured outside of the node API using shaper profile
  and communicated to the node using shared shaper ID. So there is no
  configuration overhead for shared shapers if the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet TX
  queue ID (therefore no translation is required for leaf nodes). This is also
  used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause (same
  as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
  based on IP packet bytes)

Next steps:
- SW fallback based on librte_sched library (to be later introduced by
  standalone patch set)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
[4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_scheddev.c        |  781 ++++++++++++++++++
 lib/librte_ether/rte_scheddev.h        | 1416 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_scheddev_driver.h |  365 ++++++++
 6 files changed, 2600 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_scheddev.c
 create mode 100644 lib/librte_ether/rte_scheddev.h
 create mode 100644 lib/librte_ether/rte_scheddev_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 24e0eff..8a8719f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -247,6 +247,10 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+SchedDev API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+F: lib/librte_ether/rte_scheddev*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 1d095a9..7e0527f 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_scheddev.c
 
 #
 # Export include files
@@ -54,6 +55,8 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_scheddev.h
+SYMLINK-y-include += rte_scheddev_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 637317c..4d67eee 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -159,5 +159,35 @@ DPDK_17.05 {
 	global:
 
 	rte_eth_dev_capability_ops_get;
+	rte_scheddev_get_leaf_nodes;
+	rte_scheddev_node_type_get;
+	rte_scheddev_capabilities_get;
+	rte_scheddev_level_capabilities_get;
+	rte_scheddev_node_capabilities_get;
+	rte_scheddev_wred_profile_add;
+	rte_scheddev_wred_profile_delete;
+	rte_scheddev_shared_wred_context_add_update;
+	rte_scheddev_shared_wred_context_delete;
+	rte_scheddev_shaper_profile_add;
+	rte_scheddev_shaper_profile_delete;
+	rte_scheddev_shared_shaper_add_update;
+	rte_scheddev_shared_shaper_delete;
+	rte_scheddev_node_add;
+	rte_scheddev_node_delete;
+	rte_scheddev_node_suspend;
+	rte_scheddev_node_resume;
+	rte_scheddev_hierarchy_set;
+	rte_scheddev_node_parent_update;
+	rte_scheddev_node_shaper_update;
+	rte_scheddev_node_shared_shaper_update;
+	rte_scheddev_node_stats_update;
+	rte_scheddev_node_scheduling_mode_update;
+	rte_scheddev_node_cman_update;
+	rte_scheddev_node_wred_context_update;
+	rte_scheddev_node_shared_wred_context_update;
+	rte_scheddev_node_stats_read;
+	rte_scheddev_mark_vlan_dei;
+	rte_scheddev_mark_ip_ecn;
+	rte_scheddev_mark_ip_dscp;
 
 } DPDK_17.02;
diff --git a/lib/librte_ether/rte_scheddev.c b/lib/librte_ether/rte_scheddev.c
new file mode 100644
index 0000000..d9c7dfe
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev.c
@@ -0,0 +1,781 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_scheddev_driver.h"
+#include "rte_scheddev.h"
+
+/* Get generic scheduler operations structure from a port. */
+const struct rte_scheddev_ops *
+rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_scheddev_error_set(error,
+			ENODEV,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->cap_ops_get == NULL) ||
+		(dev->dev_ops->cap_ops_get(dev, RTE_ETH_CAPABILITY_SCHED,
+		&ops) != 0) || (ops == NULL)) {
+		rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+/* Get number of leaf nodes */
+int
+rte_scheddev_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (n_leaf_nodes == NULL) {
+		rte_scheddev_error_set(error,
+			EINVAL,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(EINVAL));
+		return -rte_errno;
+	}
+
+	*n_leaf_nodes = dev->data->nb_tx_queues;
+	return 0;
+}
+
+/* Check node ID type (leaf or non-leaf) */
+int
+rte_scheddev_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_type_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_type_get(dev, node_id, is_leaf, error);
+}
+
+/* Get capabilities */
+int rte_scheddev_capabilities_get(uint8_t port_id,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->capabilities_get(dev, cap, error);
+}
+
+/* Get level capabilities */
+int rte_scheddev_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_scheddev_level_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->level_capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->level_capabilities_get(dev, level_id, cap, error);
+}
+
+/* Get node capabilities */
+int rte_scheddev_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_capabilities_get(dev, node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_scheddev_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->wred_profile_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->wred_profile_add(dev, wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_scheddev_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->wred_profile_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->wred_profile_delete(dev, wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_wred_context_add_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_wred_context_add_update(dev, shared_wred_context_id,
+		wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_scheddev_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_wred_context_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_wred_context_delete(dev, shared_wred_context_id,
+		error);
+}
+
+/* Add shaper profile */
+int rte_scheddev_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shaper_profile_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shaper_profile_add(dev, shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_scheddev_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shaper_profile_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shaper_profile_delete(dev, shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_scheddev_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_shaper_add_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_shaper_add_update(dev, shared_shaper_id,
+		shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_scheddev_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_shaper_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_shaper_delete(dev, shared_shaper_id, error);
+}
+
+/* Add node to port scheduler hierarchy */
+int rte_scheddev_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_add(dev, node_id, parent_node_id, priority, weight,
+		params, error);
+}
+
+/* Delete node from scheduler hierarchy */
+int rte_scheddev_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_delete(dev, node_id, error);
+}
+
+/* Suspend node */
+int rte_scheddev_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_suspend == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_suspend(dev, node_id, error);
+}
+
+/* Resume node */
+int rte_scheddev_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_resume == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_resume(dev, node_id, error);
+}
+
+/* Set the initial port scheduler hierarchy */
+int rte_scheddev_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->hierarchy_set == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->hierarchy_set(dev, clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_scheddev_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_parent_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_parent_update(dev, node_id, parent_node_id, priority,
+		weight, error);
+}
+
+/* Update node private shaper */
+int rte_scheddev_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shaper_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shaper_update(dev, node_id, shaper_profile_id,
+		error);
+}
+
+/* Update node shared shapers */
+int rte_scheddev_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shared_shaper_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shared_shaper_update(dev, node_id, shared_shaper_id,
+		add, error);
+}
+
+/* Update node stats */
+int rte_scheddev_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_update(dev, node_id, stats_mask, error);
+}
+
+/* Update scheduling mode */
+int rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_scheduling_mode_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_scheduling_mode_update(dev, node_id,
+		scheduling_mode_per_priority, n_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_scheddev_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_cman_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_cman_update(dev, node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_scheddev_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_wred_context_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_wred_context_update(dev, node_id, wred_profile_id,
+		error);
+}
+
+/* Update node shared WRED context */
+int rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shared_wred_context_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shared_wred_context_update(dev, node_id,
+		shared_wred_context_id, add, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_scheddev_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_read == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_read(dev, node_id, stats, stats_mask, clear,
+		error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_scheddev_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_vlan_dei == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_vlan_dei(dev, mark_green, mark_yellow, mark_red,
+		error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_scheddev_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_ip_ecn == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_ip_ecn(dev, mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_scheddev_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_ip_dscp == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_ip_dscp(dev, mark_green, mark_yellow, mark_red,
+		error);
+}
diff --git a/lib/librte_ether/rte_scheddev.h b/lib/librte_ether/rte_scheddev.h
new file mode 100644
index 0000000..1741f7a
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev.h
@@ -0,0 +1,1416 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_SCHEDDEV_H__
+#define __INCLUDE_RTE_SCHEDDEV_H__
+
+/**
+ * @file
+ * RTE Generic Hierarchical Scheduler API
+ *
+ * This interface provides the ability to configure the hierarchical scheduler
+ * feature in a generic way.
+ */
+
+#include <stdint.h>
+
+#include <rte_red.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Ethernet framing overhead
+ *
+ * Overhead fields per Ethernet frame:
+ * 1. Preamble:                                            7 bytes;
+ * 2. Start of Frame Delimiter (SFD):                      1 byte;
+ * 3. Inter-Frame Gap (IFG):                              12 bytes.
+ */
+#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD                  20
+
+/**
+ * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
+ * is generated and added at the end of the Ethernet frame on TX side without
+ * any SW intervention.
+ */
+#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**< Invalid WRED profile ID */
+#define RTE_SCHEDDEV_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**< Invalid shaper profile ID */
+#define RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**< Node ID for the parent of the root node */
+#define RTE_SCHEDDEV_NODE_ID_NULL                          UINT32_MAX
+
+/**
+ * Color
+ */
+enum rte_scheddev_color {
+	e_RTE_SCHEDDEV_GREEN = 0, /**< Green */
+	e_RTE_SCHEDDEV_YELLOW, /**< Yellow */
+	e_RTE_SCHEDDEV_RED, /**< Red */
+	e_RTE_SCHEDDEV_COLORS /**< Number of colors */
+};
+
+/**
+ * Node statistics counter type
+ */
+enum rte_scheddev_stats_type {
+	/**< Number of packets scheduled from current node. */
+	RTE_SCHEDDEV_STATS_N_PKTS = 1 << 0,
+
+	/**< Number of bytes scheduled from current node. */
+	RTE_SCHEDDEV_STATS_N_BYTES = 1 << 1,
+
+	/**< Number of green packets dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
+
+	/**< Number of yellow packets dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
+
+	/**< Number of red packets dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_PKTS_RED_DROPPED = 1 << 4,
+
+	/**< Number of green bytes dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
+
+	/**< Number of yellow bytes dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
+
+	/**< Number of red bytes dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_BYTES_RED_DROPPED = 1 << 7,
+
+	/**< Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_SCHEDDEV_STATS_N_PKTS_QUEUED = 1 << 8,
+
+	/**< Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_SCHEDDEV_STATS_N_BYTES_QUEUED = 1 << 9,
+};
+
+/**
+ * Node statistics counters
+ */
+struct rte_scheddev_node_stats {
+	/**< Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/**< Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/**< Statistics counters for leaf nodes only. */
+	struct {
+		/**< Number of packets dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_pkts_dropped[e_RTE_SCHEDDEV_COLORS];
+
+		/**< Number of bytes dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_bytes_dropped[e_RTE_SCHEDDEV_COLORS];
+
+		/**< Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/**< Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Scheduler dynamic updates
+ */
+enum rte_scheddev_dynamic_update_type {
+	/**< Dynamic parent node update. */
+	RTE_SCHEDDEV_UPDATE_NODE_PARENT = 1 << 0,
+
+	/**< Dynamic node add/delete. */
+	RTE_SCHEDDEV_UPDATE_NODE_ADD_DELETE = 1 << 1,
+
+	/**< Suspend/resume nodes. */
+	RTE_SCHEDDEV_UPDATE_NODE_SUSPEND_RESUME = 1 << 2,
+
+	/**< Dynamic switch between WFQ and WRR per node SP priority level. */
+	RTE_SCHEDDEV_UPDATE_NODE_SCHEDULING_MODE = 1 << 3,
+
+	/**< Dynamic update of the set of enabled stats counter types. */
+	RTE_SCHEDDEV_UPDATE_NODE_STATS = 1 << 4,
+
+	/**< Dynamic update of congestion management mode for leaf nodes. */
+	RTE_SCHEDDEV_UPDATE_NODE_CMAN = 1 << 5,
+};
+
+/**
+ * Scheduler node capabilities
+ */
+struct rte_scheddev_node_capabilities {
+	/**< Private shaper support. */
+	int shaper_private_supported;
+
+	/**< Dual rate shaping support for private shaper. Valid only when
+	 * private shaper is supported.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/**< Minimum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of supported shared shapers. The value of zero
+	 * indicates that shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Mask of supported statistics counter types. */
+	uint64_t stats_mask;
+
+	union {
+		/**< Items valid only for non-leaf nodes. */
+		struct {
+			/**< Maximum number of children nodes. */
+			uint32_t n_children_max;
+
+			/**< Maximum number of supported priority levels. The
+			 * value of zero is invalid. The value of 1 indicates
+			 * that only priority 0 is supported, which essentially
+			 * means that Strict Priority (SP) algorithm is not
+			 * supported.
+			 */
+			uint32_t sp_n_priorities_max;
+
+			/**< Maximum number of sibling nodes that can have the
+			 * same priority at any given time. The value of zero is
+			 * invalid. The value of 1 indicates that WFQ/WRR
+			 * algorithms are not supported. The maximum value is
+			 * *n_children_max*.
+			 */
+			uint32_t sp_n_children_max;
+
+			/**< WFQ algorithm support. */
+			int wfq_supported;
+
+			/**< WRR algorithm support. */
+			int wrr_supported;
+
+			/**< Maximum WFQ/WRR weight. */
+			uint32_t wfq_wrr_weight_max;
+		} nonleaf;
+
+		/**< Items valid only for leaf nodes. */
+		struct {
+			/**< Head drop algorithm support. */
+			int cman_head_drop_supported;
+
+			/**< Private WRED context support. */
+			int cman_wred_context_private_supported;
+
+			/**< Maximum number of shared WRED contexts supported.
+			 * The value of zero indicates that shared WRED contexts
+			 * are not supported.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+		} leaf;
+	};
+};
+
+/**
+ * Scheduler level capabilities
+ */
+struct rte_scheddev_level_capabilities {
+	/**< Maximum number of nodes for the current hierarchy level. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of non-leaf nodes for the current hierarchy level.
+	 * The value of 0 indicates that current level only supports leaf nodes.
+	 * The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_nonleaf_max;
+
+	/**< Maximum number of leaf nodes for the current hierarchy level. The
+	 * value of 0 indicates that current level only supports non-leaf nodes.
+	 * The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_leaf_max;
+
+	/**< Summary of node-level capabilities across all the non-leaf nodes
+	 * of the current hierarchy level. Valid only when *n_nodes_nonleaf_max*
+	 * is greater than 0.
+	 */
+	struct rte_scheddev_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all the leaf nodes of the
+	 * current hierarchy level. Valid only when *n_nodes_leaf_max* is
+	 * greater than 0.
+	 */
+	struct rte_scheddev_node_capabilities leaf;
+};
+
+/**
+ * Scheduler capabilities
+ */
+struct rte_scheddev_capabilities {
+	/**< Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/**< Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resource between private and
+	 * shared shapers, it is typically equal to the sum between
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/**< Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have the private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/**< Maximum number of shared shapers. The value of zero indicates that
+	 * shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Maximum number of nodes that can share the same shared shaper. Only
+	 * valid when shared shapers are supported.
+	 */
+	uint32_t shaper_shared_n_nodes_max;
+
+	/**< Maximum number of shared shapers that can be configured with dual
+	 * rate shaping. The value of zero indicates that dual rate shaping
+	 * support is not available for shared shapers.
+	 */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/**< Minimum committed/peak rate (bytes per second) for shared
+	 * shapers. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for shared
+	 * shaper. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_max;
+
+	/**< Minimum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_min;
+
+	/**< Maximum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_max;
+
+	/**< Maximum number of WRED contexts. */
+	uint32_t cman_wred_context_n_max;
+
+	/**< Maximum number of private WRED contexts. Indicates the maximum
+	 * number of leaf nodes that can concurrently have the private WRED
+	 * context enabled.
+	 */
+	uint32_t cman_wred_context_private_n_max;
+
+	/**< Maximum number of shared WRED contexts. The value of zero indicates
+	 * that shared WRED contexts are not supported.
+	 */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/**< Maximum number of leaf nodes that can share the same WRED context.
+	 * Only valid when shared WRED contexts are supported.
+	 */
+	uint32_t cman_wred_context_shared_n_nodes_max;
+
+	/**< Support for VLAN DEI packet marking. */
+	int mark_vlan_dei_supported;
+
+	/**< Support for IPv4/IPv6 ECN marking of TCP packets. */
+	int mark_ip_ecn_tcp_supported;
+
+	/**< Support for IPv4/IPv6 ECN marking of SCTP packets. */
+	int mark_ip_ecn_sctp_supported;
+
+	/**< Support for IPv4/IPv6 DSCP packet marking. */
+	int mark_ip_dscp_supported;
+
+	/**< Set of supported dynamic update operations
+	 * (see enum rte_scheddev_dynamic_update_type).
+	 */
+	uint64_t dynamic_update_mask;
+
+	/**< Summary of node-level capabilities across all non-leaf nodes. */
+	struct rte_scheddev_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all leaf nodes. */
+	struct rte_scheddev_node_capabilities leaf;
+};
+
+/**
+ * Congestion management (CMAN) mode
+ *
+ * This is used for controlling the admission of packets into a packet queue or
+ * group of packet queues on congestion. On request of writing a new packet
+ * into the current queue while the queue is full, the *tail drop* algorithm
+ * drops the new packet while leaving the queue unmodified, as opposed to *head
+ * drop* algorithm, which drops the packet at the head of the queue (the oldest
+ * packet waiting in the queue) and admits the new packet at the tail of the
+ * queue.
+ *
+ * The *Random Early Detection (RED)* algorithm works by proactively dropping
+ * more and more input packets as the queue occupancy builds up. When the queue
+ * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+ * RED* algorithm uses a separate set of RED thresholds for each packet color.
+ */
+enum rte_scheddev_cman_mode {
+	RTE_SCHEDDEV_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_SCHEDDEV_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_SCHEDDEV_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+ * WRED profile
+ *
+ * Multiple WRED contexts can share the same WRED profile. Each leaf node with
+ * WRED enabled as its congestion management mode has zero or one private WRED
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * WRED contexts (multiple leaf nodes use the same WRED context). A private
+ * WRED context is used to perform congestion management for a single leaf
+ * node, while a shared WRED context is used to perform congestion management
+ * for a group of leaf nodes.
+ */
+struct rte_scheddev_wred_params {
+	/**< One set of RED parameters per packet color */
+	struct rte_red_params red_params[e_RTE_SCHEDDEV_COLORS];
+};
+
+/**
+ * Token bucket
+ */
+struct rte_scheddev_token_bucket {
+	/**< Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/**< Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+ * Shaper (rate limiter) profile
+ *
+ * Multiple shaper instances can share the same shaper profile. Each node has
+ * zero or one private shaper (only one node using it) and/or zero, one or
+ * several shared shapers (multiple nodes use the same shaper instance).
+ * A private shaper is used to perform traffic shaping for a single node, while
+ * a shared shaper is used to perform traffic shaping for a group of nodes.
+ *
+ * Single rate shapers use a single token bucket. A single rate shaper can be
+ * configured by setting the rate of the committed bucket to zero, which
+ * effectively disables this bucket. The peak bucket is used to limit the rate
+ * and the burst size for the current shaper.
+ *
+ * Dual rate shapers use both the committed and the peak token buckets. The
+ * rate of the committed bucket has to be less than or equal to the rate of the
+ * peak bucket.
+ */
+struct rte_scheddev_shaper_params {
+	/**< Committed token bucket */
+	struct rte_scheddev_token_bucket committed;
+
+	/**< Peak token bucket */
+	struct rte_scheddev_token_bucket peak;
+
+	/**< Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+ * Node parameters
+ *
+ * Each scheduler hierarchy node has multiple inputs (children nodes of the
+ * current parent node) and a single output (which is input to its parent
+ * node). The current node arbitrates its inputs using Strict Priority (SP),
+ * Weighted Fair Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to
+ * schedule input packets on its output while observing its shaping (rate
+ * limiting) constraints.
+ *
+ * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
+ * approximations of the ideal of WFQ and are assimilated to WFQ, although
+ * an associated implementation-dependent trade-off on accuracy, performance
+ * and resource usage might exist.
+ *
+ * Children nodes with different priorities are scheduled using the SP
+ * algorithm, based on their priority, with zero (0) as the highest priority.
+ * Children with same priority are scheduled using the WFQ or WRR algorithm,
+ * based on their weight, which is relative to the sum of the weights of all
+ * siblings with same priority, with one (1) as the lowest weight.
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
+ * where N is the number of TX queues configured for the current Ethernet port.
+ * The non-leaf nodes have their IDs generated by the application.
+ */
+struct rte_scheddev_node_params {
+	/**< Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/**< User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	/**< Mask of statistics counter types to be enabled for this node. This
+	 * needs to be a subset of the statistics counter types available for
+	 * the current node. Any statistics counter type not included in this
+	 * set is to be disabled for the current node.
+	 */
+	uint64_t stats_mask;
+
+	union {
+		/**< Parameters only valid for non-leaf nodes. */
+		struct {
+			/**< For each priority, indicates whether the children
+			 * nodes sharing the same priority are to be scheduled
+			 * by WFQ or by WRR. When NULL, it indicates that WFQ
+			 * is to be used for all priorities. When non-NULL, it
+			 * points to a pre-allocated array of *n_priority*
+			 * elements, with a non-zero value element indicating
+			 * WFQ and a zero value element for WRR.
+			 */
+			int *scheduling_mode_per_priority;
+
+			/**< Number of priorities. */
+			uint32_t n_priorities;
+		} nonleaf;
+
+		/**< Parameters only valid for leaf nodes. */
+		struct {
+			/**< Congestion management mode */
+			enum rte_scheddev_cman_mode cman;
+
+			/**< WRED parameters (valid when *cman* is WRED). */
+			struct {
+				/**< WRED profile for private WRED context. */
+				uint32_t wred_profile_id;
+
+				/**< User allocated array of shared WRED context
+				 * IDs. The absence of a private WRED context
+				 * for current leaf node is indicated by value
+				 * RTE_SCHEDDEV_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/**< Number of shared WRED context IDs in the
+				 * *shared_wred_context_id* array.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_scheddev_error::cause.
+ */
+enum rte_scheddev_error_type {
+	RTE_SCHEDDEV_ERROR_TYPE_NONE, /**< No error. */
+	RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_SCHEDDEV_ERROR_TYPE_CAPABILITIES,
+	RTE_SCHEDDEV_ERROR_TYPE_LEVEL_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARENT_NODE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PRIORITY,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_WEIGHT,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_STATS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_N_PRIORITIES,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_CMAN,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_scheddev_error {
+	enum rte_scheddev_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Scheduler get number of leaf nodes
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the set of leaf nodes is predefined, their number is always equal
+ * to N (where N is the number of TX queues configured for the current port) and
+ * their IDs are 0 .. (N-1).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param n_leaf_nodes
+ *   Number of leaf nodes for the current port.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node type (i.e. leaf or non-leaf) get
+ *
+ * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is the
+ * number of TX queues of the current Ethernet port. The non-leaf nodes have
+ * their IDs generated by the application outside of the above range, which is
+ * reserved for leaf nodes.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID value. Needs to be valid.
+ * @param is_leaf
+ *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Scheduler capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_capabilities_get(uint8_t port_id,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler level capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param level_id
+ *   The scheduler hierarchy level identifier. The value of 0 identifies the
+ *   level of the root node.
+ * @param cap
+ *   Scheduler level capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_scheddev_level_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param cap
+ *   Scheduler node capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is currently
+ * at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several scheduler hierarchy
+ * leaf nodes configured to use WRED as the congestion management mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. scheduler hierarchy leaf node) of this
+ * shared WRED context.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is no
+ * longer using the shaper profile previously assigned to it and is updated to
+ * use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. scheduler hierarchy node) of this shared
+ * shaper.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node add
+ *
+ * Create new node and connect it as child of an existing node. The new node is
+ * further identified by *node_id*, which needs to be unused by any of the
+ * existing nodes. The parent node is identified by *parent_node_id*, which
+ * needs to be the valid ID of an existing non-leaf node. The parent node is
+ * going to use the provided SP *priority* and WFQ/WRR *weight* to schedule its
+ * new child node.
+ *
+ * This function has to be called for both leaf and non-leaf nodes. In the case
+ * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
+ * the number of configured TX queues of the current port), the leaf node is
+ * configured rather than created (as the set of leaf nodes is predefined) and
+ * it is also connected as child of an existing node.
+ *
+ * The first node that is added becomes the root node and all the nodes that are
+ * subsequently added have to be added as descendants of the root node. The
+ * parent of the root node has to be specified as RTE_SCHEDDEV_NODE_ID_NULL and
+ * there can only be one node with this parent ID (i.e. the root node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has at
+ * least one user (i.e. child node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node suspend
+ *
+ * Suspend an existing node.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node resume
+ *
+ * Resume an existing node that was previously suspended.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler hierarchy set
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the scheduler start-up hierarchy.
+ *
+ * This function fails when the currently configured scheduler hierarchy is not
+ * supported by the Ethernet port, in which case the user can abort or try out
+ * another hierarchy configuration (e.g. a hierarchy with less leaf nodes),
+ * which can be build from scratch (when *clear_on_fail* is enabled) or by
+ * modifying the existing hierarchy configuration (when *clear_on_fail* is
+ * disabled).
+ *
+ * Note that, even when the configured scheduler hierarchy is supported (so this
+ * function is successful), the Ethernet port start might still fail due to e.g.
+ * not enough memory being available in the system, etc.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node parent update
+ *
+ * The parent of the root node cannot be changed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node private shaper update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node shared shapers update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node enabled statistics counters update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats_mask
+ *   Mask of statistics counter types to be enabled for the current node. This
+ *   needs to be a subset of the statistics counter types available for the
+ *   current node. Any statistics counter type not included in this set is to be
+ *   disabled for the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node scheduling mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param scheduling_mode_per_priority
+ *   For each priority, indicates whether the children nodes sharing the same
+ *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates that
+ *   WFQ is to be used for all priorities. When non-NULL, it points to a
+ *   pre-allocated array of *n_priority* elements, with a non-zero value element
+ *   indicating WFQ and a zero value element for WRR.
+ * @param n_priorities
+ *   Number of priorities.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node congestion management mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param cman
+ *   Congestion management mode.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node private WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_SCHEDDEV_WRED_PROFILE_ID_NONE, with
+ *   the latter disabling the private WRED context of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node shared WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared WRED context to current node or to
+ *   zero to delete this shared WRED context from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node statistics counters read
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param stats_mask
+ *   When non-NULL, it contains the mask of statistics counter types that are
+ *   currently enabled for this node, indicating which of the counters retrieved
+ *   with the *stats* structure are valid.
+ * @param clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read, otherwise
+ *   the statistics counters are left untouched.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion Notification
+ * (ECN) field (2 bits). The DSCP field is typically used to encode the traffic
+ * class and/or drop priority (RFC 2597), while the ECN field is used by RFC
+ * 3168 to implement a congestion notification mechanism to be leveraged by
+ * transport layer protocols such as TCP and SCTP that have congestion control
+ * mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10 (values
+ * indicating that source endpoint is ECN-capable) to 2'b11 (meaning that
+ * congestion is experienced). The destination endpoint can use the ECN-Echo
+ * (ECE) TCP flag to relay the congestion indication back to the source
+ * endpoint, which acknowledges it back to the destination endpoint with the
+ * Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ *                       Class 1    Class 2    Class 3    Class 4
+ *                     +----------+----------+----------+----------+
+ *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
+ *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
+ *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
+ *                     +----------+----------+----------+----------+
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2, as
+ * well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_SCHEDDEV_H__ */
diff --git a/lib/librte_ether/rte_scheddev_driver.h b/lib/librte_ether/rte_scheddev_driver.h
new file mode 100644
index 0000000..d245aea
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev_driver.h
@@ -0,0 +1,365 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
+#define __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Hierarchical Scheduler API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_scheddev.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int (*rte_scheddev_node_type_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node type get */
+
+typedef int (*rte_scheddev_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler capabilities get */
+
+typedef int (*rte_scheddev_level_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t level_id,
+	struct rte_scheddev_level_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler level capabilities get */
+
+typedef int (*rte_scheddev_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node capabilities get */
+
+typedef int (*rte_scheddev_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler WRED profile add */
+
+typedef int (*rte_scheddev_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler WRED profile delete */
+
+typedef int (*rte_scheddev_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared WRED context add */
+
+typedef int (*rte_scheddev_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared WRED context delete */
+
+typedef int (*rte_scheddev_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shaper profile add */
+
+typedef int (*rte_scheddev_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shaper profile delete */
+
+typedef int (*rte_scheddev_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared shaper add/update */
+
+typedef int (*rte_scheddev_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared shaper delete */
+
+typedef int (*rte_scheddev_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node add */
+
+typedef int (*rte_scheddev_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node delete */
+
+typedef int (*rte_scheddev_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node suspend */
+
+typedef int (*rte_scheddev_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node resume */
+
+typedef int (*rte_scheddev_hierarchy_set_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler hierarchy set */
+
+typedef int (*rte_scheddev_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node parent update */
+
+typedef int (*rte_scheddev_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node shaper update */
+
+typedef int (*rte_scheddev_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node shaper update */
+
+typedef int (*rte_scheddev_node_stats_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node stats update */
+
+typedef int (*rte_scheddev_node_scheduling_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node scheduling mode update */
+
+typedef int (*rte_scheddev_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node congestion management mode update */
+
+typedef int (*rte_scheddev_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node WRED context update */
+
+typedef int (*rte_scheddev_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node WRED context update */
+
+typedef int (*rte_scheddev_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler read stats counters for specific node */
+
+typedef int (*rte_scheddev_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - VLAN DEI */
+
+typedef int (*rte_scheddev_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - IPv4/IPv6 ECN */
+
+typedef int (*rte_scheddev_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - IPv4/IPv6 DSCP */
+
+struct rte_scheddev_ops {
+	/** Scheduler node type get */
+	rte_scheddev_node_type_get_t node_type_get;
+
+	/** Scheduler capabilities_get */
+	rte_scheddev_capabilities_get_t capabilities_get;
+	/** Scheduler level capabilities_get */
+	rte_scheddev_level_capabilities_get_t level_capabilities_get;
+	/** Scheduler node capabilities get */
+	rte_scheddev_node_capabilities_get_t node_capabilities_get;
+
+	/** Scheduler WRED profile add */
+	rte_scheddev_wred_profile_add_t wred_profile_add;
+	/** Scheduler WRED profile delete */
+	rte_scheddev_wred_profile_delete_t wred_profile_delete;
+	/** Scheduler shared WRED context add/update */
+	rte_scheddev_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Scheduler shared WRED context delete */
+	rte_scheddev_shared_wred_context_delete_t
+		shared_wred_context_delete;
+
+	/** Scheduler shaper profile add */
+	rte_scheddev_shaper_profile_add_t shaper_profile_add;
+	/** Scheduler shaper profile delete */
+	rte_scheddev_shaper_profile_delete_t shaper_profile_delete;
+	/** Scheduler shared shaper add/update */
+	rte_scheddev_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Scheduler shared shaper delete */
+	rte_scheddev_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Scheduler node add */
+	rte_scheddev_node_add_t node_add;
+	/** Scheduler node delete */
+	rte_scheddev_node_delete_t node_delete;
+	/** Scheduler node suspend */
+	rte_scheddev_node_suspend_t node_suspend;
+	/** Scheduler node resume */
+	rte_scheddev_node_resume_t node_resume;
+	/** Scheduler hierarchy set */
+	rte_scheddev_hierarchy_set_t hierarchy_set;
+
+	/** Scheduler node parent update */
+	rte_scheddev_node_parent_update_t node_parent_update;
+	/** Scheduler node shaper update */
+	rte_scheddev_node_shaper_update_t node_shaper_update;
+	/** Scheduler node shared shaper update */
+	rte_scheddev_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Scheduler node stats update */
+	rte_scheddev_node_stats_update_t node_stats_update;
+	/** Scheduler node scheduling mode update */
+	rte_scheddev_node_scheduling_mode_update_t node_scheduling_mode_update;
+	/** Scheduler node congestion management mode update */
+	rte_scheddev_node_cman_update_t node_cman_update;
+	/** Scheduler node WRED context update */
+	rte_scheddev_node_wred_context_update_t node_wred_context_update;
+	/** Scheduler node shared WRED context update */
+	rte_scheddev_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+	/** Scheduler read statistics counters for current node */
+	rte_scheddev_node_stats_read_t node_stats_read;
+
+	/** Scheduler packet marking - VLAN DEI */
+	rte_scheddev_mark_vlan_dei_t mark_vlan_dei;
+	/** Scheduler packet marking - IPv4/IPv6 ECN */
+	rte_scheddev_mark_ip_ecn_t mark_ip_ecn;
+	/** Scheduler packet marking - IPv4/IPv6 DSCP */
+	rte_scheddev_mark_ip_dscp_t mark_ip_dscp;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param error
+ *   Pointer to error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error type.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_scheddev_error_set(struct rte_scheddev_error *error,
+		   int code,
+		   enum rte_scheddev_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_scheddev_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic hierarchical scheduler operations structure from a port
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param error
+ *   Error details
+ *
+ * @return
+ *   The hierarchical scheduler operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_scheddev_ops *
+rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_SCHEDDEV_DRIVER_H__ */
-- 
2.5.0

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-21 10:27  0%     ` Hunt, David
@ 2017-02-24 14:03  0%     ` Bruce Richardson
  2017-03-01  9:55  0%       ` Hunt, David
  2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-24 14:03 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:37AM +0000, David Hunt wrote:
> Move files out of the way so that we can replace with new
> versions of the distributor libtrary. Files are named in
> such a way as to match the symbol versioning that we will
> apply for backward ABI compatibility.
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  app/test/test_distributor.c                  |   2 +-
>  app/test/test_distributor_perf.c             |   2 +-
>  examples/distributor/main.c                  |   2 +-
>  lib/librte_distributor/Makefile              |   4 +-
>  lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
>  lib/librte_distributor/rte_distributor.h     | 247 --------------
>  lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++

Rather than changing the unit tests and example applications, I think
this patch would be better with a new rte_distributor.h file which
simply does "#include  <rte_distributor_v20.h>". Alternatively, I
recently upstreamed a patch, which went into 17.02, to allow symlinks in
the folder so you could create a symlink to the renamed file.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements
  2017-02-21  3:17  3% ` [dpdk-dev] [PATCH v7 0/17] distributor library " David Hunt
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
@ 2017-02-24 14:01  0%   ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-24 14:01 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:36AM +0000, David Hunt wrote:
> This patch aims to improve the throughput of the distributor library.
> 
> It uses a similar handshake mechanism to the previous version of
> the library, in that bits are used to indicate when packets are ready
> to be sent to a worker and ready to be returned from a worker. One main
> difference is that instead of sending one packet in a cache line, it makes
> use of the 7 free spaces in the same cache line in order to send up to
> 8 packets at a time to/from a worker.
> 
> The flow matching algorithm has had significant re-work, and now keeps an
> array of inflight flows and an array of backlog flows, and matches incoming
> flows to the inflight/backlog flows of all workers so that flow pinning to
> workers can be maintained.
> 
> The Flow Match algorithm has both scalar and a vector versions, and a
> function pointer is used to select the post appropriate function at run time,
> depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
> the scalar match function is selected, which should still gives a good boost
> in performance over the non-burst API.
> 
> v2 changes:
>   * Created a common distributor_priv.h header file with common
>     definitions and structures.
>   * Added a scalar version so it can be built and used on machines without
>     sse2 instruction set
>   * Added unit autotests
>   * Added perf autotest

For future reference, I think it's better to put the list of deltas from
each version in reverse order, so that the latest changes are on top,
and save scrolling for those of us who have been tracking the set.

> 
> v3 changes:
>   * Addressed mailing list review comments
>   * Test code removal
>   * Split out SSE match into separate file to facilitate NEON addition
>   * Cleaned up conditional compilation flags for SSE2
>   * Addressed c99 style compilation errors
>   * rebased on latest head (Jan 2 2017, Happy New Year to all)
> 
> v4 changes:
>    * fixed issue building shared libraries
> 
> v5 changes:
>    * Removed some un-needed code around retries in worker API calls
>    * Cleanup due to review comments on mailing list
>    * Cleanup of non-x86 platform compilation, fallback to scalar match
> 
> v6 changes:
>    * Fixed intermittent segfault where num pkts not divisible
>      by BURST_SIZE
>    * Cleanup due to review comments on mailing list
>    * Renamed _priv.h to _private.h.
> 
> v7 changes:
>    * Reorganised patch so there's a more natural progression in the
>      changes, and divided them down into easier to review chunks.
>    * Previous versions of this patch set were effectively two APIs.
>      We now have a single API. Legacy functionality can
>      be used by by using the rte_distributor_create API call with the
>      RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
>    * Added symbol versioning for old API so that ABI is preserved.
> 
The merging to a single API is great to see, making it so much easier
for app developers. Thanks for that.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-23 18:48  4%     ` [dpdk-dev] Further fun with ABI tracking Ferruh Yigit
@ 2017-02-24  7:32  8%       ` Christian Ehrhardt
  0 siblings, 0 replies; 200+ results
From: Christian Ehrhardt @ 2017-02-24  7:32 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Jan Blunck, dev, cjcollier, ricardo.salveti, Luca Boccassi

On Thu, Feb 23, 2017 at 7:48 PM, Ferruh Yigit <ferruh.yigit@intel.com>
wrote:

> Can you please describe this option more?
>

Of course, I happy about any engagement/discussion to have on that.
Much better than silently staying in queue.


> Does is mean for each DPDK release, distro will release all libraries?
>

First of all it is opt-in. If nobody does change the default setting (="")
to
anything nothing happens.

A distribution _CAN_ use the feature to reliably avoid collisions between
DPDK releases and allow concurrent installations more easily.


> For 16.07:
> acl.2, eal.2, ethdev.4, pdump.1
>
> For 16.11:
> acl.2, eal.3, ethdev.5, pdump.1
>

That example is what we did so far, trying to follow the DPDK ABI
versioning.
But that caused the issue I described with (new)pdump.1->eal.3 and the base
app->eal2


Will dpdk package have following packages:
> acl.16.07.2, eal.16.07.2, ethdev.16.07.4, pdump.16.07.1
> acl.16.11.2, eal.16.11.3, ethdev.16.11.5, pdump.16.11.1
>

I thought on that, but Jan correctly brought up that if we do override we
should
as well trivialize and override fully - ignoring the per lib LIBABIVER.
So it will not be eal.16.11.3 but instead just eal.16.11   (no subversion,
as there
is no need). If that is wanted that can easily be done, please let me know
if I
should run a v2 with that.

And for initial OVS usecase, will it be:
>
> OVS
>  +---> eal.16.07.2
>  +---> pdump.16.11.1
>         +---> eal.16.11.3
>
>
Not quite, the usecase would look like that:
The current DPDK generates LIBABIVER versions: eal.3, pdump.1, ...
OVS
 +---> eal.3
 +---> pdump.1
        +---> eal.3

Note: Packages are initially carried forward from the former Distrobution
release,
so the next release would start as the former has ended:
OVS
 +---> eal.3
 +---> pdump.1
        +---> eal.3

Then the new DPDK would come in and using this feature would then
generate all as major version: eal.17.02, pdump.17.02, ...
But since OVS was not recompiled yet AND there is no collision OVS
would still look like:
OVS
 +---> eal.3
 +---> pdump.1
        +---> eal.3

Then we can recompile OVS and it will become
OVS
 +---> eal.17.02
 +---> pdump.17.02
        +---> eal.17.02

Into the future with more apps depending on DPDK there can be many
scenarios:
1. all are fine rebuilding, there will be no dependency left to the older
dpdk
    and it will be autoremoved after all upgrades
2. some packages are slow to adapt, but that is fine we can still provide
the
    old dependencies at the same time if needed
3. the time in between #2 and #1 is not wreaking havok as the
cross-dependency
    issue is no more


> Assuming above understanding is correct J :
>
> - If same version of the library will be delivered for each DPDK
> release, what is the benefit of having fine grained libraries really?
>

The benefit of the fine grained versioning is for other types of
distributing DPDK.
Recognizing an ABI bump for lib-consuming developers, bundling it directly
with your app, ...



> - Above OVS usage still does not look right, I don't believe this is the
> intention when library level dependency resolving introduced.
>
> Overall I am for single library, but I can see the benefit of having
> multiple small libraries, that is why I vote for option 4 in your
> initial mail.
>

Single library would solve it as well, but as mentioned and you all
remember there
were people with reasons for it that I could not challenge being too far
out of the
application scenarios they had in mind.

And I agree this can cause problem if not automated, but we already know
> the library dependencies, I think a script can be developed to warn a
> least, and they can be updated manually.
>
> And isn't the purpose of increasing LIBABIVER to notify application that
> library is modified and can't be used with that app anymore.
> For DPDK, even if the library is not changed, if another library that it
> depends modified, this may mean the behavior of the library may be
> changed, so it makes sense to me if library notifies the user for this
> case, by increasing its version.
>
> Yes this makes effect of increasing a core library version big, but I
> believe this is also true, increasing a core library version almost
> means increasing dpdk version.
>

Interesting - thanks for sharing your opinion here - I rethought that for a
while now.

While this could work I consider it inferior to the approach I submitted
in the patch yesterday [1] for the following reasons:

- If we bump infecting (looking at the recent history) we most likely end up
  bumping all libraries at least every other release. Now there isn't much
  different in bumping all of them +1 or just using a single increasing
version.
  Except you could miss a few bumps or track it wrong.

- The new Feature is opt-in, allowing those who want to do that major bump;
  But at the same time allowing those who won't to keep on tracking each lib
  individually and build/deliver it that way.

- I learned (often the hard way) that to be different often causes problems
  that are hard to foresee.
  The infecting ABI would be "DPDK is different" again, while the major
  override is somewhat established.

For now I'd suggest taking the opt-in feature as suggested in [1] as a means
for those who need it (like us and maybe more downstreams over time).
If DPDK is evolving to become more stable and develops a feature like
the #4 "infecting-abi-bump + tracking" it can still be picked us later and
by anybody else who wants needs it.
It will then "just" be dropping a config option we set before to get back.


TL;DR: I think DPDK is not stable enough to make option #4 worth
implementing for now to make a difference worth (but would cause
lot of work and error potential). But since my code [1] implementing
approach #1 and a later approach #4 in the future are not mutually
exclusive I'd ask to go for #1 now and #4 later if one needs and
implements it.

[1]: http://dpdk.org/ml/archives/dev/2017-February/058121.html


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-22 13:12  7%   ` Christian Ehrhardt
  2017-02-22 13:24 20%     ` [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version Christian Ehrhardt
@ 2017-02-23 18:48  4%     ` Ferruh Yigit
  2017-02-24  7:32  8%       ` Christian Ehrhardt
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-02-23 18:48 UTC (permalink / raw)
  To: Christian Ehrhardt, Jan Blunck
  Cc: dev, cjcollier, ricardo.salveti, Luca Boccassi

On 2/22/2017 1:12 PM, Christian Ehrhardt wrote:
> On Tue, Feb 14, 2017 at 9:31 PM, Jan Blunck <jblunck@infradead.org> wrote:
> 
>>> 1. Downstreams to insert Major version into soname
>>> Distributions could insert the DPDK major version (like 16.11) into the
>>> soname and package names. A common example of this is libboost [5].
>>> That would perfectly allow 16.07.<LIBABIVER> to coexist with
>>> 16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
>>> Yet it would mean that anything depending on the old library will have to
>>> be recompiled to pick up the new code, even if it depends on an ABI that
>> is
>>> still present in the new release.
>>> Also - not a technical reason - but it is clearly more work to force
>> update
>>> all dependencies and clean out old packages for every release.
>>
>> Actually this isn't exactly what I proposed during the summit. Just
>> keep it simple and fix the ABI version of all libraries at 16.11.0.
>> This is a proven approach and has been used for years with different
>> libraries.
> 
> 
> Since there was no other response I'll try to wrap up.
> 
> Yes #1 also is my preferred solution at the moment.
> We tried with individual following the tracking of LIBABIVER upstream but
> as outlined before we hit too many issues.
> I discussed it in the deb_dpdk group which acked as well to use this as
> general approach.
> The other options have too obvious flaws as I listed on my initial report
> and - thanks btw - you added a few more.
> 
> @Bruce - sorry I don't think dropping config options is the solution. Yet
> my suggestion does not prevent you from doing so.

Hi Christian,

Can you please describe this option more?

Does is mean for each DPDK release, distro will release all libraries?

For 16.07:
acl.2, eal.2, ethdev.4, pdump.1

For 16.11:
acl.2, eal.3, ethdev.5, pdump.1

Will dpdk package have following packages:
acl.16.07.2, eal.16.07.2, ethdev.16.07.4, pdump.16.07.1
acl.16.11.2, eal.16.11.3, ethdev.16.11.5, pdump.16.11.1

And for initial OVS usecase, will it be:

OVS
 +---> eal.16.07.2
 +---> pdump.16.11.1
        +---> eal.16.11.3


Assuming above understanding is correct J :

- If same version of the library will be delivered for each DPDK
release, what is the benefit of having fine grained libraries really?

- Above OVS usage still does not look right, I don't believe this is the
intention when library level dependency resolving introduced.

Overall I am for single library, but I can see the benefit of having
multiple small libraries, that is why I vote for option 4 in your
initial mail.

And I agree this can cause problem if not automated, but we already know
the library dependencies, I think a script can be developed to warn a
least, and they can be updated manually.

And isn't the purpose of increasing LIBABIVER to notify application that
library is modified and can't be used with that app anymore.
For DPDK, even if the library is not changed, if another library that it
depends modified, this may mean the behavior of the library may be
changed, so it makes sense to me if library notifies the user for this
case, by increasing its version.

Yes this makes effect of increasing a core library version big, but I
believe this is also true, increasing a core library version almost
means increasing dpdk version.

> 
> 
> 
>> You could easily do this independently of us upstream
>> fixing the ABI problems.
> 
> 
> 
> I agree, but I'd like to suggest the mechanism I want to implement.
> An ack by upstream for the Feature to set such a major ABI would be great.
> Actually since it is optional and can help more people integrating DPDK
> getting it accepted upstream be even better.
> 
> I'll send a patch in reply to this thread later today that implements what
> I have in mind.
> 
> 

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1 09/14] ring: allow dequeue fns to return remaining entry count
                     ` (5 preceding siblings ...)
  2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
@ 2017-02-23 17:24  2% ` Bruce Richardson
    7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:24 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Add an extra parameter to the ring dequeue burst/bulk functions so that
those functions can optionally return the amount of remaining objs in the
ring. This information can be used by applications in a number of ways,
for instance, with single-consumer queues, it provides a max
dequeue size which is guaranteed to work.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/pdump/main.c                                   |  2 +-
 app/test-pipeline/runtime.c                        |  6 +-
 app/test/test_link_bonding_mode4.c                 |  3 +-
 app/test/test_pmd_ring_perf.c                      |  7 +-
 app/test/test_ring.c                               | 54 ++++++-------
 app/test/test_ring_perf.c                          | 20 +++--
 app/test/test_table_acl.c                          |  2 +-
 app/test/test_table_pipeline.c                     |  2 +-
 app/test/test_table_ports.c                        |  8 +-
 app/test/virtual_pmd.c                             |  4 +-
 doc/guides/rel_notes/release_17_05.rst             |  8 ++
 drivers/crypto/null/null_crypto_pmd.c              |  2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c             |  3 +-
 drivers/net/ring/rte_eth_ring.c                    |  2 +-
 examples/distributor/main.c                        |  2 +-
 examples/load_balancer/runtime.c                   |  6 +-
 .../client_server_mp/mp_client/client.c            |  3 +-
 examples/packet_ordering/main.c                    |  6 +-
 examples/qos_sched/app_thread.c                    |  6 +-
 examples/quota_watermark/qw/main.c                 |  5 +-
 examples/server_node_efd/node/node.c               |  2 +-
 lib/librte_hash/rte_cuckoo_hash.c                  |  3 +-
 lib/librte_mempool/rte_mempool_ring.c              |  4 +-
 lib/librte_port/rte_port_frag.c                    |  3 +-
 lib/librte_port/rte_port_ring.c                    |  6 +-
 lib/librte_ring/rte_ring.h                         | 90 +++++++++++-----------
 26 files changed, 145 insertions(+), 114 deletions(-)

diff --git a/app/pdump/main.c b/app/pdump/main.c
index b88090d..3b13753 100644
--- a/app/pdump/main.c
+++ b/app/pdump/main.c
@@ -496,7 +496,7 @@ pdump_rxtx(struct rte_ring *ring, uint8_t vdev_id, struct pdump_stats *stats)
 
 	/* first dequeue packets from ring of primary process */
 	const uint16_t nb_in_deq = rte_ring_dequeue_burst(ring,
-			(void *)rxtx_bufs, BURST_SIZE);
+			(void *)rxtx_bufs, BURST_SIZE, NULL);
 	stats->dequeue_pkts += nb_in_deq;
 
 	if (nb_in_deq) {
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index c06ff54..8970e1c 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -121,7 +121,8 @@ app_main_loop_worker(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_rx[i],
 			(void **) worker_mbuf->array,
-			app.burst_size_worker_read);
+			app.burst_size_worker_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
@@ -151,7 +152,8 @@ app_main_loop_tx(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_tx[i],
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
-			app.burst_size_tx_read);
+			app.burst_size_tx_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
diff --git a/app/test/test_link_bonding_mode4.c b/app/test/test_link_bonding_mode4.c
index 8df28b4..15091b1 100644
--- a/app/test/test_link_bonding_mode4.c
+++ b/app/test/test_link_bonding_mode4.c
@@ -193,7 +193,8 @@ static uint8_t lacpdu_rx_count[RTE_MAX_ETHPORTS] = {0, };
 static int
 slave_get_pkts(struct slave_conf *slave, struct rte_mbuf **buf, uint16_t size)
 {
-	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf, size);
+	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf,
+			size, NULL);
 }
 
 /*
diff --git a/app/test/test_pmd_ring_perf.c b/app/test/test_pmd_ring_perf.c
index 045a7f2..004882a 100644
--- a/app/test/test_pmd_ring_perf.c
+++ b/app/test/test_pmd_ring_perf.c
@@ -67,7 +67,7 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t eth_start = rte_rdtsc();
@@ -99,7 +99,7 @@ test_single_enqueue_dequeue(void)
 	rte_compiler_barrier();
 	for (i = 0; i < iterations; i++) {
 		rte_ring_enqueue_bulk(r, &burst, 1, NULL);
-		rte_ring_dequeue_bulk(r, &burst, 1);
+		rte_ring_dequeue_bulk(r, &burst, 1, NULL);
 	}
 	const uint64_t sc_end = rte_rdtsc_precise();
 	rte_compiler_barrier();
@@ -133,7 +133,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, (void *)burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, (void *)burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, (void *)burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index b0ca88b..858ebc1 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -119,7 +119,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		    __func__, i, rand);
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
 				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
+				NULL) == rand);
 
 		/* fill the ring */
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
@@ -129,7 +130,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
+				NULL) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -186,19 +188,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -232,19 +234,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -265,7 +267,7 @@ test_ring_basic(void)
 		cur_src += MAX_BULK;
 		if (ret == 0)
 			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if (ret == 0)
 			goto fail;
@@ -303,13 +305,13 @@ test_ring_basic(void)
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue2\n");
@@ -390,19 +392,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1) ;
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -451,19 +453,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -505,19 +507,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -539,7 +541,7 @@ test_ring_burst_basic(void)
 		cur_src += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
@@ -578,19 +580,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -613,7 +615,7 @@ test_ring_burst_basic(void)
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret != 2)
 		goto fail;
@@ -753,7 +755,7 @@ test_ring_basic_ex(void)
 		goto fail_test;
 	}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2);
+	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
 	if (ret != 2) {
 		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
 		goto fail_test;
diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index f95a8e9..ed89896 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -152,12 +152,12 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t mc_end = rte_rdtsc();
 
 	printf("SC empty dequeue: %.2F\n",
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
@@ -325,7 +325,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -333,7 +334,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
@@ -361,7 +363,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -369,7 +372,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
diff --git a/app/test/test_table_acl.c b/app/test/test_table_acl.c
index b3bfda4..4d43be7 100644
--- a/app/test/test_table_acl.c
+++ b/app/test/test_table_acl.c
@@ -713,7 +713,7 @@ test_pipeline_single_filter(int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0) {
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/app/test/test_table_pipeline.c b/app/test/test_table_pipeline.c
index 36bfeda..b58aa5d 100644
--- a/app/test/test_table_pipeline.c
+++ b/app/test/test_table_pipeline.c
@@ -494,7 +494,7 @@ test_pipeline_single_filter(int test_type, int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0)
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/app/test/test_table_ports.c b/app/test/test_table_ports.c
index 395f4f3..39592ce 100644
--- a/app/test/test_table_ports.c
+++ b/app/test/test_table_ports.c
@@ -163,7 +163,7 @@ test_port_ring_writer(void)
 	rte_port_ring_writer_ops.f_flush(port);
 	expected_pkts = 1;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -7;
@@ -178,7 +178,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -193,7 +193,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -208,7 +208,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -9;
diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index 39e070c..b209355 100644
--- a/app/test/virtual_pmd.c
+++ b/app/test/virtual_pmd.c
@@ -342,7 +342,7 @@ virtual_ethdev_rx_burst_success(void *queue __rte_unused,
 	dev_private = vrtl_eth_dev->data->dev_private;
 
 	rx_count = rte_ring_dequeue_burst(dev_private->rx_queue, (void **) bufs,
-			nb_pkts);
+			nb_pkts, NULL);
 
 	/* increments ipackets count */
 	dev_private->eth_stats.ipackets += rx_count;
@@ -508,7 +508,7 @@ virtual_ethdev_get_mbufs_from_tx_queue(uint8_t port_id,
 
 	dev_private = vrtl_eth_dev->data->dev_private;
 	return rte_ring_dequeue_burst(dev_private->tx_queue, (void **)pkt_burst,
-		burst_length);
+		burst_length, NULL);
 }
 
 static uint8_t
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 249ad6e..563a74c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -123,6 +123,8 @@ API Changes
   * added an extra parameter to the burst/bulk enqueue functions to
     return the number of free spaces in the ring after enqueue. This can
     be used by an application to implement its own watermark functionality.
+  * added an extra parameter to the burst/bulk dequeue functions to return
+    the number elements remaining in the ring after dequeue.
   * changed the return value of the enqueue and dequeue bulk functions to
     match that of the burst equivalents. In all cases, ring functions which
     operate on multiple packets now return the number of elements enqueued
@@ -135,6 +137,12 @@ API Changes
     - ``rte_ring_sc_dequeue_bulk``
     - ``rte_ring_dequeue_bulk``
 
+    NOTE: the above functions all have different parameters as well as
+    different return values, due to the other listed changes above. This
+    means that all instances of the functions in existing code will be
+    flagged by the compiler. The return value usage should be checked
+    while fixing the compiler error due to the extra parameter.
+
 ABI Changes
 -----------
 
diff --git a/drivers/crypto/null/null_crypto_pmd.c b/drivers/crypto/null/null_crypto_pmd.c
index ed5a9fc..f68ec8d 100644
--- a/drivers/crypto/null/null_crypto_pmd.c
+++ b/drivers/crypto/null/null_crypto_pmd.c
@@ -155,7 +155,7 @@ null_crypto_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
 	unsigned nb_dequeued;
 
 	nb_dequeued = rte_ring_dequeue_burst(qp->processed_pkts,
-			(void **)ops, nb_ops);
+			(void **)ops, nb_ops, NULL);
 	qp->qp_stats.dequeued_count += nb_dequeued;
 
 	return nb_dequeued;
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index f3ac9e2..96638af 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1008,7 +1008,8 @@ bond_ethdev_tx_burst_8023ad(void *queue, struct rte_mbuf **bufs,
 		struct port *port = &mode_8023ad_ports[slaves[i]];
 
 		slave_slow_nb_pkts[i] = rte_ring_dequeue_burst(port->tx_ring,
-				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS);
+				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS,
+				NULL);
 		slave_nb_pkts[i] = slave_slow_nb_pkts[i];
 
 		for (j = 0; j < slave_slow_nb_pkts[i]; j++)
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index adbf478..77ef3a1 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -88,7 +88,7 @@ eth_ring_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
 	void **ptrs = (void *)&bufs[0];
 	struct ring_queue *r = q;
 	const uint16_t nb_rx = (uint16_t)rte_ring_dequeue_burst(r->rng,
-			ptrs, nb_bufs);
+			ptrs, nb_bufs, NULL);
 	if (r->rng->flags & RING_F_SC_DEQ)
 		r->rx_pkts.cnt += nb_rx;
 	else
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index cfd360b..5cb6185 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -330,7 +330,7 @@ lcore_tx(struct rte_ring *in_r)
 
 			struct rte_mbuf *bufs[BURST_SIZE];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE, NULL);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 1645994..8192c08 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -349,7 +349,8 @@ app_lcore_io_tx(
 			ret = rte_ring_sc_dequeue_bulk(
 				ring,
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
-				bsz_rd);
+				bsz_rd,
+				NULL);
 
 			if (unlikely(ret == 0))
 				continue;
@@ -504,7 +505,8 @@ app_lcore_worker(
 		ret = rte_ring_sc_dequeue_bulk(
 			ring_in,
 			(void **) lp->mbuf_in.array,
-			bsz_rd);
+			bsz_rd,
+			NULL);
 
 		if (unlikely(ret == 0))
 			continue;
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index dca9eb9..01b535c 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -279,7 +279,8 @@ main(int argc, char *argv[])
 		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts,
+				PKT_READ_SIZE, NULL);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index d268350..7719dad 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -462,7 +462,7 @@ worker_thread(void *args_ptr)
 
 		/* dequeue the mbufs from rx_to_workers ring */
 		burst_size = rte_ring_dequeue_burst(ring_in,
-				(void *)burst_buffer, MAX_PKTS_BURST);
+				(void *)burst_buffer, MAX_PKTS_BURST, NULL);
 		if (unlikely(burst_size == 0))
 			continue;
 
@@ -510,7 +510,7 @@ send_thread(struct send_thread_args *args)
 
 		/* deque the mbufs from workers_to_tx ring */
 		nb_dq_mbufs = rte_ring_dequeue_burst(args->ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(nb_dq_mbufs == 0))
 			continue;
@@ -595,7 +595,7 @@ tx_thread(struct rte_ring *ring_in)
 
 		/* deque the mbufs from workers_to_tx ring */
 		dqnum = rte_ring_dequeue_burst(ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(dqnum == 0))
 			continue;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 0c81a15..15f117f 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -179,7 +179,7 @@ app_tx_thread(struct thread_conf **confs)
 
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
-					burst_conf.qos_dequeue);
+					burst_conf.qos_dequeue, NULL);
 		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
@@ -218,7 +218,7 @@ app_worker_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
@@ -254,7 +254,7 @@ app_mixed_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
diff --git a/examples/quota_watermark/qw/main.c b/examples/quota_watermark/qw/main.c
index 57df8ef..2dcddea 100644
--- a/examples/quota_watermark/qw/main.c
+++ b/examples/quota_watermark/qw/main.c
@@ -247,7 +247,8 @@ pipeline_stage(__attribute__((unused)) void *args)
 			}
 
 			/* Dequeue up to quota mbuf from rx */
-			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota);
+			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts,
+					*quota, NULL);
 			if (unlikely(nb_dq_pkts < 0))
 				continue;
 
@@ -305,7 +306,7 @@ send_stage(__attribute__((unused)) void *args)
 
 			/* Dequeue packets from tx and send them */
 			nb_dq_pkts = (uint16_t) rte_ring_dequeue_burst(tx,
-					(void *) tx_pkts, *quota);
+					(void *) tx_pkts, *quota, NULL);
 			rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
 
 			/* TODO: Check if nb_dq_pkts == nb_tx_pkts? */
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index 9ec6a05..f780b92 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) == 0))
+					rx_pkts, NULL) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 6552199..645c0cf 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -536,7 +536,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
 			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
-					cached_free_slots->objs, LCORE_CACHE_SIZE);
+					cached_free_slots->objs,
+					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0)
 				return -ENOSPC;
 
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index 9b8fd2b..5c132bf 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -58,14 +58,14 @@ static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_mc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_sc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_port/rte_port_frag.c b/lib/librte_port/rte_port_frag.c
index 0fcace9..320407e 100644
--- a/lib/librte_port/rte_port_frag.c
+++ b/lib/librte_port/rte_port_frag.c
@@ -186,7 +186,8 @@ rte_port_ring_reader_frag_rx(void *port,
 		/* If "pkts" buffer is empty, read packet burst from ring */
 		if (p->n_pkts == 0) {
 			p->n_pkts = rte_ring_sc_dequeue_burst(p->ring,
-				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX);
+				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX,
+				NULL);
 			RTE_PORT_RING_READER_FRAG_STATS_PKTS_IN_ADD(p, p->n_pkts);
 			if (p->n_pkts == 0)
 				return n_pkts_out;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 9fadac7..492b0e7 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -111,7 +111,8 @@ rte_port_ring_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts)
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
@@ -124,7 +125,8 @@ rte_port_ring_multi_reader_rx(void *port, struct rte_mbuf **pkts,
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index b5a995e..afd5367 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -483,7 +483,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -492,11 +493,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	unsigned int i;
 	uint32_t mask = r->mask;
 
-	/* Avoid the unnecessary cmpset operation below, which is also
-	 * potentially harmful when n equals 0. */
-	if (n == 0)
-		return 0;
-
 	/* move cons.head atomically */
 	do {
 		/* Restore n as it may change every loop */
@@ -511,15 +507,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		entries = (prod_tail - cons_head);
 
 		/* Set the actual entries for dequeue */
-		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED)
-				return 0;
-			else {
-				if (unlikely(entries == 0))
-					return 0;
-				n = entries;
-			}
-		}
+		if (n > entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+		if (unlikely(n == 0))
+			goto end;
 
 		cons_next = cons_head + n;
 		success = rte_atomic32_cmpset(&r->cons.head, cons_head,
@@ -538,7 +530,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		rte_pause();
 
 	r->cons.tail = cons_next;
-
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -562,7 +556,8 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  */
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -577,15 +572,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * and size(ring)-1. */
 	entries = prod_tail - cons_head;
 
-	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED)
-			return 0;
-		else {
-			if (unlikely(entries == 0))
-				return 0;
-			n = entries;
-		}
-	}
+	if (n > entries)
+		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+	if (unlikely(entries == 0))
+		goto end;
 
 	cons_next = cons_head + n;
 	r->cons.head = cons_next;
@@ -595,6 +586,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -741,9 +735,11 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -760,9 +756,11 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -782,12 +780,13 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
+		unsigned int *available)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
 }
 
 /**
@@ -808,7 +807,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -826,7 +825,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -848,7 +847,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -1038,9 +1037,11 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1058,9 +1059,11 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1080,12 +1083,13 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - Number of objects dequeued
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
 }
 
 #ifdef __cplusplus
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 07/14] ring: make bulk and burst fn return vals consistent
                     ` (4 preceding siblings ...)
  2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 06/14] ring: remove watermark support Bruce Richardson
@ 2017-02-23 17:24  2% ` Bruce Richardson
  2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
    7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:24 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

The bulk fns for rings returns 0 for all elements enqueued and negative
for no space. Change that to make them consistent with the burst functions
in returning the number of elements enqueued/dequeued, i.e. 0 or N.
This change also allows the return value from enq/deq to be used directly
without a branch for error checking.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test-pipeline/pipeline_hash.c                  |   2 +-
 app/test-pipeline/runtime.c                        |   8 +-
 app/test/test_ring.c                               |  46 +++++----
 app/test/test_ring_perf.c                          |   8 +-
 doc/guides/rel_notes/release_17_05.rst             |  11 +++
 doc/guides/sample_app_ug/server_node_efd.rst       |   2 +-
 examples/load_balancer/runtime.c                   |  16 ++-
 .../client_server_mp/mp_client/client.c            |   8 +-
 .../client_server_mp/mp_server/main.c              |   2 +-
 examples/qos_sched/app_thread.c                    |   8 +-
 examples/server_node_efd/node/node.c               |   2 +-
 examples/server_node_efd/server/main.c             |   2 +-
 lib/librte_mempool/rte_mempool_ring.c              |  12 ++-
 lib/librte_ring/rte_ring.h                         | 109 +++++++--------------
 14 files changed, 106 insertions(+), 130 deletions(-)

diff --git a/app/test-pipeline/pipeline_hash.c b/app/test-pipeline/pipeline_hash.c
index 10d2869..1ac0aa8 100644
--- a/app/test-pipeline/pipeline_hash.c
+++ b/app/test-pipeline/pipeline_hash.c
@@ -547,6 +547,6 @@ app_main_loop_rx_metadata(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index 42a6142..4e20669 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -98,7 +98,7 @@ app_main_loop_rx(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -123,7 +123,7 @@ app_main_loop_worker(void) {
 			(void **) worker_mbuf->array,
 			app.burst_size_worker_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		do {
@@ -131,7 +131,7 @@ app_main_loop_worker(void) {
 				app.rings_tx[i ^ 1],
 				(void **) worker_mbuf->array,
 				app.burst_size_worker_write);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -152,7 +152,7 @@ app_main_loop_tx(void) {
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
 			app.burst_size_tx_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		n_mbufs += app.burst_size_tx_read;
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 666a451..112433b 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -117,20 +117,18 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
 		printf("%s: iteration %u, random shift: %u;\n",
 		    __func__, i, rand);
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rand));
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rand));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand) != 0);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
 
 		/* fill the ring */
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rsz));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz) != 0);
 		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
 		TEST_RING_VERIFY(rsz == rte_ring_count(r));
 		TEST_RING_VERIFY(rte_ring_full(r));
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rsz));
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -171,37 +169,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -217,37 +215,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -264,11 +262,11 @@ test_ring_basic(void)
 	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
 		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 		cur_src += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 		cur_dst += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 	}
 
@@ -294,25 +292,25 @@ test_ring_basic(void)
 
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue2\n");
 		goto fail;
 	}
diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 320c20c..8ccbdef 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -195,13 +195,13 @@ enqueue_bulk(void *p)
 
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_sp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_mp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mp_end = rte_rdtsc();
 
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4e748dc..2b11765 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -120,6 +120,17 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
   * removed the function ``rte_ring_set_water_mark`` as part of a general
     removal of watermarks support in the library.
+  * changed the return value of the enqueue and dequeue bulk functions to
+    match that of the burst equivalents. In all cases, ring functions which
+    operate on multiple packets now return the number of elements enqueued
+    or dequeued, as appropriate. The updated functions are:
+
+    - ``rte_ring_mp_enqueue_bulk``
+    - ``rte_ring_sp_enqueue_bulk``
+    - ``rte_ring_enqueue_bulk``
+    - ``rte_ring_mc_dequeue_bulk``
+    - ``rte_ring_sc_dequeue_bulk``
+    - ``rte_ring_dequeue_bulk``
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/server_node_efd.rst b/doc/guides/sample_app_ug/server_node_efd.rst
index 9b69cfe..e3a63c8 100644
--- a/doc/guides/sample_app_ug/server_node_efd.rst
+++ b/doc/guides/sample_app_ug/server_node_efd.rst
@@ -286,7 +286,7 @@ repeated infinitely.
 
         cl = &nodes[node];
         if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-                cl_rx_buf[node].count) != 0){
+                cl_rx_buf[node].count) != cl_rx_buf[node].count){
             for (j = 0; j < cl_rx_buf[node].count; j++)
                 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
             cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 6944325..82b10bc 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -146,7 +146,7 @@ app_lcore_io_rx_buffer_to_send (
 		(void **) lp->rx.mbuf_out[worker].array,
 		bsz);
 
-	if (unlikely(ret == -ENOBUFS)) {
+	if (unlikely(ret == 0)) {
 		uint32_t k;
 		for (k = 0; k < bsz; k ++) {
 			struct rte_mbuf *m = lp->rx.mbuf_out[worker].array[k];
@@ -312,7 +312,7 @@ app_lcore_io_rx_flush(struct app_lcore_params_io *lp, uint32_t n_workers)
 			(void **) lp->rx.mbuf_out[worker].array,
 			lp->rx.mbuf_out[worker].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->rx.mbuf_out[worker].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->rx.mbuf_out[worker].array[k];
@@ -349,9 +349,8 @@ app_lcore_io_tx(
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
 				bsz_rd);
 
-			if (unlikely(ret == -ENOENT)) {
+			if (unlikely(ret == 0))
 				continue;
-			}
 
 			n_mbufs += bsz_rd;
 
@@ -505,9 +504,8 @@ app_lcore_worker(
 			(void **) lp->mbuf_in.array,
 			bsz_rd);
 
-		if (unlikely(ret == -ENOENT)) {
+		if (unlikely(ret == 0))
 			continue;
-		}
 
 #if APP_WORKER_DROP_ALL_PACKETS
 		for (j = 0; j < bsz_rd; j ++) {
@@ -559,7 +557,7 @@ app_lcore_worker(
 
 #if APP_STATS
 			lp->rings_out_iters[port] ++;
-			if (ret == 0) {
+			if (ret > 0) {
 				lp->rings_out_count[port] += 1;
 			}
 			if (lp->rings_out_iters[port] == APP_STATS){
@@ -572,7 +570,7 @@ app_lcore_worker(
 			}
 #endif
 
-			if (unlikely(ret == -ENOBUFS)) {
+			if (unlikely(ret == 0)) {
 				uint32_t k;
 				for (k = 0; k < bsz_wr; k ++) {
 					struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
@@ -609,7 +607,7 @@ app_lcore_worker_flush(struct app_lcore_params_worker *lp)
 			(void **) lp->mbuf_out[port].array,
 			lp->mbuf_out[port].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->mbuf_out[port].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index d4f9ca3..dca9eb9 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -276,14 +276,10 @@ main(int argc, char *argv[])
 	printf("[Press Ctrl-C to quit ...]\n");
 
 	for (;;) {
-		uint16_t i, rx_pkts = PKT_READ_SIZE;
+		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		/* try dequeuing max possible packets first, if that fails, get the
-		 * most we can. Loop body should only execute once, maximum */
-		while (rx_pkts > 0 &&
-				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts, rx_pkts) != 0))
-			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring), PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/multi_process/client_server_mp/mp_server/main.c b/examples/multi_process/client_server_mp/mp_server/main.c
index a6dc12d..19c95b2 100644
--- a/examples/multi_process/client_server_mp/mp_server/main.c
+++ b/examples/multi_process/client_server_mp/mp_server/main.c
@@ -227,7 +227,7 @@ flush_rx_queue(uint16_t client)
 
 	cl = &clients[client];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[client].buffer,
-			cl_rx_buf[client].count) != 0){
+			cl_rx_buf[client].count) == 0){
 		for (j = 0; j < cl_rx_buf[client].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[client].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[client].count;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 70fdcdb..dab4594 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -107,7 +107,7 @@ app_rx_thread(struct thread_conf **confs)
 			}
 
 			if (unlikely(rte_ring_sp_enqueue_bulk(conf->rx_ring,
-								(void **)rx_mbufs, nb_rx) != 0)) {
+					(void **)rx_mbufs, nb_rx) == 0)) {
 				for(i = 0; i < nb_rx; i++) {
 					rte_pktmbuf_free(rx_mbufs[i]);
 
@@ -180,7 +180,7 @@ app_tx_thread(struct thread_conf **confs)
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
 					burst_conf.qos_dequeue);
-		if (likely(retval == 0)) {
+		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
 			conf->counter = 0; /* reset empty read loop counter */
@@ -230,7 +230,9 @@ app_worker_thread(struct thread_conf **confs)
 		nb_pkt = rte_sched_port_dequeue(conf->sched_port, mbufs,
 					burst_conf.qos_dequeue);
 		if (likely(nb_pkt > 0))
-			while (rte_ring_sp_enqueue_bulk(conf->tx_ring, (void **)mbufs, nb_pkt) != 0);
+			while (rte_ring_sp_enqueue_bulk(conf->tx_ring,
+					(void **)mbufs, nb_pkt) == 0)
+				; /* empty body */
 
 		conf_idx++;
 		if (confs[conf_idx] == NULL)
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index a6c0c70..9ec6a05 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) != 0))
+					rx_pkts) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/examples/server_node_efd/server/main.c b/examples/server_node_efd/server/main.c
index 1a54d1b..3eb7fac 100644
--- a/examples/server_node_efd/server/main.c
+++ b/examples/server_node_efd/server/main.c
@@ -247,7 +247,7 @@ flush_rx_queue(uint16_t node)
 
 	cl = &nodes[node];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-			cl_rx_buf[node].count) != 0){
+			cl_rx_buf[node].count) != cl_rx_buf[node].count){
 		for (j = 0; j < cl_rx_buf[node].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index b9aa64d..409b860 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -42,26 +42,30 @@ static int
 common_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_mp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_sp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_mc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_sc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index e5fc751..6712f1f 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -344,14 +344,10 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -383,7 +379,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOBUFS;
+				return 0;
 			else {
 				/* No free entry available */
 				if (unlikely(free_entries == 0))
@@ -409,7 +405,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -425,14 +421,10 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -452,7 +444,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOBUFS;
+			return 0;
 		else {
 			/* No free entry available */
 			if (unlikely(free_entries == 0))
@@ -469,7 +461,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -490,16 +482,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
 
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -531,7 +518,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOENT;
+				return 0;
 			else {
 				if (unlikely(entries == 0))
 					return 0;
@@ -557,7 +544,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	r->cons.tail = cons_next;
 
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -575,15 +562,10 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -602,7 +584,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	if (n > entries) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOENT;
+			return 0;
 		else {
 			if (unlikely(entries == 0))
 				return 0;
@@ -618,7 +600,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -634,10 +616,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -654,10 +635,9 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -678,10 +658,9 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned n)
 {
@@ -708,7 +687,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 static inline int __attribute__((always_inline))
 rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_mp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -725,7 +704,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_sp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -746,10 +725,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_enqueue(struct rte_ring *r, void *obj)
 {
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue(r, obj);
-	else
-		return rte_ring_mp_enqueue(r, obj);
+	return rte_ring_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -765,11 +741,9 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -786,11 +760,9 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects to dequeue from the ring to the obj_table,
  *   must be strictly positive.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -810,11 +782,9 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	if (r->cons.sc_dequeue)
@@ -841,7 +811,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -859,7 +829,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -881,10 +851,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue(r, obj_p);
-	else
-		return rte_ring_mc_dequeue(r, obj_p);
+	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 06/14] ring: remove watermark support
                     ` (3 preceding siblings ...)
  2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
@ 2017-02-23 17:23  2% ` Bruce Richardson
  2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Remove the watermark support. A future commit will add support for having
enqueue functions return the amount of free space in the ring, which will
allow applications to implement their own watermark checks, while also
being more useful to the app.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/commands.c                    |  52 ------------
 app/test/test_ring.c                   | 149 +--------------------------------
 doc/guides/rel_notes/release_17_05.rst |   2 +
 examples/Makefile                      |   2 +-
 lib/librte_ring/rte_ring.c             |  23 -----
 lib/librte_ring/rte_ring.h             |  58 +------------
 6 files changed, 8 insertions(+), 278 deletions(-)

diff --git a/app/test/commands.c b/app/test/commands.c
index 2df46b0..551c81d 100644
--- a/app/test/commands.c
+++ b/app/test/commands.c
@@ -228,57 +228,6 @@ cmdline_parse_inst_t cmd_dump_one = {
 
 /****************/
 
-struct cmd_set_ring_result {
-	cmdline_fixed_string_t set;
-	cmdline_fixed_string_t name;
-	uint32_t value;
-};
-
-static void cmd_set_ring_parsed(void *parsed_result, struct cmdline *cl,
-				__attribute__((unused)) void *data)
-{
-	struct cmd_set_ring_result *res = parsed_result;
-	struct rte_ring *r;
-	int ret;
-
-	r = rte_ring_lookup(res->name);
-	if (r == NULL) {
-		cmdline_printf(cl, "Cannot find ring\n");
-		return;
-	}
-
-	if (!strcmp(res->set, "set_watermark")) {
-		ret = rte_ring_set_water_mark(r, res->value);
-		if (ret != 0)
-			cmdline_printf(cl, "Cannot set water mark\n");
-	}
-}
-
-cmdline_parse_token_string_t cmd_set_ring_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, set,
-				 "set_watermark");
-
-cmdline_parse_token_string_t cmd_set_ring_name =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, name, NULL);
-
-cmdline_parse_token_num_t cmd_set_ring_value =
-	TOKEN_NUM_INITIALIZER(struct cmd_set_ring_result, value, UINT32);
-
-cmdline_parse_inst_t cmd_set_ring = {
-	.f = cmd_set_ring_parsed,  /* function to call */
-	.data = NULL,      /* 2nd arg of func */
-	.help_str = "set watermark: "
-			"set_watermark <ring_name> <value>",
-	.tokens = {        /* token list, NULL terminated */
-		(void *)&cmd_set_ring_set,
-		(void *)&cmd_set_ring_name,
-		(void *)&cmd_set_ring_value,
-		NULL,
-	},
-};
-
-/****************/
-
 struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
@@ -419,7 +368,6 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_autotest,
 	(cmdline_parse_inst_t *)&cmd_dump,
 	(cmdline_parse_inst_t *)&cmd_dump_one,
-	(cmdline_parse_inst_t *)&cmd_set_ring,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx_anchor,
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 3891f5d..666a451 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -78,21 +78,6 @@
  *      - Dequeue one object, two objects, MAX_BULK objects
  *      - Check that dequeued pointers are correct
  *
- *    - Test watermark and default bulk enqueue/dequeue:
- *
- *      - Set watermark
- *      - Set default bulk value
- *      - Enqueue objects, check that -EDQUOT is returned when
- *        watermark is exceeded
- *      - Check that dequeued pointers are correct
- *
- * #. Check live watermark change
- *
- *    - Start a loop on another lcore that will enqueue and dequeue
- *      objects in a ring. It will monitor the value of watermark.
- *    - At the same time, change the watermark on the master lcore.
- *    - The slave lcore will check that watermark changes from 16 to 32.
- *
  * #. Performance tests.
  *
  * Tests done in test_ring_perf.c
@@ -115,123 +100,6 @@ static struct rte_ring *r;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
-static int
-check_live_watermark_change(__attribute__((unused)) void *dummy)
-{
-	uint64_t hz = rte_get_timer_hz();
-	void *obj_table[MAX_BULK];
-	unsigned watermark, watermark_old = 16;
-	uint64_t cur_time, end_time;
-	int64_t diff = 0;
-	int i, ret;
-	unsigned count = 4;
-
-	/* init the object table */
-	memset(obj_table, 0, sizeof(obj_table));
-	end_time = rte_get_timer_cycles() + (hz / 4);
-
-	/* check that bulk and watermark are 4 and 32 (respectively) */
-	while (diff >= 0) {
-
-		/* add in ring until we reach watermark */
-		ret = 0;
-		for (i = 0; i < 16; i ++) {
-			if (ret != 0)
-				break;
-			ret = rte_ring_enqueue_bulk(r, obj_table, count);
-		}
-
-		if (ret != -EDQUOT) {
-			printf("Cannot enqueue objects, or watermark not "
-			       "reached (ret=%d)\n", ret);
-			return -1;
-		}
-
-		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->watermark;
-		if (watermark != watermark_old &&
-		    (watermark_old != 16 || watermark != 32)) {
-			printf("Bad watermark change %u -> %u\n", watermark_old,
-			       watermark);
-			return -1;
-		}
-		watermark_old = watermark;
-
-		/* dequeue objects from ring */
-		while (i--) {
-			ret = rte_ring_dequeue_bulk(r, obj_table, count);
-			if (ret != 0) {
-				printf("Cannot dequeue (ret=%d)\n", ret);
-				return -1;
-			}
-		}
-
-		cur_time = rte_get_timer_cycles();
-		diff = end_time - cur_time;
-	}
-
-	if (watermark_old != 32 ) {
-		printf(" watermark was not updated (wm=%u)\n",
-		       watermark_old);
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-test_live_watermark_change(void)
-{
-	unsigned lcore_id = rte_lcore_id();
-	unsigned lcore_id2 = rte_get_next_lcore(lcore_id, 0, 1);
-
-	printf("Test watermark live modification\n");
-	rte_ring_set_water_mark(r, 16);
-
-	/* launch a thread that will enqueue and dequeue, checking
-	 * watermark and quota */
-	rte_eal_remote_launch(check_live_watermark_change, NULL, lcore_id2);
-
-	rte_delay_ms(100);
-	rte_ring_set_water_mark(r, 32);
-	rte_delay_ms(100);
-
-	if (rte_eal_wait_lcore(lcore_id2) < 0)
-		return -1;
-
-	return 0;
-}
-
-/* Test for catch on invalid watermark values */
-static int
-test_set_watermark( void ){
-	unsigned count;
-	int setwm;
-
-	struct rte_ring *r = rte_ring_lookup("test_ring_basic_ex");
-	if(r == NULL){
-		printf( " ring lookup failed\n" );
-		goto error;
-	}
-	count = r->size * 2;
-	setwm = rte_ring_set_water_mark(r, count);
-	if (setwm != -EINVAL){
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-
-	count = 0;
-	rte_ring_set_water_mark(r, count);
-	if (r->watermark != r->size) {
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-	return 0;
-
-error:
-	return -1;
-}
-
 /*
  * helper routine for test_ring_basic
  */
@@ -418,8 +286,7 @@ test_ring_basic(void)
 	cur_src = src;
 	cur_dst = dst;
 
-	printf("test watermark and default bulk enqueue / dequeue\n");
-	rte_ring_set_water_mark(r, 20);
+	printf("test default bulk enqueue / dequeue\n");
 	num_elems = 16;
 
 	cur_src = src;
@@ -433,8 +300,8 @@ test_ring_basic(void)
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != -EDQUOT) {
-		printf("Watermark not exceeded\n");
+	if (ret != 0) {
+		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
@@ -930,16 +797,6 @@ test_ring(void)
 		return -1;
 
 	/* basic operations */
-	if (test_live_watermark_change() < 0)
-		return -1;
-
-	if ( test_set_watermark() < 0){
-		printf ("Test failed to detect invalid parameter\n");
-		return -1;
-	}
-	else
-		printf ( "Test detected forced bad watermark values\n");
-
 	if ( test_create_count_odd() < 0){
 			printf ("Test failed to detect odd count\n");
 			return -1;
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index c69ca8f..4e748dc 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -118,6 +118,8 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
+  * removed the function ``rte_ring_set_water_mark`` as part of a general
+    removal of watermarks support in the library.
 
 ABI Changes
 -----------
diff --git a/examples/Makefile b/examples/Makefile
index da2bfdd..19cd5ad 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -81,7 +81,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += packet_ordering
 DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += qos_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += qos_sched
-DIRS-y += quota_watermark
+#DIRS-y += quota_watermark
 DIRS-$(CONFIG_RTE_ETHDEV_RXTX_CALLBACKS) += rxtx_callbacks
 DIRS-y += skeleton
 ifeq ($(CONFIG_RTE_LIBRTE_HASH),y)
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 90ee63f..18fb644 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -138,7 +138,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
 	r->size = count;
@@ -256,24 +255,6 @@ rte_ring_free(struct rte_ring *r)
 	rte_free(te);
 }
 
-/*
- * change the high water mark. If *count* is 0, water marking is
- * disabled
- */
-int
-rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
-{
-	if (count >= r->size)
-		return -EINVAL;
-
-	/* if count is 0, disable the watermarking */
-	if (count == 0)
-		count = r->size;
-
-	r->watermark = count;
-	return 0;
-}
-
 /* dump the status of the ring on the console */
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
@@ -287,10 +268,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->watermark == r->size)
-		fprintf(f, "  watermark=0\n");
-	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 0f95c84..e5fc751 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -148,7 +148,6 @@ struct rte_ring {
 			/**< Memzone, if any, containing the rte_ring */
 	uint32_t size;           /**< Size of ring. */
 	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_ht_ptr prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
@@ -163,7 +162,6 @@ struct rte_ring {
 
 #define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
 #define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
@@ -269,26 +267,6 @@ struct rte_ring *rte_ring_create(const char *name, unsigned count,
 void rte_ring_free(struct rte_ring *r);
 
 /**
- * Change the high water mark.
- *
- * If *count* is 0, water marking is disabled. Otherwise, it is set to the
- * *count* value. The *count* value must be greater than 0 and less
- * than the ring size.
- *
- * This function can be called at any time (not necessarily at
- * initialization).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param count
- *   The new water mark value.
- * @return
- *   - 0: Success; water mark changed.
- *   - -EINVAL: Invalid water mark value.
- */
-int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
-
-/**
  * Dump the status of the ring to a file.
  *
  * @param f
@@ -369,8 +347,6 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -385,7 +361,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	int success;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -426,13 +401,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-				(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	/*
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
@@ -441,7 +409,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -460,8 +428,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -474,7 +440,6 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_next, free_entries;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	prod_head = r->prod.head;
 	cons_tail = r->cons.tail;
@@ -503,15 +468,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-			(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -677,8 +635,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -699,8 +655,6 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -725,8 +679,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -751,8 +703,6 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -770,8 +720,6 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -793,8 +741,6 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 05/14] ring: remove the yield when waiting for tail update
                     ` (2 preceding siblings ...)
  2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 04/14] ring: remove debug setting Bruce Richardson
@ 2017-02-23 17:23  4% ` Bruce Richardson
  2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 06/14] ring: remove watermark support Bruce Richardson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

There was a compile time setting to enable a ring to yield when
it entered a loop in mp or mc rings waiting for the tail pointer update.
Build time settings are not recommended for enabling/disabling features,
and since this was off by default, remove it completely. If needed, a
runtime enabled equivalent can be used.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                              |  1 -
 doc/guides/prog_guide/env_abstraction_layer.rst |  5 ----
 doc/guides/rel_notes/release_17_05.rst          |  1 +
 lib/librte_ring/rte_ring.h                      | 35 +++++--------------------
 4 files changed, 7 insertions(+), 35 deletions(-)

diff --git a/config/common_base b/config/common_base
index b3d8272..d5beadd 100644
--- a/config/common_base
+++ b/config/common_base
@@ -447,7 +447,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
 # Compile librte_mempool
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 10a10a8..7c39cd2 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -352,11 +352,6 @@ Known Issues
 
   3. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
 
-  ``RTE_RING_PAUSE_REP_COUNT`` is defined for rte_ring to reduce contention. It's mainly for case 2, a yield is issued after number of times pause repeat.
-
-  It adds a sched_yield() syscall if the thread spins for too long while waiting on the other thread to finish its operations on the ring.
-  This gives the preempted thread a chance to proceed and finish with the ring enqueue/dequeue operation.
-
 + rte_timer
 
   Running  ``rte_timer_manager()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e0ebd71..c69ca8f 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -117,6 +117,7 @@ API Changes
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
+  * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 814f593..0f95c84 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -114,11 +114,6 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-#ifndef RTE_RING_PAUSE_REP_COUNT
-#define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
-                                    *   if RTE_RING_PAUSE_REP not defined. */
-#endif
-
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
 /* structure to hold a pair of head/tail values and other metadata */
@@ -388,7 +383,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t cons_tail, free_entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -442,18 +437,9 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->prod.tail != prod_head)) {
+	while (unlikely(r->prod.tail != prod_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->prod.tail = prod_next;
 	return ret;
 }
@@ -486,7 +472,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 {
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -563,7 +549,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_next, entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -608,18 +594,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * If there are other dequeues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->cons.tail != cons_head)) {
+	while (unlikely(r->cons.tail != cons_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -654,7 +631,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1 04/14] ring: remove debug setting
    2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting Bruce Richardson
  2017-02-23 17:23  3% ` [dpdk-dev] [PATCH v1 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
@ 2017-02-23 17:23  2% ` Bruce Richardson
  2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

The debug option only provided statistics to the user, most of
which could be tracked by the application itself. Remove this as a
compile time option, and feature, simplifying the code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_ring.c                   | 410 ---------------------------------
 config/common_base                     |   1 -
 doc/guides/prog_guide/ring_lib.rst     |   7 -
 doc/guides/rel_notes/release_17_05.rst |   1 +
 lib/librte_ring/rte_ring.c             |  41 ----
 lib/librte_ring/rte_ring.h             |  97 +-------
 6 files changed, 13 insertions(+), 544 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 5f09097..3891f5d 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -763,412 +763,6 @@ test_ring_burst_basic(void)
 	return -1;
 }
 
-static int
-test_ring_stats(void)
-{
-
-#ifndef RTE_LIBRTE_RING_DEBUG
-	printf("Enable RTE_LIBRTE_RING_DEBUG to test ring stats.\n");
-	return 0;
-#else
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i;
-	unsigned num_items            = 0;
-	unsigned failed_enqueue_ops   = 0;
-	unsigned failed_enqueue_items = 0;
-	unsigned failed_dequeue_ops   = 0;
-	unsigned failed_dequeue_items = 0;
-	unsigned last_enqueue_ops     = 0;
-	unsigned last_enqueue_items   = 0;
-	unsigned last_quota_ops       = 0;
-	unsigned last_quota_items     = 0;
-	unsigned lcore_id = rte_lcore_id();
-	struct rte_ring_debug_stats *ring_stats = &r->stats[lcore_id];
-
-	printf("Test the ring stats.\n");
-
-	/* Reset the watermark in case it was set in another test. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Allocate some dummy object pointers. */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
-
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-
-	/* Allocate some memory for copied objects. */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-
-	/* Set the head and tail pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	/* Do Enqueue tests. */
-	printf("Test the dequeue stats.\n");
-
-	/* Fill the ring up to RING_SIZE -1. */
-	printf("Fill the ring.\n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK); i++) {
-		rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK);
-		cur_src += MAX_BULK;
-	}
-
-	/* Adjust for final enqueue = MAX_BULK -1. */
-	cur_src--;
-
-	printf("Verify that the ring is full.\n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
-
-
-	printf("Verify the enqueue success stats.\n");
-	/* Stats should match above enqueue operations to fill the ring. */
-	if (ring_stats->enq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Current max objects is RING_SIZE -1. */
-	if (ring_stats->enq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any failures yet. */
-	if (ring_stats->enq_fail_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_fail_objs != 0)
-		goto fail;
-
-
-	printf("Test stats for SP burst enqueue to a full ring.\n");
-	num_items = 2;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for SP bulk enqueue to a full ring.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP burst enqueue to a full ring.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP bulk enqueue to a full ring.\n");
-	num_items = 16;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	/* Do Dequeue tests. */
-	printf("Test the dequeue stats.\n");
-
-	printf("Empty the ring.\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* There was only RING_SIZE -1 objects to dequeue. */
-	cur_dst++;
-
-	printf("Verify ring is empty.\n");
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	printf("Verify the dequeue success stats.\n");
-	/* Stats should match above dequeue operations. */
-	if (ring_stats->deq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Objects dequeued is RING_SIZE -1. */
-	if (ring_stats->deq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any dequeue failure stats yet. */
-	if (ring_stats->deq_fail_bulk != 0)
-		goto fail;
-
-	printf("Test stats for SC burst dequeue with an empty ring.\n");
-	num_items = 2;
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for SC bulk dequeue with an empty ring.\n");
-	num_items = 4;
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC burst dequeue with an empty ring.\n");
-	num_items = 8;
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC bulk dequeue with an empty ring.\n");
-	num_items = 16;
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test total enqueue/dequeue stats.\n");
-	/* At this point the enqueue and dequeue stats should be the same. */
-	if (ring_stats->enq_success_bulk != ring_stats->deq_success_bulk)
-		goto fail;
-	if (ring_stats->enq_success_objs != ring_stats->deq_success_objs)
-		goto fail;
-	if (ring_stats->enq_fail_bulk    != ring_stats->deq_fail_bulk)
-		goto fail;
-	if (ring_stats->enq_fail_objs    != ring_stats->deq_fail_objs)
-		goto fail;
-
-
-	/* Watermark Tests. */
-	printf("Test the watermark/quota stats.\n");
-
-	printf("Verify the initial watermark stats.\n");
-	/* Watermark stats should be 0 since there is no watermark. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Set a watermark. */
-	rte_ring_set_water_mark(r, 16);
-
-	/* Reset pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue below watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should still be 0. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Success stats should have increased. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops + 1)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items + num_items)
-		goto fail;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue at watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != 1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP burst enqueue above watermark.\n");
-	num_items = 1;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP burst enqueue above watermark.\n");
-	num_items = 2;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP bulk enqueue above watermark.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP bulk enqueue above watermark.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	printf("Test watermark success stats.\n");
-	/* Success stats should be same as last non-watermarked enqueue. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items)
-		goto fail;
-
-
-	/* Cleanup. */
-
-	/* Empty the ring. */
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* Reset the watermark. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
-	return 0;
-
-fail:
-	free(src);
-	free(dst);
-	return -1;
-#endif
-}
-
 /*
  * it will always fail to create ring with a wrong ring size number in this function
  */
@@ -1335,10 +929,6 @@ test_ring(void)
 	if (test_ring_basic() < 0)
 		return -1;
 
-	/* ring stats */
-	if (test_ring_stats() < 0)
-		return -1;
-
 	/* basic operations */
 	if (test_live_watermark_change() < 0)
 		return -1;
diff --git a/config/common_base b/config/common_base
index 099ffda..b3d8272 100644
--- a/config/common_base
+++ b/config/common_base
@@ -447,7 +447,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_LIBRTE_RING_DEBUG=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index 9f69753..d4ab502 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -110,13 +110,6 @@ Once an enqueue operation reaches the high water mark, the producer is notified,
 
 This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE.
 
-Debug
-~~~~~
-
-When debug is enabled (CONFIG_RTE_LIBRTE_RING_DEBUG is set),
-the library stores some per-ring statistic counters about the number of enqueues/dequeues.
-These statistics are per-core to avoid concurrent accesses or atomic operations.
-
 Use Cases
 ---------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index ea45e0c..e0ebd71 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -116,6 +116,7 @@ API Changes
   have been made to it:
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
+  * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 80fc356..90ee63f 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -131,12 +131,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 			  RTE_CACHE_LINE_MASK) != 0);
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_LIBRTE_RING_DEBUG
-	RTE_BUILD_BUG_ON((sizeof(struct rte_ring_debug_stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
@@ -284,11 +278,6 @@ rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
 {
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats sum;
-	unsigned lcore_id;
-#endif
-
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
 	fprintf(f, "  size=%"PRIu32"\n", r->size);
@@ -302,36 +291,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		fprintf(f, "  watermark=0\n");
 	else
 		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
-
-	/* sum and dump statistics */
-#ifdef RTE_LIBRTE_RING_DEBUG
-	memset(&sum, 0, sizeof(sum));
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		sum.enq_success_bulk += r->stats[lcore_id].enq_success_bulk;
-		sum.enq_success_objs += r->stats[lcore_id].enq_success_objs;
-		sum.enq_quota_bulk += r->stats[lcore_id].enq_quota_bulk;
-		sum.enq_quota_objs += r->stats[lcore_id].enq_quota_objs;
-		sum.enq_fail_bulk += r->stats[lcore_id].enq_fail_bulk;
-		sum.enq_fail_objs += r->stats[lcore_id].enq_fail_objs;
-		sum.deq_success_bulk += r->stats[lcore_id].deq_success_bulk;
-		sum.deq_success_objs += r->stats[lcore_id].deq_success_objs;
-		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
-		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
-	}
-	fprintf(f, "  size=%"PRIu32"\n", r->size);
-	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
-	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
-	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
-	fprintf(f, "  enq_quota_objs=%"PRIu64"\n", sum.enq_quota_objs);
-	fprintf(f, "  enq_fail_bulk=%"PRIu64"\n", sum.enq_fail_bulk);
-	fprintf(f, "  enq_fail_objs=%"PRIu64"\n", sum.enq_fail_objs);
-	fprintf(f, "  deq_success_bulk=%"PRIu64"\n", sum.deq_success_bulk);
-	fprintf(f, "  deq_success_objs=%"PRIu64"\n", sum.deq_success_objs);
-	fprintf(f, "  deq_fail_bulk=%"PRIu64"\n", sum.deq_fail_bulk);
-	fprintf(f, "  deq_fail_objs=%"PRIu64"\n", sum.deq_fail_objs);
-#else
-	fprintf(f, "  no statistics available\n");
-#endif
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 6e75c15..814f593 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -109,24 +109,6 @@ enum rte_ring_queue_behavior {
 	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
 };
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-/**
- * A structure that stores the ring statistics (per-lcore).
- */
-struct rte_ring_debug_stats {
-	uint64_t enq_success_bulk; /**< Successful enqueues number. */
-	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
-	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
-	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
-	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
-	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
-	uint64_t deq_success_bulk; /**< Successful dequeues number. */
-	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
-	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
-	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
-} __rte_cache_aligned;
-#endif
-
 #define RTE_RING_MZ_PREFIX "RG_"
 /**< The maximum length of a ring name. */
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
@@ -179,10 +161,6 @@ struct rte_ring {
 	/** Ring consumer status. */
 	struct rte_ring_ht_ptr cons __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-#endif
-
 	void *ring[] __rte_cache_aligned;   /**< Memory space of ring starts here.
 	                                     * not volatile so need to be careful
 	                                     * about compiler re-ordering */
@@ -194,27 +172,6 @@ struct rte_ring {
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
- * @internal When debug is enabled, store ring statistics.
- * @param r
- *   A pointer to the ring.
- * @param name
- *   The name of the statistics field to increment in the ring.
- * @param n
- *   The number to add to the object-oriented statistics.
- */
-#ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {                        \
-		unsigned __lcore_id = rte_lcore_id();           \
-		if (__lcore_id < RTE_MAX_LCORE) {               \
-			r->stats[__lcore_id].name##_objs += n;  \
-			r->stats[__lcore_id].name##_bulk += 1;  \
-		}                                               \
-	} while(0)
-#else
-#define __RING_STAT_ADD(r, name, n) do {} while(0)
-#endif
-
-/**
  * Calculate the memory size needed for a ring
  *
  * This function returns the number of bytes needed for a ring, given
@@ -455,17 +412,12 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOBUFS;
-			}
 			else {
 				/* No free entry available */
-				if (unlikely(free_entries == 0)) {
-					__RING_STAT_ADD(r, enq_fail, n);
+				if (unlikely(free_entries == 0))
 					return 0;
-				}
-
 				n = free_entries;
 			}
 		}
@@ -480,15 +432,11 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	/*
 	 * If there are other enqueues in progress that preceded us,
@@ -552,17 +500,12 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, enq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOBUFS;
-		}
 		else {
 			/* No free entry available */
-			if (unlikely(free_entries == 0)) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (unlikely(free_entries == 0))
 				return 0;
-			}
-
 			n = free_entries;
 		}
 	}
@@ -575,15 +518,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	r->prod.tail = prod_next;
 	return ret;
@@ -647,16 +586,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOENT;
-			}
 			else {
-				if (unlikely(entries == 0)){
-					__RING_STAT_ADD(r, deq_fail, n);
+				if (unlikely(entries == 0))
 					return 0;
-				}
-
 				n = entries;
 			}
 		}
@@ -686,7 +620,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 			sched_yield();
 		}
 	}
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -733,16 +666,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	entries = prod_tail - cons_head;
 
 	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, deq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOENT;
-		}
 		else {
-			if (unlikely(entries == 0)){
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (unlikely(entries == 0))
 				return 0;
-			}
-
 			n = entries;
 		}
 	}
@@ -754,7 +682,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	DEQUEUE_PTRS();
 	rte_smp_rmb();
 
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
 }
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 03/14] ring: eliminate duplication of size and mask fields
    2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting Bruce Richardson
@ 2017-02-23 17:23  3% ` Bruce Richardson
  2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 04/14] ring: remove debug setting Bruce Richardson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_ring.c       |  6 +++---
 lib/librte_ring/rte_ring.c | 20 ++++++++++----------
 lib/librte_ring/rte_ring.h | 32 ++++++++++++++++----------------
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index ebcb896..5f09097 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -148,7 +148,7 @@ check_live_watermark_change(__attribute__((unused)) void *dummy)
 		}
 
 		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->prod.watermark;
+		watermark = r->watermark;
 		if (watermark != watermark_old &&
 		    (watermark_old != 16 || watermark != 32)) {
 			printf("Bad watermark change %u -> %u\n", watermark_old,
@@ -213,7 +213,7 @@ test_set_watermark( void ){
 		printf( " ring lookup failed\n" );
 		goto error;
 	}
-	count = r->prod.size*2;
+	count = r->size * 2;
 	setwm = rte_ring_set_water_mark(r, count);
 	if (setwm != -EINVAL){
 		printf("Test failed to detect invalid watermark count value\n");
@@ -222,7 +222,7 @@ test_set_watermark( void ){
 
 	count = 0;
 	rte_ring_set_water_mark(r, count);
-	if (r->prod.watermark != r->prod.size) {
+	if (r->watermark != r->size) {
 		printf("Test failed to detect invalid watermark count value\n");
 		goto error;
 	}
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4bc6da1..80fc356 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -144,11 +144,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.watermark = count;
+	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
-	r->prod.size = r->cons.size = count;
-	r->prod.mask = r->cons.mask = count-1;
+	r->size = count;
+	r->mask = count - 1;
 	r->prod.head = r->cons.head = 0;
 	r->prod.tail = r->cons.tail = 0;
 
@@ -269,14 +269,14 @@ rte_ring_free(struct rte_ring *r)
 int
 rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 {
-	if (count >= r->prod.size)
+	if (count >= r->size)
 		return -EINVAL;
 
 	/* if count is 0, disable the watermarking */
 	if (count == 0)
-		count = r->prod.size;
+		count = r->size;
 
-	r->prod.watermark = count;
+	r->watermark = count;
 	return 0;
 }
 
@@ -291,17 +291,17 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->prod.watermark == r->prod.size)
+	if (r->watermark == r->size)
 		fprintf(f, "  watermark=0\n");
 	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->prod.watermark);
+		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 
 	/* sum and dump statistics */
 #ifdef RTE_LIBRTE_RING_DEBUG
@@ -318,7 +318,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
 		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
 	}
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
 	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
 	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 0c8defd..6e75c15 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -143,13 +143,10 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 struct rte_ring_ht_ptr {
 	volatile uint32_t head;  /**< Prod/consumer head. */
 	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
 	union {
 		uint32_t sp_enqueue; /**< True, if single producer. */
 		uint32_t sc_dequeue; /**< True, if single consumer. */
 	};
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 };
 
 /**
@@ -169,9 +166,12 @@ struct rte_ring {
 	 * next time the ABI changes
 	 */
 	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
+	int flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_ht_ptr prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
@@ -350,7 +350,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * Placed here since identical code needed in both
  * single and multi producer enqueue functions */
 #define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
+	const uint32_t size = r->size; \
 	uint32_t idx = prod_head & mask; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
@@ -377,7 +377,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * single and multi consumer dequeue functions */
 #define DEQUEUE_PTRS() do { \
 	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
+	const uint32_t size = r->size; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
 			obj_table[i] = r->ring[idx]; \
@@ -432,7 +432,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -480,7 +480,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -539,7 +539,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	prod_head = r->prod.head;
@@ -575,7 +575,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -625,7 +625,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -722,7 +722,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
 	prod_tail = r->prod.tail;
@@ -1051,7 +1051,7 @@ rte_ring_full(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+	return ((cons_tail - prod_tail - 1) & r->mask) == 0;
 }
 
 /**
@@ -1084,7 +1084,7 @@ rte_ring_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (prod_tail - cons_tail) & r->prod.mask;
+	return (prod_tail - cons_tail) & r->mask;
 }
 
 /**
@@ -1100,7 +1100,7 @@ rte_ring_free_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (cons_tail - prod_tail - 1) & r->prod.mask;
+	return (cons_tail - prod_tail - 1) & r->mask;
 }
 
 /**
@@ -1114,7 +1114,7 @@ rte_ring_free_count(const struct rte_ring *r)
 static inline unsigned int
 rte_ring_get_size(const struct rte_ring *r)
 {
-	return r->prod.size;
+	return r->size;
 }
 
 /**
-- 
2.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  @ 2017-02-23 17:23  4% ` Bruce Richardson
  2017-02-28 11:35  0%   ` Jerin Jacob
  2017-02-23 17:23  3% ` [dpdk-dev] [PATCH v1 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Users compiling DPDK should not need to know or care about the arrangement
of cachelines in the rte_ring structure. Therefore just remove the build
option and set the structures to be always split. For improved
performance use 128B rather than 64B alignment since it stops the producer
and consumer data being on adjacent cachelines.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                     | 1 -
 doc/guides/rel_notes/release_17_05.rst | 6 ++++++
 lib/librte_ring/rte_ring.c             | 2 --
 lib/librte_ring/rte_ring.h             | 8 ++------
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/config/common_base b/config/common_base
index aeee13e..099ffda 100644
--- a/config/common_base
+++ b/config/common_base
@@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 #
 CONFIG_RTE_LIBRTE_RING=y
 CONFIG_RTE_LIBRTE_RING_DEBUG=n
-CONFIG_RTE_RING_SPLIT_PROD_CONS=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e25ea9f..ea45e0c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -110,6 +110,12 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Reworked rte_ring library**
+
+  The rte_ring library has been reworked and updated. The following changes
+  have been made to it:
+
+  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index ca0a108..4bc6da1 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	/* compilation-time checks */
 	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_RING_SPLIT_PROD_CONS
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #ifdef RTE_LIBRTE_RING_DEBUG
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 72ccca5..04fe667 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -168,7 +168,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Producer head. */
 		volatile uint32_t tail;  /**< Producer tail. */
-	} prod __rte_cache_aligned;
+	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
 
 	/** Ring consumer status. */
 	struct cons {
@@ -177,11 +177,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Consumer head. */
 		volatile uint32_t tail;  /**< Consumer tail. */
-#ifdef RTE_RING_SPLIT_PROD_CONS
-	} cons __rte_cache_aligned;
-#else
-	} cons;
-#endif
+	} cons __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
 
 #ifdef RTE_LIBRTE_RING_DEBUG
 	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version
  2017-02-22 13:12  7%   ` Christian Ehrhardt
@ 2017-02-22 13:24 20%     ` Christian Ehrhardt
  2017-02-28  8:34  4%       ` Jan Blunck
  2017-02-23 18:48  4%     ` [dpdk-dev] Further fun with ABI tracking Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Christian Ehrhardt @ 2017-02-22 13:24 UTC (permalink / raw)
  To: dev
  Cc: Christian Ehrhardt, cjcollier @ linuxfoundation . org,
	ricardo.salveti, Luca Boccassi

Downstreams might want to provide different DPDK releases at the same
time to support multiple consumers of DPDK linked against older and newer
sonames.

Also due to the interdependencies that DPDK libraries can have applications
might end up with an executable space in which multiple versions of a
library are mapped by ld.so.

Think of LibA that got an ABI bump and LibB that did not get an ABI bump
but is depending on LibA.

    Application
    \-> LibA.old
    \-> LibB.new -> LibA.new

That is a conflict which can be avoided by setting CONFIG_RTE_MAJOR_ABI.
If set CONFIG_RTE_MAJOR_ABI overwrites any LIBABIVER value.
An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all
libraries librte<?>.so.16.11 instead of librte<?>.so.<LIBABIVER>.

We need to cut arbitrary long stings after the .so now and this would work
for any ABI version in LIBABIVER:
  $(Q)ln -s -f $< $(patsubst %.$(LIBABIVER),%,$@)
But using the following instead additionally allows to simplify the Make
File for the CONFIG_RTE_NEXT_ABI case.
  $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 config/common_base                     |  5 +++++
 doc/guides/contributing/versioning.rst | 25 +++++++++++++++++++++++++
 mk/rte.lib.mk                          | 12 +++++++-----
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/config/common_base b/config/common_base
index aeee13e..37aa1e1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -75,6 +75,11 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
 CONFIG_RTE_NEXT_ABI=y
 
 #
+# Major ABI to overwrite library specific LIBABIVER
+#
+CONFIG_RTE_MAJOR_ABI=
+
+#
 # Machine's cache line size
 #
 CONFIG_RTE_CACHE_LINE_SIZE=64
diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
index fbc44a7..8aaf370 100644
--- a/doc/guides/contributing/versioning.rst
+++ b/doc/guides/contributing/versioning.rst
@@ -133,6 +133,31 @@ The macros exported are:
   fully qualified function ``p``, so that if a symbol becomes versioned, it
   can still be mapped back to the public symbol name.
 
+Setting a Major ABI version
+---------------------------
+
+Downstreams might want to provide different DPDK releases at the same time to
+support multiple consumers of DPDK linked against older and newer sonames.
+
+Also due to the interdependencies that DPDK libraries can have applications
+might end up with an executable space in which multiple versions of a library
+are mapped by ld.so.
+
+Think of LibA that got an ABI bump and LibB that did not get an ABI bump but is
+depending on LibA.
+
+.. note::
+
+    Application
+    \-> LibA.old
+    \-> LibB.new -> LibA.new
+
+That is a conflict which can be avoided by setting ``CONFIG_RTE_MAJOR_ABI``.
+If set, the value of ``CONFIG_RTE_MAJOR_ABI`` overwrites all - otherwise per
+library - versions defined in the libraries ``LIBABIVER``.
+An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all libraries
+``librte<?>.so.16.11`` instead of ``librte<?>.so.<LIBABIVER>``.
+
 Examples of ABI Macro use
 -------------------------
 
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 33a5f5a..06046c2 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -40,6 +40,12 @@ EXTLIB_BUILD ?= n
 # VPATH contains at least SRCDIR
 VPATH += $(SRCDIR)
 
+ifneq ($(CONFIG_RTE_MAJOR_ABI),)
+ifneq ($(LIBABIVER),)
+LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
+endif
+endif
+
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
 ifeq ($(EXTLIB_BUILD),n)
@@ -156,11 +162,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
 	@[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
 	$(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
-ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
-	$(Q)ln -s -f $< $(basename $(basename $@))
-else
-	$(Q)ln -s -f $< $(basename $@)
-endif
+	$(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
 endif
 
 #
-- 
2.7.4

^ permalink raw reply	[relevance 20%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-14 20:31  9% ` Jan Blunck
@ 2017-02-22 13:12  7%   ` Christian Ehrhardt
  2017-02-22 13:24 20%     ` [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version Christian Ehrhardt
  2017-02-23 18:48  4%     ` [dpdk-dev] Further fun with ABI tracking Ferruh Yigit
  0 siblings, 2 replies; 200+ results
From: Christian Ehrhardt @ 2017-02-22 13:12 UTC (permalink / raw)
  To: Jan Blunck; +Cc: dev, cjcollier, ricardo.salveti, Luca Boccassi

On Tue, Feb 14, 2017 at 9:31 PM, Jan Blunck <jblunck@infradead.org> wrote:

> > 1. Downstreams to insert Major version into soname
> > Distributions could insert the DPDK major version (like 16.11) into the
> > soname and package names. A common example of this is libboost [5].
> > That would perfectly allow 16.07.<LIBABIVER> to coexist with
> > 16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
> > Yet it would mean that anything depending on the old library will have to
> > be recompiled to pick up the new code, even if it depends on an ABI that
> is
> > still present in the new release.
> > Also - not a technical reason - but it is clearly more work to force
> update
> > all dependencies and clean out old packages for every release.
>
> Actually this isn't exactly what I proposed during the summit. Just
> keep it simple and fix the ABI version of all libraries at 16.11.0.
> This is a proven approach and has been used for years with different
> libraries.


Since there was no other response I'll try to wrap up.

Yes #1 also is my preferred solution at the moment.
We tried with individual following the tracking of LIBABIVER upstream but
as outlined before we hit too many issues.
I discussed it in the deb_dpdk group which acked as well to use this as
general approach.
The other options have too obvious flaws as I listed on my initial report
and - thanks btw - you added a few more.

@Bruce - sorry I don't think dropping config options is the solution. Yet
my suggestion does not prevent you from doing so.



> You could easily do this independently of us upstream
> fixing the ABI problems.



I agree, but I'd like to suggest the mechanism I want to implement.
An ack by upstream for the Feature to set such a major ABI would be great.
Actually since it is optional and can help more people integrating DPDK
getting it accepted upstream be even better.

I'll send a patch in reply to this thread later today that implements what
I have in mind.


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH v2] lpm: extend IPv6 next hop field
  2017-02-19 17:14  4% [dpdk-dev] [PATCH] lpm: extend IPv6 next hop field Vladyslav Buslov
@ 2017-02-21 14:46  4% ` Vladyslav Buslov
  0 siblings, 0 replies; 200+ results
From: Vladyslav Buslov @ 2017-02-21 14:46 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev

This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
---
 app/test/test_lpm6.c                            | 115 ++++++++++++++------
 app/test/test_lpm6_perf.c                       |   4 +-
 doc/guides/prog_guide/lpm6_lib.rst              |   2 +-
 doc/guides/rel_notes/release_17_05.rst          |   5 +
 examples/ip_fragmentation/main.c                |  17 +--
 examples/ip_reassembly/main.c                   |  17 +--
 examples/ipsec-secgw/ipsec-secgw.c              |   2 +-
 examples/l3fwd/l3fwd_lpm_sse.h                  |  24 ++---
 examples/performance-thread/l3fwd-thread/main.c |  11 +-
 lib/librte_lpm/rte_lpm6.c                       | 134 +++++++++++++++++++++---
 lib/librte_lpm/rte_lpm6.h                       |  32 +++++-
 lib/librte_lpm/rte_lpm_version.map              |  10 ++
 lib/librte_table/rte_table_lpm_ipv6.c           |   9 +-
 13 files changed, 292 insertions(+), 90 deletions(-)

diff --git a/app/test/test_lpm6.c b/app/test/test_lpm6.c
index 61134f7..e0e7bf0 100644
--- a/app/test/test_lpm6.c
+++ b/app/test/test_lpm6.c
@@ -79,6 +79,7 @@ static int32_t test24(void);
 static int32_t test25(void);
 static int32_t test26(void);
 static int32_t test27(void);
+static int32_t test28(void);
 
 rte_lpm6_test tests6[] = {
 /* Test Cases */
@@ -110,6 +111,7 @@ rte_lpm6_test tests6[] = {
 	test25,
 	test26,
 	test27,
+	test28,
 };
 
 #define NUM_LPM6_TESTS                (sizeof(tests6)/sizeof(tests6[0]))
@@ -354,7 +356,7 @@ test6(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -392,7 +394,7 @@ test7(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[10][16];
-	int16_t next_hop_return[10];
+	int32_t next_hop_return[10];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -469,7 +471,8 @@ test9(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 16, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 	uint8_t i;
 
@@ -513,7 +516,8 @@ test10(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -557,7 +561,8 @@ test11(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -617,7 +622,8 @@ test12(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -655,7 +661,8 @@ test13(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = 2;
@@ -702,7 +709,8 @@ test14(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 25, next_hop_add = 100;
+	uint8_t depth = 25;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -748,7 +756,8 @@ test15(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 24, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 24;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -784,7 +793,8 @@ test16(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {12,12,1,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 128, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 128;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -828,7 +838,8 @@ test17(void)
 	uint8_t ip1[] = {127,255,255,255,255,255,255,255,255,
 			255,255,255,255,255,255,255};
 	uint8_t ip2[] = {128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -857,7 +868,7 @@ test17(void)
 
 	/* Loop with rte_lpm6_delete. */
 	for (depth = 16; depth >= 1; depth--) {
-		next_hop_add = (uint8_t) (depth - 1);
+		next_hop_add = (depth - 1);
 
 		status = rte_lpm6_delete(lpm, ip2, depth);
 		TEST_LPM_ASSERT(status == 0);
@@ -893,8 +904,9 @@ test18(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16], ip_1[16], ip_2[16];
-	uint8_t depth, depth_1, depth_2, next_hop_add, next_hop_add_1,
-		next_hop_add_2, next_hop_return;
+	uint8_t depth, depth_1, depth_2;
+	uint32_t next_hop_add, next_hop_add_1,
+			next_hop_add_2, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1055,7 +1067,8 @@ test19(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1253,7 +1266,8 @@ test20(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1320,8 +1334,9 @@ test21(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[4][16];
-	uint8_t depth, next_hop_add;
-	int16_t next_hop_return[4];
+	uint8_t depth;
+	uint32_t next_hop_add;
+	int32_t next_hop_return[4];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1378,8 +1393,9 @@ test22(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[5][16];
-	uint8_t depth[5], next_hop_add;
-	int16_t next_hop_return[5];
+	uint8_t depth[5];
+	uint32_t next_hop_add;
+	int32_t next_hop_return[5];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1495,7 +1511,8 @@ test23(void)
 	struct rte_lpm6_config config;
 	uint32_t i;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1579,7 +1596,8 @@ test25(void)
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
 	uint32_t i;
-	uint8_t depth, next_hop_add, next_hop_return, next_hop_expected;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return, next_hop_expected;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1632,10 +1650,10 @@ test26(void)
 	uint8_t d_ip_10_32 = 32;
 	uint8_t	d_ip_10_24 = 24;
 	uint8_t	d_ip_20_25 = 25;
-	uint8_t next_hop_ip_10_32 = 100;
-	uint8_t	next_hop_ip_10_24 = 105;
-	uint8_t	next_hop_ip_20_25 = 111;
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_ip_10_32 = 100;
+	uint32_t next_hop_ip_10_24 = 105;
+	uint32_t next_hop_ip_20_25 = 111;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1650,7 +1668,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_32, &next_hop_return);
-	uint8_t test_hop_10_32 = next_hop_return;
+	uint32_t test_hop_10_32 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_32);
 
@@ -1659,7 +1677,7 @@ test26(void)
 			return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_24, &next_hop_return);
-	uint8_t test_hop_10_24 = next_hop_return;
+	uint32_t test_hop_10_24 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_24);
 
@@ -1668,7 +1686,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_20_25, &next_hop_return);
-	uint8_t test_hop_20_25 = next_hop_return;
+	uint32_t test_hop_20_25 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_20_25);
 
@@ -1707,7 +1725,8 @@ test27(void)
 		struct rte_lpm6 *lpm = NULL;
 		struct rte_lpm6_config config;
 		uint8_t ip[] = {128,128,128,128,128,128,128,128,128,128,128,128,128,128,0,0};
-		uint8_t depth = 128, next_hop_add = 100, next_hop_return;
+		uint8_t depth = 128;
+		uint32_t next_hop_add = 100, next_hop_return;
 		int32_t status = 0;
 		int i, j;
 
@@ -1746,6 +1765,42 @@ test27(void)
 }
 
 /*
+ * Call add, lookup and delete for a single rule with maximum 21bit next_hop
+ * size.
+ * Check that next_hop returned from lookup is equal to provisioned value.
+ * Delete the rule and check that the same test returs a miss.
+ */
+int32_t
+test28(void)
+{
+	struct rte_lpm6 *lpm = NULL;
+	struct rte_lpm6_config config;
+	uint8_t ip[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 0x001FFFFF, next_hop_return = 0;
+	int32_t status = 0;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	status = rte_lpm6_add(lpm, ip, depth, next_hop_add);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm6_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add));
+
+	status = rte_lpm6_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	rte_lpm6_free(lpm);
+
+	return PASS;
+}
+
+/*
  * Do all unit tests.
  */
 static int
diff --git a/app/test/test_lpm6_perf.c b/app/test/test_lpm6_perf.c
index 0723081..30be430 100644
--- a/app/test/test_lpm6_perf.c
+++ b/app/test/test_lpm6_perf.c
@@ -86,7 +86,7 @@ test_lpm6_perf(void)
 	struct rte_lpm6_config config;
 	uint64_t begin, total_time;
 	unsigned i, j;
-	uint8_t next_hop_add = 0xAA, next_hop_return = 0;
+	uint32_t next_hop_add = 0xAA, next_hop_return = 0;
 	int status = 0;
 	int64_t count = 0;
 
@@ -148,7 +148,7 @@ test_lpm6_perf(void)
 	count = 0;
 
 	uint8_t ip_batch[NUM_IPS_ENTRIES][16];
-	int16_t next_hops[NUM_IPS_ENTRIES];
+	int32_t next_hops[NUM_IPS_ENTRIES];
 
 	for (i = 0; i < NUM_IPS_ENTRIES; i++)
 		memcpy(ip_batch[i], large_ips_table[i].ip, 16);
diff --git a/doc/guides/prog_guide/lpm6_lib.rst b/doc/guides/prog_guide/lpm6_lib.rst
index 0aea5c5..f791507 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -53,7 +53,7 @@ several thousand IPv6 rules, but the number can vary depending on the case.
 An LPM prefix is represented by a pair of parameters (128-bit key, depth), with depth in the range of 1 to 128.
 An LPM rule is represented by an LPM prefix and some user data associated with the prefix.
 The prefix serves as the unique identifier for the LPM rule.
-In this implementation, the user data is 1-byte long and is called "next hop",
+In this implementation, the user data is 21-bits long and is called "next hop",
 which corresponds to its main use of storing the ID of the next hop in a routing table entry.
 
 The main methods exported for the LPM component are:
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 48fb5bd..723e085 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,6 +41,9 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Increased number of next hops for LPM IPv6 to 2^21.**
+
+  The next_hop field is extended from 8 bits to 21 bits for IPv6.
 
 Resolved Issues
 ---------------
@@ -110,6 +113,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
+  while keeping ABI compatibility.
 
 ABI Changes
 -----------
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index e1e32c6..89d08c8 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -265,8 +265,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		uint8_t queueid, uint8_t port_in)
 {
 	struct rx_queue *rxq;
-	uint32_t i, len, next_hop_ipv4;
-	uint8_t next_hop_ipv6, port_out, ipv6;
+	uint32_t i, len, next_hop;
+	uint8_t port_out, ipv6;
 	int32_t len2;
 
 	ipv6 = 0;
@@ -290,9 +290,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			port_out = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
@@ -326,9 +326,10 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_hdr = rte_pktmbuf_mtod(m, struct ipv6_hdr *);
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			port_out = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+						&next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 50fe422..661b64f 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -346,8 +346,8 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	struct rte_ip_frag_death_row *dr;
 	struct rx_queue *rxq;
 	void *d_addr_bytes;
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6, dst_port;
+	uint32_t next_hop;
+	uint8_t dst_port;
 
 	rxq = &qconf->rx_queue_list[queue];
 
@@ -390,9 +390,9 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			dst_port = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
@@ -427,9 +427,10 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			dst_port = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+						&next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv6);
diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
index 5a4c9b7..5744c46 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -618,7 +618,7 @@ route4_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 static inline void
 route6_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 {
-	int16_t hop[MAX_PKT_BURST * 2];
+	int32_t hop[MAX_PKT_BURST * 2];
 	uint8_t dst_ip[MAX_PKT_BURST * 2][16];
 	uint8_t *ip6_dst;
 	uint16_t i, offset;
diff --git a/examples/l3fwd/l3fwd_lpm_sse.h b/examples/l3fwd/l3fwd_lpm_sse.h
index 538fe3d..aa06b6d 100644
--- a/examples/l3fwd/l3fwd_lpm_sse.h
+++ b/examples/l3fwd/l3fwd_lpm_sse.h
@@ -40,8 +40,7 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ipv4_hdr *ipv4_hdr;
 	struct ether_hdr *eth_hdr;
@@ -51,9 +50,11 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
 		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
 
-		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct,
-				rte_be_to_cpu_32(ipv4_hdr->dst_addr), &next_hop_ipv4) == 0) ?
-						next_hop_ipv4 : portid);
+		return (uint16_t) (
+			(rte_lpm_lookup(qconf->ipv4_lookup_struct,
+					rte_be_to_cpu_32(ipv4_hdr->dst_addr),
+					&next_hop) == 0) ?
+						next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -61,8 +62,8 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
@@ -78,14 +79,13 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
-			&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+			&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -93,8 +93,8 @@ lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
diff --git a/examples/performance-thread/l3fwd-thread/main.c b/examples/performance-thread/l3fwd-thread/main.c
index 53083df..fa99daf 100644
--- a/examples/performance-thread/l3fwd-thread/main.c
+++ b/examples/performance-thread/l3fwd-thread/main.c
@@ -909,7 +909,7 @@ static inline uint8_t
 get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid,
 		lookup6_struct_t *ipv6_l3fwd_lookup_struct)
 {
-	uint8_t next_hop;
+	uint32_t next_hop;
 
 	return (uint8_t) ((rte_lpm6_lookup(ipv6_l3fwd_lookup_struct,
 			((struct ipv6_hdr *)ipv6_hdr)->dst_addr, &next_hop) == 0) ?
@@ -1396,15 +1396,14 @@ rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
 static inline __attribute__((always_inline)) uint16_t
 get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv4_lookup_struct, dst_ipv4,
-				&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+				&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -1413,8 +1412,8 @@ get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 
 		return (uint16_t) ((rte_lpm6_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0) ? next_hop_ipv6 :
-						portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0) ?
+				next_hop : portid);
 
 	}
 
diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 32fdba0..9cc7be7 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -97,7 +97,7 @@ struct rte_lpm6_tbl_entry {
 /** Rules tbl entry structure. */
 struct rte_lpm6_rule {
 	uint8_t ip[RTE_LPM6_IPV6_ADDR_SIZE]; /**< Rule IP address. */
-	uint8_t next_hop; /**< Rule next hop. */
+	uint32_t next_hop; /**< Rule next hop. */
 	uint8_t depth; /**< Rule depth. */
 };
 
@@ -297,7 +297,7 @@ rte_lpm6_free(struct rte_lpm6 *lpm)
  * the nexthop if so. Otherwise it adds a new rule if enough space is available.
  */
 static inline int32_t
-rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
+rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint32_t next_hop, uint8_t depth)
 {
 	uint32_t rule_index;
 
@@ -340,7 +340,7 @@ rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
  */
 static void
 expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
-		uint8_t next_hop)
+		uint32_t next_hop)
 {
 	uint32_t tbl8_group_end, tbl8_gindex_next, j;
 
@@ -377,7 +377,7 @@ expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
 static inline int
 add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
 		struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip, uint8_t bytes,
-		uint8_t first_byte, uint8_t depth, uint8_t next_hop)
+		uint8_t first_byte, uint8_t depth, uint32_t next_hop)
 {
 	uint32_t tbl_index, tbl_range, tbl8_group_start, tbl8_group_end, i;
 	int32_t tbl8_gindex;
@@ -507,9 +507,17 @@ add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
  * Add a route
  */
 int
-rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop)
 {
+	return rte_lpm6_add_v1705(lpm, ip, depth, next_hop);
+}
+VERSION_SYMBOL(rte_lpm6_add, _v20, 2.0);
+
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop)
+{
 	struct rte_lpm6_tbl_entry *tbl;
 	struct rte_lpm6_tbl_entry *tbl_next;
 	int32_t rule_index;
@@ -560,6 +568,10 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_add, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip,
+				uint8_t depth, uint32_t next_hop),
+		rte_lpm6_add_v1705);
 
 /*
  * Takes a pointer to a table entry and inspect one level.
@@ -569,7 +581,7 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 static inline int
 lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		const struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip,
-		uint8_t first_byte, uint8_t *next_hop)
+		uint8_t first_byte, uint32_t *next_hop)
 {
 	uint32_t tbl8_index, tbl_entry;
 
@@ -589,7 +601,7 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		return 1;
 	} else {
 		/* If not extended then we can have a match. */
-		*next_hop = (uint8_t)tbl_entry;
+		*next_hop = ((uint32_t)tbl_entry & RTE_LPM6_TBL8_BITMASK);
 		return (tbl_entry & RTE_LPM6_LOOKUP_SUCCESS) ? 0 : -ENOENT;
 	}
 }
@@ -598,7 +610,26 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
  * Looks up an IP
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL)
+		return -EINVAL;
+
+	status = rte_lpm6_lookup_v1705(lpm, ip, &next_hop32);
+	if (status == 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+}
+VERSION_SYMBOL(rte_lpm6_lookup, _v20, 2.0);
+
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
+		uint32_t *next_hop)
 {
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
@@ -625,20 +656,23 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip,
+				uint32_t *next_hop), rte_lpm6_lookup_v1705);
 
 /*
  * Looks up a group of IP addresses
  */
 int
-rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n)
 {
 	unsigned i;
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
-	uint32_t tbl24_index;
-	uint8_t first_byte, next_hop;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
 	int status;
 
 	/* DEBUG: Check user input arguments. */
@@ -664,11 +698,59 @@ rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		if (status < 0)
 			next_hops[i] = -1;
 		else
-			next_hops[i] = next_hop;
+			next_hops[i] = (int16_t)next_hop;
+	}
+
+	return 0;
+}
+VERSION_SYMBOL(rte_lpm6_lookup_bulk_func, _v20, 2.0);
+
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t *next_hops, unsigned int n)
+{
+	unsigned int i;
+	const struct rte_lpm6_tbl_entry *tbl;
+	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
+	int status;
+
+	/* DEBUG: Check user input arguments. */
+	if ((lpm == NULL) || (ips == NULL) || (next_hops == NULL))
+		return -EINVAL;
+
+	for (i = 0; i < n; i++) {
+		first_byte = LOOKUP_FIRST_BYTE;
+		tbl24_index = (ips[i][0] << BYTES2_SIZE) |
+				(ips[i][1] << BYTE_SIZE) | ips[i][2];
+
+		/* Calculate pointer to the first entry to be inspected */
+		tbl = &lpm->tbl24[tbl24_index];
+
+		do {
+			/* Continue inspecting following levels
+			 * until success or failure
+			 */
+			status = lookup_step(lpm, tbl, &tbl_next, ips[i],
+					first_byte++, &next_hop);
+			tbl = tbl_next;
+		} while (status == 1);
+
+		if (status < 0)
+			next_hops[i] = -1;
+		else
+			next_hops[i] = (int32_t)next_hop;
 	}
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup_bulk_func, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+				uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+				int32_t *next_hops, unsigned int n),
+		rte_lpm6_lookup_bulk_func_v1705);
 
 /*
  * Finds a rule in rule table.
@@ -698,8 +780,28 @@ rule_find(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
  * Look for a rule in the high-level rules table
  */
 int
-rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop)
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL)
+		return -EINVAL;
+
+	status = rte_lpm6_is_rule_present_v1705(lpm, ip, depth, &next_hop32);
+	if (status > 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+
+}
+VERSION_SYMBOL(rte_lpm6_is_rule_present, _v20, 2.0);
+
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop)
 {
 	uint8_t ip_masked[RTE_LPM6_IPV6_ADDR_SIZE];
 	int32_t rule_index;
@@ -724,6 +826,10 @@ uint8_t *next_hop)
 	/* If rule is not found return 0. */
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_is_rule_present, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_is_rule_present(struct rte_lpm6 *lpm,
+				uint8_t *ip, uint8_t depth, uint32_t *next_hop),
+		rte_lpm6_is_rule_present_v1705);
 
 /*
  * Delete a rule from the rule table.
diff --git a/lib/librte_lpm/rte_lpm6.h b/lib/librte_lpm/rte_lpm6.h
index 13d027f..3a3342d 100644
--- a/lib/librte_lpm/rte_lpm6.h
+++ b/lib/librte_lpm/rte_lpm6.h
@@ -39,6 +39,7 @@
  */
 
 #include <stdint.h>
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -123,7 +124,13 @@ rte_lpm6_free(struct rte_lpm6 *lpm);
  */
 int
 rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
+int
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop);
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
 
 /**
  * Check if a rule is present in the LPM table,
@@ -142,7 +149,13 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
  */
 int
 rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop);
+		uint32_t *next_hop);
+int
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop);
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop);
 
 /**
  * Delete a rule from the LPM table.
@@ -199,7 +212,12 @@ rte_lpm6_delete_all(struct rte_lpm6 *lpm);
  *   -EINVAL for incorrect arguments, -ENOENT on lookup miss, 0 on lookup hit
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop);
+int
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
+		uint32_t *next_hop);
 
 /**
  * Lookup multiple IP addresses in an LPM table.
@@ -220,7 +238,15 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
 int
 rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
-		int16_t * next_hops, unsigned n);
+		int32_t *next_hops, unsigned int n);
+int
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int16_t *next_hops, unsigned int n);
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t *next_hops, unsigned int n);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 239b371..90beac8 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -34,3 +34,13 @@ DPDK_16.04 {
 	rte_lpm_delete_all;
 
 } DPDK_2.0;
+
+DPDK_17.05 {
+	global:
+
+	rte_lpm6_add;
+	rte_lpm6_is_rule_present;
+	rte_lpm6_lookup;
+	rte_lpm6_lookup_bulk_func;
+
+} DPDK_16.04;
diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c
index 836f4cf..1e1a173 100644
--- a/lib/librte_table/rte_table_lpm_ipv6.c
+++ b/lib/librte_table/rte_table_lpm_ipv6.c
@@ -211,9 +211,8 @@ rte_table_lpm_ipv6_entry_add(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint32_t nht_pos, nht_pos0_valid;
+	uint32_t nht_pos, nht_pos0, nht_pos0_valid;
 	int status;
-	uint8_t nht_pos0;
 
 	/* Check input parameters */
 	if (lpm == NULL) {
@@ -256,7 +255,7 @@ rte_table_lpm_ipv6_entry_add(
 
 	/* Add rule to low level LPM table */
 	if (rte_lpm6_add(lpm->lpm, ip_prefix->ip, ip_prefix->depth,
-		(uint8_t) nht_pos) < 0) {
+		nht_pos) < 0) {
 		RTE_LOG(ERR, TABLE, "%s: LPM IPv6 rule add failed\n", __func__);
 		return -1;
 	}
@@ -280,7 +279,7 @@ rte_table_lpm_ipv6_entry_delete(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint8_t nht_pos;
+	uint32_t nht_pos;
 	int status;
 
 	/* Check input parameters */
@@ -356,7 +355,7 @@ rte_table_lpm_ipv6_lookup(
 			uint8_t *ip = RTE_MBUF_METADATA_UINT8_PTR(pkt,
 				lpm->offset);
 			int status;
-			uint8_t nht_pos;
+			uint32_t nht_pos;
 
 			status = rte_lpm6_lookup(lpm->lpm, ip, &nht_pos);
 			if (status == 0) {
-- 
2.1.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 1/8] eal: use different constructor priorities for initcalls
  @ 2017-02-21 12:30  3%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-02-21 12:30 UTC (permalink / raw)
  To: Jan Blunck, dev; +Cc: david.marchand, shreyansh.jain

On 2/20/2017 2:17 PM, Jan Blunck wrote:
> This introduces different initcall macros to allow for late registration of
> the virtual device bus.
> 
> Signed-off-by: Jan Blunck <jblunck@infradead.org>
> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

<...>

>  
> -#define RTE_INIT(func) \
> -static void __attribute__((constructor, used)) func(void)
> +#define RTE_EAL_INIT(func) \
> +static void __attribute__((constructor(101), used)) func(void)
> +
> +#define RTE_POST_EAL_INIT(func) \
> +static void __attribute__((constructor(102), used)) func(void)
> +
> +#define RTE_DEV_INIT(func) \
> +static void __attribute__((constructor(103), used)) func(void)
> +
> +#define RTE_INIT(func) RTE_DEV_INIT(func)

Does it make sense to give some gaps among priorities,
101, 102, 103 --> 100, 200 , 300

When new priorities added (not sure if that ever will happen), is
changing previous priorities cause a ABI breakage?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 2/2] ethdev: add hierarchical scheduler API
  2017-02-10 14:05  1% ` [dpdk-dev] [PATCH 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
@ 2017-02-21 10:35  0%   ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2017-02-21 10:35 UTC (permalink / raw)
  To: Cristian Dumitrescu, dev; +Cc: thomas.monjalon, jerin.jacob

On 2/10/2017 7:35 PM, Cristian Dumitrescu wrote:
> This patch introduces the generic ethdev API for the hierarchical scheduler
> capability.
>
> Main features:
> - Exposed as ethdev plugin capability (similar to rte_flow approach)
> - Capability query API per port and per hierarchy node
> - Scheduling algorithms: strict priority (SP), Weighed Fair Queuing (WFQ),
>   Weighted Round Robin (WRR)
> - Traffic shaping: single/dual rate, private (per node) and shared (by multiple
>   nodes) shapers
> - Congestion management for hierarchy leaf nodes: algorithms of tail drop,
>   head drop, WRED; private (per node) and shared (by multiple nodes) WRED
>   contexts
> - Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
>   TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)
>
> Changes since RFC [1]:
> - Implemented as ethdev plugin (similar to rte_flow) as opposed to more
>   monolithic additions to ethdev itself
> - Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
>   suggested items with only one exception, see the long list below, hopefully
>   nothing was forgotten.
>     - The item not done (hopefully for a good reason): driver-generated object
>       IDs. IMO the choice to have application-generated object IDs adds marginal
>       complexity to the driver (search ID function required), but it provides
>       huge simplification for the application. The app does not need to worry
>       about building & managing tree-like structure for storing driver-generated
>       object IDs, the app can use its own convention for node IDs depending on
>       the specific hierarchy that it needs. Trivial example: identify all
>       level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
>       on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
>       310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
>       112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
>       the other related simplification that was implemented: leaf nodes now have
>       predefined IDs that are the same with their Ethernet TX queue ID (
>       therefore no translation is required for leaf nodes).
> - Capability API. Done per port and per node as well.
> - Dual rate shapers
> - Added configuration of private shaper (per node) directly from the shaper
>   profile as part of node API (no shaper ID needed for private shapers), while
>   the shared shapers are configured outside of the node API using shaper profile
>   and communicated to the node using shared shaper ID. So there is no
>   configuration overhead for shared shapers if the app does not use any of them.
> - Leaf nodes now have predefined IDs that are the same with their Ethernet TX
>   queue ID (therefore no translation is required for leaf nodes). This is also
>   used to differentiate between a leaf node and a non-leaf node.
> - Domain-specific errors to give a precise indication of the error cause (same
>   as done by rte_flow)
> - Packet marking API
> - Packet length optional adjustment for shapers, positive (e.g. for adding
>   Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
>   based on IP packet bytes)
>
> Next steps:
> - SW fallback based on librte_sched library (to be later introduced by
>   standalone patch set)
>
> [1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
> [2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
> [3] Hemants’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
>
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> ---
>  MAINTAINERS                            |    4 +
>  lib/librte_ether/Makefile              |    5 +-
>  lib/librte_ether/rte_ether_version.map |   30 +
>  lib/librte_ether/rte_scheddev.c        |  790 ++++++++++++++++++++
>  lib/librte_ether/rte_scheddev.h        | 1273 ++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_scheddev_driver.h |  374 ++++++++++
>  6 files changed, 2475 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_ether/rte_scheddev.c
>  create mode 100644 lib/librte_ether/rte_scheddev.h
>  create mode 100644 lib/librte_ether/rte_scheddev_driver.h
>

...<snip>

> +
> +#ifndef __INCLUDE_RTE_SCHEDDEV_H__
> +#define __INCLUDE_RTE_SCHEDDEV_H__
> +
> +/**
> + * @file
> + * RTE Generic Hierarchical Scheduler API
> + *
> + * This interface provides the ability to configure the hierarchical scheduler
> + * feature in a generic way.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_red.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** Ethernet framing overhead
> +  *
> +  * Overhead fields per Ethernet frame:
> +  * 1. Preamble:                                            7 bytes;
> +  * 2. Start of Frame Delimiter (SFD):                      1 byte;
> +  * 3. Inter-Frame Gap (IFG):                              12 bytes.
> +  */
> +#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD                  20
> +
> +/**
> +  * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
> +  * is generated and added at the end of the Ethernet frame on TX side without
> +  * any SW intervention.
> +  */
> +#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS              24
> +
> +/**< Invalid WRED profile ID */
> +#define RTE_SCHEDDEV_WRED_PROFILE_ID_NONE                  UINT32_MAX
> +
> +/**< Invalid shaper profile ID */
> +#define RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE                UINT32_MAX
> +
> +/**< Scheduler hierarchy root node ID */
> +#define RTE_SCHEDDEV_ROOT_NODE_ID                          UINT32_MAX
> +
> +
> +/**
> +  * Scheduler node capabilities
> +  */
> +struct rte_scheddev_node_capabilities {
> +	/**< Private shaper support. */
> +	int shaper_private_supported;
> +
> +	/**< Dual rate shaping support for private shaper. Valid only when
> +	 * private shaper is supported.
> +	 */
> +	int shaper_private_dual_rate_supported;
> +
> +	/**< Minimum committed/peak rate (bytes per second) for private
> +	 * shaper. Valid only when private shaper is supported.
> +	 */
> +	uint64_t shaper_private_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for private
> +	 * shaper. Valid only when private shaper is supported.
> +	 */
> +	uint64_t shaper_private_rate_max;
> +
> +	/**< Maximum number of supported shared shapers. The value of zero
> +	 * indicates that shared shapers are not supported.
> +	 */
> +	uint32_t shaper_shared_n_max;
> +
> +	/**< Items valid only for non-leaf nodes. */
> +	struct {
> +		/**< Maximum number of children nodes. */
> +		uint32_t n_children_max;
> +
> +		/**< Lowest priority supported. The value of 1 indicates that
> +		 * only priority 0 is supported, which essentially means that
> +		 * Strict Priority (SP) algorithm is not supported.
> +		 */
> +		uint32_t sp_priority_min;
> +
This can be  simply sp_priority_level, with 0 indicating no support
1 indicates '0' and '1' priority.  or 7 indicates '0' to '7' i.e. total 
8 priorities.

> +		/**< Maximum number of sibling nodes that can have the same
> +		 * priority at any given time. When equal to *n_children_max*,
> +		 * it indicates that WFQ/WRR algorithms are not supported.
> +		 */
> +		uint32_t sp_n_children_max;
not clear to me.
OK, more than 1 children can have same priority, than you apply WRR/WFQ 
among them.

However, there can be different sets,  e.g prio '0' and '1' has only 1 
children. while prio '2' has 6 children, than you apply WRR/WFQ among them.

> +
> +		/**< WFQ algorithm support. */
> +		int scheduling_wfq_supported;
> +
> +		/**< WRR algorithm support. */
> +		int scheduling_wrr_supported;
> +
> +		/**< Maximum WFQ/WRR weight. */
> +		uint32_t scheduling_wfq_wrr_weight_max;
> +	} nonleaf;
> +
> +	/**< Items valid only for leaf nodes. */
> +	struct {
> +		/**< Head drop algorithm support. */
> +		int cman_head_drop_supported;
> +
> +		/**< Private WRED context support. */
> +		int cman_wred_context_private_supported;
> +

The context part is not clear to me.

> +		/**< Maximum number of shared WRED contexts supported. The value
> +		 * of zero indicates that shared WRED contexts are not
> +		 * supported.
> +		 */
> +		uint32_t cman_wred_context_shared_n_max;
> +	} leaf;

non-leaf nodes may have different capabilities.

your leaf node is like a QoS Queue, are you supporting shapper on leaf 
node as well?


I will still prefer if you separate QoS Queue from a standard Sched 
node, the capabilities are different and it will be cleaner at the cost 
of increased structure and number of APIs.

> +};
> +
> +/**
> +  * Scheduler capabilities
> +  */
> +struct rte_scheddev_capabilities {
> +	/**< Maximum number of nodes. */
> +	uint32_t n_nodes_max;
> +
> +	/**< Maximum number of levels (i.e. number of nodes connecting the root
> +	 * node with any leaf node, including the root and the leaf).
> +	 */
> +	uint32_t n_levels_max;
> +
> +	/**< Maximum number of shapers, either private or shared. In case the
> +	 * implementation does not share any resource between private and
> +	 * shared shapers, it is typically equal to the sum between
> +	 * *shaper_private_n_max* and *shaper_shared_n_max*.
> +	 */
> +	uint32_t shaper_n_max;
> +
> +	/**< Maximum number of private shapers. Indicates the maximum number of
> +	 * nodes that can concurrently have the private shaper enabled.
> +	 */
> +	uint32_t shaper_private_n_max;
> +
> +	/**< Maximum number of shared shapers. The value of zero indicates that
> +	  * shared shapers are not supported.
> +	  */
> +	uint32_t shaper_shared_n_max;
> +
> +	/**< Maximum number of nodes that can share the same shared shaper. Only
> +	  * valid when shared shapers are supported.
> +	  */
> +	uint32_t shaper_shared_n_nodes_max;
> +
> +	/**< Maximum number of shared shapers that can be configured with dual
> +	  * rate shaping. The value of zero indicates that dual rate shaping
> +	  * support is not available for shared shapers.
> +	  */
> +	uint32_t shaper_shared_dual_rate_n_max;
> +
> +	/**< Minimum committed/peak rate (bytes per second) for shared
> +	  * shapers. Only valid when shared shapers are supported.
> +	  */
> +	uint64_t shaper_shared_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for shared
> +	  * shaper. Only valid when shared shapers are supported.
> +	  */
> +	uint64_t shaper_shared_rate_max;
> +
> +	/**< Minimum value allowed for packet length adjustment for
> +	  * private/shared shapers.
> +	  */
> +	int shaper_pkt_length_adjust_min;
> +
> +	/**< Maximum value allowed for packet length adjustment for
> +	  * private/shared shapers.
> +	  */
> +	int shaper_pkt_length_adjust_max;
> +
> +	/**< Maximum number of WRED contexts. */
> +	uint32_t cman_wred_context_n_max;
> +
> +	/**< Maximum number of private WRED contexts. Indicates the maximum
> +	  * number of leaf nodes that can concurrently have the private WRED
> +	  * context enabled.
> +	  */
> +	uint32_t cman_wred_context_private_n_max;
> +
> +	/**< Maximum number of shared WRED contexts. The value of zero indicates
> +	  * that shared WRED contexts are not supported.
> +	  */
> +	uint32_t cman_wred_context_shared_n_max;
> +
> +	/**< Maximum number of leaf nodes that can share the same WRED context.
> +	  * Only valid when shared WRED contexts are supported.
> +	  */
> +	uint32_t cman_wred_context_shared_n_nodes_max;
> +
> +	/**< Support for VLAN DEI packet marking. */
> +	int mark_vlan_dei_supported;
> +
> +	/**< Support for IPv4/IPv6 ECN marking of TCP packets. */
> +	int mark_ip_ecn_tcp_supported;
> +
> +	/**< Support for IPv4/IPv6 ECN marking of SCTP packets. */
> +	int mark_ip_ecn_sctp_supported;
> +
> +	/**< Support for IPv4/IPv6 DSCP packet marking. */
> +	int mark_ip_dscp_supported;
> +
> +	/**< Summary of node-level capabilities across all nodes. */
> +	struct rte_scheddev_node_capabilities node;

This should be array of numbers of levels supported in the system. 
Non-leaf node at level 2 can have different capabilities than level 3 node.

> +};
> +
> +/**
> +  * Congestion management (CMAN) mode
> +  *
> +  * This is used for controlling the admission of packets into a packet queue or
> +  * group of packet queues on congestion. On request of writing a new packet
> +  * into the current queue while the queue is full, the *tail drop* algorithm
> +  * drops the new packet while leaving the queue unmodified, as opposed to *head
> +  * drop* algorithm, which drops the packet at the head of the queue (the oldest
> +  * packet waiting in the queue) and admits the new packet at the tail of the
> +  * queue.
> +  *
> +  * The *Random Early Detection (RED)* algorithm works by proactively dropping
> +  * more and more input packets as the queue occupancy builds up. When the queue
> +  * is full or almost full, RED effectively works as *tail drop*. The *Weighted
> +  * RED* algorithm uses a separate set of RED thresholds for each packet color.
> +  */
> +enum rte_scheddev_cman_mode {
> +	RTE_SCHEDDEV_CMAN_TAIL_DROP = 0, /**< Tail drop */
> +	RTE_SCHEDDEV_CMAN_HEAD_DROP, /**< Head drop */
> +	RTE_SCHEDDEV_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
> +};
> +
> +/**
> +  * Color
> +  */
> +enum rte_scheddev_color {
> +	e_RTE_SCHEDDEV_GREEN = 0, /**< Green */
> +	e_RTE_SCHEDDEV_YELLOW,    /**< Yellow */
> +	e_RTE_SCHEDDEV_RED,       /**< Red */
> +	e_RTE_SCHEDDEV_COLORS     /**< Number of colors */
> +};
> +
> +/**
> +  * WRED profile
> +  */
> +struct rte_scheddev_wred_params {
> +	/**< One set of RED parameters per packet color */
> +	struct rte_red_params red_params[e_RTE_SCHEDDEV_COLORS];
> +};
> +
> +/**
> +  * Token bucket
> +  */
> +struct rte_scheddev_token_bucket {
> +	/**< Token bucket rate (bytes per second) */
> +	uint64_t rate;
> +
> +	/**< Token bucket size (bytes), a.k.a. max burst size */
> +	uint64_t size;
> +};
> +
> +/**
> +  * Shaper (rate limiter) profile
> +  *
> +  * Multiple shaper instances can share the same shaper profile. Each node has
> +  * zero or one private shaper (only one node using it) and/or zero, one or
> +  * several shared shapers (multiple nodes use the same shaper instance).
> +  *
> +  * Single rate shapers use a single token bucket. A single rate shaper can be
> +  * configured by setting the rate of the committed bucket to zero, which
> +  * effectively disables this bucket. The peak bucket is used to limit the rate
> +  * and the burst size for the current shaper.
> +  *
> +  * Dual rate shapers use both the committed and the peak token buckets. The
> +  * rate of the committed bucket has to be less than or equal to the rate of the
> +  * peak bucket.
> +  */
> +struct rte_scheddev_shaper_params {
> +	/**< Committed token bucket */
> +	struct rte_scheddev_token_bucket committed;
> +
> +	/**< Peak token bucket */
> +	struct rte_scheddev_token_bucket peak;
> +
> +	/**< Signed value to be added to the length of each packet for the
> +	 * purpose of shaping. Can be used to correct the packet length with
> +	 * the framing overhead bytes that are also consumed on the wire (e.g.
> +	 * RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS).
> +	 */
> +	int32_t pkt_length_adjust;
> +};
> +
> +/**
> +  * Node parameters
> +  *
> +  * Each scheduler hierarchy node has multiple inputs (children nodes of the
> +  * current parent node) and a single output (which is input to its parent
> +  * node). The current node arbitrates its inputs using Strict Priority (SP),
> +  * Weighted Fair Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to
> +  * schedule input packets on its output while observing its shaping (rate
> +  * limiting) constraints.
> +  *
> +  * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
> +  * approximations of the ideal of WFQ and are assimilated to WFQ, although
> +  * an associated implementation-dependent trade-off on accuracy, performance
> +  * and resource usage might exist.
> +  *
> +  * Children nodes with different priorities are scheduled using the SP
> +  * algorithm, based on their priority, with zero (0) as the highest priority.
> +  * Children with same priority are scheduled using the WFQ or WRR algorithm,
> +  * based on their weight, which is relative to the sum of the weights of all
> +  * siblings with same priority, with one (1) as the lowest weight.
> +  *
> +  * Each leaf node sits on on top of a TX queue of the current Ethernet port.
> +  * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
> +  * where N is the number of TX queues configured for the current Ethernet port.
> +  * The non-leaf nodes have their IDs generated by the application.
> +  */


Ok, that means 0 to N-1 is reserved for leaf nodes. the application will 
choose any value for non-leaf nodes?
What will be the parent node id for the root node?

> +struct rte_scheddev_node_params {
> +	/**< Shaper profile for the private shaper. The absence of the private
> +	 * shaper for the current node is indicated by setting this parameter
> +	 * to RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE.
> +	 */
> +	uint32_t shaper_profile_id;
> +
> +	/**< User allocated array of valid shared shaper IDs. */
> +	uint32_t *shared_shaper_id;
> +
> +	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
> +	uint32_t n_shared_shapers;
> +
> +	union {
> +		/**< Parameters only valid for non-leaf nodes. */
> +		struct {
> +			/**< For each priority, indicates whether the children
> +			 * nodes sharing the same priority are to be scheduled
> +			 * by WFQ or by WRR. When NULL, it indicates that WFQ
> +			 * is to be used for all priorities. When non-NULL, it
> +			 * points to a pre-allocated array of *n_priority*
> +			 * elements, with a non-zero value element indicating
> +			 * WFQ and a zero value element for WRR.
> +			 */
> +			int *scheduling_mode_per_priority;

what is the structure of the pointer element? Just a bool array?

> +
> +			/**< Number of priorities. */
> +			uint32_t n_priorities;
> +		} nonleaf;
> +
> +		/**< Parameters only valid for leaf nodes. */
> +		struct {
> +			/**< Congestion management mode */
> +			enum rte_scheddev_cman_mode cman;
> +
> +			/**< WRED parameters (valid when *cman* is WRED). */
> +			struct {
> +				/**< WRED profile for private WRED context. */
> +				uint32_t wred_profile_id;
> +
> +				/**< User allocated array of shared WRED context
> +				 * IDs. The absence of a private WRED context
> +				 * for current leaf node is indicated by value
> +				 * RTE_SCHEDDEV_WRED_PROFILE_ID_NONE.
> +				 */
> +				uint32_t *shared_wred_context_id;
> +
> +				/**< Number of shared WRED context IDs in the
> +				 * *shared_wred_context_id* array.
> +				 */
> +				uint32_t n_shared_wred_contexts;
> +			} wred;
> +		} leaf;

need a bool is_leaf here to differentiate between leaf and non-leaf node.

> +	};
> +};
> +
> +/**
> +  * Node statistics counter type
> +  */
> +enum rte_scheddev_stats_counter {
> +	/**< Number of packets scheduled from current node. */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS = 1 << 0,
> +
> +	/**< Number of bytes scheduled from current node. */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES = 1 << 1,
> +
> +	/**< Number of packets dropped by current node.  */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS_DROPPED = 1 << 2,
> +
> +	/**< Number of bytes dropped by current node.  */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES_DROPPED = 1 << 3,
> +
> +	/**< Number of packets currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS_QUEUED = 1 << 4,
> +
> +	/**< Number of bytes currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES_QUEUED = 1 << 5,
> +};
> +
> +/**
> +  * Node statistics counters
> +  */
> +struct rte_scheddev_node_stats {
> +	/**< Number of packets scheduled from current node. */
> +	uint64_t n_pkts;
> +
> +	/**< Number of bytes scheduled from current node. */
> +	uint64_t n_bytes;
> +
> +	/**< Statistics counters for leaf nodes only. */
> +	struct {
> +		/**< Number of packets dropped by current leaf node. */
> +		uint64_t n_pkts_dropped;
> +
> +		/**< Number of bytes dropped by current leaf node. */
> +		uint64_t n_bytes_dropped;
> +
> +		/**< Number of packets currently waiting in the packet queue of
> +		 * current leaf node.
> +		 */
> +		uint64_t n_pkts_queued;
> +
> +		/**< Number of bytes currently waiting in the packet queue of
> +		 * current leaf node.
> +		 */
> +		uint64_t n_bytes_queued;
> +	} leaf;
> +};
> +
> +/**
> + * Verbose error types.
> + *
> + * Most of them provide the type of the object referenced by struct
> + * rte_scheddev_error::cause.
> + */
> +enum rte_scheddev_error_type {
> +	RTE_SCHEDDEV_ERROR_TYPE_NONE, /**< No error. */
> +	RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE,
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_GREEN,
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_YELLOW,
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_RED,
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE,
> +	RTE_SCHEDDEV_ERROR_TYPE_SHARED_SHAPER_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_PARENT_NODE_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_PRIORITY,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_WEIGHT,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_CMAN,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_WRED_PROFILE_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_SHARED_WRED_CONTEXT_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_ID,
> +};
> +
> +/**
> + * Verbose error structure definition.
> + *
> + * This object is normally allocated by applications and set by PMDs, the
> + * message points to a constant string which does not need to be freed by
> + * the application, however its pointer can be considered valid only as long
> + * as its associated DPDK port remains configured. Closing the underlying
> + * device or unloading the PMD invalidates it.
> + *
> + * Both cause and message may be NULL regardless of the error type.
> + */
> +struct rte_scheddev_error {
> +	enum rte_scheddev_error_type type; /**< Cause field and error type. */
> +	const void *cause; /**< Object responsible for the error. */
> +	const char *message; /**< Human-readable error message. */
> +};
> +
> +/**
> + * Scheduler capabilities get
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param cap
> + *   Scheduler capabilities. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_capabilities_get(uint8_t port_id,
> +	struct rte_scheddev_capabilities *cap,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node capabilities get
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param cap
> + *   Scheduler node capabilities. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_capabilities_get(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_node_capabilities *cap,
> +	struct rte_scheddev_error *error);
> +

Node capabilities is already part of scheddev_capabilities?

What are you expecting different here. Unless you support different 
capability for each level, this may not be useful.

> +/**
> + * Scheduler WRED profile add
> + *
> + * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
> + * is used to create one or several WRED contexts.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param wred_profile_id
> + *   WRED profile ID for the new profile. Needs to be unused.
> + * @param profile
> + *   WRED profile parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_wred_profile_add(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_wred_params *profile,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler WRED profile delete
> + *
> + * Delete an existing WRED profile. This operation fails when there is currently
> + * at least one user (i.e. WRED context) of this WRED profile.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param wred_profile_id
> + *   WRED profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_wred_profile_delete(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shared WRED context add or update
> + *
> + * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
> + * created by using the WRED profile identified by *wred_profile_id*.
> + *
> + * When *shared_wred_context_id* is valid, this WRED context is no longer using
> + * the profile previously assigned to it and is updated to use the profile
> + * identified by *wred_profile_id*.
> + *
> + * A valid shared WRED context can be assigned to several scheduler hierarchy
> + * leaf nodes configured to use WRED as the congestion management mode.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID
> + * @param wred_profile_id
> + *   WRED profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shared WRED context delete
> + *
> + * Delete an existing shared WRED context. This operation fails when there is
> + * currently at least one user (i.e. scheduler hierarchy leaf node) of this
> + * shared WRED context.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shared_wred_context_delete(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shaper profile add
> + *
> + * Create a new shaper profile with ID set to *shaper_profile_id*. The new
> + * shaper profile is used to create one or several shapers.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shaper_profile_id
> + *   Shaper profile ID for the new profile. Needs to be unused.
> + * @param profile
> + *   Shaper profile parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shaper_profile_add(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_shaper_params *profile,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shaper profile delete
> + *
> + * Delete an existing shaper profile. This operation fails when there is
> + * currently at least one user (i.e. shaper) of this shaper profile.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shaper_profile_id
> + *   Shaper profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shaper_profile_delete(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shared shaper add or update
> + *
> + * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
> + * with this ID is created using the shaper profile identified by
> + * *shaper_profile_id*.
> + *
> + * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is no
> + * longer using the shaper profile previously assigned to it and is updated to
> + * use the shaper profile identified by *shaper_profile_id*.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_shaper_id
> + *   Shared shaper ID
> + * @param shaper_profile_id
> + *   Shaper profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shared_shaper_add_update(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shared shaper delete
> + *
> + * Delete an existing shared shaper. This operation fails when there is
> + * currently at least one user (i.e. scheduler hierarchy node) of this shared
> + * shaper.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_shaper_id
> + *   Shared shaper ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shared_shaper_delete(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node add
> + *
> + * When *node_id* is not a valid node ID, a new node with this ID is created and
> + * connected as child to the existing node identified by *parent_node_id*.
> + *
> + * When *node_id* is a valid node ID, this node is disconnected from its current
> + * parent and connected as child to another existing node identified by
> + * *parent_node_id *.
> + *
> + * This function can be called during port initialization phase (before the
> + * Ethernet port is started) for building the scheduler start-up hierarchy.
> + * Subject to the specific Ethernet port supporting on-the-fly scheduler
> + * hierarchy updates, this function can also be called during run-time (after
> + * the Ethernet port is started).

This should  a capability, whether dynamic_hierarchy_updates are 
supported or not.

> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID
> + * @param parent_node_id
> + *   Parent node ID. Needs to be the valid.

What will be the parent node id for the root node?  how the root node is 
created on the ethernet port?

> + * @param priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param params
> + *   Node parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_add(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_scheddev_node_params *params,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node delete
> + *
> + * Delete an existing node. This operation fails when this node currently has at
> + * least one user (i.e. child node).
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_delete(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node suspend
> + *
> + * Suspend an existing node.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_suspend(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node resume
> + *
> + * Resume an existing node that was previously suspended.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_resume(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler hierarchy set
> + *
> + * This function is called during the port initialization phase (before the
> + * Ethernet port is started) to freeze the scheduler start-up hierarchy.
> + *
> + * This function fails when the currently configured scheduler hierarchy is not
> + * supported by the Ethernet port, in which case the user can abort or try out
> + * another hierarchy configuration (e.g. a hierarchy with less leaf nodes),
> + * which can be build from scratch (when *clear_on_fail* is enabled) or by
> + * modifying the existing hierarchy configuration (when *clear_on_fail* is
> + * disabled).
> + *
> + * Note that, even when the configured scheduler hierarchy is supported (so this
> + * function is successful), the Ethernet port start might still fail due to e.g.
> + * not enough memory being available in the system, etc.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param clear_on_fail
> + *   On function call failure, hierarchy is cleared when this parameter is
> + *   non-zero and preserved when this parameter is equal to zero.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_hierarchy_set(uint8_t port_id,
> +	int clear_on_fail,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node parent update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param parent_node_id
> + *   Node ID for the new parent. Needs to be valid.
> + * @param priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is zero. Used by the WFQ/WRR
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_parent_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_scheddev_error *error);
> +

The usages are not clear. How it is different from node_add API.
is the intention to update a specific node or change the connection of a 
specific node to a existing or new parent.


> +/**
> + * Scheduler node private shaper update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param shaper_profile_id
> + *   Shaper profile ID for the private shaper of the current node. Needs to be
> + *   either valid shaper profile ID or RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE, with
> + *   the latter disabling the private shaper of the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node shared shapers update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param shared_shaper_id
> + *   Shared shaper ID. Needs to be valid.
> + * @param add
> + *   Set to non-zero value to add this shared shaper to current node or to zero
> + *   to delete this shared shaper from current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_shared_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int add,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node scheduling mode update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param scheduling_mode_per_priority
> + *   For each priority, indicates whether the children nodes sharing the same
> + *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates that
> + *   WFQ is to be used for all priorities. When non-NULL, it points to a
> + *   pre-allocated array of *n_priority* elements, with a non-zero value element
> + *   indicating WFQ and a zero value element for WRR.
> + * @param n_priorities
> + *   Number of priorities.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
> +	uint32_t node_id,
> +	int *scheduling_mode_per_priority,
> +	uint32_t n_priorities,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node congestion management mode update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param cman
> + *   Congestion management mode.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_cman_update(uint8_t port_id,
> +	uint32_t node_id,
> +	enum rte_scheddev_cman_mode cman,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node private WRED context update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param wred_profile_id
> + *   WRED profile ID for the private WRED context of the current node. Needs to
> + *   be either valid WRED profile ID or RTE_SCHEDDEV_WRED_PROFILE_ID_NONE, with
> + *   the latter disabling the private WRED context of the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node shared WRED context update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID. Needs to be valid.
> + * @param add
> + *   Set to non-zero value to add this shared WRED context to current node or to
> + *   zero to delete this shared WRED context from current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler packet marking - VLAN DEI (IEEE 802.1Q)
> + *
> + * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
> + * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
> + * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
> + * Format Indicator (CFI).
> + *
> + * All VLAN frames of a given color get their DEI bit set if marking is enabled
> + * for this color; otherwise, their DEI bit is left as is (either set or not).
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_mark_vlan_dei(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
> + *
> + * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
> + * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
> + * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion Notification
> + * (ECN) field (2 bits). The DSCP field is typically used to encode the traffic
> + * class and/or drop priority (RFC 2597), while the ECN field is used by RFC
> + * 3168 to implement a congestion notification mechanism to be leveraged by
> + * transport layer protocols such as TCP and SCTP that have congestion control
> + * mechanisms.
> + *
> + * When congestion is experienced, as alternative to dropping the packet,
> + * routers can change the ECN field of input packets from 2'b01 or 2'b10 (values
> + * indicating that source endpoint is ECN-capable) to 2'b11 (meaning that
> + * congestion is experienced). The destination endpoint can use the ECN-Echo
> + * (ECE) TCP flag to relay the congestion indication back to the source
> + * endpoint, which acknowledges it back to the destination endpoint with the
> + * Congestion Window Reduced (CWR) TCP flag.
> + *
> + * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
> + * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
> + * enabled for the current color, otherwise the ECN field is left as is.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_mark_ip_ecn(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
> + *
> + * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
> + * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
> + * values proposed by this RFC:
> + *
> + *                       Class 1    Class 2    Class 3    Class 4
> + *                     +----------+----------+----------+----------+
> + *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
> + *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
> + *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
> + *                     +----------+----------+----------+----------+
> + *
> + * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2, as
> + * well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
> + *
> + * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
> + * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
> + * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
> + * for each color; when not enabled for a given color, the DSCP field of all
> + * packets with that color is left as is.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_mark_ip_dscp(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler get statistics counter types enabled for all nodes
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param nonleaf_node_capability_stats_mask
> + *   Statistics counter types available per node for all non-leaf nodes. Needs
> + *   to be pre-allocated.
> + * @param nonleaf_node_enabled_stats_mask
> + *   Statistics counter types currently enabled per node for each non-leaf node.
> + *   This is a subset of *nonleaf_node_capability_stats_mask*. Needs to be
> + *   pre-allocated.
> + * @param leaf_node_capability_stats_mask
> + *   Statistics counter types available per node for all leaf nodes. Needs to
> + *   be pre-allocated.
> + * @param leaf_node_enabled_stats_mask
> + *   Statistics counter types currently enabled for each leaf node. This is
> + *   a subset of *leaf_node_capability_stats_mask*. Needs to be pre-allocated.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_stats_get_enabled(uint8_t port_id,
> +	uint64_t *nonleaf_node_capability_stats_mask,
> +	uint64_t *nonleaf_node_enabled_stats_mask,
> +	uint64_t *leaf_node_capability_stats_mask,
> +	uint64_t *leaf_node_enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler enable selected statistics counters for all nodes
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param nonleaf_node_enabled_stats_mask
> + *   Statistics counter types to be enabled per node for each non-leaf node.
> + *   This needs to be a subset of the statistics counter types available per
> + *   node for all non-leaf nodes. Any statistics counter type not included in
> + *   this set is to be disabled for all non-leaf nodes.
> + * @param leaf_node_enabled_stats_mask
> + *   Statistics counter types to be enabled per node for each leaf node. This
> + *   needs to be a subset of the statistics counter types available per node for
> + *   all leaf nodes. Any statistics counter type not included in this set is to
> + *   be disabled for all leaf nodes.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_stats_enable(uint8_t port_id,
> +	uint64_t nonleaf_node_enabled_stats_mask,
> +	uint64_t leaf_node_enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler get statistics counter types enabled for current node
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param capability_stats_mask
> + *   Statistics counter types available for the current node. Needs to be
> + *   pre-allocated.
> + * @param enabled_stats_mask
> + *   Statistics counter types currently enabled for the current node. This is
> + *   a subset of *capability_stats_mask*. Needs to be pre-allocated.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_stats_get_enabled(uint8_t port_id,
> +	uint32_t node_id,
> +	uint64_t *capability_stats_mask,
> +	uint64_t *enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler enable selected statistics counters for current node
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param enabled_stats_mask
> + *   Statistics counter types to be enabled for the current node. This needs to
> + *   be a subset of the statistics counter types available for the current node.
> + *   Any statistics counter type not included in this set is to be disabled for
> + *   the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_stats_enable(uint8_t port_id,
> +	uint32_t node_id,
> +	uint64_t enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node statistics counters read
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param stats
> + *   When non-NULL, it contains the current value for the statistics counters
> + *   enabled for the current node.
> + * @param clear
> + *   When this parameter has a non-zero value, the statistics counters are
> + *   cleared (i.e. set to zero) immediately after they have been read, otherwise
> + *   the statistics counters are left untouched.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_stats_read(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_node_stats *stats,
> +	int clear,
> +	struct rte_scheddev_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* __INCLUDE_RTE_SCHEDDEV_H__ */
> diff --git a/lib/librte_ether/rte_scheddev_driver.h b/lib/librte_ether/rte_scheddev_driver.h
> new file mode 100644
> index 0000000..c0a0321
> --- /dev/null
> +++ b/lib/librte_ether/rte_scheddev_driver.h
> @@ -0,0 +1,374 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
> +#define __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
> +
> +/**
> + * @file
> + * RTE Generic Hierarchical Scheduler API (Driver Side)
> + *
> + * This file provides implementation helpers for internal use by PMDs, they
> + * are not intended to be exposed to applications and are not subject to ABI
> + * versioning.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_errno.h>
> +#include "rte_ethdev.h"
> +#include "rte_scheddev.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +typedef int (*rte_scheddev_capabilities_get_t)(struct rte_eth_dev *dev,
> +	struct rte_scheddev_capabilities *cap,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler capabilities get */
> +
> +typedef int (*rte_scheddev_node_capabilities_get_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_node_capabilities *cap,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node capabilities get */
> +
> +typedef int (*rte_scheddev_wred_profile_add_t)(struct rte_eth_dev *dev,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_wred_params *profile,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler WRED profile add */
> +
> +typedef int (*rte_scheddev_wred_profile_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler WRED profile delete */
> +
> +typedef int (*rte_scheddev_shared_wred_context_add_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shared WRED context add */
> +
> +typedef int (*rte_scheddev_shared_wred_context_delete_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t shared_wred_context_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shared WRED context delete */
> +
> +typedef int (*rte_scheddev_shaper_profile_add_t)(struct rte_eth_dev *dev,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_shaper_params *profile,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shaper profile add */
> +
> +typedef int (*rte_scheddev_shaper_profile_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shaper profile delete */
> +
> +typedef int (*rte_scheddev_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shared shaper add/update */
> +
> +typedef int (*rte_scheddev_shared_shaper_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t shared_shaper_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shared shaper delete */
> +
> +typedef int (*rte_scheddev_node_add_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_scheddev_node_params *params,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node add */
> +
> +typedef int (*rte_scheddev_node_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node delete */
> +
> +typedef int (*rte_scheddev_node_suspend_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node suspend */
> +
> +typedef int (*rte_scheddev_node_resume_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node resume */
> +
> +typedef int (*rte_scheddev_hierarchy_set_t)(struct rte_eth_dev *dev,
> +	int clear_on_fail,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler hierarchy set */
> +
> +typedef int (*rte_scheddev_node_parent_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node parent update */
> +
> +typedef int (*rte_scheddev_node_shaper_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node shaper update */
> +
> +typedef int (*rte_scheddev_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int32_t add,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node shaper update */
> +
> +typedef int (*rte_scheddev_node_scheduling_mode_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	int *scheduling_mode_per_priority,
> +	uint32_t n_priorities,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node scheduling mode update */
> +
> +typedef int (*rte_scheddev_node_cman_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	enum rte_scheddev_cman_mode cman,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node congestion management mode update */
> +
> +typedef int (*rte_scheddev_node_wred_context_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node WRED context update */
> +
> +typedef int (*rte_scheddev_node_shared_wred_context_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node WRED context update */
> +
> +typedef int (*rte_scheddev_mark_vlan_dei_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler packet marking - VLAN DEI */
> +
> +typedef int (*rte_scheddev_mark_ip_ecn_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler packet marking - IPv4/IPv6 ECN */
> +
> +typedef int (*rte_scheddev_mark_ip_dscp_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler packet marking - IPv4/IPv6 DSCP */
> +
> +typedef int (*rte_scheddev_stats_get_enabled_t)(struct rte_eth_dev *dev,
> +	uint64_t *nonleaf_node_capability_stats_mask,
> +	uint64_t *nonleaf_node_enabled_stats_mask,
> +	uint64_t *leaf_node_capability_stats_mask,
> +	uint64_t *leaf_node_enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler get set of stats counters enabled for all nodes */
> +
> +typedef int (*rte_scheddev_stats_enable_t)(struct rte_eth_dev *dev,
> +	uint64_t nonleaf_node_enabled_stats_mask,
> +	uint64_t leaf_node_enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler enable selected stats counters for all nodes */
> +
> +typedef int (*rte_scheddev_node_stats_get_enabled_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint64_t *capability_stats_mask,
> +	uint64_t *enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler get set of stats counters enabled for specific node */
> +
> +typedef int (*rte_scheddev_node_stats_enable_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint64_t enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler enable selected stats counters for specific node */
> +
> +typedef int (*rte_scheddev_node_stats_read_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_node_stats *stats,
> +	int clear,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler read stats counters for specific node */
> +
> +struct rte_scheddev_ops {
> +	/** Scheduler capabilities_get */
> +	rte_scheddev_capabilities_get_t capabilities_get;
> +	/** Scheduler node capabilities get */
> +	rte_scheddev_node_capabilities_get_t node_capabilities_get;
> +
> +	/** Scheduler WRED profile add */
> +	rte_scheddev_wred_profile_add_t wred_profile_add;
> +	/** Scheduler WRED profile delete */
> +	rte_scheddev_wred_profile_delete_t wred_profile_delete;
> +	/** Scheduler shared WRED context add/update */
> +	rte_scheddev_shared_wred_context_add_update_t
> +		shared_wred_context_add_update;
> +	/** Scheduler shared WRED context delete */
> +	rte_scheddev_shared_wred_context_delete_t
> +		shared_wred_context_delete;
> +	/** Scheduler shaper profile add */
> +	rte_scheddev_shaper_profile_add_t shaper_profile_add;
> +	/** Scheduler shaper profile delete */
> +	rte_scheddev_shaper_profile_delete_t shaper_profile_delete;
> +	/** Scheduler shared shaper add/update */
> +	rte_scheddev_shared_shaper_add_update_t shared_shaper_add_update;
> +	/** Scheduler shared shaper delete */
> +	rte_scheddev_shared_shaper_delete_t shared_shaper_delete;
> +
> +	/** Scheduler node add */
> +	rte_scheddev_node_add_t node_add;
> +	/** Scheduler node delete */
> +	rte_scheddev_node_delete_t node_delete;
> +	/** Scheduler node suspend */
> +	rte_scheddev_node_suspend_t node_suspend;
> +	/** Scheduler node resume */
> +	rte_scheddev_node_resume_t node_resume;
> +	/** Scheduler hierarchy set */
> +	rte_scheddev_hierarchy_set_t hierarchy_set;
> +
> +	/** Scheduler node parent update */
> +	rte_scheddev_node_parent_update_t node_parent_update;
> +	/** Scheduler node shaper update */
> +	rte_scheddev_node_shaper_update_t node_shaper_update;
> +	/** Scheduler node shared shaper update */
> +	rte_scheddev_node_shared_shaper_update_t node_shared_shaper_update;
> +	/** Scheduler node scheduling mode update */
> +	rte_scheddev_node_scheduling_mode_update_t node_scheduling_mode_update;
> +	/** Scheduler node congestion management mode update */
> +	rte_scheddev_node_cman_update_t node_cman_update;
> +	/** Scheduler node WRED context update */
> +	rte_scheddev_node_wred_context_update_t node_wred_context_update;
> +	/** Scheduler node shared WRED context update */
> +	rte_scheddev_node_shared_wred_context_update_t
> +		node_shared_wred_context_update;
> +
> +	/** Scheduler packet marking - VLAN DEI */
> +	rte_scheddev_mark_vlan_dei_t mark_vlan_dei;
> +	/** Scheduler packet marking - IPv4/IPv6 ECN */
> +	rte_scheddev_mark_ip_ecn_t mark_ip_ecn;
> +	/** Scheduler packet marking - IPv4/IPv6 DSCP */
> +	rte_scheddev_mark_ip_dscp_t mark_ip_dscp;
> +
> +	/** Scheduler get statistics counter type enabled for all nodes */
> +	rte_scheddev_stats_get_enabled_t stats_get_enabled;
> +	/** Scheduler enable selected statistics counters for all nodes */
> +	rte_scheddev_stats_enable_t stats_enable;
> +	/** Scheduler get statistics counter type enabled for current node */
> +	rte_scheddev_node_stats_get_enabled_t node_stats_get_enabled;
> +	/** Scheduler enable selected statistics counters for current node */
> +	rte_scheddev_node_stats_enable_t node_stats_enable;
> +	/** Scheduler read statistics counters for current node */
> +	rte_scheddev_node_stats_read_t node_stats_read;
> +};
> +
> +/**
> + * Initialize generic error structure.
> + *
> + * This function also sets rte_errno to a given value.
> + *
> + * @param error
> + *   Pointer to error structure (may be NULL).
> + * @param code
> + *   Related error code (rte_errno).
> + * @param type
> + *   Cause field and error type.
> + * @param cause
> + *   Object responsible for the error.
> + * @param message
> + *   Human-readable error message.
> + *
> + * @return
> + *   Error code.
> + */
> +static inline int
> +rte_scheddev_error_set(struct rte_scheddev_error *error,
> +		   int code,
> +		   enum rte_scheddev_error_type type,
> +		   const void *cause,
> +		   const char *message)
> +{
> +	if (error) {
> +		*error = (struct rte_scheddev_error){
> +			.type = type,
> +			.cause = cause,
> +			.message = message,
> +		};
> +	}
> +	rte_errno = code;
> +	return code;
> +}
> +
> +/**
> + * Get generic hierarchical scheduler operations structure from a port
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param error
> + *   Error details
> + *
> + * @return
> + *   The hierarchical scheduler operations structure associated with port_id on
> + *   success, NULL otherwise.
> + */
> +const struct rte_scheddev_ops *
> +rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* __INCLUDE_RTE_SCHEDDEV_DRIVER_H__ */
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
@ 2017-02-21 10:27  0%     ` Hunt, David
  2017-02-24 14:03  0%     ` Bruce Richardson
  2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2 siblings, 0 replies; 200+ results
From: Hunt, David @ 2017-02-21 10:27 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson


On 21/2/2017 3:17 AM, David Hunt wrote:
> Move files out of the way so that we can replace with new
> versions of the distributor libtrary. Files are named in
> such a way as to match the symbol versioning that we will
> apply for backward ABI compatibility.
>
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>
---snip--

Apologies, this patch should have been sent with '--find-renames', thus 
reducing the
size of this patch significantly, and eliminating checkpatch 
warnings/errors.

Dave.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] maintainers: fix script paths
@ 2017-02-21 10:22 16% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-21 10:22 UTC (permalink / raw)
  To: dev

The directory scripts does not exist anymore.
The files have been moved but some paths were not updated
in the maintainers list.

Fixes: 9a98f50e890b ("scripts: move to devtools")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8305237..24e0eff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -70,7 +70,7 @@ ABI versioning
 M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
-F: scripts/validate-abi.sh
+F: devtools/validate-abi.sh
 
 Driver information
 F: buildtools/pmdinfogen/
@@ -241,7 +241,7 @@ F: app/test/test_mbuf.c
 Ethernet API
 M: Thomas Monjalon <thomas.monjalon@6wind.com>
 F: lib/librte_ether/
-F: scripts/test-null.sh
+F: devtools/test-null.sh
 
 Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
-- 
2.7.0

^ permalink raw reply	[relevance 16%]

* [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17  3% ` [dpdk-dev] [PATCH v7 0/17] distributor library " David Hunt
@ 2017-02-21  3:17  1%   ` David Hunt
  2017-02-21 10:27  0%     ` Hunt, David
                       ` (2 more replies)
  2017-02-24 14:01  0%   ` [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
  1 sibling, 3 replies; 200+ results
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c                  |   2 +-
 app/test/test_distributor_perf.c             |   2 +-
 examples/distributor/main.c                  |   2 +-
 lib/librte_distributor/Makefile              |   4 +-
 lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
 lib/librte_distributor/rte_distributor.h     | 247 --------------
 lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++
 8 files changed, 739 insertions(+), 739 deletions(-)
 delete mode 100644 lib/librte_distributor/rte_distributor.c
 delete mode 100644 lib/librte_distributor/rte_distributor.h
 create mode 100644 lib/librte_distributor/rte_distributor_v20.c
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 85cb8f3..ba402e2 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -39,7 +39,7 @@
 #include <rte_errno.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index 7947fe9..fe0c97d 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -39,7 +39,7 @@
 #include <rte_cycles.h>
 #include <rte_common.h>
 #include <rte_mbuf.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..fba5446 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -43,7 +43,7 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define RX_RING_SIZE 256
 #define TX_RING_SIZE 512
diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..60837ed 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,10 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
deleted file mode 100644
index f3f778c..0000000
--- a/lib/librte_distributor/rte_distributor.c
+++ /dev/null
@@ -1,487 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <stdio.h>
-#include <sys/queue.h>
-#include <string.h>
-#include <rte_mbuf.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_errno.h>
-#include <rte_string_fns.h>
-#include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
-
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
-
-TAILQ_HEAD(rte_distributor_list, rte_distributor);
-
-static struct rte_tailq_elem rte_distributor_tailq = {
-	.name = "RTE_DISTRIBUTOR",
-};
-EAL_REGISTER_TAILQ(rte_distributor_tailq)
-
-/**** APIs called by workers ****/
-
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
-			| RTE_DISTRIB_GET_BUF;
-	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
-		rte_pause();
-	buf->bufptr64 = req;
-}
-
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
-		return NULL;
-
-	/* since bufptr64 is signed, this should be an arithmetic shift */
-	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
-	return (struct rte_mbuf *)((uintptr_t)ret);
-}
-
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	struct rte_mbuf *ret;
-	rte_distributor_request_pkt(d, worker_id, oldpkt);
-	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
-		rte_pause();
-	return ret;
-}
-
-int
-rte_distributor_return_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
-			| RTE_DISTRIB_RETURN_BUF;
-	buf->bufptr64 = req;
-	return 0;
-}
-
-/**** APIs called on distributor core ***/
-
-/* as name suggests, adds a packet to the backlog for a particular worker */
-static int
-add_to_backlog(struct rte_distributor_backlog *bl, int64_t item)
-{
-	if (bl->count == RTE_DISTRIB_BACKLOG_SIZE)
-		return -1;
-
-	bl->pkts[(bl->start + bl->count++) & (RTE_DISTRIB_BACKLOG_MASK)]
-			= item;
-	return 0;
-}
-
-/* takes the next packet for a worker off the backlog */
-static int64_t
-backlog_pop(struct rte_distributor_backlog *bl)
-{
-	bl->count--;
-	return bl->pkts[bl->start++ & RTE_DISTRIB_BACKLOG_MASK];
-}
-
-/* stores a packet returned from a worker inside the returns array */
-static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor *d,
-		unsigned *ret_start, unsigned *ret_count)
-{
-	/* store returns in a circular buffer - code is branch-free */
-	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
-			= (void *)oldbuf;
-	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
-	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
-}
-
-static inline void
-handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
-{
-	d->in_flight_tags[wkr] = 0;
-	d->in_flight_bitmask &= ~(1UL << wkr);
-	d->bufs[wkr].bufptr64 = 0;
-	if (unlikely(d->backlog[wkr].count != 0)) {
-		/* On return of a packet, we need to move the
-		 * queued packets for this core elsewhere.
-		 * Easiest solution is to set things up for
-		 * a recursive call. That will cause those
-		 * packets to be queued up for the next free
-		 * core, i.e. it will return as soon as a
-		 * core becomes free to accept the first
-		 * packet, as subsequent ones will be added to
-		 * the backlog for that core.
-		 */
-		struct rte_mbuf *pkts[RTE_DISTRIB_BACKLOG_SIZE];
-		unsigned i;
-		struct rte_distributor_backlog *bl = &d->backlog[wkr];
-
-		for (i = 0; i < bl->count; i++) {
-			unsigned idx = (bl->start + i) &
-					RTE_DISTRIB_BACKLOG_MASK;
-			pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >>
-					RTE_DISTRIB_FLAG_BITS));
-		}
-		/* recursive call.
-		 * Note that the tags were set before first level call
-		 * to rte_distributor_process.
-		 */
-		rte_distributor_process(d, pkts, i);
-		bl->count = bl->start = 0;
-	}
-}
-
-/* this function is called when process() fn is called without any new
- * packets. It goes through all the workers and clears any returned packets
- * to do a partial flush.
- */
-static int
-process_returns(struct rte_distributor *d)
-{
-	unsigned wkr;
-	unsigned flushed = 0;
-	unsigned ret_start = d->returns.start,
-			ret_count = d->returns.count;
-
-	for (wkr = 0; wkr < d->num_workers; wkr++) {
-
-		const int64_t data = d->bufs[wkr].bufptr64;
-		uintptr_t oldbuf = 0;
-
-		if (data & RTE_DISTRIB_GET_BUF) {
-			flushed++;
-			if (d->backlog[wkr].count)
-				d->bufs[wkr].bufptr64 =
-						backlog_pop(&d->backlog[wkr]);
-			else {
-				d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
-				d->in_flight_tags[wkr] = 0;
-				d->in_flight_bitmask &= ~(1UL << wkr);
-			}
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		} else if (data & RTE_DISTRIB_RETURN_BUF) {
-			handle_worker_shutdown(d, wkr);
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		}
-
-		store_return(oldbuf, d, &ret_start, &ret_count);
-	}
-
-	d->returns.start = ret_start;
-	d->returns.count = ret_count;
-
-	return flushed;
-}
-
-/* process a set of packets to distribute them to workers */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs)
-{
-	unsigned next_idx = 0;
-	unsigned wkr = 0;
-	struct rte_mbuf *next_mb = NULL;
-	int64_t next_value = 0;
-	uint32_t new_tag = 0;
-	unsigned ret_start = d->returns.start,
-			ret_count = d->returns.count;
-
-	if (unlikely(num_mbufs == 0))
-		return process_returns(d);
-
-	while (next_idx < num_mbufs || next_mb != NULL) {
-
-		int64_t data = d->bufs[wkr].bufptr64;
-		uintptr_t oldbuf = 0;
-
-		if (!next_mb) {
-			next_mb = mbufs[next_idx++];
-			next_value = (((int64_t)(uintptr_t)next_mb)
-					<< RTE_DISTRIB_FLAG_BITS);
-			/*
-			 * User is advocated to set tag vaue for each
-			 * mbuf before calling rte_distributor_process.
-			 * User defined tags are used to identify flows,
-			 * or sessions.
-			 */
-			new_tag = next_mb->hash.usr;
-
-			/*
-			 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64
-			 * then the size of match has to be expanded.
-			 */
-			uint64_t match = 0;
-			unsigned i;
-			/*
-			 * to scan for a match use "xor" and "not" to get a 0/1
-			 * value, then use shifting to merge to single "match"
-			 * variable, where a one-bit indicates a match for the
-			 * worker given by the bit-position
-			 */
-			for (i = 0; i < d->num_workers; i++)
-				match |= (!(d->in_flight_tags[i] ^ new_tag)
-					<< i);
-
-			/* Only turned-on bits are considered as match */
-			match &= d->in_flight_bitmask;
-
-			if (match) {
-				next_mb = NULL;
-				unsigned worker = __builtin_ctzl(match);
-				if (add_to_backlog(&d->backlog[worker],
-						next_value) < 0)
-					next_idx--;
-			}
-		}
-
-		if ((data & RTE_DISTRIB_GET_BUF) &&
-				(d->backlog[wkr].count || next_mb)) {
-
-			if (d->backlog[wkr].count)
-				d->bufs[wkr].bufptr64 =
-						backlog_pop(&d->backlog[wkr]);
-
-			else {
-				d->bufs[wkr].bufptr64 = next_value;
-				d->in_flight_tags[wkr] = new_tag;
-				d->in_flight_bitmask |= (1UL << wkr);
-				next_mb = NULL;
-			}
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		} else if (data & RTE_DISTRIB_RETURN_BUF) {
-			handle_worker_shutdown(d, wkr);
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		}
-
-		/* store returns in a circular buffer */
-		store_return(oldbuf, d, &ret_start, &ret_count);
-
-		if (++wkr == d->num_workers)
-			wkr = 0;
-	}
-	/* to finish, check all workers for backlog and schedule work for them
-	 * if they are ready */
-	for (wkr = 0; wkr < d->num_workers; wkr++)
-		if (d->backlog[wkr].count &&
-				(d->bufs[wkr].bufptr64 & RTE_DISTRIB_GET_BUF)) {
-
-			int64_t oldbuf = d->bufs[wkr].bufptr64 >>
-					RTE_DISTRIB_FLAG_BITS;
-			store_return(oldbuf, d, &ret_start, &ret_count);
-
-			d->bufs[wkr].bufptr64 = backlog_pop(&d->backlog[wkr]);
-		}
-
-	d->returns.start = ret_start;
-	d->returns.count = ret_count;
-	return num_mbufs;
-}
-
-/* return to the caller, packets returned from workers */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs)
-{
-	struct rte_distributor_returned_pkts *returns = &d->returns;
-	unsigned retval = (max_mbufs < returns->count) ?
-			max_mbufs : returns->count;
-	unsigned i;
-
-	for (i = 0; i < retval; i++) {
-		unsigned idx = (returns->start + i) & RTE_DISTRIB_RETURNS_MASK;
-		mbufs[i] = returns->mbufs[idx];
-	}
-	returns->start += i;
-	returns->count -= i;
-
-	return retval;
-}
-
-/* return the number of packets in-flight in a distributor, i.e. packets
- * being workered on or queued up in a backlog. */
-static inline unsigned
-total_outstanding(const struct rte_distributor *d)
-{
-	unsigned wkr, total_outstanding;
-
-	total_outstanding = __builtin_popcountl(d->in_flight_bitmask);
-
-	for (wkr = 0; wkr < d->num_workers; wkr++)
-		total_outstanding += d->backlog[wkr].count;
-
-	return total_outstanding;
-}
-
-/* flush the distributor, so that there are no outstanding packets in flight or
- * queued up. */
-int
-rte_distributor_flush(struct rte_distributor *d)
-{
-	const unsigned flushed = total_outstanding(d);
-
-	while (total_outstanding(d) > 0)
-		rte_distributor_process(d, NULL, 0);
-
-	return flushed;
-}
-
-/* clears the internal returns array in the distributor */
-void
-rte_distributor_clear_returns(struct rte_distributor *d)
-{
-	d->returns.start = d->returns.count = 0;
-#ifndef __OPTIMIZE__
-	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
-#endif
-}
-
-/* creates a distributor instance */
-struct rte_distributor *
-rte_distributor_create(const char *name,
-		unsigned socket_id,
-		unsigned num_workers)
-{
-	struct rte_distributor *d;
-	struct rte_distributor_list *distributor_list;
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-
-	/* compilation-time checks */
-	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
-	RTE_BUILD_BUG_ON(RTE_DISTRIB_MAX_WORKERS >
-				sizeof(d->in_flight_bitmask) * CHAR_BIT);
-
-	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
-	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
-	if (mz == NULL) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-
-	d = mz->addr;
-	snprintf(d->name, sizeof(d->name), "%s", name);
-	d->num_workers = num_workers;
-
-	distributor_list = RTE_TAILQ_CAST(rte_distributor_tailq.head,
-					  rte_distributor_list);
-
-	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
-	TAILQ_INSERT_TAIL(distributor_list, d, next);
-	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
-
-	return d;
-}
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
deleted file mode 100644
index 7d36bc8..0000000
--- a/lib/librte_distributor/rte_distributor.h
+++ /dev/null
@@ -1,247 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _RTE_DISTRIBUTE_H_
-#define _RTE_DISTRIBUTE_H_
-
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
new file mode 100644
index 0000000..b890947
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -0,0 +1,487 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_v20.h"
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/* we will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits. */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS	64
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+struct rte_distributor_backlog {
+	unsigned start;
+	unsigned count;
+	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
+};
+
+struct rte_distributor_returned_pkts {
+	unsigned start;
+	unsigned count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned num_workers;                 /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+TAILQ_HEAD(rte_distributor_list, rte_distributor);
+
+static struct rte_tailq_elem rte_distributor_tailq = {
+	.name = "RTE_DISTRIBUTOR",
+};
+EAL_REGISTER_TAILQ(rte_distributor_tailq)
+
+/**** APIs called by workers ****/
+
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
+			| RTE_DISTRIB_GET_BUF;
+	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
+		rte_pause();
+	buf->bufptr64 = req;
+}
+
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned worker_id)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
+		return NULL;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
+	return (struct rte_mbuf *)((uintptr_t)ret);
+}
+
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	struct rte_mbuf *ret;
+	rte_distributor_request_pkt(d, worker_id, oldpkt);
+	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
+		rte_pause();
+	return ret;
+}
+
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
+			| RTE_DISTRIB_RETURN_BUF;
+	buf->bufptr64 = req;
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* as name suggests, adds a packet to the backlog for a particular worker */
+static int
+add_to_backlog(struct rte_distributor_backlog *bl, int64_t item)
+{
+	if (bl->count == RTE_DISTRIB_BACKLOG_SIZE)
+		return -1;
+
+	bl->pkts[(bl->start + bl->count++) & (RTE_DISTRIB_BACKLOG_MASK)]
+			= item;
+	return 0;
+}
+
+/* takes the next packet for a worker off the backlog */
+static int64_t
+backlog_pop(struct rte_distributor_backlog *bl)
+{
+	bl->count--;
+	return bl->pkts[bl->start++ & RTE_DISTRIB_BACKLOG_MASK];
+}
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
+		unsigned *ret_start, unsigned *ret_count)
+{
+	/* store returns in a circular buffer - code is branch-free */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
+}
+
+static inline void
+handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
+{
+	d->in_flight_tags[wkr] = 0;
+	d->in_flight_bitmask &= ~(1UL << wkr);
+	d->bufs[wkr].bufptr64 = 0;
+	if (unlikely(d->backlog[wkr].count != 0)) {
+		/* On return of a packet, we need to move the
+		 * queued packets for this core elsewhere.
+		 * Easiest solution is to set things up for
+		 * a recursive call. That will cause those
+		 * packets to be queued up for the next free
+		 * core, i.e. it will return as soon as a
+		 * core becomes free to accept the first
+		 * packet, as subsequent ones will be added to
+		 * the backlog for that core.
+		 */
+		struct rte_mbuf *pkts[RTE_DISTRIB_BACKLOG_SIZE];
+		unsigned i;
+		struct rte_distributor_backlog *bl = &d->backlog[wkr];
+
+		for (i = 0; i < bl->count; i++) {
+			unsigned idx = (bl->start + i) &
+					RTE_DISTRIB_BACKLOG_MASK;
+			pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >>
+					RTE_DISTRIB_FLAG_BITS));
+		}
+		/* recursive call.
+		 * Note that the tags were set before first level call
+		 * to rte_distributor_process.
+		 */
+		rte_distributor_process(d, pkts, i);
+		bl->count = bl->start = 0;
+	}
+}
+
+/* this function is called when process() fn is called without any new
+ * packets. It goes through all the workers and clears any returned packets
+ * to do a partial flush.
+ */
+static int
+process_returns(struct rte_distributor *d)
+{
+	unsigned wkr;
+	unsigned flushed = 0;
+	unsigned ret_start = d->returns.start,
+			ret_count = d->returns.count;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++) {
+
+		const int64_t data = d->bufs[wkr].bufptr64;
+		uintptr_t oldbuf = 0;
+
+		if (data & RTE_DISTRIB_GET_BUF) {
+			flushed++;
+			if (d->backlog[wkr].count)
+				d->bufs[wkr].bufptr64 =
+						backlog_pop(&d->backlog[wkr]);
+			else {
+				d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
+				d->in_flight_tags[wkr] = 0;
+				d->in_flight_bitmask &= ~(1UL << wkr);
+			}
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		} else if (data & RTE_DISTRIB_RETURN_BUF) {
+			handle_worker_shutdown(d, wkr);
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		}
+
+		store_return(oldbuf, d, &ret_start, &ret_count);
+	}
+
+	d->returns.start = ret_start;
+	d->returns.count = ret_count;
+
+	return flushed;
+}
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned num_mbufs)
+{
+	unsigned next_idx = 0;
+	unsigned wkr = 0;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint32_t new_tag = 0;
+	unsigned ret_start = d->returns.start,
+			ret_count = d->returns.count;
+
+	if (unlikely(num_mbufs == 0))
+		return process_returns(d);
+
+	while (next_idx < num_mbufs || next_mb != NULL) {
+
+		int64_t data = d->bufs[wkr].bufptr64;
+		uintptr_t oldbuf = 0;
+
+		if (!next_mb) {
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb)
+					<< RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			new_tag = next_mb->hash.usr;
+
+			/*
+			 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64
+			 * then the size of match has to be expanded.
+			 */
+			uint64_t match = 0;
+			unsigned i;
+			/*
+			 * to scan for a match use "xor" and "not" to get a 0/1
+			 * value, then use shifting to merge to single "match"
+			 * variable, where a one-bit indicates a match for the
+			 * worker given by the bit-position
+			 */
+			for (i = 0; i < d->num_workers; i++)
+				match |= (!(d->in_flight_tags[i] ^ new_tag)
+					<< i);
+
+			/* Only turned-on bits are considered as match */
+			match &= d->in_flight_bitmask;
+
+			if (match) {
+				next_mb = NULL;
+				unsigned worker = __builtin_ctzl(match);
+				if (add_to_backlog(&d->backlog[worker],
+						next_value) < 0)
+					next_idx--;
+			}
+		}
+
+		if ((data & RTE_DISTRIB_GET_BUF) &&
+				(d->backlog[wkr].count || next_mb)) {
+
+			if (d->backlog[wkr].count)
+				d->bufs[wkr].bufptr64 =
+						backlog_pop(&d->backlog[wkr]);
+
+			else {
+				d->bufs[wkr].bufptr64 = next_value;
+				d->in_flight_tags[wkr] = new_tag;
+				d->in_flight_bitmask |= (1UL << wkr);
+				next_mb = NULL;
+			}
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		} else if (data & RTE_DISTRIB_RETURN_BUF) {
+			handle_worker_shutdown(d, wkr);
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		}
+
+		/* store returns in a circular buffer */
+		store_return(oldbuf, d, &ret_start, &ret_count);
+
+		if (++wkr == d->num_workers)
+			wkr = 0;
+	}
+	/* to finish, check all workers for backlog and schedule work for them
+	 * if they are ready */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		if (d->backlog[wkr].count &&
+				(d->bufs[wkr].bufptr64 & RTE_DISTRIB_GET_BUF)) {
+
+			int64_t oldbuf = d->bufs[wkr].bufptr64 >>
+					RTE_DISTRIB_FLAG_BITS;
+			store_return(oldbuf, d, &ret_start, &ret_count);
+
+			d->bufs[wkr].bufptr64 = backlog_pop(&d->backlog[wkr]);
+		}
+
+	d->returns.start = ret_start;
+	d->returns.count = ret_count;
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned idx = (returns->start + i) & RTE_DISTRIB_RETURNS_MASK;
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/* return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog. */
+static inline unsigned
+total_outstanding(const struct rte_distributor *d)
+{
+	unsigned wkr, total_outstanding;
+
+	total_outstanding = __builtin_popcountl(d->in_flight_bitmask);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/* flush the distributor, so that there are no outstanding packets in flight or
+ * queued up. */
+int
+rte_distributor_flush(struct rte_distributor *d)
+{
+	const unsigned flushed = total_outstanding(d);
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process(d, NULL, 0);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns(struct rte_distributor *d)
+{
+	d->returns.start = d->returns.count = 0;
+#ifndef __OPTIMIZE__
+	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
+#endif
+}
+
+/* creates a distributor instance */
+struct rte_distributor *
+rte_distributor_create(const char *name,
+		unsigned socket_id,
+		unsigned num_workers)
+{
+	struct rte_distributor *d;
+	struct rte_distributor_list *distributor_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+	RTE_BUILD_BUG_ON(RTE_DISTRIB_MAX_WORKERS >
+				sizeof(d->in_flight_bitmask) * CHAR_BIT);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+	distributor_list = RTE_TAILQ_CAST(rte_distributor_tailq.head,
+					  rte_distributor_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(distributor_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..7d36bc8
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTE_H_
+#define _RTE_DISTRIBUTE_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned socket_id,
+		unsigned num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be procesed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements
  @ 2017-02-21  3:17  3% ` David Hunt
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-24 14:01  0%   ` [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
  0 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v4 changes:
   * fixed issue building shared libraries

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x

[01/17] lib: rename legacy distributor lib files
[02/17] lib: symbol versioning of functions in distributor
[03/17] lib: create rte_distributor_private.h
[04/17] lib: add new burst oriented distributor structs
[05/17] lib: add new distributor code
[06/17] lib: add SIMD flow matching to distributor
[07/17] lib: apply symbol versioning to distibutor lib
[08/17] test: change params to distributor autotest
[09/17] test: switch distributor test over to burst API
[10/17] test: test single and burst distributor API
[11/17] test: add perf test for distributor burst mode
[12/17] example: add extra stats to distributor sample
[13/17] sample: distributor: wait for ports to come up
[14/17] sample: switch to new distributor API
[15/17] lib: make v20 header file private
[16/17] doc: distributor library changes for new burst api
[17/17] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] maintainers: claim responsability for xen
  @ 2017-02-20 17:36  3%               ` Joao Martins
  0 siblings, 0 replies; 200+ results
From: Joao Martins @ 2017-02-20 17:36 UTC (permalink / raw)
  To: Jan Blunck, Konrad Rzeszutek Wilk
  Cc: Vincent JARDIN, Thomas Monjalon, Tan, Jianfeng,
	Konrad Rzeszutek Wilk, dev, Bruce Richardson, Yuanhan Liu,
	Xen-devel

On 02/20/2017 09:56 AM, Jan Blunck wrote:
> On Fri, Feb 17, 2017 at 5:07 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Thu, Feb 16, 2017 at 10:51:44PM +0100, Vincent JARDIN wrote:
>>> Le 16/02/2017 à 14:36, Konrad Rzeszutek Wilk a écrit :
>>>>> Is it time now to officially remove Dom0 support?
>>>> So we do have an prototype implementation of netback but it is waiting
>>>> for review of xen-devel to the spec.
>>>>
>>>> And I believe the implementation does utilize some of the dom0
>>>> parts of code in DPDK.
>>>
>>> Please, do you have URLs/pointers about it? It would be interesting to share
>>> it with DPDK community too.
>>
>> Joao, would it be possible to include an tarball of the patches? I know
>> they are no in the right state with the review of the staging
>> grants API - they are incompatible, but it may help folks to get
>> a feel for what DPDK APIs you used?
>>
>> Staging grants API:
>> https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg01878.html
> 
> The topic of the grants API is unrelated to the dom0 memory pool. The
> memory pool which uses xen_create_contiguous_region() is used in cases
> we know that there are no hugepages available.
Correct, I think what Konrad was trying to say was that xen-netback normally
lives in a PV domain which doesn't have superpages, therefore such driver would
need that memory pool part in order to work. The mentioned spec are additions to
xen netif ABI for backend to safely map a fixed set of grant references
(recycled overtime, provided by frontend) with the purpose of avoiding grant ops
- DPDK would be one of the users.

> Joao and I met in Dublin and I whined about not being able to call
> into the grants API from userspace and instead need to kick a kernel
> driver to do the work for every burst. It would be great if that could
> change in the future.
Hm, I recall about that discussion. AFAIK you can do both grant alloc/revoke of
pages through xengntshr_share_pages(...) and xengntshr_unshare(...) APIs
provided by libxengnttab[0] starting 4.7 or, libxc on older versions with
xc_gntshr_share_pages/xc_gntshr_munmap[2]. For the notification (or kicks) you
can allocate the event channel in the guest with libevtchn[1] starting 4.7, with
xenevtchn_bind_unbound_port(...) or libxc on older versions with
xc_evtchn_bind_unbound_port(...)[2]. And kick the guest with xenevtchn_notify or
xc_evtchn_notify(...) [latter on older versions]. In short these APIs are ioctls
to /dev/gntdev and /dev/evtchn. xenstore operations can also be done in
userspace with libxenstore[3].

To have the (similar) behavior of VRING_AVAIL_F_NO_INTERRUPT (i.e. avoiding the
kicks) you "just" don't set rsp_event in ring (e.g. no calls to
RING_FINAL_CHECK_FOR_RESPONSES), and keep checking for unconsumed Rx/Tx
responses. For guest request notification (to wake up the backend for new Tx/Rx
requests), you're dependent on whether backend requests it since it's the one
setting req_event index. If it indeed sets it then you gotta use the evtchn
notify that I depicted in the previous paragraph.

Hope that helps!

Joao

[0]
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libs/gnttab/include/xengnttab.h;hb=HEAD
[1]
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libs/evtchn/include/xenevtchn.h;hb=HEAD
[2]
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/include/xenctrl_compat.h;hb=HEAD
[3]
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/xenstore/include/xenstore.h;hb=HEAD

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 2/2] kni: remove KNI vhost support
  2017-02-20 14:30  5% ` [dpdk-dev] [PATCH v2 1/2] doc: add removed items section to release notes Ferruh Yigit
@ 2017-02-20 14:30  1%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-02-20 14:30 UTC (permalink / raw)
  To: Thomas Monjalon, John McNamara; +Cc: dev, Bruce Richardson, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                             |   3 -
 devtools/test-build.sh                         |   1 -
 doc/guides/prog_guide/index.rst                |   4 -
 doc/guides/prog_guide/kernel_nic_interface.rst | 113 ----
 doc/guides/rel_notes/deprecation.rst           |   6 -
 doc/guides/rel_notes/release_17_05.rst         |   2 +
 lib/librte_eal/linuxapp/kni/Makefile           |   1 -
 lib/librte_eal/linuxapp/kni/kni_dev.h          |  33 -
 lib/librte_eal/linuxapp/kni/kni_fifo.h         |  14 -
 lib/librte_eal/linuxapp/kni/kni_misc.c         |  22 -
 lib/librte_eal/linuxapp/kni/kni_net.c          |  13 -
 lib/librte_eal/linuxapp/kni/kni_vhost.c        | 842 -------------------------
 12 files changed, 2 insertions(+), 1052 deletions(-)
 delete mode 100644 lib/librte_eal/linuxapp/kni/kni_vhost.c

diff --git a/config/common_base b/config/common_base
index 71a4fcb..aeee13e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -584,9 +584,6 @@ CONFIG_RTE_LIBRTE_KNI=n
 CONFIG_RTE_KNI_KMOD=n
 CONFIG_RTE_KNI_KMOD_ETHTOOL=n
 CONFIG_RTE_KNI_PREEMPT_DEFAULT=y
-CONFIG_RTE_KNI_VHOST=n
-CONFIG_RTE_KNI_VHOST_MAX_CACHE_SIZE=1024
-CONFIG_RTE_KNI_VHOST_VNET_HDR_EN=n
 
 #
 # Compile the pdump library
diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 0f131fc..84d3165 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -194,7 +194,6 @@ config () # <directory> <target> <options>
 		sed -ri        's,(PMD_OPENSSL=)n,\1y,' $1/.config
 		test "$DPDK_DEP_SSL" != y || \
 		sed -ri            's,(PMD_QAT=)n,\1y,' $1/.config
-		sed -ri        's,(KNI_VHOST.*=)n,\1y,' $1/.config
 		sed -ri           's,(SCHED_.*=)n,\1y,' $1/.config
 		build_config_hook $1 $2 $3
 
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 7f825cb..77f427e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -127,10 +127,6 @@ Programmer's Guide
 
 :numref:`figure_pkt_flow_kni` :ref:`figure_pkt_flow_kni`
 
-:numref:`figure_vhost_net_arch2` :ref:`figure_vhost_net_arch2`
-
-:numref:`figure_kni_traffic_flow` :ref:`figure_kni_traffic_flow`
-
 
 :numref:`figure_pkt_proc_pipeline_qos` :ref:`figure_pkt_proc_pipeline_qos`
 
diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index 4f25595..6f7fd28 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -168,116 +168,3 @@ The application handlers can be registered upon interface creation or explicitly
 This provides flexibility in multiprocess scenarios
 (where the KNI is created in the primary process but the callbacks are handled in the secondary one).
 The constraint is that a single process can register and handle the requests.
-
-.. _kni_vhost_backend-label:
-
-KNI Working as a Kernel vHost Backend
--------------------------------------
-
-vHost is a kernel module usually working as the backend of virtio (a para- virtualization driver framework)
-to accelerate the traffic from the guest to the host.
-The DPDK Kernel NIC interface provides the ability to hookup vHost traffic into userspace DPDK application.
-Together with the DPDK PMD virtio, it significantly improves the throughput between guest and host.
-In the scenario where DPDK is running as fast path in the host, kni-vhost is an efficient path for the traffic.
-
-Overview
-~~~~~~~~
-
-vHost-net has three kinds of real backend implementations. They are: 1) tap, 2) macvtap and 3) RAW socket.
-The main idea behind kni-vhost is making the KNI work as a RAW socket, attaching it as the backend instance of vHost-net.
-It is using the existing interface with vHost-net, so it does not require any kernel hacking,
-and is fully-compatible with the kernel vhost module.
-As vHost is still taking responsibility for communicating with the front-end virtio,
-it naturally supports both legacy virtio -net and the DPDK PMD virtio.
-There is a little penalty that comes from the non-polling mode of vhost.
-However, it scales throughput well when using KNI in multi-thread mode.
-
-.. _figure_vhost_net_arch2:
-
-.. figure:: img/vhost_net_arch.*
-
-   vHost-net Architecture Overview
-
-
-Packet Flow
-~~~~~~~~~~~
-
-There is only a minor difference from the original KNI traffic flows.
-On transmit side, vhost kthread calls the RAW socket's ops sendmsg and it puts the packets into the KNI transmit FIFO.
-On the receive side, the kni kthread gets packets from the KNI receive FIFO, puts them into the queue of the raw socket,
-and wakes up the task in vhost kthread to begin receiving.
-All the packet copying, irrespective of whether it is on the transmit or receive side,
-happens in the context of vhost kthread.
-Every vhost-net device is exposed to a front end virtio device in the guest.
-
-.. _figure_kni_traffic_flow:
-
-.. figure:: img/kni_traffic_flow.*
-
-   KNI Traffic Flow
-
-
-Sample Usage
-~~~~~~~~~~~~
-
-Before starting to use KNI as the backend of vhost, the CONFIG_RTE_KNI_VHOST configuration option must be turned on.
-Otherwise, by default, KNI will not enable its backend support capability.
-
-Of course, as a prerequisite, the vhost/vhost-net kernel CONFIG should be chosen before compiling the kernel.
-
-#.  Compile the DPDK and insert uio_pci_generic/igb_uio kernel modules as normal.
-
-#.  Insert the KNI kernel module:
-
-    .. code-block:: console
-
-        insmod ./rte_kni.ko
-
-    If using KNI in multi-thread mode, use the following command line:
-
-    .. code-block:: console
-
-        insmod ./rte_kni.ko kthread_mode=multiple
-
-#.  Running the KNI sample application:
-
-    .. code-block:: console
-
-        examples/kni/build/app/kni -c -0xf0 -n 4 -- -p 0x3 -P --config="(0,4,6),(1,5,7)"
-
-    This command runs the kni sample application with two physical ports.
-    Each port pins two forwarding cores (ingress/egress) in user space.
-
-#.  Assign a raw socket to vhost-net during qemu-kvm startup.
-    The DPDK does not provide a script to do this since it is easy for the user to customize.
-    The following shows the key steps to launch qemu-kvm with kni-vhost:
-
-    .. code-block:: bash
-
-        #!/bin/bash
-        echo 1 > /sys/class/net/vEth0/sock_en
-        fd=`cat /sys/class/net/vEth0/sock_fd`
-        qemu-kvm \
-        -name vm1 -cpu host -m 2048 -smp 1 -hda /opt/vm-fc16.img \
-        -netdev tap,fd=$fd,id=hostnet1,vhost=on \
-        -device virti-net-pci,netdev=hostnet1,id=net1,bus=pci.0,addr=0x4
-
-It is simple to enable raw socket using sysfs sock_en and get raw socket fd using sock_fd under the KNI device node.
-
-Then, using the qemu-kvm command with the -netdev option to assign such raw socket fd as vhost's backend.
-
-.. note::
-
-    The key word tap must exist as qemu-kvm now only supports vhost with a tap backend, so here we cheat qemu-kvm by an existing fd.
-
-Compatibility Configure Option
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-There is a CONFIG_RTE_KNI_VHOST_VNET_HDR_EN configuration option in DPDK configuration file.
-By default, it set to n, which means do not turn on the virtio net header,
-which is used to support additional features (such as, csum offload, vlan offload, generic-segmentation and so on),
-since the kni-vhost does not yet support those features.
-
-Even if the option is turned on, kni-vhost will ignore the information that the header contains.
-When working with legacy virtio on the guest, it is better to turn off unsupported offload features using ethtool -K.
-Otherwise, there may be problems such as an incorrect L4 checksum error.
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9d4dfcc..66ca596 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -113,12 +113,6 @@ Deprecation Notices
   has different feature set, meaning functions like ``rte_vhost_feature_disable``
   need be changed. Last, file rte_virtio_net.h will be renamed to rte_vhost.h.
 
-* kni: Remove :ref:`kni_vhost_backend-label` feature (KNI_VHOST) in 17.05 release.
-  :doc:`Vhost Library </prog_guide/vhost_lib>` is currently preferred method for
-  guest - host communication. Just for clarification, this is not to remove KNI
-  or VHOST feature, but KNI_VHOST which is a KNI feature enabled via a compile
-  time option, and disabled by default.
-
 * ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
   A pointer to a rte_cryptodev_config structure will be added to the
   function prototype ``cryptodev_configure_t``, as a new parameter.
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 59929b0..e25ea9f 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -137,6 +137,8 @@ Removed Items
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* KNI vhost support removed.
+
 
 Shared Library Versions
 -----------------------
diff --git a/lib/librte_eal/linuxapp/kni/Makefile b/lib/librte_eal/linuxapp/kni/Makefile
index 3c22b63..7864a2a 100644
--- a/lib/librte_eal/linuxapp/kni/Makefile
+++ b/lib/librte_eal/linuxapp/kni/Makefile
@@ -61,7 +61,6 @@ DEPDIRS-y += lib/librte_eal/linuxapp/eal
 #
 SRCS-y := kni_misc.c
 SRCS-y += kni_net.c
-SRCS-$(CONFIG_RTE_KNI_VHOST) += kni_vhost.c
 SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += kni_ethtool.c
 
 SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += ethtool/ixgbe/ixgbe_main.c
diff --git a/lib/librte_eal/linuxapp/kni/kni_dev.h b/lib/librte_eal/linuxapp/kni/kni_dev.h
index 58cbadd..002e5fa 100644
--- a/lib/librte_eal/linuxapp/kni/kni_dev.h
+++ b/lib/librte_eal/linuxapp/kni/kni_dev.h
@@ -37,10 +37,6 @@
 #include <linux/spinlock.h>
 #include <linux/list.h>
 
-#ifdef RTE_KNI_VHOST
-#include <net/sock.h>
-#endif
-
 #include <exec-env/rte_kni_common.h>
 #define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
 
@@ -102,15 +98,6 @@ struct kni_dev {
 	/* synchro for request processing */
 	unsigned long synchro;
 
-#ifdef RTE_KNI_VHOST
-	struct kni_vhost_queue *vhost_queue;
-
-	volatile enum {
-		BE_STOP = 0x1,
-		BE_START = 0x2,
-		BE_FINISH = 0x4,
-	} vq_status;
-#endif
 	/* buffers */
 	void *pa[MBUF_BURST_SZ];
 	void *va[MBUF_BURST_SZ];
@@ -118,26 +105,6 @@ struct kni_dev {
 	void *alloc_va[MBUF_BURST_SZ];
 };
 
-#ifdef RTE_KNI_VHOST
-uint32_t
-kni_poll(struct file *file, struct socket *sock, poll_table * wait);
-int kni_chk_vhost_rx(struct kni_dev *kni);
-int kni_vhost_init(struct kni_dev *kni);
-int kni_vhost_backend_release(struct kni_dev *kni);
-
-struct kni_vhost_queue {
-	struct sock sk;
-	struct socket *sock;
-	int vnet_hdr_sz;
-	struct kni_dev *kni;
-	int sockfd;
-	uint32_t flags;
-	struct sk_buff *cache;
-	struct rte_kni_fifo *fifo;
-};
-
-#endif
-
 void kni_net_rx(struct kni_dev *kni);
 void kni_net_init(struct net_device *dev);
 void kni_net_config_lo_mode(char *lo_str);
diff --git a/lib/librte_eal/linuxapp/kni/kni_fifo.h b/lib/librte_eal/linuxapp/kni/kni_fifo.h
index 025ec1c..14f4141 100644
--- a/lib/librte_eal/linuxapp/kni/kni_fifo.h
+++ b/lib/librte_eal/linuxapp/kni/kni_fifo.h
@@ -91,18 +91,4 @@ kni_fifo_free_count(struct rte_kni_fifo *fifo)
 	return (fifo->read - fifo->write - 1) & (fifo->len - 1);
 }
 
-#ifdef RTE_KNI_VHOST
-/**
- * Initializes the kni fifo structure
- */
-static inline void
-kni_fifo_init(struct rte_kni_fifo *fifo, uint32_t size)
-{
-	fifo->write = 0;
-	fifo->read = 0;
-	fifo->len = size;
-	fifo->elem_size = sizeof(void *);
-}
-#endif
-
 #endif /* _KNI_FIFO_H_ */
diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c b/lib/librte_eal/linuxapp/kni/kni_misc.c
index 33b61f2..f1f6bea 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -140,11 +140,7 @@ kni_thread_single(void *data)
 		down_read(&knet->kni_list_lock);
 		for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
 			list_for_each_entry(dev, &knet->kni_list_head, list) {
-#ifdef RTE_KNI_VHOST
-				kni_chk_vhost_rx(dev);
-#else
 				kni_net_rx(dev);
-#endif
 				kni_net_poll_resp(dev);
 			}
 		}
@@ -167,11 +163,7 @@ kni_thread_multiple(void *param)
 
 	while (!kthread_should_stop()) {
 		for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
-#ifdef RTE_KNI_VHOST
-			kni_chk_vhost_rx(dev);
-#else
 			kni_net_rx(dev);
-#endif
 			kni_net_poll_resp(dev);
 		}
 #ifdef RTE_KNI_PREEMPT_DEFAULT
@@ -248,9 +240,6 @@ kni_release(struct inode *inode, struct file *file)
 			dev->pthread = NULL;
 		}
 
-#ifdef RTE_KNI_VHOST
-		kni_vhost_backend_release(dev);
-#endif
 		kni_dev_remove(dev);
 		list_del(&dev->list);
 	}
@@ -397,10 +386,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	kni->sync_va = dev_info.sync_va;
 	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
 
-#ifdef RTE_KNI_VHOST
-	kni->vhost_queue = NULL;
-	kni->vq_status = BE_STOP;
-#endif
 	kni->mbuf_size = dev_info.mbuf_size;
 
 	pr_debug("tx_phys:      0x%016llx, tx_q addr:      0x%p\n",
@@ -490,10 +475,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 		return -ENODEV;
 	}
 
-#ifdef RTE_KNI_VHOST
-	kni_vhost_init(kni);
-#endif
-
 	ret = kni_run_thread(knet, kni, dev_info.force_bind);
 	if (ret != 0)
 		return ret;
@@ -537,9 +518,6 @@ kni_ioctl_release(struct net *net, uint32_t ioctl_num,
 			dev->pthread = NULL;
 		}
 
-#ifdef RTE_KNI_VHOST
-		kni_vhost_backend_release(dev);
-#endif
 		kni_dev_remove(dev);
 		list_del(&dev->list);
 		ret = 0;
diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c b/lib/librte_eal/linuxapp/kni/kni_net.c
index 4ac99cf..db9f489 100644
--- a/lib/librte_eal/linuxapp/kni/kni_net.c
+++ b/lib/librte_eal/linuxapp/kni/kni_net.c
@@ -198,18 +198,6 @@ kni_net_config(struct net_device *dev, struct ifmap *map)
 /*
  * Transmit a packet (called by the kernel)
  */
-#ifdef RTE_KNI_VHOST
-static int
-kni_net_tx(struct sk_buff *skb, struct net_device *dev)
-{
-	struct kni_dev *kni = netdev_priv(dev);
-
-	dev_kfree_skb(skb);
-	kni->stats.tx_dropped++;
-
-	return NETDEV_TX_OK;
-}
-#else
 static int
 kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 {
@@ -289,7 +277,6 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 
 	return NETDEV_TX_OK;
 }
-#endif
 
 /*
  * RX: normal working mode
diff --git a/lib/librte_eal/linuxapp/kni/kni_vhost.c b/lib/librte_eal/linuxapp/kni/kni_vhost.c
deleted file mode 100644
index f54c34b..0000000
--- a/lib/librte_eal/linuxapp/kni/kni_vhost.c
+++ /dev/null
@@ -1,842 +0,0 @@
-/*-
- * GPL LICENSE SUMMARY
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *
- *   This program is free software; you can redistribute it and/or modify
- *   it under the terms of version 2 of the GNU General Public License as
- *   published by the Free Software Foundation.
- *
- *   This program is distributed in the hope that it will be useful, but
- *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- *   General Public License for more details.
- *
- *   You should have received a copy of the GNU General Public License
- *   along with this program; if not, write to the Free Software
- *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
- *   The full GNU General Public License is included in this distribution
- *   in the file called LICENSE.GPL.
- *
- *   Contact Information:
- *   Intel Corporation
- */
-
-#include <linux/module.h>
-#include <linux/net.h>
-#include <net/sock.h>
-#include <linux/virtio_net.h>
-#include <linux/wait.h>
-#include <linux/mm.h>
-#include <linux/nsproxy.h>
-#include <linux/sched.h>
-#include <linux/if_tun.h>
-#include <linux/version.h>
-#include <linux/file.h>
-
-#include "compat.h"
-#include "kni_dev.h"
-#include "kni_fifo.h"
-
-#define RX_BURST_SZ 4
-
-#ifdef HAVE_STATIC_SOCK_MAP_FD
-static int kni_sock_map_fd(struct socket *sock)
-{
-	struct file *file;
-	int fd = get_unused_fd_flags(0);
-
-	if (fd < 0)
-		return fd;
-
-	file = sock_alloc_file(sock, 0, NULL);
-	if (IS_ERR(file)) {
-		put_unused_fd(fd);
-		return PTR_ERR(file);
-	}
-	fd_install(fd, file);
-	return fd;
-}
-#endif
-
-static struct proto kni_raw_proto = {
-	.name = "kni_vhost",
-	.owner = THIS_MODULE,
-	.obj_size = sizeof(struct kni_vhost_queue),
-};
-
-static inline int
-kni_vhost_net_tx(struct kni_dev *kni, struct msghdr *m,
-		 uint32_t offset, uint32_t len)
-{
-	struct rte_kni_mbuf *pkt_kva = NULL;
-	struct rte_kni_mbuf *pkt_va = NULL;
-	int ret;
-
-	pr_debug("tx offset=%d, len=%d, iovlen=%d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   offset, len, (int)m->msg_iter.iov->iov_len);
-#else
-		   offset, len, (int)m->msg_iov->iov_len);
-#endif
-
-	/**
-	 * Check if it has at least one free entry in tx_q and
-	 * one entry in alloc_q.
-	 */
-	if (kni_fifo_free_count(kni->tx_q) == 0 ||
-	    kni_fifo_count(kni->alloc_q) == 0) {
-		/**
-		 * If no free entry in tx_q or no entry in alloc_q,
-		 * drops skb and goes out.
-		 */
-		goto drop;
-	}
-
-	/* dequeue a mbuf from alloc_q */
-	ret = kni_fifo_get(kni->alloc_q, (void **)&pkt_va, 1);
-	if (likely(ret == 1)) {
-		void *data_kva;
-
-		pkt_kva = (void *)pkt_va - kni->mbuf_va + kni->mbuf_kva;
-		data_kva = pkt_kva->buf_addr + pkt_kva->data_off
-			- kni->mbuf_va + kni->mbuf_kva;
-
-#ifdef HAVE_IOV_ITER_MSGHDR
-		copy_from_iter(data_kva, len, &m->msg_iter);
-#else
-		memcpy_fromiovecend(data_kva, m->msg_iov, offset, len);
-#endif
-
-		if (unlikely(len < ETH_ZLEN)) {
-			memset(data_kva + len, 0, ETH_ZLEN - len);
-			len = ETH_ZLEN;
-		}
-		pkt_kva->pkt_len = len;
-		pkt_kva->data_len = len;
-
-		/* enqueue mbuf into tx_q */
-		ret = kni_fifo_put(kni->tx_q, (void **)&pkt_va, 1);
-		if (unlikely(ret != 1)) {
-			/* Failing should not happen */
-			pr_err("Fail to enqueue mbuf into tx_q\n");
-			goto drop;
-		}
-	} else {
-		/* Failing should not happen */
-		pr_err("Fail to dequeue mbuf from alloc_q\n");
-		goto drop;
-	}
-
-	/* update statistics */
-	kni->stats.tx_bytes += len;
-	kni->stats.tx_packets++;
-
-	return 0;
-
-drop:
-	/* update statistics */
-	kni->stats.tx_dropped++;
-
-	return 0;
-}
-
-static inline int
-kni_vhost_net_rx(struct kni_dev *kni, struct msghdr *m,
-		 uint32_t offset, uint32_t len)
-{
-	uint32_t pkt_len;
-	struct rte_kni_mbuf *kva;
-	struct rte_kni_mbuf *va;
-	void *data_kva;
-	struct sk_buff *skb;
-	struct kni_vhost_queue *q = kni->vhost_queue;
-
-	if (unlikely(q == NULL))
-		return 0;
-
-	/* ensure at least one entry in free_q */
-	if (unlikely(kni_fifo_free_count(kni->free_q) == 0))
-		return 0;
-
-	skb = skb_dequeue(&q->sk.sk_receive_queue);
-	if (unlikely(skb == NULL))
-		return 0;
-
-	kva = (struct rte_kni_mbuf *)skb->data;
-
-	/* free skb to cache */
-	skb->data = NULL;
-	if (unlikely(kni_fifo_put(q->fifo, (void **)&skb, 1) != 1))
-		/* Failing should not happen */
-		pr_err("Fail to enqueue entries into rx cache fifo\n");
-
-	pkt_len = kva->data_len;
-	if (unlikely(pkt_len > len))
-		goto drop;
-
-	pr_debug("rx offset=%d, len=%d, pkt_len=%d, iovlen=%d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   offset, len, pkt_len, (int)m->msg_iter.iov->iov_len);
-#else
-		   offset, len, pkt_len, (int)m->msg_iov->iov_len);
-#endif
-
-	data_kva = kva->buf_addr + kva->data_off - kni->mbuf_va + kni->mbuf_kva;
-#ifdef HAVE_IOV_ITER_MSGHDR
-	if (unlikely(copy_to_iter(data_kva, pkt_len, &m->msg_iter)))
-#else
-	if (unlikely(memcpy_toiovecend(m->msg_iov, data_kva, offset, pkt_len)))
-#endif
-		goto drop;
-
-	/* Update statistics */
-	kni->stats.rx_bytes += pkt_len;
-	kni->stats.rx_packets++;
-
-	/* enqueue mbufs into free_q */
-	va = (void *)kva - kni->mbuf_kva + kni->mbuf_va;
-	if (unlikely(kni_fifo_put(kni->free_q, (void **)&va, 1) != 1))
-		/* Failing should not happen */
-		pr_err("Fail to enqueue entries into free_q\n");
-
-	pr_debug("receive done %d\n", pkt_len);
-
-	return pkt_len;
-
-drop:
-	/* Update drop statistics */
-	kni->stats.rx_dropped++;
-
-	return 0;
-}
-
-static uint32_t
-kni_sock_poll(struct file *file, struct socket *sock, poll_table *wait)
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-	uint32_t mask = 0;
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return POLLERR;
-
-	kni = q->kni;
-#ifdef HAVE_SOCKET_WQ
-	pr_debug("start kni_poll on group %d, wq 0x%16llx\n",
-		  kni->group_id, (uint64_t)sock->wq);
-	poll_wait(file, &sock->wq->wait, wait);
-#else
-	pr_debug("start kni_poll on group %d, wait at 0x%16llx\n",
-		  kni->group_id, (uint64_t)&sock->wait);
-	poll_wait(file, &sock->wait, wait);
-#endif
-
-	if (kni_fifo_count(kni->rx_q) > 0)
-		mask |= POLLIN | POLLRDNORM;
-
-	if (sock_writeable(&q->sk) ||
-#ifdef SOCKWQ_ASYNC_NOSPACE
-		(!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &q->sock->flags) &&
-			sock_writeable(&q->sk)))
-#else
-		(!test_and_set_bit(SOCK_ASYNC_NOSPACE, &q->sock->flags) &&
-			sock_writeable(&q->sk)))
-#endif
-		mask |= POLLOUT | POLLWRNORM;
-
-	return mask;
-}
-
-static inline void
-kni_vhost_enqueue(struct kni_dev *kni, struct kni_vhost_queue *q,
-		  struct sk_buff *skb, struct rte_kni_mbuf *va)
-{
-	struct rte_kni_mbuf *kva;
-
-	kva = (void *)(va) - kni->mbuf_va + kni->mbuf_kva;
-	(skb)->data = (unsigned char *)kva;
-	(skb)->len = kva->data_len;
-	skb_queue_tail(&q->sk.sk_receive_queue, skb);
-}
-
-static inline void
-kni_vhost_enqueue_burst(struct kni_dev *kni, struct kni_vhost_queue *q,
-	  struct sk_buff **skb, struct rte_kni_mbuf **va)
-{
-	int i;
-
-	for (i = 0; i < RX_BURST_SZ; skb++, va++, i++)
-		kni_vhost_enqueue(kni, q, *skb, *va);
-}
-
-int
-kni_chk_vhost_rx(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q = kni->vhost_queue;
-	uint32_t nb_in, nb_mbuf, nb_skb;
-	const uint32_t BURST_MASK = RX_BURST_SZ - 1;
-	uint32_t nb_burst, nb_backlog, i;
-	struct sk_buff *skb[RX_BURST_SZ];
-	struct rte_kni_mbuf *va[RX_BURST_SZ];
-
-	if (unlikely(BE_STOP & kni->vq_status)) {
-		kni->vq_status |= BE_FINISH;
-		return 0;
-	}
-
-	if (unlikely(q == NULL))
-		return 0;
-
-	nb_skb = kni_fifo_count(q->fifo);
-	nb_mbuf = kni_fifo_count(kni->rx_q);
-
-	nb_in = min(nb_mbuf, nb_skb);
-	nb_in = min_t(uint32_t, nb_in, RX_BURST_SZ);
-	nb_burst   = (nb_in & ~BURST_MASK);
-	nb_backlog = (nb_in & BURST_MASK);
-
-	/* enqueue skb_queue per BURST_SIZE bulk */
-	if (nb_burst != 0) {
-		if (unlikely(kni_fifo_get(kni->rx_q, (void **)&va, RX_BURST_SZ)
-				!= RX_BURST_SZ))
-			goto except;
-
-		if (unlikely(kni_fifo_get(q->fifo, (void **)&skb, RX_BURST_SZ)
-				!= RX_BURST_SZ))
-			goto except;
-
-		kni_vhost_enqueue_burst(kni, q, skb, va);
-	}
-
-	/* all leftover, do one by one */
-	for (i = 0; i < nb_backlog; ++i) {
-		if (unlikely(kni_fifo_get(kni->rx_q, (void **)&va, 1) != 1))
-			goto except;
-
-		if (unlikely(kni_fifo_get(q->fifo, (void **)&skb, 1) != 1))
-			goto except;
-
-		kni_vhost_enqueue(kni, q, *skb, *va);
-	}
-
-	/* Ondemand wake up */
-	if ((nb_in == RX_BURST_SZ) || (nb_skb == 0) ||
-	    ((nb_mbuf < RX_BURST_SZ) && (nb_mbuf != 0))) {
-		wake_up_interruptible_poll(sk_sleep(&q->sk),
-				   POLLIN | POLLRDNORM | POLLRDBAND);
-		pr_debug("RX CHK KICK nb_mbuf %d, nb_skb %d, nb_in %d\n",
-			   nb_mbuf, nb_skb, nb_in);
-	}
-
-	return 0;
-
-except:
-	/* Failing should not happen */
-	pr_err("Fail to enqueue fifo, it shouldn't happen\n");
-	BUG_ON(1);
-
-	return 0;
-}
-
-static int
-#ifdef HAVE_KIOCB_MSG_PARAM
-kni_sock_sndmsg(struct kiocb *iocb, struct socket *sock,
-	   struct msghdr *m, size_t total_len)
-#else
-kni_sock_sndmsg(struct socket *sock,
-	   struct msghdr *m, size_t total_len)
-#endif /* HAVE_KIOCB_MSG_PARAM */
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	int vnet_hdr_len = 0;
-	unsigned long len = total_len;
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return 0;
-
-	pr_debug("kni_sndmsg len %ld, flags 0x%08x, nb_iov %d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   len, q->flags, (int)m->msg_iter.iov->iov_len);
-#else
-		   len, q->flags, (int)m->msg_iovlen);
-#endif
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	if (likely(q->flags & IFF_VNET_HDR)) {
-		vnet_hdr_len = q->vnet_hdr_sz;
-		if (unlikely(len < vnet_hdr_len))
-			return -EINVAL;
-		len -= vnet_hdr_len;
-	}
-#endif
-
-	if (unlikely(len < ETH_HLEN + q->vnet_hdr_sz))
-		return -EINVAL;
-
-	return kni_vhost_net_tx(q->kni, m, vnet_hdr_len, len);
-}
-
-static int
-#ifdef HAVE_KIOCB_MSG_PARAM
-kni_sock_rcvmsg(struct kiocb *iocb, struct socket *sock,
-	   struct msghdr *m, size_t len, int flags)
-#else
-kni_sock_rcvmsg(struct socket *sock,
-	   struct msghdr *m, size_t len, int flags)
-#endif /* HAVE_KIOCB_MSG_PARAM */
-{
-	int vnet_hdr_len = 0;
-	int pkt_len = 0;
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	static struct virtio_net_hdr
-		__attribute__ ((unused)) vnet_hdr = {
-		.flags = 0,
-		.gso_type = VIRTIO_NET_HDR_GSO_NONE
-	};
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return 0;
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	if (likely(q->flags & IFF_VNET_HDR)) {
-		vnet_hdr_len = q->vnet_hdr_sz;
-		len -= vnet_hdr_len;
-		if (len < 0)
-			return -EINVAL;
-	}
-#endif
-
-	pkt_len = kni_vhost_net_rx(q->kni, m, vnet_hdr_len, len);
-	if (unlikely(pkt_len == 0))
-		return 0;
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	/* no need to copy hdr when no pkt received */
-#ifdef HAVE_IOV_ITER_MSGHDR
-	if (unlikely(copy_to_iter((void *)&vnet_hdr, vnet_hdr_len,
-		&m->msg_iter)))
-#else
-	if (unlikely(memcpy_toiovecend(m->msg_iov,
-		(void *)&vnet_hdr, 0, vnet_hdr_len)))
-#endif /* HAVE_IOV_ITER_MSGHDR */
-		return -EFAULT;
-#endif /* RTE_KNI_VHOST_VNET_HDR_EN */
-	pr_debug("kni_rcvmsg expect_len %ld, flags 0x%08x, pkt_len %d\n",
-		   (unsigned long)len, q->flags, pkt_len);
-
-	return pkt_len + vnet_hdr_len;
-}
-
-/* dummy tap like ioctl */
-static int
-kni_sock_ioctl(struct socket *sock, uint32_t cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-	struct ifreq __user *ifr = argp;
-	uint32_t __user *up = argp;
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-	uint32_t u;
-	int __user *sp = argp;
-	int s;
-	int ret;
-
-	pr_debug("tap ioctl cmd 0x%08x\n", cmd);
-
-	switch (cmd) {
-	case TUNSETIFF:
-		pr_debug("TUNSETIFF\n");
-		/* ignore the name, just look at flags */
-		if (get_user(u, &ifr->ifr_flags))
-			return -EFAULT;
-
-		ret = 0;
-		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
-			ret = -EINVAL;
-		else
-			q->flags = u;
-
-		return ret;
-
-	case TUNGETIFF:
-		pr_debug("TUNGETIFF\n");
-		rcu_read_lock_bh();
-		kni = rcu_dereference_bh(q->kni);
-		if (kni)
-			dev_hold(kni->net_dev);
-		rcu_read_unlock_bh();
-
-		if (!kni)
-			return -ENOLINK;
-
-		ret = 0;
-		if (copy_to_user(&ifr->ifr_name, kni->net_dev->name, IFNAMSIZ)
-				|| put_user(q->flags, &ifr->ifr_flags))
-			ret = -EFAULT;
-		dev_put(kni->net_dev);
-		return ret;
-
-	case TUNGETFEATURES:
-		pr_debug("TUNGETFEATURES\n");
-		u = IFF_TAP | IFF_NO_PI;
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-		u |= IFF_VNET_HDR;
-#endif
-		if (put_user(u, up))
-			return -EFAULT;
-		return 0;
-
-	case TUNSETSNDBUF:
-		pr_debug("TUNSETSNDBUF\n");
-		if (get_user(u, up))
-			return -EFAULT;
-
-		q->sk.sk_sndbuf = u;
-		return 0;
-
-	case TUNGETVNETHDRSZ:
-		s = q->vnet_hdr_sz;
-		if (put_user(s, sp))
-			return -EFAULT;
-		pr_debug("TUNGETVNETHDRSZ %d\n", s);
-		return 0;
-
-	case TUNSETVNETHDRSZ:
-		if (get_user(s, sp))
-			return -EFAULT;
-		if (s < (int)sizeof(struct virtio_net_hdr))
-			return -EINVAL;
-
-		pr_debug("TUNSETVNETHDRSZ %d\n", s);
-		q->vnet_hdr_sz = s;
-		return 0;
-
-	case TUNSETOFFLOAD:
-		pr_debug("TUNSETOFFLOAD %lx\n", arg);
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-		/* not support any offload yet */
-		if (!(q->flags & IFF_VNET_HDR))
-			return  -EINVAL;
-
-		return 0;
-#else
-		return -EINVAL;
-#endif
-
-	default:
-		pr_debug("NOT SUPPORT\n");
-		return -EINVAL;
-	}
-}
-
-static int
-kni_sock_compat_ioctl(struct socket *sock, uint32_t cmd,
-		     unsigned long arg)
-{
-	/* 32 bits app on 64 bits OS to be supported later */
-	pr_debug("Not implemented.\n");
-
-	return -EINVAL;
-}
-
-#define KNI_VHOST_WAIT_WQ_SAFE()                        \
-do {							\
-	while ((BE_FINISH | BE_STOP) == kni->vq_status) \
-		msleep(1);				\
-} while (0)						\
-
-
-static int
-kni_sock_release(struct socket *sock)
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-
-	if (q == NULL)
-		return 0;
-
-	kni = q->kni;
-	if (kni != NULL) {
-		kni->vq_status = BE_STOP;
-		KNI_VHOST_WAIT_WQ_SAFE();
-		kni->vhost_queue = NULL;
-		q->kni = NULL;
-	}
-
-	if (q->sockfd != -1)
-		q->sockfd = -1;
-
-	sk_set_socket(&q->sk, NULL);
-	sock->sk = NULL;
-
-	sock_put(&q->sk);
-
-	pr_debug("dummy sock release done\n");
-
-	return 0;
-}
-
-int
-kni_sock_getname(struct socket *sock, struct sockaddr *addr,
-		int *sockaddr_len, int peer)
-{
-	pr_debug("dummy sock getname\n");
-	((struct sockaddr_ll *)addr)->sll_family = AF_PACKET;
-	return 0;
-}
-
-static const struct proto_ops kni_socket_ops = {
-	.getname = kni_sock_getname,
-	.sendmsg = kni_sock_sndmsg,
-	.recvmsg = kni_sock_rcvmsg,
-	.release = kni_sock_release,
-	.poll    = kni_sock_poll,
-	.ioctl   = kni_sock_ioctl,
-	.compat_ioctl = kni_sock_compat_ioctl,
-};
-
-static void
-kni_sk_write_space(struct sock *sk)
-{
-	wait_queue_head_t *wqueue;
-
-	if (!sock_writeable(sk) ||
-#ifdef SOCKWQ_ASYNC_NOSPACE
-	    !test_and_clear_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags))
-#else
-	    !test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags))
-#endif
-		return;
-	wqueue = sk_sleep(sk);
-	if (wqueue && waitqueue_active(wqueue))
-		wake_up_interruptible_poll(
-			wqueue, POLLOUT | POLLWRNORM | POLLWRBAND);
-}
-
-static void
-kni_sk_destruct(struct sock *sk)
-{
-	struct kni_vhost_queue *q =
-		container_of(sk, struct kni_vhost_queue, sk);
-
-	if (!q)
-		return;
-
-	/* make sure there's no packet in buffer */
-	while (skb_dequeue(&sk->sk_receive_queue) != NULL)
-		;
-
-	mb();
-
-	if (q->fifo != NULL) {
-		kfree(q->fifo);
-		q->fifo = NULL;
-	}
-
-	if (q->cache != NULL) {
-		kfree(q->cache);
-		q->cache = NULL;
-	}
-}
-
-static int
-kni_vhost_backend_init(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q;
-	struct net *net = current->nsproxy->net_ns;
-	int err, i, sockfd;
-	struct rte_kni_fifo *fifo;
-	struct sk_buff *elem;
-
-	if (kni->vhost_queue != NULL)
-		return -1;
-
-#ifdef HAVE_SK_ALLOC_KERN_PARAM
-	q = (struct kni_vhost_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-			&kni_raw_proto, 0);
-#else
-	q = (struct kni_vhost_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-			&kni_raw_proto);
-#endif
-	if (!q)
-		return -ENOMEM;
-
-	err = sock_create_lite(AF_UNSPEC, SOCK_RAW, IPPROTO_RAW, &q->sock);
-	if (err)
-		goto free_sk;
-
-	sockfd = kni_sock_map_fd(q->sock);
-	if (sockfd < 0) {
-		err = sockfd;
-		goto free_sock;
-	}
-
-	/* cache init */
-	q->cache = kzalloc(
-		RTE_KNI_VHOST_MAX_CACHE_SIZE * sizeof(struct sk_buff),
-		GFP_KERNEL);
-	if (!q->cache)
-		goto free_fd;
-
-	fifo = kzalloc(RTE_KNI_VHOST_MAX_CACHE_SIZE * sizeof(void *)
-			+ sizeof(struct rte_kni_fifo), GFP_KERNEL);
-	if (!fifo)
-		goto free_cache;
-
-	kni_fifo_init(fifo, RTE_KNI_VHOST_MAX_CACHE_SIZE);
-
-	for (i = 0; i < RTE_KNI_VHOST_MAX_CACHE_SIZE; i++) {
-		elem = &q->cache[i];
-		kni_fifo_put(fifo, (void **)&elem, 1);
-	}
-	q->fifo = fifo;
-
-	/* store sockfd in vhost_queue */
-	q->sockfd = sockfd;
-
-	/* init socket */
-	q->sock->type = SOCK_RAW;
-	q->sock->state = SS_CONNECTED;
-	q->sock->ops = &kni_socket_ops;
-	sock_init_data(q->sock, &q->sk);
-
-	/* init sock data */
-	q->sk.sk_write_space = kni_sk_write_space;
-	q->sk.sk_destruct = kni_sk_destruct;
-	q->flags = IFF_NO_PI | IFF_TAP;
-	q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	q->flags |= IFF_VNET_HDR;
-#endif
-
-	/* bind kni_dev with vhost_queue */
-	q->kni = kni;
-	kni->vhost_queue = q;
-
-	wmb();
-
-	kni->vq_status = BE_START;
-
-#ifdef HAVE_SOCKET_WQ
-	pr_debug("backend init sockfd=%d, sock->wq=0x%16llx,sk->sk_wq=0x%16llx",
-		  q->sockfd, (uint64_t)q->sock->wq,
-		  (uint64_t)q->sk.sk_wq);
-#else
-	pr_debug("backend init sockfd=%d, sock->wait at 0x%16llx,sk->sk_sleep=0x%16llx",
-		  q->sockfd, (uint64_t)&q->sock->wait,
-		  (uint64_t)q->sk.sk_sleep);
-#endif
-
-	return 0;
-
-free_cache:
-	kfree(q->cache);
-	q->cache = NULL;
-
-free_fd:
-	put_unused_fd(sockfd);
-
-free_sock:
-	q->kni = NULL;
-	kni->vhost_queue = NULL;
-	kni->vq_status |= BE_FINISH;
-	sock_release(q->sock);
-	q->sock->ops = NULL;
-	q->sock = NULL;
-
-free_sk:
-	sk_free((struct sock *)q);
-
-	return err;
-}
-
-/* kni vhost sock sysfs */
-static ssize_t
-show_sock_fd(struct device *dev, struct device_attribute *attr,
-	     char *buf)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-	int sockfd = -1;
-
-	if (kni->vhost_queue != NULL)
-		sockfd = kni->vhost_queue->sockfd;
-	return snprintf(buf, 10, "%d\n", sockfd);
-}
-
-static ssize_t
-show_sock_en(struct device *dev, struct device_attribute *attr,
-	     char *buf)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-
-	return snprintf(buf, 10, "%u\n", (kni->vhost_queue == NULL ? 0 : 1));
-}
-
-static ssize_t
-set_sock_en(struct device *dev, struct device_attribute *attr,
-	      const char *buf, size_t count)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-	unsigned long en;
-	int err = 0;
-
-	if (kstrtoul(buf, 0, &en) != 0)
-		return -EINVAL;
-
-	if (en)
-		err = kni_vhost_backend_init(kni);
-
-	return err ? err : count;
-}
-
-static DEVICE_ATTR(sock_fd, S_IRUGO | S_IRUSR, show_sock_fd, NULL);
-static DEVICE_ATTR(sock_en, S_IRUGO | S_IWUSR, show_sock_en, set_sock_en);
-static struct attribute *dev_attrs[] = {
-	&dev_attr_sock_fd.attr,
-	&dev_attr_sock_en.attr,
-	NULL,
-};
-
-static const struct attribute_group dev_attr_grp = {
-	.attrs = dev_attrs,
-};
-
-int
-kni_vhost_backend_release(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q = kni->vhost_queue;
-
-	if (q == NULL)
-		return 0;
-
-	/* dettach from kni */
-	q->kni = NULL;
-
-	pr_debug("release backend done\n");
-
-	return 0;
-}
-
-int
-kni_vhost_init(struct kni_dev *kni)
-{
-	struct net_device *dev = kni->net_dev;
-
-	if (sysfs_create_group(&dev->dev.kobj, &dev_attr_grp))
-		sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-
-	kni->vq_status = BE_STOP;
-
-	pr_debug("kni_vhost_init done\n");
-
-	return 0;
-}
-- 
2.9.3

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v2 1/2] doc: add removed items section to release notes
  2017-02-15 13:15  1% [dpdk-dev] [PATCH] kni: remove KNI vhost support Ferruh Yigit
@ 2017-02-20 14:30  5% ` Ferruh Yigit
  2017-02-20 14:30  1%   ` [dpdk-dev] [PATCH v2 2/2] kni: remove KNI vhost support Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-02-20 14:30 UTC (permalink / raw)
  To: Thomas Monjalon, John McNamara; +Cc: dev, Bruce Richardson, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 doc/guides/rel_notes/release_17_05.rst | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 48fb5bd..59929b0 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -125,6 +125,18 @@ ABI Changes
    =========================================================
 
 
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item in the past
+     tense.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
 
 Shared Library Versions
 -----------------------
-- 
2.9.3

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH] lpm: extend IPv6 next hop field
@ 2017-02-19 17:14  4% Vladyslav Buslov
  2017-02-21 14:46  4% ` [dpdk-dev] [PATCH v2] " Vladyslav Buslov
  0 siblings, 1 reply; 200+ results
From: Vladyslav Buslov @ 2017-02-19 17:14 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev

This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
---
 app/test/test_lpm6.c                            | 114 ++++++++++++++------
 app/test/test_lpm6_perf.c                       |   4 +-
 doc/guides/prog_guide/lpm6_lib.rst              |   2 +-
 doc/guides/rel_notes/release_17_05.rst          |   5 +
 examples/ip_fragmentation/main.c                |  16 +--
 examples/ip_reassembly/main.c                   |  16 +--
 examples/ipsec-secgw/ipsec-secgw.c              |   2 +-
 examples/l3fwd/l3fwd_lpm_sse.h                  |  20 ++--
 examples/performance-thread/l3fwd-thread/main.c |   9 +-
 lib/librte_lpm/rte_lpm6.c                       | 133 +++++++++++++++++++++---
 lib/librte_lpm/rte_lpm6.h                       |  29 +++++-
 lib/librte_lpm/rte_lpm_version.map              |  10 ++
 lib/librte_table/rte_table_lpm_ipv6.c           |   9 +-
 13 files changed, 282 insertions(+), 87 deletions(-)

diff --git a/app/test/test_lpm6.c b/app/test/test_lpm6.c
index 61134f7..2950aae 100644
--- a/app/test/test_lpm6.c
+++ b/app/test/test_lpm6.c
@@ -79,6 +79,7 @@ static int32_t test24(void);
 static int32_t test25(void);
 static int32_t test26(void);
 static int32_t test27(void);
+static int32_t test28(void);
 
 rte_lpm6_test tests6[] = {
 /* Test Cases */
@@ -110,6 +111,7 @@ rte_lpm6_test tests6[] = {
 	test25,
 	test26,
 	test27,
+	test28,
 };
 
 #define NUM_LPM6_TESTS                (sizeof(tests6)/sizeof(tests6[0]))
@@ -354,7 +356,7 @@ test6(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -392,7 +394,7 @@ test7(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[10][16];
-	int16_t next_hop_return[10];
+	int32_t next_hop_return[10];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -469,7 +471,8 @@ test9(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 16, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 	uint8_t i;
 
@@ -513,7 +516,8 @@ test10(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -557,7 +561,8 @@ test11(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -617,7 +622,8 @@ test12(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -655,7 +661,8 @@ test13(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = 2;
@@ -702,7 +709,8 @@ test14(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 25, next_hop_add = 100;
+	uint8_t depth = 25;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -748,7 +756,8 @@ test15(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 24, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 24;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -784,7 +793,8 @@ test16(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {12,12,1,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 128, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 128;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -828,7 +838,8 @@ test17(void)
 	uint8_t ip1[] = {127,255,255,255,255,255,255,255,255,
 			255,255,255,255,255,255,255};
 	uint8_t ip2[] = {128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -857,7 +868,7 @@ test17(void)
 
 	/* Loop with rte_lpm6_delete. */
 	for (depth = 16; depth >= 1; depth--) {
-		next_hop_add = (uint8_t) (depth - 1);
+		next_hop_add = (depth - 1);
 
 		status = rte_lpm6_delete(lpm, ip2, depth);
 		TEST_LPM_ASSERT(status == 0);
@@ -893,8 +904,9 @@ test18(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16], ip_1[16], ip_2[16];
-	uint8_t depth, depth_1, depth_2, next_hop_add, next_hop_add_1,
-		next_hop_add_2, next_hop_return;
+	uint8_t depth, depth_1, depth_2;
+	uint32_t next_hop_add, next_hop_add_1,
+			next_hop_add_2, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1055,7 +1067,8 @@ test19(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1253,7 +1266,8 @@ test20(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1320,8 +1334,9 @@ test21(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[4][16];
-	uint8_t depth, next_hop_add;
-	int16_t next_hop_return[4];
+	uint8_t depth;
+	uint32_t next_hop_add;
+	int32_t next_hop_return[4];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1378,8 +1393,9 @@ test22(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[5][16];
-	uint8_t depth[5], next_hop_add;
-	int16_t next_hop_return[5];
+	uint8_t depth[5];
+	uint32_t next_hop_add;
+	int32_t next_hop_return[5];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1495,7 +1511,8 @@ test23(void)
 	struct rte_lpm6_config config;
 	uint32_t i;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1579,7 +1596,8 @@ test25(void)
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
 	uint32_t i;
-	uint8_t depth, next_hop_add, next_hop_return, next_hop_expected;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return, next_hop_expected;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1632,10 +1650,10 @@ test26(void)
 	uint8_t d_ip_10_32 = 32;
 	uint8_t	d_ip_10_24 = 24;
 	uint8_t	d_ip_20_25 = 25;
-	uint8_t next_hop_ip_10_32 = 100;
-	uint8_t	next_hop_ip_10_24 = 105;
-	uint8_t	next_hop_ip_20_25 = 111;
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_ip_10_32 = 100;
+	uint32_t next_hop_ip_10_24 = 105;
+	uint32_t next_hop_ip_20_25 = 111;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1650,7 +1668,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_32, &next_hop_return);
-	uint8_t test_hop_10_32 = next_hop_return;
+	uint32_t test_hop_10_32 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_32);
 
@@ -1659,7 +1677,7 @@ test26(void)
 			return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_24, &next_hop_return);
-	uint8_t test_hop_10_24 = next_hop_return;
+	uint32_t test_hop_10_24 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_24);
 
@@ -1668,7 +1686,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_20_25, &next_hop_return);
-	uint8_t test_hop_20_25 = next_hop_return;
+	uint32_t test_hop_20_25 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_20_25);
 
@@ -1707,7 +1725,8 @@ test27(void)
 		struct rte_lpm6 *lpm = NULL;
 		struct rte_lpm6_config config;
 		uint8_t ip[] = {128,128,128,128,128,128,128,128,128,128,128,128,128,128,0,0};
-		uint8_t depth = 128, next_hop_add = 100, next_hop_return;
+		uint8_t depth = 128;
+		uint32_t next_hop_add = 100, next_hop_return;
 		int32_t status = 0;
 		int i, j;
 
@@ -1746,6 +1765,41 @@ test27(void)
 }
 
 /*
+ * Call add, lookup and delete for a single rule with maximum 21bit next_hop size.
+ * Check that next_hop returned from lookup is equal to provisioned value.
+ * Delete the rule and check that the same test returs a miss.
+ */
+int32_t
+test28(void)
+{
+	struct rte_lpm6 *lpm = NULL;
+	struct rte_lpm6_config config;
+	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 0x001FFFFF, next_hop_return = 0;
+	int32_t status = 0;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	status = rte_lpm6_add(lpm, ip, depth, next_hop_add);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm6_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add));
+
+	status = rte_lpm6_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	rte_lpm6_free(lpm);
+
+	return PASS;
+}
+
+/*
  * Do all unit tests.
  */
 static int
diff --git a/app/test/test_lpm6_perf.c b/app/test/test_lpm6_perf.c
index 0723081..30be430 100644
--- a/app/test/test_lpm6_perf.c
+++ b/app/test/test_lpm6_perf.c
@@ -86,7 +86,7 @@ test_lpm6_perf(void)
 	struct rte_lpm6_config config;
 	uint64_t begin, total_time;
 	unsigned i, j;
-	uint8_t next_hop_add = 0xAA, next_hop_return = 0;
+	uint32_t next_hop_add = 0xAA, next_hop_return = 0;
 	int status = 0;
 	int64_t count = 0;
 
@@ -148,7 +148,7 @@ test_lpm6_perf(void)
 	count = 0;
 
 	uint8_t ip_batch[NUM_IPS_ENTRIES][16];
-	int16_t next_hops[NUM_IPS_ENTRIES];
+	int32_t next_hops[NUM_IPS_ENTRIES];
 
 	for (i = 0; i < NUM_IPS_ENTRIES; i++)
 		memcpy(ip_batch[i], large_ips_table[i].ip, 16);
diff --git a/doc/guides/prog_guide/lpm6_lib.rst b/doc/guides/prog_guide/lpm6_lib.rst
index 0aea5c5..f791507 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -53,7 +53,7 @@ several thousand IPv6 rules, but the number can vary depending on the case.
 An LPM prefix is represented by a pair of parameters (128-bit key, depth), with depth in the range of 1 to 128.
 An LPM rule is represented by an LPM prefix and some user data associated with the prefix.
 The prefix serves as the unique identifier for the LPM rule.
-In this implementation, the user data is 1-byte long and is called "next hop",
+In this implementation, the user data is 21-bits long and is called "next hop",
 which corresponds to its main use of storing the ID of the next hop in a routing table entry.
 
 The main methods exported for the LPM component are:
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 48fb5bd..723e085 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,6 +41,9 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Increased number of next hops for LPM IPv6 to 2^21.**
+
+  The next_hop field is extended from 8 bits to 21 bits for IPv6.
 
 Resolved Issues
 ---------------
@@ -110,6 +113,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
+  while keeping ABI compatibility.
 
 ABI Changes
 -----------
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index e1e32c6..51035f5 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -265,8 +265,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		uint8_t queueid, uint8_t port_in)
 {
 	struct rx_queue *rxq;
-	uint32_t i, len, next_hop_ipv4;
-	uint8_t next_hop_ipv6, port_out, ipv6;
+	uint32_t i, len, next_hop;
+	uint8_t port_out, ipv6;
 	int32_t len2;
 
 	ipv6 = 0;
@@ -290,9 +290,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			port_out = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
@@ -326,9 +326,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_hdr = rte_pktmbuf_mtod(m, struct ipv6_hdr *);
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			port_out = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 50fe422..50730a2 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -346,8 +346,8 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	struct rte_ip_frag_death_row *dr;
 	struct rx_queue *rxq;
 	void *d_addr_bytes;
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6, dst_port;
+	uint32_t next_hop;
+	uint8_t dst_port;
 
 	rxq = &qconf->rx_queue_list[queue];
 
@@ -390,9 +390,9 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			dst_port = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
@@ -427,9 +427,9 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			dst_port = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv6);
diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
index 5a4c9b7..5744c46 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -618,7 +618,7 @@ route4_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 static inline void
 route6_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 {
-	int16_t hop[MAX_PKT_BURST * 2];
+	int32_t hop[MAX_PKT_BURST * 2];
 	uint8_t dst_ip[MAX_PKT_BURST * 2][16];
 	uint8_t *ip6_dst;
 	uint16_t i, offset;
diff --git a/examples/l3fwd/l3fwd_lpm_sse.h b/examples/l3fwd/l3fwd_lpm_sse.h
index 538fe3d..1ef70d3 100644
--- a/examples/l3fwd/l3fwd_lpm_sse.h
+++ b/examples/l3fwd/l3fwd_lpm_sse.h
@@ -40,8 +40,7 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ipv4_hdr *ipv4_hdr;
 	struct ether_hdr *eth_hdr;
@@ -52,8 +51,8 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct,
-				rte_be_to_cpu_32(ipv4_hdr->dst_addr), &next_hop_ipv4) == 0) ?
-						next_hop_ipv4 : portid);
+				rte_be_to_cpu_32(ipv4_hdr->dst_addr), &next_hop) == 0) ?
+						next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -61,8 +60,8 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
@@ -78,14 +77,13 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
-			&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+			&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -93,8 +91,8 @@ lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
diff --git a/examples/performance-thread/l3fwd-thread/main.c b/examples/performance-thread/l3fwd-thread/main.c
index 53083df..510d6e8 100644
--- a/examples/performance-thread/l3fwd-thread/main.c
+++ b/examples/performance-thread/l3fwd-thread/main.c
@@ -909,7 +909,7 @@ static inline uint8_t
 get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid,
 		lookup6_struct_t *ipv6_l3fwd_lookup_struct)
 {
-	uint8_t next_hop;
+	uint32_t next_hop;
 
 	return (uint8_t) ((rte_lpm6_lookup(ipv6_l3fwd_lookup_struct,
 			((struct ipv6_hdr *)ipv6_hdr)->dst_addr, &next_hop) == 0) ?
@@ -1396,15 +1396,14 @@ rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
 static inline __attribute__((always_inline)) uint16_t
 get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv4_lookup_struct, dst_ipv4,
-				&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+				&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -1413,7 +1412,7 @@ get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 
 		return (uint16_t) ((rte_lpm6_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0) ? next_hop_ipv6 :
+				ipv6_hdr->dst_addr, &next_hop) == 0) ? next_hop :
 						portid);
 
 	}
diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 32fdba0..8915fff 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -97,7 +97,7 @@ struct rte_lpm6_tbl_entry {
 /** Rules tbl entry structure. */
 struct rte_lpm6_rule {
 	uint8_t ip[RTE_LPM6_IPV6_ADDR_SIZE]; /**< Rule IP address. */
-	uint8_t next_hop; /**< Rule next hop. */
+	uint32_t next_hop; /**< Rule next hop. */
 	uint8_t depth; /**< Rule depth. */
 };
 
@@ -297,7 +297,7 @@ rte_lpm6_free(struct rte_lpm6 *lpm)
  * the nexthop if so. Otherwise it adds a new rule if enough space is available.
  */
 static inline int32_t
-rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
+rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint32_t next_hop, uint8_t depth)
 {
 	uint32_t rule_index;
 
@@ -340,7 +340,7 @@ rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
  */
 static void
 expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
-		uint8_t next_hop)
+		uint32_t next_hop)
 {
 	uint32_t tbl8_group_end, tbl8_gindex_next, j;
 
@@ -377,7 +377,7 @@ expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
 static inline int
 add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
 		struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip, uint8_t bytes,
-		uint8_t first_byte, uint8_t depth, uint8_t next_hop)
+		uint8_t first_byte, uint8_t depth, uint32_t next_hop)
 {
 	uint32_t tbl_index, tbl_range, tbl8_group_start, tbl8_group_end, i;
 	int32_t tbl8_gindex;
@@ -507,9 +507,17 @@ add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
  * Add a route
  */
 int
-rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop)
 {
+	return rte_lpm6_add_v1705(lpm, ip, depth, next_hop);
+}
+VERSION_SYMBOL(rte_lpm6_add, _v20, 2.0);
+
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop)
+{
 	struct rte_lpm6_tbl_entry *tbl;
 	struct rte_lpm6_tbl_entry *tbl_next;
 	int32_t rule_index;
@@ -560,6 +568,9 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_add, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+				uint32_t next_hop), rte_lpm6_add_v1705);
 
 /*
  * Takes a pointer to a table entry and inspect one level.
@@ -569,7 +580,7 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 static inline int
 lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		const struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip,
-		uint8_t first_byte, uint8_t *next_hop)
+		uint8_t first_byte, uint32_t *next_hop)
 {
 	uint32_t tbl8_index, tbl_entry;
 
@@ -589,7 +600,7 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		return 1;
 	} else {
 		/* If not extended then we can have a match. */
-		*next_hop = (uint8_t)tbl_entry;
+		*next_hop = ((uint32_t)tbl_entry & RTE_LPM6_TBL8_BITMASK);
 		return (tbl_entry & RTE_LPM6_LOOKUP_SUCCESS) ? 0 : -ENOENT;
 	}
 }
@@ -598,7 +609,26 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
  * Looks up an IP
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL) {
+		return -EINVAL;
+	}
+
+	status = rte_lpm6_lookup_v1705(lpm, ip, &next_hop32);
+	if (status == 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+}
+VERSION_SYMBOL(rte_lpm6_lookup, _v20, 2.0);
+
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop)
 {
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
@@ -625,20 +655,23 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip,
+				uint32_t *next_hop), rte_lpm6_lookup_v1705);
 
 /*
  * Looks up a group of IP addresses
  */
 int
-rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n)
 {
 	unsigned i;
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
-	uint32_t tbl24_index;
-	uint8_t first_byte, next_hop;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
 	int status;
 
 	/* DEBUG: Check user input arguments. */
@@ -664,11 +697,58 @@ rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		if (status < 0)
 			next_hops[i] = -1;
 		else
-			next_hops[i] = next_hop;
+			next_hops[i] = (int16_t)next_hop;
 	}
 
 	return 0;
 }
+VERSION_SYMBOL(rte_lpm6_lookup_bulk_func, _v20, 2.0);
+
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t * next_hops, unsigned n)
+{
+	unsigned i;
+	const struct rte_lpm6_tbl_entry *tbl;
+	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
+	int status;
+
+	/* DEBUG: Check user input arguments. */
+	if ((lpm == NULL) || (ips == NULL) || (next_hops == NULL)) {
+		return -EINVAL;
+	}
+
+	for (i = 0; i < n; i++) {
+		first_byte = LOOKUP_FIRST_BYTE;
+		tbl24_index = (ips[i][0] << BYTES2_SIZE) |
+				(ips[i][1] << BYTE_SIZE) | ips[i][2];
+
+		/* Calculate pointer to the first entry to be inspected */
+		tbl = &lpm->tbl24[tbl24_index];
+
+		do {
+			/* Continue inspecting following levels until success or failure */
+			status = lookup_step(lpm, tbl, &tbl_next, ips[i], first_byte++,
+					&next_hop);
+			tbl = tbl_next;
+		} while (status == 1);
+
+		if (status < 0)
+			next_hops[i] = -1;
+		else
+			next_hops[i] = (int32_t)next_hop;
+	}
+
+	return 0;
+}
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup_bulk_func, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+				uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+				int32_t * next_hops, unsigned n),
+		rte_lpm6_lookup_bulk_func_v1705);
 
 /*
  * Finds a rule in rule table.
@@ -698,8 +778,29 @@ rule_find(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
  * Look for a rule in the high-level rules table
  */
 int
-rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop)
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL) {
+		return -EINVAL;
+	}
+
+	status = rte_lpm6_is_rule_present_v1705(lpm, ip, depth, &next_hop32);
+	if (status > 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+
+}
+VERSION_SYMBOL(rte_lpm6_is_rule_present, _v20, 2.0);
+
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop)
 {
 	uint8_t ip_masked[RTE_LPM6_IPV6_ADDR_SIZE];
 	int32_t rule_index;
@@ -724,6 +825,10 @@ uint8_t *next_hop)
 	/* If rule is not found return 0. */
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_is_rule_present, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip,
+				uint8_t depth, uint32_t *next_hop),
+		rte_lpm6_is_rule_present_v1705);
 
 /*
  * Delete a rule from the rule table.
diff --git a/lib/librte_lpm/rte_lpm6.h b/lib/librte_lpm/rte_lpm6.h
index 13d027f..0ab54d4 100644
--- a/lib/librte_lpm/rte_lpm6.h
+++ b/lib/librte_lpm/rte_lpm6.h
@@ -39,6 +39,7 @@
  */
 
 #include <stdint.h>
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -123,7 +124,13 @@ rte_lpm6_free(struct rte_lpm6 *lpm);
  */
 int
 rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
+int
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop);
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
 
 /**
  * Check if a rule is present in the LPM table,
@@ -142,7 +149,13 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
  */
 int
 rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop);
+		uint32_t *next_hop);
+int
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop);
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop);
 
 /**
  * Delete a rule from the LPM table.
@@ -199,7 +212,11 @@ rte_lpm6_delete_all(struct rte_lpm6 *lpm);
  *   -EINVAL for incorrect arguments, -ENOENT on lookup miss, 0 on lookup hit
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop);
+int
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop);
 
 /**
  * Lookup multiple IP addresses in an LPM table.
@@ -220,7 +237,15 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
 int
 rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t * next_hops, unsigned n);
+int
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n);
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t * next_hops, unsigned n);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 239b371..90beac8 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -34,3 +34,13 @@ DPDK_16.04 {
 	rte_lpm_delete_all;
 
 } DPDK_2.0;
+
+DPDK_17.05 {
+	global:
+
+	rte_lpm6_add;
+	rte_lpm6_is_rule_present;
+	rte_lpm6_lookup;
+	rte_lpm6_lookup_bulk_func;
+
+} DPDK_16.04;
diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c
index 836f4cf..1e1a173 100644
--- a/lib/librte_table/rte_table_lpm_ipv6.c
+++ b/lib/librte_table/rte_table_lpm_ipv6.c
@@ -211,9 +211,8 @@ rte_table_lpm_ipv6_entry_add(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint32_t nht_pos, nht_pos0_valid;
+	uint32_t nht_pos, nht_pos0, nht_pos0_valid;
 	int status;
-	uint8_t nht_pos0;
 
 	/* Check input parameters */
 	if (lpm == NULL) {
@@ -256,7 +255,7 @@ rte_table_lpm_ipv6_entry_add(
 
 	/* Add rule to low level LPM table */
 	if (rte_lpm6_add(lpm->lpm, ip_prefix->ip, ip_prefix->depth,
-		(uint8_t) nht_pos) < 0) {
+		nht_pos) < 0) {
 		RTE_LOG(ERR, TABLE, "%s: LPM IPv6 rule add failed\n", __func__);
 		return -1;
 	}
@@ -280,7 +279,7 @@ rte_table_lpm_ipv6_entry_delete(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint8_t nht_pos;
+	uint32_t nht_pos;
 	int status;
 
 	/* Check input parameters */
@@ -356,7 +355,7 @@ rte_table_lpm_ipv6_lookup(
 			uint8_t *ip = RTE_MBUF_METADATA_UINT8_PTR(pkt,
 				lpm->offset);
 			int status;
-			uint8_t nht_pos;
+			uint32_t nht_pos;
 
 			status = rte_lpm6_lookup(lpm->lpm, ip, &nht_pos);
 			if (status == 0) {
-- 
2.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 3/3] doc: remove deprecation notice
  @ 2017-02-17 12:01  5% ` Fan Zhang
  0 siblings, 0 replies; 200+ results
From: Fan Zhang @ 2017-02-17 12:01 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9d4dfcc..3e17b20 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -119,10 +119,6 @@ Deprecation Notices
   or VHOST feature, but KNI_VHOST which is a KNI feature enabled via a compile
   time option, and disabled by default.
 
-* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
-  A pointer to a rte_cryptodev_config structure will be added to the
-  function prototype ``cryptodev_configure_t``, as a new parameter.
-
 * cryptodev: A new parameter ``max_nb_sessions_per_qp`` will be added to
   ``rte_cryptodev_info.sym``. Some drivers may support limited number of
   sessions per queue_pair. With this new parameter application will know
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH] kni: remove KNI vhost support
@ 2017-02-15 13:15  1% Ferruh Yigit
  2017-02-20 14:30  5% ` [dpdk-dev] [PATCH v2 1/2] doc: add removed items section to release notes Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-02-15 13:15 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 config/common_base                             |   3 -
 devtools/test-build.sh                         |   1 -
 doc/guides/prog_guide/index.rst                |   4 -
 doc/guides/prog_guide/kernel_nic_interface.rst | 113 ----
 doc/guides/rel_notes/deprecation.rst           |   6 -
 lib/librte_eal/linuxapp/kni/Makefile           |   1 -
 lib/librte_eal/linuxapp/kni/kni_dev.h          |  33 -
 lib/librte_eal/linuxapp/kni/kni_fifo.h         |  14 -
 lib/librte_eal/linuxapp/kni/kni_misc.c         |  22 -
 lib/librte_eal/linuxapp/kni/kni_net.c          |  13 -
 lib/librte_eal/linuxapp/kni/kni_vhost.c        | 842 -------------------------
 11 files changed, 1052 deletions(-)
 delete mode 100644 lib/librte_eal/linuxapp/kni/kni_vhost.c

diff --git a/config/common_base b/config/common_base
index 71a4fcb..aeee13e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -584,9 +584,6 @@ CONFIG_RTE_LIBRTE_KNI=n
 CONFIG_RTE_KNI_KMOD=n
 CONFIG_RTE_KNI_KMOD_ETHTOOL=n
 CONFIG_RTE_KNI_PREEMPT_DEFAULT=y
-CONFIG_RTE_KNI_VHOST=n
-CONFIG_RTE_KNI_VHOST_MAX_CACHE_SIZE=1024
-CONFIG_RTE_KNI_VHOST_VNET_HDR_EN=n
 
 #
 # Compile the pdump library
diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 0f131fc..84d3165 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -194,7 +194,6 @@ config () # <directory> <target> <options>
 		sed -ri        's,(PMD_OPENSSL=)n,\1y,' $1/.config
 		test "$DPDK_DEP_SSL" != y || \
 		sed -ri            's,(PMD_QAT=)n,\1y,' $1/.config
-		sed -ri        's,(KNI_VHOST.*=)n,\1y,' $1/.config
 		sed -ri           's,(SCHED_.*=)n,\1y,' $1/.config
 		build_config_hook $1 $2 $3
 
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 7f825cb..77f427e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -127,10 +127,6 @@ Programmer's Guide
 
 :numref:`figure_pkt_flow_kni` :ref:`figure_pkt_flow_kni`
 
-:numref:`figure_vhost_net_arch2` :ref:`figure_vhost_net_arch2`
-
-:numref:`figure_kni_traffic_flow` :ref:`figure_kni_traffic_flow`
-
 
 :numref:`figure_pkt_proc_pipeline_qos` :ref:`figure_pkt_proc_pipeline_qos`
 
diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index 4f25595..6f7fd28 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -168,116 +168,3 @@ The application handlers can be registered upon interface creation or explicitly
 This provides flexibility in multiprocess scenarios
 (where the KNI is created in the primary process but the callbacks are handled in the secondary one).
 The constraint is that a single process can register and handle the requests.
-
-.. _kni_vhost_backend-label:
-
-KNI Working as a Kernel vHost Backend
--------------------------------------
-
-vHost is a kernel module usually working as the backend of virtio (a para- virtualization driver framework)
-to accelerate the traffic from the guest to the host.
-The DPDK Kernel NIC interface provides the ability to hookup vHost traffic into userspace DPDK application.
-Together with the DPDK PMD virtio, it significantly improves the throughput between guest and host.
-In the scenario where DPDK is running as fast path in the host, kni-vhost is an efficient path for the traffic.
-
-Overview
-~~~~~~~~
-
-vHost-net has three kinds of real backend implementations. They are: 1) tap, 2) macvtap and 3) RAW socket.
-The main idea behind kni-vhost is making the KNI work as a RAW socket, attaching it as the backend instance of vHost-net.
-It is using the existing interface with vHost-net, so it does not require any kernel hacking,
-and is fully-compatible with the kernel vhost module.
-As vHost is still taking responsibility for communicating with the front-end virtio,
-it naturally supports both legacy virtio -net and the DPDK PMD virtio.
-There is a little penalty that comes from the non-polling mode of vhost.
-However, it scales throughput well when using KNI in multi-thread mode.
-
-.. _figure_vhost_net_arch2:
-
-.. figure:: img/vhost_net_arch.*
-
-   vHost-net Architecture Overview
-
-
-Packet Flow
-~~~~~~~~~~~
-
-There is only a minor difference from the original KNI traffic flows.
-On transmit side, vhost kthread calls the RAW socket's ops sendmsg and it puts the packets into the KNI transmit FIFO.
-On the receive side, the kni kthread gets packets from the KNI receive FIFO, puts them into the queue of the raw socket,
-and wakes up the task in vhost kthread to begin receiving.
-All the packet copying, irrespective of whether it is on the transmit or receive side,
-happens in the context of vhost kthread.
-Every vhost-net device is exposed to a front end virtio device in the guest.
-
-.. _figure_kni_traffic_flow:
-
-.. figure:: img/kni_traffic_flow.*
-
-   KNI Traffic Flow
-
-
-Sample Usage
-~~~~~~~~~~~~
-
-Before starting to use KNI as the backend of vhost, the CONFIG_RTE_KNI_VHOST configuration option must be turned on.
-Otherwise, by default, KNI will not enable its backend support capability.
-
-Of course, as a prerequisite, the vhost/vhost-net kernel CONFIG should be chosen before compiling the kernel.
-
-#.  Compile the DPDK and insert uio_pci_generic/igb_uio kernel modules as normal.
-
-#.  Insert the KNI kernel module:
-
-    .. code-block:: console
-
-        insmod ./rte_kni.ko
-
-    If using KNI in multi-thread mode, use the following command line:
-
-    .. code-block:: console
-
-        insmod ./rte_kni.ko kthread_mode=multiple
-
-#.  Running the KNI sample application:
-
-    .. code-block:: console
-
-        examples/kni/build/app/kni -c -0xf0 -n 4 -- -p 0x3 -P --config="(0,4,6),(1,5,7)"
-
-    This command runs the kni sample application with two physical ports.
-    Each port pins two forwarding cores (ingress/egress) in user space.
-
-#.  Assign a raw socket to vhost-net during qemu-kvm startup.
-    The DPDK does not provide a script to do this since it is easy for the user to customize.
-    The following shows the key steps to launch qemu-kvm with kni-vhost:
-
-    .. code-block:: bash
-
-        #!/bin/bash
-        echo 1 > /sys/class/net/vEth0/sock_en
-        fd=`cat /sys/class/net/vEth0/sock_fd`
-        qemu-kvm \
-        -name vm1 -cpu host -m 2048 -smp 1 -hda /opt/vm-fc16.img \
-        -netdev tap,fd=$fd,id=hostnet1,vhost=on \
-        -device virti-net-pci,netdev=hostnet1,id=net1,bus=pci.0,addr=0x4
-
-It is simple to enable raw socket using sysfs sock_en and get raw socket fd using sock_fd under the KNI device node.
-
-Then, using the qemu-kvm command with the -netdev option to assign such raw socket fd as vhost's backend.
-
-.. note::
-
-    The key word tap must exist as qemu-kvm now only supports vhost with a tap backend, so here we cheat qemu-kvm by an existing fd.
-
-Compatibility Configure Option
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-There is a CONFIG_RTE_KNI_VHOST_VNET_HDR_EN configuration option in DPDK configuration file.
-By default, it set to n, which means do not turn on the virtio net header,
-which is used to support additional features (such as, csum offload, vlan offload, generic-segmentation and so on),
-since the kni-vhost does not yet support those features.
-
-Even if the option is turned on, kni-vhost will ignore the information that the header contains.
-When working with legacy virtio on the guest, it is better to turn off unsupported offload features using ethtool -K.
-Otherwise, there may be problems such as an incorrect L4 checksum error.
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9d4dfcc..66ca596 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -113,12 +113,6 @@ Deprecation Notices
   has different feature set, meaning functions like ``rte_vhost_feature_disable``
   need be changed. Last, file rte_virtio_net.h will be renamed to rte_vhost.h.
 
-* kni: Remove :ref:`kni_vhost_backend-label` feature (KNI_VHOST) in 17.05 release.
-  :doc:`Vhost Library </prog_guide/vhost_lib>` is currently preferred method for
-  guest - host communication. Just for clarification, this is not to remove KNI
-  or VHOST feature, but KNI_VHOST which is a KNI feature enabled via a compile
-  time option, and disabled by default.
-
 * ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
   A pointer to a rte_cryptodev_config structure will be added to the
   function prototype ``cryptodev_configure_t``, as a new parameter.
diff --git a/lib/librte_eal/linuxapp/kni/Makefile b/lib/librte_eal/linuxapp/kni/Makefile
index 3c22b63..7864a2a 100644
--- a/lib/librte_eal/linuxapp/kni/Makefile
+++ b/lib/librte_eal/linuxapp/kni/Makefile
@@ -61,7 +61,6 @@ DEPDIRS-y += lib/librte_eal/linuxapp/eal
 #
 SRCS-y := kni_misc.c
 SRCS-y += kni_net.c
-SRCS-$(CONFIG_RTE_KNI_VHOST) += kni_vhost.c
 SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += kni_ethtool.c
 
 SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += ethtool/ixgbe/ixgbe_main.c
diff --git a/lib/librte_eal/linuxapp/kni/kni_dev.h b/lib/librte_eal/linuxapp/kni/kni_dev.h
index 58cbadd..002e5fa 100644
--- a/lib/librte_eal/linuxapp/kni/kni_dev.h
+++ b/lib/librte_eal/linuxapp/kni/kni_dev.h
@@ -37,10 +37,6 @@
 #include <linux/spinlock.h>
 #include <linux/list.h>
 
-#ifdef RTE_KNI_VHOST
-#include <net/sock.h>
-#endif
-
 #include <exec-env/rte_kni_common.h>
 #define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
 
@@ -102,15 +98,6 @@ struct kni_dev {
 	/* synchro for request processing */
 	unsigned long synchro;
 
-#ifdef RTE_KNI_VHOST
-	struct kni_vhost_queue *vhost_queue;
-
-	volatile enum {
-		BE_STOP = 0x1,
-		BE_START = 0x2,
-		BE_FINISH = 0x4,
-	} vq_status;
-#endif
 	/* buffers */
 	void *pa[MBUF_BURST_SZ];
 	void *va[MBUF_BURST_SZ];
@@ -118,26 +105,6 @@ struct kni_dev {
 	void *alloc_va[MBUF_BURST_SZ];
 };
 
-#ifdef RTE_KNI_VHOST
-uint32_t
-kni_poll(struct file *file, struct socket *sock, poll_table * wait);
-int kni_chk_vhost_rx(struct kni_dev *kni);
-int kni_vhost_init(struct kni_dev *kni);
-int kni_vhost_backend_release(struct kni_dev *kni);
-
-struct kni_vhost_queue {
-	struct sock sk;
-	struct socket *sock;
-	int vnet_hdr_sz;
-	struct kni_dev *kni;
-	int sockfd;
-	uint32_t flags;
-	struct sk_buff *cache;
-	struct rte_kni_fifo *fifo;
-};
-
-#endif
-
 void kni_net_rx(struct kni_dev *kni);
 void kni_net_init(struct net_device *dev);
 void kni_net_config_lo_mode(char *lo_str);
diff --git a/lib/librte_eal/linuxapp/kni/kni_fifo.h b/lib/librte_eal/linuxapp/kni/kni_fifo.h
index 025ec1c..14f4141 100644
--- a/lib/librte_eal/linuxapp/kni/kni_fifo.h
+++ b/lib/librte_eal/linuxapp/kni/kni_fifo.h
@@ -91,18 +91,4 @@ kni_fifo_free_count(struct rte_kni_fifo *fifo)
 	return (fifo->read - fifo->write - 1) & (fifo->len - 1);
 }
 
-#ifdef RTE_KNI_VHOST
-/**
- * Initializes the kni fifo structure
- */
-static inline void
-kni_fifo_init(struct rte_kni_fifo *fifo, uint32_t size)
-{
-	fifo->write = 0;
-	fifo->read = 0;
-	fifo->len = size;
-	fifo->elem_size = sizeof(void *);
-}
-#endif
-
 #endif /* _KNI_FIFO_H_ */
diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c b/lib/librte_eal/linuxapp/kni/kni_misc.c
index 33b61f2..f1f6bea 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -140,11 +140,7 @@ kni_thread_single(void *data)
 		down_read(&knet->kni_list_lock);
 		for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
 			list_for_each_entry(dev, &knet->kni_list_head, list) {
-#ifdef RTE_KNI_VHOST
-				kni_chk_vhost_rx(dev);
-#else
 				kni_net_rx(dev);
-#endif
 				kni_net_poll_resp(dev);
 			}
 		}
@@ -167,11 +163,7 @@ kni_thread_multiple(void *param)
 
 	while (!kthread_should_stop()) {
 		for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
-#ifdef RTE_KNI_VHOST
-			kni_chk_vhost_rx(dev);
-#else
 			kni_net_rx(dev);
-#endif
 			kni_net_poll_resp(dev);
 		}
 #ifdef RTE_KNI_PREEMPT_DEFAULT
@@ -248,9 +240,6 @@ kni_release(struct inode *inode, struct file *file)
 			dev->pthread = NULL;
 		}
 
-#ifdef RTE_KNI_VHOST
-		kni_vhost_backend_release(dev);
-#endif
 		kni_dev_remove(dev);
 		list_del(&dev->list);
 	}
@@ -397,10 +386,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	kni->sync_va = dev_info.sync_va;
 	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
 
-#ifdef RTE_KNI_VHOST
-	kni->vhost_queue = NULL;
-	kni->vq_status = BE_STOP;
-#endif
 	kni->mbuf_size = dev_info.mbuf_size;
 
 	pr_debug("tx_phys:      0x%016llx, tx_q addr:      0x%p\n",
@@ -490,10 +475,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 		return -ENODEV;
 	}
 
-#ifdef RTE_KNI_VHOST
-	kni_vhost_init(kni);
-#endif
-
 	ret = kni_run_thread(knet, kni, dev_info.force_bind);
 	if (ret != 0)
 		return ret;
@@ -537,9 +518,6 @@ kni_ioctl_release(struct net *net, uint32_t ioctl_num,
 			dev->pthread = NULL;
 		}
 
-#ifdef RTE_KNI_VHOST
-		kni_vhost_backend_release(dev);
-#endif
 		kni_dev_remove(dev);
 		list_del(&dev->list);
 		ret = 0;
diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c b/lib/librte_eal/linuxapp/kni/kni_net.c
index 4ac99cf..db9f489 100644
--- a/lib/librte_eal/linuxapp/kni/kni_net.c
+++ b/lib/librte_eal/linuxapp/kni/kni_net.c
@@ -198,18 +198,6 @@ kni_net_config(struct net_device *dev, struct ifmap *map)
 /*
  * Transmit a packet (called by the kernel)
  */
-#ifdef RTE_KNI_VHOST
-static int
-kni_net_tx(struct sk_buff *skb, struct net_device *dev)
-{
-	struct kni_dev *kni = netdev_priv(dev);
-
-	dev_kfree_skb(skb);
-	kni->stats.tx_dropped++;
-
-	return NETDEV_TX_OK;
-}
-#else
 static int
 kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 {
@@ -289,7 +277,6 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 
 	return NETDEV_TX_OK;
 }
-#endif
 
 /*
  * RX: normal working mode
diff --git a/lib/librte_eal/linuxapp/kni/kni_vhost.c b/lib/librte_eal/linuxapp/kni/kni_vhost.c
deleted file mode 100644
index f54c34b..0000000
--- a/lib/librte_eal/linuxapp/kni/kni_vhost.c
+++ /dev/null
@@ -1,842 +0,0 @@
-/*-
- * GPL LICENSE SUMMARY
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *
- *   This program is free software; you can redistribute it and/or modify
- *   it under the terms of version 2 of the GNU General Public License as
- *   published by the Free Software Foundation.
- *
- *   This program is distributed in the hope that it will be useful, but
- *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- *   General Public License for more details.
- *
- *   You should have received a copy of the GNU General Public License
- *   along with this program; if not, write to the Free Software
- *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
- *   The full GNU General Public License is included in this distribution
- *   in the file called LICENSE.GPL.
- *
- *   Contact Information:
- *   Intel Corporation
- */
-
-#include <linux/module.h>
-#include <linux/net.h>
-#include <net/sock.h>
-#include <linux/virtio_net.h>
-#include <linux/wait.h>
-#include <linux/mm.h>
-#include <linux/nsproxy.h>
-#include <linux/sched.h>
-#include <linux/if_tun.h>
-#include <linux/version.h>
-#include <linux/file.h>
-
-#include "compat.h"
-#include "kni_dev.h"
-#include "kni_fifo.h"
-
-#define RX_BURST_SZ 4
-
-#ifdef HAVE_STATIC_SOCK_MAP_FD
-static int kni_sock_map_fd(struct socket *sock)
-{
-	struct file *file;
-	int fd = get_unused_fd_flags(0);
-
-	if (fd < 0)
-		return fd;
-
-	file = sock_alloc_file(sock, 0, NULL);
-	if (IS_ERR(file)) {
-		put_unused_fd(fd);
-		return PTR_ERR(file);
-	}
-	fd_install(fd, file);
-	return fd;
-}
-#endif
-
-static struct proto kni_raw_proto = {
-	.name = "kni_vhost",
-	.owner = THIS_MODULE,
-	.obj_size = sizeof(struct kni_vhost_queue),
-};
-
-static inline int
-kni_vhost_net_tx(struct kni_dev *kni, struct msghdr *m,
-		 uint32_t offset, uint32_t len)
-{
-	struct rte_kni_mbuf *pkt_kva = NULL;
-	struct rte_kni_mbuf *pkt_va = NULL;
-	int ret;
-
-	pr_debug("tx offset=%d, len=%d, iovlen=%d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   offset, len, (int)m->msg_iter.iov->iov_len);
-#else
-		   offset, len, (int)m->msg_iov->iov_len);
-#endif
-
-	/**
-	 * Check if it has at least one free entry in tx_q and
-	 * one entry in alloc_q.
-	 */
-	if (kni_fifo_free_count(kni->tx_q) == 0 ||
-	    kni_fifo_count(kni->alloc_q) == 0) {
-		/**
-		 * If no free entry in tx_q or no entry in alloc_q,
-		 * drops skb and goes out.
-		 */
-		goto drop;
-	}
-
-	/* dequeue a mbuf from alloc_q */
-	ret = kni_fifo_get(kni->alloc_q, (void **)&pkt_va, 1);
-	if (likely(ret == 1)) {
-		void *data_kva;
-
-		pkt_kva = (void *)pkt_va - kni->mbuf_va + kni->mbuf_kva;
-		data_kva = pkt_kva->buf_addr + pkt_kva->data_off
-			- kni->mbuf_va + kni->mbuf_kva;
-
-#ifdef HAVE_IOV_ITER_MSGHDR
-		copy_from_iter(data_kva, len, &m->msg_iter);
-#else
-		memcpy_fromiovecend(data_kva, m->msg_iov, offset, len);
-#endif
-
-		if (unlikely(len < ETH_ZLEN)) {
-			memset(data_kva + len, 0, ETH_ZLEN - len);
-			len = ETH_ZLEN;
-		}
-		pkt_kva->pkt_len = len;
-		pkt_kva->data_len = len;
-
-		/* enqueue mbuf into tx_q */
-		ret = kni_fifo_put(kni->tx_q, (void **)&pkt_va, 1);
-		if (unlikely(ret != 1)) {
-			/* Failing should not happen */
-			pr_err("Fail to enqueue mbuf into tx_q\n");
-			goto drop;
-		}
-	} else {
-		/* Failing should not happen */
-		pr_err("Fail to dequeue mbuf from alloc_q\n");
-		goto drop;
-	}
-
-	/* update statistics */
-	kni->stats.tx_bytes += len;
-	kni->stats.tx_packets++;
-
-	return 0;
-
-drop:
-	/* update statistics */
-	kni->stats.tx_dropped++;
-
-	return 0;
-}
-
-static inline int
-kni_vhost_net_rx(struct kni_dev *kni, struct msghdr *m,
-		 uint32_t offset, uint32_t len)
-{
-	uint32_t pkt_len;
-	struct rte_kni_mbuf *kva;
-	struct rte_kni_mbuf *va;
-	void *data_kva;
-	struct sk_buff *skb;
-	struct kni_vhost_queue *q = kni->vhost_queue;
-
-	if (unlikely(q == NULL))
-		return 0;
-
-	/* ensure at least one entry in free_q */
-	if (unlikely(kni_fifo_free_count(kni->free_q) == 0))
-		return 0;
-
-	skb = skb_dequeue(&q->sk.sk_receive_queue);
-	if (unlikely(skb == NULL))
-		return 0;
-
-	kva = (struct rte_kni_mbuf *)skb->data;
-
-	/* free skb to cache */
-	skb->data = NULL;
-	if (unlikely(kni_fifo_put(q->fifo, (void **)&skb, 1) != 1))
-		/* Failing should not happen */
-		pr_err("Fail to enqueue entries into rx cache fifo\n");
-
-	pkt_len = kva->data_len;
-	if (unlikely(pkt_len > len))
-		goto drop;
-
-	pr_debug("rx offset=%d, len=%d, pkt_len=%d, iovlen=%d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   offset, len, pkt_len, (int)m->msg_iter.iov->iov_len);
-#else
-		   offset, len, pkt_len, (int)m->msg_iov->iov_len);
-#endif
-
-	data_kva = kva->buf_addr + kva->data_off - kni->mbuf_va + kni->mbuf_kva;
-#ifdef HAVE_IOV_ITER_MSGHDR
-	if (unlikely(copy_to_iter(data_kva, pkt_len, &m->msg_iter)))
-#else
-	if (unlikely(memcpy_toiovecend(m->msg_iov, data_kva, offset, pkt_len)))
-#endif
-		goto drop;
-
-	/* Update statistics */
-	kni->stats.rx_bytes += pkt_len;
-	kni->stats.rx_packets++;
-
-	/* enqueue mbufs into free_q */
-	va = (void *)kva - kni->mbuf_kva + kni->mbuf_va;
-	if (unlikely(kni_fifo_put(kni->free_q, (void **)&va, 1) != 1))
-		/* Failing should not happen */
-		pr_err("Fail to enqueue entries into free_q\n");
-
-	pr_debug("receive done %d\n", pkt_len);
-
-	return pkt_len;
-
-drop:
-	/* Update drop statistics */
-	kni->stats.rx_dropped++;
-
-	return 0;
-}
-
-static uint32_t
-kni_sock_poll(struct file *file, struct socket *sock, poll_table *wait)
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-	uint32_t mask = 0;
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return POLLERR;
-
-	kni = q->kni;
-#ifdef HAVE_SOCKET_WQ
-	pr_debug("start kni_poll on group %d, wq 0x%16llx\n",
-		  kni->group_id, (uint64_t)sock->wq);
-	poll_wait(file, &sock->wq->wait, wait);
-#else
-	pr_debug("start kni_poll on group %d, wait at 0x%16llx\n",
-		  kni->group_id, (uint64_t)&sock->wait);
-	poll_wait(file, &sock->wait, wait);
-#endif
-
-	if (kni_fifo_count(kni->rx_q) > 0)
-		mask |= POLLIN | POLLRDNORM;
-
-	if (sock_writeable(&q->sk) ||
-#ifdef SOCKWQ_ASYNC_NOSPACE
-		(!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &q->sock->flags) &&
-			sock_writeable(&q->sk)))
-#else
-		(!test_and_set_bit(SOCK_ASYNC_NOSPACE, &q->sock->flags) &&
-			sock_writeable(&q->sk)))
-#endif
-		mask |= POLLOUT | POLLWRNORM;
-
-	return mask;
-}
-
-static inline void
-kni_vhost_enqueue(struct kni_dev *kni, struct kni_vhost_queue *q,
-		  struct sk_buff *skb, struct rte_kni_mbuf *va)
-{
-	struct rte_kni_mbuf *kva;
-
-	kva = (void *)(va) - kni->mbuf_va + kni->mbuf_kva;
-	(skb)->data = (unsigned char *)kva;
-	(skb)->len = kva->data_len;
-	skb_queue_tail(&q->sk.sk_receive_queue, skb);
-}
-
-static inline void
-kni_vhost_enqueue_burst(struct kni_dev *kni, struct kni_vhost_queue *q,
-	  struct sk_buff **skb, struct rte_kni_mbuf **va)
-{
-	int i;
-
-	for (i = 0; i < RX_BURST_SZ; skb++, va++, i++)
-		kni_vhost_enqueue(kni, q, *skb, *va);
-}
-
-int
-kni_chk_vhost_rx(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q = kni->vhost_queue;
-	uint32_t nb_in, nb_mbuf, nb_skb;
-	const uint32_t BURST_MASK = RX_BURST_SZ - 1;
-	uint32_t nb_burst, nb_backlog, i;
-	struct sk_buff *skb[RX_BURST_SZ];
-	struct rte_kni_mbuf *va[RX_BURST_SZ];
-
-	if (unlikely(BE_STOP & kni->vq_status)) {
-		kni->vq_status |= BE_FINISH;
-		return 0;
-	}
-
-	if (unlikely(q == NULL))
-		return 0;
-
-	nb_skb = kni_fifo_count(q->fifo);
-	nb_mbuf = kni_fifo_count(kni->rx_q);
-
-	nb_in = min(nb_mbuf, nb_skb);
-	nb_in = min_t(uint32_t, nb_in, RX_BURST_SZ);
-	nb_burst   = (nb_in & ~BURST_MASK);
-	nb_backlog = (nb_in & BURST_MASK);
-
-	/* enqueue skb_queue per BURST_SIZE bulk */
-	if (nb_burst != 0) {
-		if (unlikely(kni_fifo_get(kni->rx_q, (void **)&va, RX_BURST_SZ)
-				!= RX_BURST_SZ))
-			goto except;
-
-		if (unlikely(kni_fifo_get(q->fifo, (void **)&skb, RX_BURST_SZ)
-				!= RX_BURST_SZ))
-			goto except;
-
-		kni_vhost_enqueue_burst(kni, q, skb, va);
-	}
-
-	/* all leftover, do one by one */
-	for (i = 0; i < nb_backlog; ++i) {
-		if (unlikely(kni_fifo_get(kni->rx_q, (void **)&va, 1) != 1))
-			goto except;
-
-		if (unlikely(kni_fifo_get(q->fifo, (void **)&skb, 1) != 1))
-			goto except;
-
-		kni_vhost_enqueue(kni, q, *skb, *va);
-	}
-
-	/* Ondemand wake up */
-	if ((nb_in == RX_BURST_SZ) || (nb_skb == 0) ||
-	    ((nb_mbuf < RX_BURST_SZ) && (nb_mbuf != 0))) {
-		wake_up_interruptible_poll(sk_sleep(&q->sk),
-				   POLLIN | POLLRDNORM | POLLRDBAND);
-		pr_debug("RX CHK KICK nb_mbuf %d, nb_skb %d, nb_in %d\n",
-			   nb_mbuf, nb_skb, nb_in);
-	}
-
-	return 0;
-
-except:
-	/* Failing should not happen */
-	pr_err("Fail to enqueue fifo, it shouldn't happen\n");
-	BUG_ON(1);
-
-	return 0;
-}
-
-static int
-#ifdef HAVE_KIOCB_MSG_PARAM
-kni_sock_sndmsg(struct kiocb *iocb, struct socket *sock,
-	   struct msghdr *m, size_t total_len)
-#else
-kni_sock_sndmsg(struct socket *sock,
-	   struct msghdr *m, size_t total_len)
-#endif /* HAVE_KIOCB_MSG_PARAM */
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	int vnet_hdr_len = 0;
-	unsigned long len = total_len;
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return 0;
-
-	pr_debug("kni_sndmsg len %ld, flags 0x%08x, nb_iov %d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   len, q->flags, (int)m->msg_iter.iov->iov_len);
-#else
-		   len, q->flags, (int)m->msg_iovlen);
-#endif
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	if (likely(q->flags & IFF_VNET_HDR)) {
-		vnet_hdr_len = q->vnet_hdr_sz;
-		if (unlikely(len < vnet_hdr_len))
-			return -EINVAL;
-		len -= vnet_hdr_len;
-	}
-#endif
-
-	if (unlikely(len < ETH_HLEN + q->vnet_hdr_sz))
-		return -EINVAL;
-
-	return kni_vhost_net_tx(q->kni, m, vnet_hdr_len, len);
-}
-
-static int
-#ifdef HAVE_KIOCB_MSG_PARAM
-kni_sock_rcvmsg(struct kiocb *iocb, struct socket *sock,
-	   struct msghdr *m, size_t len, int flags)
-#else
-kni_sock_rcvmsg(struct socket *sock,
-	   struct msghdr *m, size_t len, int flags)
-#endif /* HAVE_KIOCB_MSG_PARAM */
-{
-	int vnet_hdr_len = 0;
-	int pkt_len = 0;
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	static struct virtio_net_hdr
-		__attribute__ ((unused)) vnet_hdr = {
-		.flags = 0,
-		.gso_type = VIRTIO_NET_HDR_GSO_NONE
-	};
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return 0;
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	if (likely(q->flags & IFF_VNET_HDR)) {
-		vnet_hdr_len = q->vnet_hdr_sz;
-		len -= vnet_hdr_len;
-		if (len < 0)
-			return -EINVAL;
-	}
-#endif
-
-	pkt_len = kni_vhost_net_rx(q->kni, m, vnet_hdr_len, len);
-	if (unlikely(pkt_len == 0))
-		return 0;
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	/* no need to copy hdr when no pkt received */
-#ifdef HAVE_IOV_ITER_MSGHDR
-	if (unlikely(copy_to_iter((void *)&vnet_hdr, vnet_hdr_len,
-		&m->msg_iter)))
-#else
-	if (unlikely(memcpy_toiovecend(m->msg_iov,
-		(void *)&vnet_hdr, 0, vnet_hdr_len)))
-#endif /* HAVE_IOV_ITER_MSGHDR */
-		return -EFAULT;
-#endif /* RTE_KNI_VHOST_VNET_HDR_EN */
-	pr_debug("kni_rcvmsg expect_len %ld, flags 0x%08x, pkt_len %d\n",
-		   (unsigned long)len, q->flags, pkt_len);
-
-	return pkt_len + vnet_hdr_len;
-}
-
-/* dummy tap like ioctl */
-static int
-kni_sock_ioctl(struct socket *sock, uint32_t cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-	struct ifreq __user *ifr = argp;
-	uint32_t __user *up = argp;
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-	uint32_t u;
-	int __user *sp = argp;
-	int s;
-	int ret;
-
-	pr_debug("tap ioctl cmd 0x%08x\n", cmd);
-
-	switch (cmd) {
-	case TUNSETIFF:
-		pr_debug("TUNSETIFF\n");
-		/* ignore the name, just look at flags */
-		if (get_user(u, &ifr->ifr_flags))
-			return -EFAULT;
-
-		ret = 0;
-		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
-			ret = -EINVAL;
-		else
-			q->flags = u;
-
-		return ret;
-
-	case TUNGETIFF:
-		pr_debug("TUNGETIFF\n");
-		rcu_read_lock_bh();
-		kni = rcu_dereference_bh(q->kni);
-		if (kni)
-			dev_hold(kni->net_dev);
-		rcu_read_unlock_bh();
-
-		if (!kni)
-			return -ENOLINK;
-
-		ret = 0;
-		if (copy_to_user(&ifr->ifr_name, kni->net_dev->name, IFNAMSIZ)
-				|| put_user(q->flags, &ifr->ifr_flags))
-			ret = -EFAULT;
-		dev_put(kni->net_dev);
-		return ret;
-
-	case TUNGETFEATURES:
-		pr_debug("TUNGETFEATURES\n");
-		u = IFF_TAP | IFF_NO_PI;
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-		u |= IFF_VNET_HDR;
-#endif
-		if (put_user(u, up))
-			return -EFAULT;
-		return 0;
-
-	case TUNSETSNDBUF:
-		pr_debug("TUNSETSNDBUF\n");
-		if (get_user(u, up))
-			return -EFAULT;
-
-		q->sk.sk_sndbuf = u;
-		return 0;
-
-	case TUNGETVNETHDRSZ:
-		s = q->vnet_hdr_sz;
-		if (put_user(s, sp))
-			return -EFAULT;
-		pr_debug("TUNGETVNETHDRSZ %d\n", s);
-		return 0;
-
-	case TUNSETVNETHDRSZ:
-		if (get_user(s, sp))
-			return -EFAULT;
-		if (s < (int)sizeof(struct virtio_net_hdr))
-			return -EINVAL;
-
-		pr_debug("TUNSETVNETHDRSZ %d\n", s);
-		q->vnet_hdr_sz = s;
-		return 0;
-
-	case TUNSETOFFLOAD:
-		pr_debug("TUNSETOFFLOAD %lx\n", arg);
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-		/* not support any offload yet */
-		if (!(q->flags & IFF_VNET_HDR))
-			return  -EINVAL;
-
-		return 0;
-#else
-		return -EINVAL;
-#endif
-
-	default:
-		pr_debug("NOT SUPPORT\n");
-		return -EINVAL;
-	}
-}
-
-static int
-kni_sock_compat_ioctl(struct socket *sock, uint32_t cmd,
-		     unsigned long arg)
-{
-	/* 32 bits app on 64 bits OS to be supported later */
-	pr_debug("Not implemented.\n");
-
-	return -EINVAL;
-}
-
-#define KNI_VHOST_WAIT_WQ_SAFE()                        \
-do {							\
-	while ((BE_FINISH | BE_STOP) == kni->vq_status) \
-		msleep(1);				\
-} while (0)						\
-
-
-static int
-kni_sock_release(struct socket *sock)
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-
-	if (q == NULL)
-		return 0;
-
-	kni = q->kni;
-	if (kni != NULL) {
-		kni->vq_status = BE_STOP;
-		KNI_VHOST_WAIT_WQ_SAFE();
-		kni->vhost_queue = NULL;
-		q->kni = NULL;
-	}
-
-	if (q->sockfd != -1)
-		q->sockfd = -1;
-
-	sk_set_socket(&q->sk, NULL);
-	sock->sk = NULL;
-
-	sock_put(&q->sk);
-
-	pr_debug("dummy sock release done\n");
-
-	return 0;
-}
-
-int
-kni_sock_getname(struct socket *sock, struct sockaddr *addr,
-		int *sockaddr_len, int peer)
-{
-	pr_debug("dummy sock getname\n");
-	((struct sockaddr_ll *)addr)->sll_family = AF_PACKET;
-	return 0;
-}
-
-static const struct proto_ops kni_socket_ops = {
-	.getname = kni_sock_getname,
-	.sendmsg = kni_sock_sndmsg,
-	.recvmsg = kni_sock_rcvmsg,
-	.release = kni_sock_release,
-	.poll    = kni_sock_poll,
-	.ioctl   = kni_sock_ioctl,
-	.compat_ioctl = kni_sock_compat_ioctl,
-};
-
-static void
-kni_sk_write_space(struct sock *sk)
-{
-	wait_queue_head_t *wqueue;
-
-	if (!sock_writeable(sk) ||
-#ifdef SOCKWQ_ASYNC_NOSPACE
-	    !test_and_clear_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags))
-#else
-	    !test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags))
-#endif
-		return;
-	wqueue = sk_sleep(sk);
-	if (wqueue && waitqueue_active(wqueue))
-		wake_up_interruptible_poll(
-			wqueue, POLLOUT | POLLWRNORM | POLLWRBAND);
-}
-
-static void
-kni_sk_destruct(struct sock *sk)
-{
-	struct kni_vhost_queue *q =
-		container_of(sk, struct kni_vhost_queue, sk);
-
-	if (!q)
-		return;
-
-	/* make sure there's no packet in buffer */
-	while (skb_dequeue(&sk->sk_receive_queue) != NULL)
-		;
-
-	mb();
-
-	if (q->fifo != NULL) {
-		kfree(q->fifo);
-		q->fifo = NULL;
-	}
-
-	if (q->cache != NULL) {
-		kfree(q->cache);
-		q->cache = NULL;
-	}
-}
-
-static int
-kni_vhost_backend_init(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q;
-	struct net *net = current->nsproxy->net_ns;
-	int err, i, sockfd;
-	struct rte_kni_fifo *fifo;
-	struct sk_buff *elem;
-
-	if (kni->vhost_queue != NULL)
-		return -1;
-
-#ifdef HAVE_SK_ALLOC_KERN_PARAM
-	q = (struct kni_vhost_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-			&kni_raw_proto, 0);
-#else
-	q = (struct kni_vhost_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-			&kni_raw_proto);
-#endif
-	if (!q)
-		return -ENOMEM;
-
-	err = sock_create_lite(AF_UNSPEC, SOCK_RAW, IPPROTO_RAW, &q->sock);
-	if (err)
-		goto free_sk;
-
-	sockfd = kni_sock_map_fd(q->sock);
-	if (sockfd < 0) {
-		err = sockfd;
-		goto free_sock;
-	}
-
-	/* cache init */
-	q->cache = kzalloc(
-		RTE_KNI_VHOST_MAX_CACHE_SIZE * sizeof(struct sk_buff),
-		GFP_KERNEL);
-	if (!q->cache)
-		goto free_fd;
-
-	fifo = kzalloc(RTE_KNI_VHOST_MAX_CACHE_SIZE * sizeof(void *)
-			+ sizeof(struct rte_kni_fifo), GFP_KERNEL);
-	if (!fifo)
-		goto free_cache;
-
-	kni_fifo_init(fifo, RTE_KNI_VHOST_MAX_CACHE_SIZE);
-
-	for (i = 0; i < RTE_KNI_VHOST_MAX_CACHE_SIZE; i++) {
-		elem = &q->cache[i];
-		kni_fifo_put(fifo, (void **)&elem, 1);
-	}
-	q->fifo = fifo;
-
-	/* store sockfd in vhost_queue */
-	q->sockfd = sockfd;
-
-	/* init socket */
-	q->sock->type = SOCK_RAW;
-	q->sock->state = SS_CONNECTED;
-	q->sock->ops = &kni_socket_ops;
-	sock_init_data(q->sock, &q->sk);
-
-	/* init sock data */
-	q->sk.sk_write_space = kni_sk_write_space;
-	q->sk.sk_destruct = kni_sk_destruct;
-	q->flags = IFF_NO_PI | IFF_TAP;
-	q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	q->flags |= IFF_VNET_HDR;
-#endif
-
-	/* bind kni_dev with vhost_queue */
-	q->kni = kni;
-	kni->vhost_queue = q;
-
-	wmb();
-
-	kni->vq_status = BE_START;
-
-#ifdef HAVE_SOCKET_WQ
-	pr_debug("backend init sockfd=%d, sock->wq=0x%16llx,sk->sk_wq=0x%16llx",
-		  q->sockfd, (uint64_t)q->sock->wq,
-		  (uint64_t)q->sk.sk_wq);
-#else
-	pr_debug("backend init sockfd=%d, sock->wait at 0x%16llx,sk->sk_sleep=0x%16llx",
-		  q->sockfd, (uint64_t)&q->sock->wait,
-		  (uint64_t)q->sk.sk_sleep);
-#endif
-
-	return 0;
-
-free_cache:
-	kfree(q->cache);
-	q->cache = NULL;
-
-free_fd:
-	put_unused_fd(sockfd);
-
-free_sock:
-	q->kni = NULL;
-	kni->vhost_queue = NULL;
-	kni->vq_status |= BE_FINISH;
-	sock_release(q->sock);
-	q->sock->ops = NULL;
-	q->sock = NULL;
-
-free_sk:
-	sk_free((struct sock *)q);
-
-	return err;
-}
-
-/* kni vhost sock sysfs */
-static ssize_t
-show_sock_fd(struct device *dev, struct device_attribute *attr,
-	     char *buf)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-	int sockfd = -1;
-
-	if (kni->vhost_queue != NULL)
-		sockfd = kni->vhost_queue->sockfd;
-	return snprintf(buf, 10, "%d\n", sockfd);
-}
-
-static ssize_t
-show_sock_en(struct device *dev, struct device_attribute *attr,
-	     char *buf)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-
-	return snprintf(buf, 10, "%u\n", (kni->vhost_queue == NULL ? 0 : 1));
-}
-
-static ssize_t
-set_sock_en(struct device *dev, struct device_attribute *attr,
-	      const char *buf, size_t count)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-	unsigned long en;
-	int err = 0;
-
-	if (kstrtoul(buf, 0, &en) != 0)
-		return -EINVAL;
-
-	if (en)
-		err = kni_vhost_backend_init(kni);
-
-	return err ? err : count;
-}
-
-static DEVICE_ATTR(sock_fd, S_IRUGO | S_IRUSR, show_sock_fd, NULL);
-static DEVICE_ATTR(sock_en, S_IRUGO | S_IWUSR, show_sock_en, set_sock_en);
-static struct attribute *dev_attrs[] = {
-	&dev_attr_sock_fd.attr,
-	&dev_attr_sock_en.attr,
-	NULL,
-};
-
-static const struct attribute_group dev_attr_grp = {
-	.attrs = dev_attrs,
-};
-
-int
-kni_vhost_backend_release(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q = kni->vhost_queue;
-
-	if (q == NULL)
-		return 0;
-
-	/* dettach from kni */
-	q->kni = NULL;
-
-	pr_debug("release backend done\n");
-
-	return 0;
-}
-
-int
-kni_vhost_init(struct kni_dev *kni)
-{
-	struct net_device *dev = kni->net_dev;
-
-	if (sysfs_create_group(&dev->dev.kobj, &dev_attr_grp))
-		sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-
-	kni->vq_status = BE_STOP;
-
-	pr_debug("kni_vhost_init done\n");
-
-	return 0;
-}
-- 
2.9.3

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v1] doc: add template release notes for 17.05
@ 2017-02-15 12:38  6% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2017-02-15 12:38 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Add template release notes for DPDK 17.05 with inline
comments and explanations of the various sections.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/index.rst         |   1 +
 doc/guides/rel_notes/release_17_05.rst | 195 +++++++++++++++++++++++++++++++++
 2 files changed, 196 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_17_05.rst

diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index cf8f167..c4d243c 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -36,6 +36,7 @@ Release Notes
     :numbered:
 
     rel_description
+    release_17_05
     release_17_02
     release_16_11
     release_16_07
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
new file mode 100644
index 0000000..e5a0a9e
--- /dev/null
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -0,0 +1,195 @@
+DPDK Release 17.05
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+
+      firefox build/doc/html/guides/rel_notes/release_17_05.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release. Sample
+   format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense. The description
+     should be enough to allow someone scanning the release notes to
+     understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list like
+     this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     This section is a comment. do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =========================================================
+
+
+Resolved Issues
+---------------
+
+.. This section should contain bug fixes added to the relevant
+   sections. Sample format:
+
+   * **code/section Fixed issue in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description of the resolved issue in the past
+     tense.
+
+     The title should contain the code/lib section like a commit message.
+
+     Add the entries in alphabetic order in the relevant sections below.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+EAL
+~~~
+
+
+Drivers
+~~~~~~~
+
+
+Libraries
+~~~~~~~~~
+
+
+Examples
+~~~~~~~~
+
+
+Other
+~~~~~
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue in the present
+     tense. Add information on any known workarounds.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change. Use fixed width
+     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
+     tense.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * Add a short 1-2 sentence description of the ABI change that was announced
+     in the previous releases and made in this release. Use fixed width quotes
+     for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+
+Shared Library Versions
+-----------------------
+
+.. Update any library version updated in this release and prepend with a ``+``
+   sign, like this:
+
+     librte_acl.so.2
+   + librte_cfgfile.so.2
+     librte_cmdline.so.2
+
+   This section is a comment. do not overwrite or remove it.
+   =========================================================
+
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+     librte_acl.so.2
+     librte_cfgfile.so.2
+     librte_cmdline.so.2
+     librte_cryptodev.so.2
+     librte_distributor.so.1
+     librte_eal.so.3
+     librte_ethdev.so.6
+     librte_hash.so.2
+     librte_ip_frag.so.1
+     librte_jobstats.so.1
+     librte_kni.so.2
+     librte_kvargs.so.1
+     librte_lpm.so.2
+     librte_mbuf.so.2
+     librte_mempool.so.2
+     librte_meter.so.1
+     librte_net.so.1
+     librte_pdump.so.1
+     librte_pipeline.so.3
+     librte_pmd_bond.so.1
+     librte_pmd_ring.so.2
+     librte_port.so.3
+     librte_power.so.1
+     librte_reorder.so.1
+     librte_ring.so.1
+     librte_sched.so.1
+     librte_table.so.2
+     librte_timer.so.1
+     librte_vhost.so.3
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested with this
+   release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev ops structure
  2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
  2017-02-14 10:48  4%   ` Doherty, Declan
@ 2017-02-14 20:37  4%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 20:37 UTC (permalink / raw)
  To: Fan Zhang; +Cc: dev

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-14 10:52  8% [dpdk-dev] Further fun with ABI tracking Christian Ehrhardt
  2017-02-14 16:19  4% ` Bruce Richardson
@ 2017-02-14 20:31  9% ` Jan Blunck
  2017-02-22 13:12  7%   ` Christian Ehrhardt
  1 sibling, 1 reply; 200+ results
From: Jan Blunck @ 2017-02-14 20:31 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, cjcollier, ricardo.salveti, Luca Boccassi

On Tue, Feb 14, 2017 at 11:52 AM, Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
> Hi,
> when moving to DPDK 16.11 Debian/Ubuntu packaging of DPDK has hit a new
> twist on the (it seems reoccurring) topic of DPDK ABI tracking.
>
> I have found, ... well I don't want to call it solution ..., let's say a
> crutch to get around it for the moment. But I wanted to use the example I
> had to share a few thoughts on it and to kick off a wider discussion.
>
>
> *## In library cross-dependencies plus partial ABI bumps ##*
>
> Since the day moving away from the combined shared library we had several
> improvements on tracking the ABI versions. These days [1] we have LIBABIVER
> per library and it gets bumped to reflect it is breaking with former
> versions e.g. removing symbols.
>
> Now in the 16.11 release the ABIs for cryptodev, eal and ethdev got bumped
> by [2] and [3].
>
> OTOH please remember that in general two versions of a shared library in
> the usual sense are meant to be able to stay alongside on a system without
> hurting each other. I picked a random one on my system.
> Package              Library
> libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160
> libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160.0.0
> libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95
> libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95.5.0
> Some link against the new, some against the old library - all fine.
> Usually most programs can just be rebuilt against the new library and after
> some time the old one can be dropped. That mechanism gives downstream
> distributions a way to handle transitions and consumers of libraries which
> might not all be ready for the same version every time.
> And since the per lib versioning with LIBABIVER and and the version maps we
> are good - in fact we qualify for all common cases on [4].
>
> Now in DPDK of those libraries that got an ABI bump eal and ethdev are part
> of those which most of us consider "core libraries" and most other libs and
> pmds link to them.
> And here DPDK continues to be special, due to that inter-dependency with
> old and new libraries installed on the same system the following happens on
> openvswitch built for an older version of dpdk:
> ovs-vswitchd-dpdk
>     librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2
>     librte_pdump.so.1 => /usr/lib/x86_64-linux-gnu/librte_pdump.so.1
>         librte_eal.so.3 => /usr/lib/x86_64-linux-gnu/librte_eal.so.3
>
> You can see that Openvswitch itself depends on the "old" librte_eal.so.2.
> But because  librte_pdump.so.1 did not get an ABI bump it got upgraded to
> the newer version from DPDK 16.11.
> But since the "new" pdump got built with the new DPDK 16.11 it depends on
> the "new" librte_eal.so.3.
> And having both in the same executable space at the same time causes
> segfaults and pain.
>
> As I said for now I have passed the issue with a crutch that I'm not proud
> of and I'd like to avoid in the future. For that I'm reaching out to you
> with several suggestions to discuss.
>
>
> *## Thoughts ##*
> None of these seems like a perfect solution to me yet, but clearly good to
> start discussions on them.
>
> Options that were in discussion so far and that we might adopt next cycle
> (some of these are upstream changes, some downstream, some require both to
> change - but any of them should have an ack upstream so that we are
> agreeing how to proceed with those cases).
>
> 1. Downstreams to insert Major version into soname
> Distributions could insert the DPDK major version (like 16.11) into the
> soname and package names. A common example of this is libboost [5].
> That would perfectly allow 16.07.<LIBABIVER> to coexist with
> 16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
> Yet it would mean that anything depending on the old library will have to
> be recompiled to pick up the new code, even if it depends on an ABI that is
> still present in the new release.
> Also - not a technical reason - but it is clearly more work to force update
> all dependencies and clean out old packages for every release.

Actually this isn't exactly what I proposed during the summit. Just
keep it simple and fix the ABI version of all libraries at 16.11.0.
This is a proven approach and has been used for years with different
libraries. You could easily do this independently of us upstream
fixing the ABI problems.


> 2. ABI Ranges

ABI is either backwards compatible (same major) or not. A range
doesn't solve the problem.

>
> 3. A lot of conflicts
>

This doesn't allow us to have multiple version of the library
available at runtime. So in the end it doesn't solve the problem for
the distro either.


>
> 4. ABI bump is infecting
>
> 5. back to single ABI
>

This is very similar to approach 1. It just uses up a lot more ABI versions.


> 6. More
> I'm sure there are more approaches to this, feel free to come up with more.
>

The problem is that we do not detect and fix the ABI changes that
"shine-through" the dependencies of our libraries. We need to work on
them and fix them one by one. Long-term we need to invest into keeping
the API/ABI stable and adding backward compatible symbols as well as
making structures opaque.



> I'm sure my five suggestions alone will make the thread messy, Maybe we do
> this in two rounds, sorting out the insane and identifying the preferred
> ones to then in a second run focus on discussing and maybe implementing the
> details of what we like.
>
>
> [1]: http://dpdk.org/browse/dpdk/tree/doc/guides/contributing/versioning.rst
> [2]: http://dpdk.org/browse/dpdk/commit/?id=d7e61ad3ae36
> [3]: http://dpdk.org/browse/dpdk/commit/?id=6ba1affa54108
> [4]: https://wiki.debian.org/TransitionBestPractices
> [5]: https://packages.debian.org/sid/libboost1.62-dev
> [6]:
> https://www.debian.org/doc/debian-policy/ch-relationships.html#s-conflicts
> [7]: https://wiki.ubuntu.com/ProposedMigration
>
> P.S. I beg a pardon for the wall of text
>
> --
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
    2017-02-13 18:02  4% ` Thomas Monjalon
  2017-02-14 13:54  4% ` Maxime Coquelin
@ 2017-02-14 20:28  4% ` Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 20:28 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
    2017-02-13 17:57  4%   ` Thomas Monjalon
@ 2017-02-14 19:37  4%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 19:37 UTC (permalink / raw)
  To: Bernard Iremonger; +Cc: dev

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] doc: add ABI change notification for ring library
  2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
                   ` (2 preceding siblings ...)
  2017-02-14  8:33  4% ` Olivier Matz
@ 2017-02-14 18:42  4% ` Thomas Monjalon
  3 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 18:42 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Applied

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] doc: update release notes for 17.02
  2017-02-14 15:32  4% [dpdk-dev] [PATCH v1] doc: update release notes for 17.02 John McNamara
@ 2017-02-14 16:26  2% ` John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2017-02-14 16:26 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 16202 bytes --]

Fix grammar, spelling and formatting of DPDK 17.02 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/release_17_02.rst | 241 +++++++++++++++------------------
 1 file changed, 111 insertions(+), 130 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 6420a87..357965a 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -40,46 +40,47 @@ New Features
 
 * **Added support for representing buses in EAL**
 
-  A new structure ``rte_bus`` is introduced in EAL. This allows for devices to
-  be represented by buses they are connected to. A new bus can be added to
-  DPDK by extending the ``rte_bus`` structure and implementing the scan and
-  probe functions. Once a new bus is registered using provided APIs, new
-  devices can be detected and initialized using bus scan and probe callbacks.
+  The ``rte_bus`` structure was introduced into the EAL. This allows for
+  devices to be represented by buses they are connected to. A new bus can be
+  added to DPDK by extending the ``rte_bus`` structure and implementing the
+  scan and probe functions. Once a new bus is registered using the provided
+  APIs, new devices can be detected and initialized using bus scan and probe
+  callbacks.
 
-  With this change, devices other than PCI or VDEV type can also be represented
-  in DPDK framework.
+  With this change, devices other than PCI or VDEV type can be represented
+  in the DPDK framework.
 
 * **Added generic EAL API for I/O device memory read/write operations.**
 
-  This API introduces 8-bit, 16-bit, 32bit, 64bit I/O device
-  memory read/write operations along with the relaxed versions.
+  This API introduces 8 bit, 16 bit, 32 bit and 64 bit I/O device
+  memory read/write operations along with "relaxed" versions.
 
-  The weakly-ordered machine like ARM needs additional I/O barrier for
-  device memory read/write access over PCI bus.
-  By introducing the EAL abstraction for I/O device memory read/write access,
-  The drivers can access I/O device memory in architecture-agnostic manner.
-  The relaxed version does not have additional I/O memory barrier, useful in
-  accessing the device registers of integrated controllers which
-  implicitly strongly ordered with respect to memory access.
+  Weakly-ordered architectures like ARM need an additional I/O barrier for
+  device memory read/write access over PCI bus. By introducing the EAL
+  abstraction for I/O device memory read/write access, the drivers can access
+  I/O device memory in an architecture-agnostic manner. The relaxed version
+  does not have an additional I/O memory barrier, which is useful in accessing
+  the device registers of integrated controllers which is implicitly strongly
+  ordered with respect to memory access.
 
 * **Added generic flow API (rte_flow).**
 
   This API provides a generic means to configure hardware to match specific
-  ingress or egress traffic, alter its fate and query related counters
+  ingress or egress traffic, alter its behavior and query related counters
   according to any number of user-defined rules.
 
-  It is slightly higher-level than the legacy filtering framework which it
-  encompasses and supersedes (including all functions and filter types) in
-  order to expose a single interface with an unambiguous behavior that is
-  common to all poll-mode drivers (PMDs).
+  In order to expose a single interface with an unambiguous behavior that is
+  common to all poll-mode drivers (PMDs) the ``rte_flow`` API is slightly
+  higher-level than the legacy filtering framework, which it encompasses and
+  supersedes (including all functions and filter types) .
 
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
 * **Added firmware version get API.**
 
-  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
-  version by a given device.
+  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch the firmware
+  version for a given device.
 
 * **Added APIs for MACsec offload support to the ixgbe PMD.**
 
@@ -90,54 +91,58 @@ New Features
 
   Added support for I219 Intel 1GbE NICs.
 
-* **Added VF Daemon (VFD) on i40e. - EXPERIMENTAL**
-
-  This's an EXPERIMENTAL feature to enhance the capability of DPDK PF as many
-  VF management features are not supported by kernel PF driver.
-  Some new private APIs are implemented in PMD without abstrction layer.
-  They can be used directly by some users who have the need.
-
-  The new APIs to control VFs directly from PF include,
-  1) set VF MAC anti-spoofing
-  2) set VF VLAN anti-spoofing
-  3) set TX loopback
-  4) set VF unicast promiscuous mode
-  5) set VF multicast promiscuous mode
-  6) set VF MTU
-  7) get/reset VF stats
-  8) set VF MAC address
-  9) set VF VLAN stripping
-  10) VF VLAN insertion
-  12) set VF broadcast mode
-  13) set VF VLAN tag
-  14) set VF VLAN filter
-  VFD also includes VF to PF mailbox message management by APP.
-  When PF receives mailbox messages from VF, PF should call the callback
-  provided by APP to know if they're permitted to be processed.
-
-  As an EXPERIMENTAL feature, please aware it can be changed or even
+* **Added VF Daemon (VFD) for i40e. - EXPERIMENTAL**
+
+  This is an EXPERIMENTAL feature to enhance the capability of the DPDK PF as
+  many VF management features are not currently supported by the kernel PF
+  driver. Some new private APIs are implemented directly in the PMD without an
+  abstraction layer. They can be used directly by some users who have the
+  need.
+
+  The new APIs to control VFs directly from PF include:
+
+  * Set VF MAC anti-spoofing.
+  * Set VF VLAN anti-spoofing.
+  * Set TX loopback.
+  * Set VF unicast promiscuous mode.
+  * Set VF multicast promiscuous mode.
+  * Set VF MTU.
+  * Get/reset VF stats.
+  * Set VF MAC address.
+  * Set VF VLAN stripping.
+  * Vf VLAN insertion.
+  * Set VF broadcast mode.
+  * Set VF VLAN tag.
+  * Set VF VLAN filter.
+
+  VFD also includes VF to PF mailbox message management from an application.
+  When the PF receives mailbox messages from the VF the PF should call the
+  callback provided by the application to know if they're permitted to be
+  processed.
+
+  As an EXPERIMENTAL feature, please be aware it can be changed or even
   removed without prior notice.
 
 * **Updated the i40e base driver.**
 
-  updated the i40e base driver, including the following changes:
+  Updated the i40e base driver, including the following changes:
 
-  * replace existing legacy memcpy() calls with i40e_memcpy() calls.
-  * use BIT() macro instead of bit fields
-  * add clear all WoL filters implementation
-  * add broadcast promiscuous control per VLAN
-  * remove unused X722_SUPPORT and I40E_NDIS_SUPPORT MARCOs
+  * Replace existing legacy ``memcpy()`` calls with ``i40e_memcpy()`` calls.
+  * Use ``BIT()`` macro instead of bit fields.
+  * Add clear all WoL filters implementation.
+  * Add broadcast promiscuous control per VLAN.
+  * Remove unused ``X722_SUPPORT`` and ``I40E_NDIS_SUPPORT`` macros.
 
 * **Updated the enic driver.**
 
-  * Set new Rx checksum flags in mbufs to indicate unknown, good or bad.
+  * Set new Rx checksum flags in mbufs to indicate unknown, good or bad checksums.
   * Fix set/remove of MAC addresses. Allow up to 64 addresses per device.
   * Enable TSO on outer headers.
 
 * **Added Solarflare libefx-based network PMD.**
 
-  A new network PMD which supports Solarflare SFN7xxx and SFN8xxx family
-  of 10/40 Gbps adapters has been added.
+  Added a new network PMD which supports Solarflare SFN7xxx and SFN8xxx family
+  of 10/40 Gbps adapters.
 
 * **Updated the mlx4 driver.**
 
@@ -145,8 +150,8 @@ New Features
 
 * **Added support for Mellanox ConnectX-5 adapters (mlx5).**
 
-  Support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps adapters
-  has been added to the existing mlx5 PMD.
+  Added support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps
+  adapters to the existing mlx5 PMD.
 
 * **Updated the mlx5 driver.**
 
@@ -161,47 +166,47 @@ New Features
 
 * **virtio-user with vhost-kernel as another exceptional path.**
 
-  Previously, we upstreamed a virtual device, virtio-user with vhost-user
-  as the backend, as a way for IPC (Inter-Process Communication) and user
+  Previously, we upstreamed a virtual device, virtio-user with vhost-user as
+  the backend as a way of enabling IPC (Inter-Process Communication) and user
   space container networking.
 
-  Virtio-user with vhost-kernel as the backend is a solution for exceptional
-  path, such as KNI, which exchanges packets with kernel networking stack.
+  Virtio-user with vhost-kernel as the backend is a solution for the exception
+  path, such as KNI, which exchanges packets with the kernel networking stack.
   This solution is very promising in:
 
-  * maintenance: vhost and vhost-net (kernel) is upstreamed and extensively
+  * Maintenance: vhost and vhost-net (kernel) is an upstreamed and extensively
     used kernel module.
-  * features: vhost-net is born to be a networking solution, which has
+  * Features: vhost-net is designed to be a networking solution, which has
     lots of networking related features, like multi-queue, TSO, multi-seg
     mbuf, etc.
-  * performance: similar to KNI, this solution would use one or more
+  * Performance: similar to KNI, this solution would use one or more
     kthreads to send/receive packets from user space DPDK applications,
     which has little impact on user space polling thread (except that
     it might enter into kernel space to wake up those kthreads if
     necessary).
 
-* **Added virtio Rx interrupt suppprt.**
+* **Added virtio Rx interrupt support.**
 
-  This feature enables Rx interrupt mode for virtio pci net devices as
-  binded to VFIO (noiommu mode) and drived by virtio PMD.
+  Added a feature to enable Rx interrupt mode for virtio pci net devices as
+  bound to VFIO (noiommu mode) and driven by virtio PMD.
 
-  With this feature, virtio PMD can switch between polling mode and
+  With this feature, the virtio PMD can switch between polling mode and
   interrupt mode, to achieve best performance, and at the same time save
-  power. It can work on both legacy and modern virtio devices. At this mode,
-  each rxq is mapped with an exluded MSIx interrupt.
+  power. It can work on both legacy and modern virtio devices. In this mode,
+  each ``rxq`` is mapped with an excluded MSIx interrupt.
 
   See the :ref:`Virtio Interrupt Mode <virtio_interrupt_mode>` documentation
   for more information.
 
 * **Added ARMv8 crypto PMD.**
 
-  A new crypto PMD has been added, which provides combined mode cryptografic
+  A new crypto PMD has been added, which provides combined mode cryptographic
   operations optimized for ARMv8 processors. The driver can be used to enhance
   performance in processing chained operations such as cipher + HMAC.
 
 * **Updated the QAT PMD.**
 
-  The QAT PMD was updated with additional support for:
+  The QAT PMD has been updated with additional support for:
 
   * DES algorithm.
   * Scatter-gather list (SGL) support.
@@ -210,35 +215,37 @@ New Features
 
   * The Intel(R) Multi Buffer Crypto for IPsec library used in
     AESNI MB PMD has been moved to a new repository, in GitHub.
-  * Support for single operations (cipher only and authentication only).
+  * Support has been added for single operations (cipher only and
+    authentication only).
 
 * **Updated the AES-NI GCM PMD.**
 
-  The AES-NI GCM PMD was migrated from MB library to ISA-L library.
-  The migration entailed the following additional support for:
+  The AES-NI GCM PMD was migrated from the Multi Buffer library to the ISA-L
+  library. The migration entailed adding additional support for:
 
   * GMAC algorithm.
   * 256-bit cipher key.
   * Session-less mode.
   * Out-of place processing
-  * Scatter-gatter support for chained mbufs (only out-of place and destination
+  * Scatter-gather support for chained mbufs (only out-of place and destination
     mbuf must be contiguous)
 
 * **Added crypto performance test application.**
 
-  A new performance test application allows measuring performance parameters
-  of PMDs available in crypto tree.
+  Added a new performance test application for measuring performance
+  parameters of PMDs available in the crypto tree.
 
 * **Added Elastic Flow Distributor library (rte_efd).**
 
-  This new library uses perfect hashing to determine a target/value for a
-  given incoming flow key.
+  Added a new library which uses perfect hashing to determine a target/value
+  for a given incoming flow key.
 
-  It does not store the key itself for lookup operations, and therefore,
-  lookup performance is not dependent on the key size. Also, the target/value
-  can be any arbitrary value (8 bits by default). Finally, the storage requirement
-  is much smaller than a hash-based flow table and therefore, it can better fit for
-  CPU cache, being able to scale to millions of flow keys.
+  The library does not store the key itself for lookup operations, and
+  therefore, lookup performance is not dependent on the key size. Also, the
+  target/value can be any arbitrary value (8 bits by default). Finally, the
+  storage requirement is much smaller than a hash-based flow table and
+  therefore, it can better fit in CPU cache and scale to millions of flow
+  keys.
 
   See the :ref:`Elastic Flow Distributor Library <Efd_Library>` documentation in
   the Programmers Guide document, for more information.
@@ -259,51 +266,24 @@ Resolved Issues
    Also, make sure to start the actual text at the margin.
    =========================================================
 
-
-EAL
-~~~
-
-
 Drivers
 ~~~~~~~
 
 * **net/virtio: Fixed multiple process support.**
 
-  Fixed few regressions introduced in recent releases that break the virtio
+  Fixed a few regressions introduced in recent releases that break the virtio
   multiple process support.
 
 
-Libraries
-~~~~~~~~~
-
-
 Examples
 ~~~~~~~~
 
 * **examples/ethtool: Fixed crash with non-PCI devices.**
 
-  Querying a non-PCI device was dereferencing non-existent PCI data
-  resulting in a segmentation fault.
+  Fixed issue where querying a non-PCI device was dereferencing non-existent
+  PCI data resulting in a segmentation fault.
 
 
-Other
-~~~~~
-
-
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue in the present
-     tense. Add information on any known workarounds.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
 
 API Changes
 -----------
@@ -319,25 +299,26 @@ API Changes
 
 * **Moved five APIs for VF management from the ethdev to the ixgbe PMD.**
 
-  The following five APIs for VF management from the PF have been removed from the ethdev,
-  renamed and added to the ixgbe PMD::
+  The following five APIs for VF management from the PF have been removed from
+  the ethdev, renamed, and added to the ixgbe PMD::
 
-    rte_eth_dev_set_vf_rate_limit
-    rte_eth_dev_set_vf_rx
-    rte_eth_dev_set_vf_rxmode
-    rte_eth_dev_set_vf_tx
-    rte_eth_dev_set_vf_vlan_filter
+     rte_eth_dev_set_vf_rate_limit()
+     rte_eth_dev_set_vf_rx()
+     rte_eth_dev_set_vf_rxmode()
+     rte_eth_dev_set_vf_tx()
+     rte_eth_dev_set_vf_vlan_filter()
 
   The API's have been renamed to the following::
 
-    rte_pmd_ixgbe_set_vf_rate_limit
-    rte_pmd_ixgbe_set_vf_rx
-    rte_pmd_ixgbe_set_vf_rxmode
-    rte_pmd_ixgbe_set_vf_tx
-    rte_pmd_ixgbe_set_vf_vlan_filter
+     rte_pmd_ixgbe_set_vf_rate_limit()
+     rte_pmd_ixgbe_set_vf_rx()
+     rte_pmd_ixgbe_set_vf_rxmode()
+     rte_pmd_ixgbe_set_vf_tx()
+     rte_pmd_ixgbe_set_vf_vlan_filter()
 
   The declarations for the API’s can be found in ``rte_pmd_ixgbe.h``.
 
+
 ABI Changes
 -----------
 
-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-14 10:52  8% [dpdk-dev] Further fun with ABI tracking Christian Ehrhardt
@ 2017-02-14 16:19  4% ` Bruce Richardson
  2017-02-14 20:31  9% ` Jan Blunck
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-14 16:19 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, cjcollier, ricardo.salveti, Luca Boccassi

On Tue, Feb 14, 2017 at 11:52:00AM +0100, Christian Ehrhardt wrote:
> Hi,
> when moving to DPDK 16.11 Debian/Ubuntu packaging of DPDK has hit a new
> twist on the (it seems reoccurring) topic of DPDK ABI tracking.
> 
> I have found, ... well I don't want to call it solution ..., let's say a
> crutch to get around it for the moment. But I wanted to use the example I
> had to share a few thoughts on it and to kick off a wider discussion.
> 
> 
> *## In library cross-dependencies plus partial ABI bumps ##*
> 
> Since the day moving away from the combined shared library we had several
> improvements on tracking the ABI versions. These days [1] we have LIBABIVER
> per library and it gets bumped to reflect it is breaking with former
> versions e.g. removing symbols.
> 
> Now in the 16.11 release the ABIs for cryptodev, eal and ethdev got bumped
> by [2] and [3].
> 
> OTOH please remember that in general two versions of a shared library in
> the usual sense are meant to be able to stay alongside on a system without
> hurting each other. I picked a random one on my system.
> Package              Library
> libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160
> libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160.0.0
> libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95
> libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95.5.0
> Some link against the new, some against the old library - all fine.
> Usually most programs can just be rebuilt against the new library and after
> some time the old one can be dropped. That mechanism gives downstream
> distributions a way to handle transitions and consumers of libraries which
> might not all be ready for the same version every time.
> And since the per lib versioning with LIBABIVER and and the version maps we
> are good - in fact we qualify for all common cases on [4].
> 
> Now in DPDK of those libraries that got an ABI bump eal and ethdev are part
> of those which most of us consider "core libraries" and most other libs and
> pmds link to them.
> And here DPDK continues to be special, due to that inter-dependency with
> old and new libraries installed on the same system the following happens on
> openvswitch built for an older version of dpdk:
> ovs-vswitchd-dpdk
>     librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2
>     librte_pdump.so.1 => /usr/lib/x86_64-linux-gnu/librte_pdump.so.1
>         librte_eal.so.3 => /usr/lib/x86_64-linux-gnu/librte_eal.so.3
> 
> You can see that Openvswitch itself depends on the "old" librte_eal.so.2.
> But because  librte_pdump.so.1 did not get an ABI bump it got upgraded to
> the newer version from DPDK 16.11.
> But since the "new" pdump got built with the new DPDK 16.11 it depends on
> the "new" librte_eal.so.3.
> And having both in the same executable space at the same time causes
> segfaults and pain.
> 
> As I said for now I have passed the issue with a crutch that I'm not proud
> of and I'd like to avoid in the future. For that I'm reaching out to you
> with several suggestions to discuss.
> 
> 
> *## Thoughts ##*
> None of these seems like a perfect solution to me yet, but clearly good to
> start discussions on them.
> 
> Options that were in discussion so far and that we might adopt next cycle
> (some of these are upstream changes, some downstream, some require both to
> change - but any of them should have an ack upstream so that we are
> agreeing how to proceed with those cases).
> 
> 1. Downstreams to insert Major version into soname
> Distributions could insert the DPDK major version (like 16.11) into the
> soname and package names. A common example of this is libboost [5].
> That would perfectly allow 16.07.<LIBABIVER> to coexist with
> 16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
> Yet it would mean that anything depending on the old library will have to
> be recompiled to pick up the new code, even if it depends on an ABI that is
> still present in the new release.
> Also - not a technical reason - but it is clearly more work to force update
> all dependencies and clean out old packages for every release.
> 
> 
> 2. ABI Ranges
> One could argue that due to the detailed tracking of functions DPDK is
> already close to track not ABI levels but actually ABI ranges. DPDK could
> track LIBABIVERMIN and LIBABIVER.
> Every time functionality is added LIBABIVER would get bumped, but
> LIBABIVERMIN only gets moved to the OLDEST still supported ABI when things
> are dropped.
> So on a given library librte_foo you could have LIBABIVER=5 and
> LIBABIVERMIN=3. The make install would then install the shared lib as:
> librte_foo.so.5
> and additionally links for all compatible versions:
> librte_foo.so.3 -> librte_foo.so.5
> librte_foo.so.4 -> librte_foo.so.5
> Yet, while is has some nice attributes this might make DPDK even more
> special and cause ABI level proliferation over time.
> Also even with this in place, changes moving LIBABIVERMIN "too fast" (too
> fast is different for each downstream) could still cause an issue like the
> one I initially described.
> 
> 
> 3. A lot of conflicts
> In packaging one can declare a package to conflict with another package [6].
> Now we could declare e.g. librte_eal3 to conflict with librte_eal2 (and the
> same for all other bumps).
> That would make them not coinstallable, and working on a new release would
> mean that all former consumers would become not installable as well and
> have to be rebuilt before they all could migrate [7] together.
> That "works" in some sense, but it denies the whole purpose of versioned
> library packages (to be coninstallable, to allow different library
> consumers to depent on different versions)
> 
> 
> 4. ABI bump is infecting
> Another way might be to also bump any dependent DPDK library.
> So when core libs like eal are ABI bumped likely all libs would get a bump.
> If only e.g. mempool gets a bump only those other parts using it would be
> bumped as well.
> To some extend this might still proliferate ABI versions more than one
> would like.
> Also it surely is hard to track if not automated - think of dependencies
> that are existing only in certain config cases.
> 
> 5. back to single ABI
> For the sake of giving everybody a chance to re-open old wounds I wanted to
> mention that DPDK could also decide to go back to a single ABI again.
> This could (but doesn't have to!) be combined with having a single .so file
> again.
> To decide for this might be a much cleaner and easier to track way to #4.
> 
> 6. More
> I'm sure there are more approaches to this, feel free to come up with more.
> 
> I'm sure my five suggestions alone will make the thread messy, Maybe we do
> this in two rounds, sorting out the insane and identifying the preferred
> ones to then in a second run focus on discussing and maybe implementing the
> details of what we like.
> 
> 

Of the 5 options you propose, No 4 looks most appealing to me. If it
does cause problems with different config cases, then that looks a good
reason to cut down on the allowed configs. :-)

/Bruce

> [1]: http://dpdk.org/browse/dpdk/tree/doc/guides/contributing/versioning.rst
> [2]: http://dpdk.org/browse/dpdk/commit/?id=d7e61ad3ae36
> [3]: http://dpdk.org/browse/dpdk/commit/?id=6ba1affa54108
> [4]: https://wiki.debian.org/TransitionBestPractices
> [5]: https://packages.debian.org/sid/libboost1.62-dev
> [6]:
> https://www.debian.org/doc/debian-policy/ch-relationships.html#s-conflicts
> [7]: https://wiki.ubuntu.com/ProposedMigration
> 
> P.S. I beg a pardon for the wall of text
> 
> -- 
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] crypto drivers in the API
  2017-02-14 14:46  4%     ` Doherty, Declan
@ 2017-02-14 15:47  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 15:47 UTC (permalink / raw)
  To: Doherty, Declan; +Cc: dev, Pablo DeLara Guarch

2017-02-14 14:46, Doherty, Declan:
> On 14/02/2017 11:04 AM, Thomas Monjalon wrote:
> > 2017-02-14 10:44, Doherty, Declan:
> >> On 13/02/2017 1:25 PM, Thomas Monjalon wrote:
> >>> In the crypto API, the drivers are listed.
> >>> In my opinion, it is a wrong designed and these lists should be removed.
> >>> Do we need a deprecation notice to plan this removal in 17.05, while
> >>> working on bus abstraction?
> >>>
> >> ...
> >>>
> ...
> >
> > Yes
> > If you were planning to do this, you should have sent a deprecation notice
> > few weeks ago.
> > Please send it now and we'll see if we have enough supporters shortly.
> >
> 
> Thomas, there are a couple of other changes we are looking at in the 
> cryptodev which would require API changes as well as break ABI including 
> adding support for a multi-device sessions, and changes to crypto 
> operation layout and field changes for performance but these but will 
> require RFCs or at least more discussion of the proposals. Given the 
> time constrains for the V1 deadline for 17.05 I would prefer to work on 
> the RFCs and get them out as soon as possible over the next few weeks 
> and then make all the ABI breaking changes in R17.08 in a single release.
> 
> Otherwise we will end up breaking ABI 2 release in a row which I would 
> like to avoid if possible.

OK, seems good. Thanks

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1] doc: update release notes for 17.02
@ 2017-02-14 15:32  4% John McNamara
  2017-02-14 16:26  2% ` [dpdk-dev] [PATCH v2] " John McNamara
  0 siblings, 1 reply; 200+ results
From: John McNamara @ 2017-02-14 15:32 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 16721 bytes --]

Fix grammar, spelling and formatting of DPDK 17.02 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---

Note: The "ABI Changes" section is currently empty.


 doc/guides/rel_notes/release_17_02.rst | 255 ++++++++++++++-------------------
 1 file changed, 111 insertions(+), 144 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 6420a87..b7c188a 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -40,46 +40,47 @@ New Features
 
 * **Added support for representing buses in EAL**
 
-  A new structure ``rte_bus`` is introduced in EAL. This allows for devices to
-  be represented by buses they are connected to. A new bus can be added to
-  DPDK by extending the ``rte_bus`` structure and implementing the scan and
-  probe functions. Once a new bus is registered using provided APIs, new
-  devices can be detected and initialized using bus scan and probe callbacks.
+  The ``rte_bus`` structure was introduced into the EAL. This allows for
+  devices to be represented by buses they are connected to. A new bus can be
+  added to DPDK by extending the ``rte_bus`` structure and implementing the
+  scan and probe functions. Once a new bus is registered using the provided
+  APIs, new devices can be detected and initialized using bus scan and probe
+  callbacks.
 
-  With this change, devices other than PCI or VDEV type can also be represented
-  in DPDK framework.
+  With this change, devices other than PCI or VDEV type can be represented
+  in the DPDK framework.
 
 * **Added generic EAL API for I/O device memory read/write operations.**
 
-  This API introduces 8-bit, 16-bit, 32bit, 64bit I/O device
-  memory read/write operations along with the relaxed versions.
+  This API introduces 8 bit, 16 bit, 32 bit and 64 bit I/O device
+  memory read/write operations along with "relaxed" versions.
 
-  The weakly-ordered machine like ARM needs additional I/O barrier for
-  device memory read/write access over PCI bus.
-  By introducing the EAL abstraction for I/O device memory read/write access,
-  The drivers can access I/O device memory in architecture-agnostic manner.
-  The relaxed version does not have additional I/O memory barrier, useful in
-  accessing the device registers of integrated controllers which
-  implicitly strongly ordered with respect to memory access.
+  Weakly-ordered architectures like ARM need an additional I/O barrier for
+  device memory read/write access over PCI bus. By introducing the EAL
+  abstraction for I/O device memory read/write access, the drivers can access
+  I/O device memory in an architecture-agnostic manner. The relaxed version
+  does not have an additional I/O memory barrier, which is useful in accessing
+  the device registers of integrated controllers which is implicitly strongly
+  ordered with respect to memory access.
 
 * **Added generic flow API (rte_flow).**
 
   This API provides a generic means to configure hardware to match specific
-  ingress or egress traffic, alter its fate and query related counters
+  ingress or egress traffic, alter its behavior and query related counters
   according to any number of user-defined rules.
 
-  It is slightly higher-level than the legacy filtering framework which it
-  encompasses and supersedes (including all functions and filter types) in
-  order to expose a single interface with an unambiguous behavior that is
-  common to all poll-mode drivers (PMDs).
+  In order to expose a single interface with an unambiguous behavior that is
+  common to all poll-mode drivers (PMDs) the ``rte_flow`` API is slightly
+  higher-level than the legacy filtering framework, which it encompasses and
+  supersedes (including all functions and filter types) .
 
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
 * **Added firmware version get API.**
 
-  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
-  version by a given device.
+  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch the firmware
+  version for a given device.
 
 * **Added APIs for MACsec offload support to the ixgbe PMD.**
 
@@ -90,54 +91,58 @@ New Features
 
   Added support for I219 Intel 1GbE NICs.
 
-* **Added VF Daemon (VFD) on i40e. - EXPERIMENTAL**
-
-  This's an EXPERIMENTAL feature to enhance the capability of DPDK PF as many
-  VF management features are not supported by kernel PF driver.
-  Some new private APIs are implemented in PMD without abstrction layer.
-  They can be used directly by some users who have the need.
-
-  The new APIs to control VFs directly from PF include,
-  1) set VF MAC anti-spoofing
-  2) set VF VLAN anti-spoofing
-  3) set TX loopback
-  4) set VF unicast promiscuous mode
-  5) set VF multicast promiscuous mode
-  6) set VF MTU
-  7) get/reset VF stats
-  8) set VF MAC address
-  9) set VF VLAN stripping
-  10) VF VLAN insertion
-  12) set VF broadcast mode
-  13) set VF VLAN tag
-  14) set VF VLAN filter
-  VFD also includes VF to PF mailbox message management by APP.
-  When PF receives mailbox messages from VF, PF should call the callback
-  provided by APP to know if they're permitted to be processed.
-
-  As an EXPERIMENTAL feature, please aware it can be changed or even
+* **Added VF Daemon (VFD) for i40e. - EXPERIMENTAL**
+
+  This is an EXPERIMENTAL feature to enhance the capability of the DPDK PF as
+  many VF management features are not currently supported by the kernel PF
+  driver. Some new private APIs are implemented directly in the PMD without an
+  abstraction layer. They can be used directly by some users who have the
+  need.
+
+  The new APIs to control VFs directly from PF include:
+
+  * Set VF MAC anti-spoofing.
+  * Set VF VLAN anti-spoofing.
+  * Set TX loopback.
+  * Set VF unicast promiscuous mode.
+  * Set VF multicast promiscuous mode.
+  * Set VF MTU.
+  * Get/reset VF stats.
+  * Set VF MAC address.
+  * Set VF VLAN stripping.
+  * Vf VLAN insertion.
+  * Set VF broadcast mode.
+  * Set VF VLAN tag.
+  * Set VF VLAN filter.
+
+  VFD also includes VF to PF mailbox message management from an application.
+  When the PF receives mailbox messages from the VF the PF should call the
+  callback provided by the application to know if they're permitted to be
+  processed.
+
+  As an EXPERIMENTAL feature, please be aware it can be changed or even
   removed without prior notice.
 
 * **Updated the i40e base driver.**
 
-  updated the i40e base driver, including the following changes:
+  Updated the i40e base driver, including the following changes:
 
-  * replace existing legacy memcpy() calls with i40e_memcpy() calls.
-  * use BIT() macro instead of bit fields
-  * add clear all WoL filters implementation
-  * add broadcast promiscuous control per VLAN
-  * remove unused X722_SUPPORT and I40E_NDIS_SUPPORT MARCOs
+  * Replace existing legacy ``memcpy()`` calls with ``i40e_memcpy()`` calls.
+  * Use ``BIT()`` macro instead of bit fields.
+  * Add clear all WoL filters implementation.
+  * Add broadcast promiscuous control per VLAN.
+  * Remove unused ``X722_SUPPORT`` and ``I40E_NDIS_SUPPORT`` macros.
 
 * **Updated the enic driver.**
 
-  * Set new Rx checksum flags in mbufs to indicate unknown, good or bad.
+  * Set new Rx checksum flags in mbufs to indicate unknown, good or bad checksums.
   * Fix set/remove of MAC addresses. Allow up to 64 addresses per device.
   * Enable TSO on outer headers.
 
 * **Added Solarflare libefx-based network PMD.**
 
-  A new network PMD which supports Solarflare SFN7xxx and SFN8xxx family
-  of 10/40 Gbps adapters has been added.
+  Added a new network PMD which supports Solarflare SFN7xxx and SFN8xxx family
+  of 10/40 Gbps adapters.
 
 * **Updated the mlx4 driver.**
 
@@ -145,8 +150,8 @@ New Features
 
 * **Added support for Mellanox ConnectX-5 adapters (mlx5).**
 
-  Support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps adapters
-  has been added to the existing mlx5 PMD.
+  Added support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps
+  adapters to the existing mlx5 PMD.
 
 * **Updated the mlx5 driver.**
 
@@ -161,47 +166,47 @@ New Features
 
 * **virtio-user with vhost-kernel as another exceptional path.**
 
-  Previously, we upstreamed a virtual device, virtio-user with vhost-user
-  as the backend, as a way for IPC (Inter-Process Communication) and user
+  Previously, we upstreamed a virtual device, virtio-user with vhost-user as
+  the backend as a way of enabling IPC (Inter-Process Communication) and user
   space container networking.
 
-  Virtio-user with vhost-kernel as the backend is a solution for exceptional
-  path, such as KNI, which exchanges packets with kernel networking stack.
+  Virtio-user with vhost-kernel as the backend is a solution for the exception
+  path, such as KNI, which exchanges packets with the kernel networking stack.
   This solution is very promising in:
 
-  * maintenance: vhost and vhost-net (kernel) is upstreamed and extensively
+  * Maintenance: vhost and vhost-net (kernel) is an upstreamed and extensively
     used kernel module.
-  * features: vhost-net is born to be a networking solution, which has
+  * Features: vhost-net is designed to be a networking solution, which has
     lots of networking related features, like multi-queue, TSO, multi-seg
     mbuf, etc.
-  * performance: similar to KNI, this solution would use one or more
+  * Performance: similar to KNI, this solution would use one or more
     kthreads to send/receive packets from user space DPDK applications,
     which has little impact on user space polling thread (except that
     it might enter into kernel space to wake up those kthreads if
     necessary).
 
-* **Added virtio Rx interrupt suppprt.**
+* **Added virtio Rx interrupt support.**
 
-  This feature enables Rx interrupt mode for virtio pci net devices as
-  binded to VFIO (noiommu mode) and drived by virtio PMD.
+  Added a feature to enable Rx interrupt mode for virtio pci net devices as
+  bound to VFIO (noiommu mode) and driven by virtio PMD.
 
-  With this feature, virtio PMD can switch between polling mode and
+  With this feature, the virtio PMD can switch between polling mode and
   interrupt mode, to achieve best performance, and at the same time save
-  power. It can work on both legacy and modern virtio devices. At this mode,
-  each rxq is mapped with an exluded MSIx interrupt.
+  power. It can work on both legacy and modern virtio devices. In this mode,
+  each ``rxq`` is mapped with an excluded MSIx interrupt.
 
   See the :ref:`Virtio Interrupt Mode <virtio_interrupt_mode>` documentation
   for more information.
 
 * **Added ARMv8 crypto PMD.**
 
-  A new crypto PMD has been added, which provides combined mode cryptografic
+  A new crypto PMD has been added, which provides combined mode cryptographic
   operations optimized for ARMv8 processors. The driver can be used to enhance
   performance in processing chained operations such as cipher + HMAC.
 
 * **Updated the QAT PMD.**
 
-  The QAT PMD was updated with additional support for:
+  The QAT PMD has been updated with additional support for:
 
   * DES algorithm.
   * Scatter-gather list (SGL) support.
@@ -210,100 +215,61 @@ New Features
 
   * The Intel(R) Multi Buffer Crypto for IPsec library used in
     AESNI MB PMD has been moved to a new repository, in GitHub.
-  * Support for single operations (cipher only and authentication only).
+  * Support has been added for single operations (cipher only and
+    authentication only).
 
 * **Updated the AES-NI GCM PMD.**
 
-  The AES-NI GCM PMD was migrated from MB library to ISA-L library.
-  The migration entailed the following additional support for:
+  The AES-NI GCM PMD was migrated from the Multi Buffer library to the ISA-L
+  library. The migration entailed adding additional support for:
 
   * GMAC algorithm.
   * 256-bit cipher key.
   * Session-less mode.
   * Out-of place processing
-  * Scatter-gatter support for chained mbufs (only out-of place and destination
+  * Scatter-gather support for chained mbufs (only out-of place and destination
     mbuf must be contiguous)
 
 * **Added crypto performance test application.**
 
-  A new performance test application allows measuring performance parameters
-  of PMDs available in crypto tree.
+  Added a new performance test application for measuring performance
+  parameters of PMDs available in the crypto tree.
 
 * **Added Elastic Flow Distributor library (rte_efd).**
 
-  This new library uses perfect hashing to determine a target/value for a
-  given incoming flow key.
+  Added a new library which uses perfect hashing to determine a target/value
+  for a given incoming flow key.
 
-  It does not store the key itself for lookup operations, and therefore,
-  lookup performance is not dependent on the key size. Also, the target/value
-  can be any arbitrary value (8 bits by default). Finally, the storage requirement
-  is much smaller than a hash-based flow table and therefore, it can better fit for
-  CPU cache, being able to scale to millions of flow keys.
+  The library does not store the key itself for lookup operations, and
+  therefore, lookup performance is not dependent on the key size. Also, the
+  target/value can be any arbitrary value (8 bits by default). Finally, the
+  storage requirement is much smaller than a hash-based flow table and
+  therefore, it can better fit in CPU cache and scale to millions of flow
+  keys.
 
   See the :ref:`Elastic Flow Distributor Library <Efd_Library>` documentation in
   the Programmers Guide document, for more information.
 
 
-Resolved Issues
----------------
-
-.. This section should contain bug fixes added to the relevant sections. Sample format:
-
-   * **code/section Fixed issue in the past tense with a full stop.**
-
-     Add a short 1-2 sentence description of the resolved issue in the past tense.
-     The title should contain the code/lib section like a commit message.
-     Add the entries in alphabetic order in the relevant sections below.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
-EAL
-~~~
-
 
 Drivers
 ~~~~~~~
 
 * **net/virtio: Fixed multiple process support.**
 
-  Fixed few regressions introduced in recent releases that break the virtio
+  Fixed a few regressions introduced in recent releases that break the virtio
   multiple process support.
 
 
-Libraries
-~~~~~~~~~
-
-
 Examples
 ~~~~~~~~
 
 * **examples/ethtool: Fixed crash with non-PCI devices.**
 
-  Querying a non-PCI device was dereferencing non-existent PCI data
-  resulting in a segmentation fault.
+  Fixed issue where querying a non-PCI device was dereferencing non-existent
+  PCI data resulting in a segmentation fault.
 
 
-Other
-~~~~~
-
-
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue in the present
-     tense. Add information on any known workarounds.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
 
 API Changes
 -----------
@@ -319,25 +285,26 @@ API Changes
 
 * **Moved five APIs for VF management from the ethdev to the ixgbe PMD.**
 
-  The following five APIs for VF management from the PF have been removed from the ethdev,
-  renamed and added to the ixgbe PMD::
+  The following five APIs for VF management from the PF have been removed from
+  the ethdev, renamed, and added to the ixgbe PMD::
 
-    rte_eth_dev_set_vf_rate_limit
-    rte_eth_dev_set_vf_rx
-    rte_eth_dev_set_vf_rxmode
-    rte_eth_dev_set_vf_tx
-    rte_eth_dev_set_vf_vlan_filter
+     rte_eth_dev_set_vf_rate_limit()
+     rte_eth_dev_set_vf_rx()
+     rte_eth_dev_set_vf_rxmode()
+     rte_eth_dev_set_vf_tx()
+     rte_eth_dev_set_vf_vlan_filter()
 
   The API's have been renamed to the following::
 
-    rte_pmd_ixgbe_set_vf_rate_limit
-    rte_pmd_ixgbe_set_vf_rx
-    rte_pmd_ixgbe_set_vf_rxmode
-    rte_pmd_ixgbe_set_vf_tx
-    rte_pmd_ixgbe_set_vf_vlan_filter
+     rte_pmd_ixgbe_set_vf_rate_limit()
+     rte_pmd_ixgbe_set_vf_rx()
+     rte_pmd_ixgbe_set_vf_rxmode()
+     rte_pmd_ixgbe_set_vf_tx()
+     rte_pmd_ixgbe_set_vf_vlan_filter()
 
   The declarations for the API’s can be found in ``rte_pmd_ixgbe.h``.
 
+
 ABI Changes
 -----------
 
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] crypto drivers in the API
  2017-02-14 11:04  0%   ` Thomas Monjalon
@ 2017-02-14 14:46  4%     ` Doherty, Declan
  2017-02-14 15:47  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Doherty, Declan @ 2017-02-14 14:46 UTC (permalink / raw)
  To: Thomas Monjalon, dev, Pablo DeLara Guarch

On 14/02/2017 11:04 AM, Thomas Monjalon wrote:
> 2017-02-14 10:44, Doherty, Declan:
>> On 13/02/2017 1:25 PM, Thomas Monjalon wrote:
>>> In the crypto API, the drivers are listed.
>>> In my opinion, it is a wrong designed and these lists should be removed.
>>> Do we need a deprecation notice to plan this removal in 17.05, while
>>> working on bus abstraction?
>>>
>> ...
>>>
...
>
> Yes
> If you were planning to do this, you should have sent a deprecation notice
> few weeks ago.
> Please send it now and we'll see if we have enough supporters shortly.
>

Thomas, there are a couple of other changes we are looking at in the 
cryptodev which would require API changes as well as break ABI including 
adding support for a multi-device sessions, and changes to crypto 
operation layout and field changes for performance but these but will 
require RFCs or at least more discussion of the proposals. Given the 
time constrains for the V1 deadline for 17.05 I would prefer to work on 
the RFCs and get them out as soon as possible over the next few weeks 
and then make all the ABI breaking changes in R17.08 in a single release.

Otherwise we will end up breaking ABI 2 release in a row which I would 
like to avoid if possible.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
    2017-02-13 18:02  4% ` Thomas Monjalon
@ 2017-02-14 13:54  4% ` Maxime Coquelin
  2017-02-14 20:28  4% ` Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2017-02-14 13:54 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: Thomas Monjalon, John McNamara

Hi Yuanhan,

On 01/23/2017 02:04 PM, Yuanhan Liu wrote:
> I made a vhost ABI/API refactoring at v16.04, meant to avoid such issue
> forever. Well, apparently, I lied.
>
> People are looking for more vhost-user options now days, other than
> vhost-user net only. For example, SPDK (Storage Performance Development
> Kit) are looking for chance of vhost-user SCSI and vhost-user block.
>
> Apparently, they also need a vhost-user backend, while DPDK already
> has a (mature enough) backend, they don't want to implement it again
> from scratch. They want to leverage the one DPDK provides.
>
> However, the last refactoring hasn't done that right, at least it's
> not friendly for extending vhost-user to add more devices support.
> For example, different virtio devices has its own feature set, while
> APIs like rte_vhost_feature_disable(feature_mask) have no option to
> tell the device type. Thus, a more proper API should look like:
>
>     rte_vhost_feature_disable(device_type, feature_mask);

I wonder if we could also change it to be per-instance, instead of
disabling features globally:
rte_vhost_feature_disable(vid, device_type, feature_mask);

It could be useful for live-migration with different backend versions on
the hosts, as it would allow to run instances with different compat
modes (like running vhost's DPDK v17.08 with v17.05-only supported
features).
I made a proposal about cross-version migration, but we are far from a
conclusion on the design.

>
> Besides that, few public files and structures should be renamed, to
> not let it bind to virtio-net. Specifically, they are:
>
> - virtio_net_device_ops --> vhost_device_ops
> - rte_virtio_net.h      --> rte_vhost.h
>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Anyway, the change you propose is necessary:
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
  2017-02-14  8:33  4% ` Olivier Matz
@ 2017-02-14 11:43  4%   ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2017-02-14 11:43 UTC (permalink / raw)
  To: Olivier Matz, Bruce Richardson; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Tuesday, February 14, 2017 2:34 AM
> To: Bruce Richardson <bruce.richardson@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring
> library
> 
> On Mon, 13 Feb 2017 17:38:30 +0000, Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > Document proposed changes for the rings code in the next release.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] crypto drivers in the API
  2017-02-14 10:44  4% ` Doherty, Declan
@ 2017-02-14 11:04  0%   ` Thomas Monjalon
  2017-02-14 14:46  4%     ` Doherty, Declan
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-14 11:04 UTC (permalink / raw)
  To: Doherty, Declan, dev

2017-02-14 10:44, Doherty, Declan:
> On 13/02/2017 1:25 PM, Thomas Monjalon wrote:
> > In the crypto API, the drivers are listed.
> > In my opinion, it is a wrong designed and these lists should be removed.
> > Do we need a deprecation notice to plan this removal in 17.05, while
> > working on bus abstraction?
> >
> ...
> >
> 
> Hey Thomas,
> I agree that these need to be removed, and I had planned on doing this 
> for 17.05 but I have a concern on the requirements for ABI breakage in 
> relation to this. This enum is unfortunately used in both the 
> rte_cryptodev and rte_crypto_sym_session structures which are part of 
> the libraries public API. I don't think it would be feasible to maintain 
> a set of 17.02 compatible APIs with the changes this would introduce, as 
> it would require a large number of functions to have 2 versions? Is it 
> OK to break the ABI for this case?

Yes
If you were planning to do this, you should have sent a deprecation notice
few weeks ago.
Please send it now and we'll see if we have enough supporters shortly.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev ops structure
  2017-02-14 10:48  4%   ` Doherty, Declan
@ 2017-02-14 11:03  4%     ` De Lara Guarch, Pablo
  0 siblings, 0 replies; 200+ results
From: De Lara Guarch, Pablo @ 2017-02-14 11:03 UTC (permalink / raw)
  To: Doherty, Declan, Zhang, Roy Fan, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Doherty, Declan
> Sent: Tuesday, February 14, 2017 10:48 AM
> To: Zhang, Roy Fan; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev
> ops structure
> 
> On 14/02/2017 10:41 AM, Fan Zhang wrote:
> > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> > ---
> ...
> >
> 
> Acked-by: Declan Doherty <declan.doherty@intel.com>

Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] Further fun with ABI tracking
@ 2017-02-14 10:52  8% Christian Ehrhardt
  2017-02-14 16:19  4% ` Bruce Richardson
  2017-02-14 20:31  9% ` Jan Blunck
  0 siblings, 2 replies; 200+ results
From: Christian Ehrhardt @ 2017-02-14 10:52 UTC (permalink / raw)
  To: dev; +Cc: cjcollier, ricardo.salveti, Luca Boccassi

Hi,
when moving to DPDK 16.11 Debian/Ubuntu packaging of DPDK has hit a new
twist on the (it seems reoccurring) topic of DPDK ABI tracking.

I have found, ... well I don't want to call it solution ..., let's say a
crutch to get around it for the moment. But I wanted to use the example I
had to share a few thoughts on it and to kick off a wider discussion.


*## In library cross-dependencies plus partial ABI bumps ##*

Since the day moving away from the combined shared library we had several
improvements on tracking the ABI versions. These days [1] we have LIBABIVER
per library and it gets bumped to reflect it is breaking with former
versions e.g. removing symbols.

Now in the 16.11 release the ABIs for cryptodev, eal and ethdev got bumped
by [2] and [3].

OTOH please remember that in general two versions of a shared library in
the usual sense are meant to be able to stay alongside on a system without
hurting each other. I picked a random one on my system.
Package              Library
libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160
libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160.0.0
libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95
libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95.5.0
Some link against the new, some against the old library - all fine.
Usually most programs can just be rebuilt against the new library and after
some time the old one can be dropped. That mechanism gives downstream
distributions a way to handle transitions and consumers of libraries which
might not all be ready for the same version every time.
And since the per lib versioning with LIBABIVER and and the version maps we
are good - in fact we qualify for all common cases on [4].

Now in DPDK of those libraries that got an ABI bump eal and ethdev are part
of those which most of us consider "core libraries" and most other libs and
pmds link to them.
And here DPDK continues to be special, due to that inter-dependency with
old and new libraries installed on the same system the following happens on
openvswitch built for an older version of dpdk:
ovs-vswitchd-dpdk
    librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2
    librte_pdump.so.1 => /usr/lib/x86_64-linux-gnu/librte_pdump.so.1
        librte_eal.so.3 => /usr/lib/x86_64-linux-gnu/librte_eal.so.3

You can see that Openvswitch itself depends on the "old" librte_eal.so.2.
But because  librte_pdump.so.1 did not get an ABI bump it got upgraded to
the newer version from DPDK 16.11.
But since the "new" pdump got built with the new DPDK 16.11 it depends on
the "new" librte_eal.so.3.
And having both in the same executable space at the same time causes
segfaults and pain.

As I said for now I have passed the issue with a crutch that I'm not proud
of and I'd like to avoid in the future. For that I'm reaching out to you
with several suggestions to discuss.


*## Thoughts ##*
None of these seems like a perfect solution to me yet, but clearly good to
start discussions on them.

Options that were in discussion so far and that we might adopt next cycle
(some of these are upstream changes, some downstream, some require both to
change - but any of them should have an ack upstream so that we are
agreeing how to proceed with those cases).

1. Downstreams to insert Major version into soname
Distributions could insert the DPDK major version (like 16.11) into the
soname and package names. A common example of this is libboost [5].
That would perfectly allow 16.07.<LIBABIVER> to coexist with
16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
Yet it would mean that anything depending on the old library will have to
be recompiled to pick up the new code, even if it depends on an ABI that is
still present in the new release.
Also - not a technical reason - but it is clearly more work to force update
all dependencies and clean out old packages for every release.


2. ABI Ranges
One could argue that due to the detailed tracking of functions DPDK is
already close to track not ABI levels but actually ABI ranges. DPDK could
track LIBABIVERMIN and LIBABIVER.
Every time functionality is added LIBABIVER would get bumped, but
LIBABIVERMIN only gets moved to the OLDEST still supported ABI when things
are dropped.
So on a given library librte_foo you could have LIBABIVER=5 and
LIBABIVERMIN=3. The make install would then install the shared lib as:
librte_foo.so.5
and additionally links for all compatible versions:
librte_foo.so.3 -> librte_foo.so.5
librte_foo.so.4 -> librte_foo.so.5
Yet, while is has some nice attributes this might make DPDK even more
special and cause ABI level proliferation over time.
Also even with this in place, changes moving LIBABIVERMIN "too fast" (too
fast is different for each downstream) could still cause an issue like the
one I initially described.


3. A lot of conflicts
In packaging one can declare a package to conflict with another package [6].
Now we could declare e.g. librte_eal3 to conflict with librte_eal2 (and the
same for all other bumps).
That would make them not coinstallable, and working on a new release would
mean that all former consumers would become not installable as well and
have to be rebuilt before they all could migrate [7] together.
That "works" in some sense, but it denies the whole purpose of versioned
library packages (to be coninstallable, to allow different library
consumers to depent on different versions)


4. ABI bump is infecting
Another way might be to also bump any dependent DPDK library.
So when core libs like eal are ABI bumped likely all libs would get a bump.
If only e.g. mempool gets a bump only those other parts using it would be
bumped as well.
To some extend this might still proliferate ABI versions more than one
would like.
Also it surely is hard to track if not automated - think of dependencies
that are existing only in certain config cases.

5. back to single ABI
For the sake of giving everybody a chance to re-open old wounds I wanted to
mention that DPDK could also decide to go back to a single ABI again.
This could (but doesn't have to!) be combined with having a single .so file
again.
To decide for this might be a much cleaner and easier to track way to #4.

6. More
I'm sure there are more approaches to this, feel free to come up with more.

I'm sure my five suggestions alone will make the thread messy, Maybe we do
this in two rounds, sorting out the insane and identifying the preferred
ones to then in a second run focus on discussing and maybe implementing the
details of what we like.


[1]: http://dpdk.org/browse/dpdk/tree/doc/guides/contributing/versioning.rst
[2]: http://dpdk.org/browse/dpdk/commit/?id=d7e61ad3ae36
[3]: http://dpdk.org/browse/dpdk/commit/?id=6ba1affa54108
[4]: https://wiki.debian.org/TransitionBestPractices
[5]: https://packages.debian.org/sid/libboost1.62-dev
[6]:
https://www.debian.org/doc/debian-policy/ch-relationships.html#s-conflicts
[7]: https://wiki.ubuntu.com/ProposedMigration

P.S. I beg a pardon for the wall of text

-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev ops structure
  2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
@ 2017-02-14 10:48  4%   ` Doherty, Declan
  2017-02-14 11:03  4%     ` De Lara Guarch, Pablo
  2017-02-14 20:37  4%   ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Doherty, Declan @ 2017-02-14 10:48 UTC (permalink / raw)
  To: Fan Zhang, dev

On 14/02/2017 10:41 AM, Fan Zhang wrote:
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
...
>

Acked-by: Declan Doherty <declan.doherty@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] crypto drivers in the API
  @ 2017-02-14 10:44  4% ` Doherty, Declan
  2017-02-14 11:04  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Doherty, Declan @ 2017-02-14 10:44 UTC (permalink / raw)
  To: Thomas Monjalon, Declan Doherty; +Cc: dev

On 13/02/2017 1:25 PM, Thomas Monjalon wrote:
> In the crypto API, the drivers are listed.
> In my opinion, it is a wrong designed and these lists should be removed.
> Do we need a deprecation notice to plan this removal in 17.05, while
> working on bus abstraction?
>
...
>

Hey Thomas,
I agree that these need to be removed, and I had planned on doing this 
for 17.05 but I have a concern on the requirements for ABI breakage in 
relation to this. This enum is unfortunately used in both the 
rte_cryptodev and rte_crypto_sym_session structures which are part of 
the libraries public API. I don't think it would be feasible to maintain 
a set of 17.02 compatible APIs with the changes this would introduce, as 
it would require a large number of functions to have 2 versions? Is it 
OK to break the ABI for this case?

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev ops structure
  2017-02-10 11:39  9% [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure Fan Zhang
  2017-02-10 13:59  4% ` Trahe, Fiona
@ 2017-02-14 10:41  9% ` Fan Zhang
  2017-02-14 10:48  4%   ` Doherty, Declan
  2017-02-14 20:37  4%   ` Thomas Monjalon
  1 sibling, 2 replies; 200+ results
From: Fan Zhang @ 2017-02-14 10:41 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v2:
Rework the grammar

 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..d64858f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -62,3 +62,7 @@ Deprecation Notices
   PMDs that implement the latter.
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
+
+* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
+  A pointer to a rte_cryptodev_config structure will be added to the
+  function prototype "cryptodev_configuret_t, as a new parameter.
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
  2017-02-14  3:17  4%     ` Jerin Jacob
@ 2017-02-14 10:33  4%       ` Iremonger, Bernard
  0 siblings, 0 replies; 200+ results
From: Iremonger, Bernard @ 2017-02-14 10:33 UTC (permalink / raw)
  To: Jerin Jacob, Thomas Monjalon; +Cc: dev, Mcnamara, John



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, February 14, 2017 3:17 AM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>
> Cc: Iremonger, Bernard <bernard.iremonger@intel.com>; dev@dpdk.org;
> Mcnamara, John <john.mcnamara@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for
> ethdev
> 
> On Mon, Feb 13, 2017 at 06:57:20PM +0100, Thomas Monjalon wrote:
> > 2017-01-05 15:25, Bernard Iremonger:
> > > In 17.05 nine rte_eth_dev_* functions will be removed from
> > > librte_ether, renamed and moved to the ixgbe PMD.
> > >
> > > Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> >
> > "ixgbe bypass" should be in the title and the description.
> > I'll reword to:
> >
> > doc: announce move of ethdev bypass function to ixgbe API
> >
> > In 17.05, nine rte_eth_dev_* functions for bypass control, and
> > implemented only in ixgbe, will be removed from ethdev, renamed and
> > moved to the ixgbe PMD-specific API.
> >
> > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization
  2017-02-14  8:32  3%   ` Olivier Matz
@ 2017-02-14  9:39  0%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-14  9:39 UTC (permalink / raw)
  To: Olivier Matz
  Cc: thomas.monjalon, keith.wiles, konstantin.ananyev, stephen, dev

On Tue, Feb 14, 2017 at 09:32:20AM +0100, Olivier Matz wrote:
> Hi Bruce,
> 
> On Tue,  7 Feb 2017 14:12:38 +0000, Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > This patchset make a set of, sometimes non-backward compatible,
> > cleanup changes to the rte_ring code in order to improve it. The
> > resulting code is shorter*, since the existing functions are
> > restructured to reduce code duplication, as well as being more
> > consistent in behaviour. The specific changes made are explained in
> > each patch which makes that change.
> > 
> > Key incompatibilities:
> > * The biggest, and probably most controversial change is that to the
> >   enqueue and dequeue APIs. The enqueue/deq burst and bulk functions
> > have their function prototypes changed so that they all return an
> > additional parameter, indicating the size of next call which is
> > guaranteed to succeed. In case on enq, this is the number of
> > available slots on the ring, and in case of deq, it is the number of
> > objects which can be pulled. As well as this, the return value from
> > the bulk functions have been changed to make them compatible with the
> > burst functions. In all cases, the functions to enq/deq a set of objs
> > now return the number of objects processed, 0 or N, in the case of
> > bulk functions, 0, N or any value in between in the case of the burst
> > ones. [Due to the extra parameter, the compiler will flag all
> > instances of the function to allow the user to also change the return
> > value logic at the same time]
> > * The parameters to the single object enq/deq functions have not been 
> >   changed. Because of that, the return value is also unmodified - as
> > the compiler cannot automatically flag this to the user.
> > 
> > Potential further cleanups:
> > * To a certain extent the rte_ring structure has gone from being a
> > whole ring structure, including a "ring" element itself, to just
> > being a header which can be reused, along with the head/tail update
> > functions to create new rings. For now, the enqueue code works by
> > assuming that the ring data goes immediately after the header, but
> > that can be changed to allow specialised ring implementations to put
> > additional metadata of their own after the ring header. I didn't see
> > this as being needed right now, but it may be worth considering for a
> > V1 patchset.
> > * There are 9 enqueue functions and 9 dequeue functions in
> > rte_ring.h. I suspect not all of those are used, so personally I
> > would consider dropping the functions to enqueue/dequeue a single
> > value using single or multi semantics, i.e. drop 
> >     rte_ring_sp_enqueue
> >     rte_ring_mp_enqueue
> >     rte_ring_sc_dequeue
> >     rte_ring_mc_dequeue
> >   That would still leave a single enqueue and dequeue function for
> > working with a single object at a time.
> > * It should be possible to merge the head update code for enqueue and
> >   dequeue into a single function. The key difference between the two
> > is the calculation of how far the index can be moved. I felt that the
> >   functions for moving the head index are sufficiently complicated
> > with many parameters to them already, that trying to merge in more
> > code would impede readability. However, if so desired this change can
> > be made at a later stage without affecting ABI or API.
> > 
> > PERFORMANCE:
> > I've run performance autotests on a couple of (Intel) platforms.
> > Looking particularly at the core-2-core results, which I expect are
> > the main ones of interest, the performance after this patchset is a
> > few cycles per packet faster in my testing. I'm hoping it should be
> > at least neutral perf-wise.
> > 
> > REQUEST FOR FEEDBACK:
> > * Are all of these changes worth making?
> 
> I've quickly browsed all the patches. I think yes, we should do it: it
> brings a good cleanup, removing features we don't need, restructuring
> the code, and also adding the feature you need :)
> 
> 
> > * Should they be made in existing ring code, or do we look to provide
> > a new fifo library to completely replace the ring one?
> 
> I think it's ok to have it in the existing code. Breaking the ABI
> is never suitable, but I think having 2 libs would be even more
> confusing.
> 
> 
> > * How does the implementation of new ring types using this code
> > compare vs that of the previous RFCs?
> 
> I prefer this version, especially compared to the first RFC.
> 
> 
> Thanks for this big rework. I'll dive into the patches a do a more
> exhaustive review soon.
> 
Great, thanks. I'm aware of a few things that already need to be cleaned
up for V1 e.g. comments are not always correctly updated on functions.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
  2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
  2017-02-14  0:32  4% ` Mcnamara, John
  2017-02-14  3:25  4% ` Jerin Jacob
@ 2017-02-14  8:33  4% ` Olivier Matz
  2017-02-14 11:43  4%   ` Hemant Agrawal
  2017-02-14 18:42  4% ` [dpdk-dev] " Thomas Monjalon
  3 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2017-02-14  8:33 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Mon, 13 Feb 2017 17:38:30 +0000, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> Document proposed changes for the rings code in the next release.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization
  2017-02-07 14:12  2% ` [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization Bruce Richardson
@ 2017-02-14  8:32  3%   ` Olivier Matz
  2017-02-14  9:39  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2017-02-14  8:32 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: thomas.monjalon, keith.wiles, konstantin.ananyev, stephen, dev

Hi Bruce,

On Tue,  7 Feb 2017 14:12:38 +0000, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> This patchset make a set of, sometimes non-backward compatible,
> cleanup changes to the rte_ring code in order to improve it. The
> resulting code is shorter*, since the existing functions are
> restructured to reduce code duplication, as well as being more
> consistent in behaviour. The specific changes made are explained in
> each patch which makes that change.
> 
> Key incompatibilities:
> * The biggest, and probably most controversial change is that to the
>   enqueue and dequeue APIs. The enqueue/deq burst and bulk functions
> have their function prototypes changed so that they all return an
> additional parameter, indicating the size of next call which is
> guaranteed to succeed. In case on enq, this is the number of
> available slots on the ring, and in case of deq, it is the number of
> objects which can be pulled. As well as this, the return value from
> the bulk functions have been changed to make them compatible with the
> burst functions. In all cases, the functions to enq/deq a set of objs
> now return the number of objects processed, 0 or N, in the case of
> bulk functions, 0, N or any value in between in the case of the burst
> ones. [Due to the extra parameter, the compiler will flag all
> instances of the function to allow the user to also change the return
> value logic at the same time]
> * The parameters to the single object enq/deq functions have not been 
>   changed. Because of that, the return value is also unmodified - as
> the compiler cannot automatically flag this to the user.
> 
> Potential further cleanups:
> * To a certain extent the rte_ring structure has gone from being a
> whole ring structure, including a "ring" element itself, to just
> being a header which can be reused, along with the head/tail update
> functions to create new rings. For now, the enqueue code works by
> assuming that the ring data goes immediately after the header, but
> that can be changed to allow specialised ring implementations to put
> additional metadata of their own after the ring header. I didn't see
> this as being needed right now, but it may be worth considering for a
> V1 patchset.
> * There are 9 enqueue functions and 9 dequeue functions in
> rte_ring.h. I suspect not all of those are used, so personally I
> would consider dropping the functions to enqueue/dequeue a single
> value using single or multi semantics, i.e. drop 
>     rte_ring_sp_enqueue
>     rte_ring_mp_enqueue
>     rte_ring_sc_dequeue
>     rte_ring_mc_dequeue
>   That would still leave a single enqueue and dequeue function for
> working with a single object at a time.
> * It should be possible to merge the head update code for enqueue and
>   dequeue into a single function. The key difference between the two
> is the calculation of how far the index can be moved. I felt that the
>   functions for moving the head index are sufficiently complicated
> with many parameters to them already, that trying to merge in more
> code would impede readability. However, if so desired this change can
> be made at a later stage without affecting ABI or API.
> 
> PERFORMANCE:
> I've run performance autotests on a couple of (Intel) platforms.
> Looking particularly at the core-2-core results, which I expect are
> the main ones of interest, the performance after this patchset is a
> few cycles per packet faster in my testing. I'm hoping it should be
> at least neutral perf-wise.
> 
> REQUEST FOR FEEDBACK:
> * Are all of these changes worth making?

I've quickly browsed all the patches. I think yes, we should do it: it
brings a good cleanup, removing features we don't need, restructuring
the code, and also adding the feature you need :)


> * Should they be made in existing ring code, or do we look to provide
> a new fifo library to completely replace the ring one?

I think it's ok to have it in the existing code. Breaking the ABI
is never suitable, but I think having 2 libs would be even more
confusing.


> * How does the implementation of new ring types using this code
> compare vs that of the previous RFCs?

I prefer this version, especially compared to the first RFC.


Thanks for this big rework. I'll dive into the patches a do a more
exhaustive review soon.

Regards,
Olivier

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
  2017-02-13 21:56  0%   ` Jan Blunck
@ 2017-02-14  5:18  0%     ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2017-02-14  5:18 UTC (permalink / raw)
  To: Jan Blunck; +Cc: dev, nhorman, Thomas Monjalon

On Tuesday 14 February 2017 03:26 AM, Jan Blunck wrote:
> On Mon, Feb 13, 2017 at 1:00 PM, Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
>> On Monday 13 February 2017 05:25 PM, Shreyansh Jain wrote:
>>>
>>> EAL PCI layer is planned to be restructured in 17.05 to unlink it from
>>> generic structures like eth_driver, rte_cryptodev_driver, and also move
>>> it into a PCI Bus.
>>>
>>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>> ---
>>>  doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>> b/doc/guides/rel_notes/deprecation.rst
>>> index fbe2fcb..b12d435 100644
>>> --- a/doc/guides/rel_notes/deprecation.rst
>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>> @@ -13,10 +13,14 @@ Deprecation Notices
>>>    has exposed, like the way we have done with uio-pci-generic. This
>>> change
>>>    targets release 17.05.
>>>
>>> -* ``eth_driver`` is planned to be removed in 17.02. This currently serves
>>> as
>>> -  a placeholder for PMDs to register themselves. Changes for ``rte_bus``
>>> will
>>> -  provide a way to handle device initialization currently being done in
>>> -  ``eth_driver``.
>>
>>
>> Just to highlight, above statement was added by me in 16.11.
>> As of now I plan to work on removing rte_pci_driver from eth_driver,
>> rather than removing eth_driver all together (which, probably, was
>> better idea).
>> If someone still wishes to work on its complete removal, we can keep
>> the above. (and probably remove the below).
>>
>
> There is no benefit in keeping eth_driver and removing rte_pci_driver
> from it. Technically it isn't even needed today.

I agree with you.
I stopped working on it because I realized that removing it means making
pci_probe call eth_dev_init handlers directly. Or, restructure the whole
of pci probe stack - which, because of pending PCI bus implementation,
was slightly tentative.

Changes are already expected in EAL PCI code for bus movement, probably
this task can be combined with that.

>
>>
>>> +* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
>>> +  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
>>> +  bus.
>>> +
>>> +* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in
>>> 17.05.
>>> +  This is to unlink the ethernet driver from PCI dependencies.
>>> +  Similarly, ``rte_pci_driver`` in planned to be removed from
>>> +  ``rte_cryptodev_driver`` in 17.05.
>>>
>>>  * In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
>>>    extended with new function pointer ``tx_pkt_prepare`` allowing
>>> verification
>>>
>>
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-14  0:21  4%   ` Hemant Agrawal
@ 2017-02-14  5:11  4%     ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2017-02-14  5:11 UTC (permalink / raw)
  To: Hemant Agrawal, Trahe, Fiona, Zhang, Roy Fan, dev; +Cc: De Lara Guarch, Pablo


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hemant Agrawal
> Sent: Monday, February 13, 2017 6:21 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; Zhang, Roy Fan
> <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> structure
> 
> On 2/10/2017 7:59 AM, Trahe, Fiona wrote:
> > Hi Fan,
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> >> Sent: Friday, February 10, 2017 11:39 AM
> >> To: dev@dpdk.org
> >> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> >> Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> >> structure
> >>
> >> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> >> b/doc/guides/rel_notes/deprecation.rst
> >> index 755dc65..564d93a 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -62,3 +62,7 @@ Deprecation Notices
> >>    PMDs that implement the latter.
> >>    Target release for removal of the legacy API will be defined once most
> >>    PMDs have switched to rte_flow.
> >> +
> >> +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops``
> structure.
> >> +  The field ``cryptodev_configure_t`` function prototype will be
> >> +added a
> >> +  parameter of a struct rte_cryptodev_config type pointer.
> >> --
> >> 2.7.4
> >
> > Can you fix the grammar here please. I'm not sure what the change is?
> >
> I also find it hard to understand it first. Not perfect, but I tried to reword it.
> 
> A new parameter ``struct rte_cryptodev_config *config`` will be added to the
> ``cryptodev_configure_t`` function pointer field.
> 

In any case,
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
  2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
  2017-02-14  0:32  4% ` Mcnamara, John
@ 2017-02-14  3:25  4% ` Jerin Jacob
  2017-02-14  8:33  4% ` Olivier Matz
  2017-02-14 18:42  4% ` [dpdk-dev] " Thomas Monjalon
  3 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2017-02-14  3:25 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Mon, Feb 13, 2017 at 05:38:30PM +0000, Bruce Richardson wrote:
> Document proposed changes for the rings code in the next release.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index b49e0a0..e715fc7 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,25 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
>  
> +* ring: Changes are planned to rte_ring APIs in release 17.05. Proposed
> +  changes include:
> +    - Removing build time options for the ring:
> +      CONFIG_RTE_RING_SPLIT_PROD_CONS
> +      CONFIG_RTE_RING_PAUSE_REP_COUNT
> +    - Adding an additional parameter to enqueue functions to return the
> +      amount of free space in the ring
> +    - Adding an additional parameter to dequeue functions to return the
> +      number of remaining elements in the ring
> +    - Removing direct support for watermarks in the rings, since the
> +      additional return value from the enqueue function makes it
> +      unneeded
> +    - Adjusting the return values of the bulk() enq/deq functions to
> +      make them consistent with the burst() equivalents. [Note, parameter
> +      to these functions are changing too, per points above, so compiler
> +      will flag them as needing update in legacy code]
> +    - Updates to some library functions e.g. rte_ring_get_memsize() to
> +      allow for variably-sized ring elements.
> +
>  * igb_uio: iomem mapping and sysfs files created for iomem and ioport in
>    igb_uio will be removed, because we are able to detect these from what Linux
>    has exposed, like the way we have done with uio-pci-generic. This change

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
  2017-02-13 18:02  4% ` Thomas Monjalon
@ 2017-02-14  3:21  4%   ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2017-02-14  3:21 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Yuanhan Liu, dev, Maxime Coquelin, John McNamara, Ben Walker

On Mon, Feb 13, 2017 at 07:02:56PM +0100, Thomas Monjalon wrote:
> 2017-01-23 21:04, Yuanhan Liu:
> > I made a vhost ABI/API refactoring at v16.04, meant to avoid such issue
> > forever. Well, apparently, I lied.
> > 
> > People are looking for more vhost-user options now days, other than
> > vhost-user net only. For example, SPDK (Storage Performance Development
> > Kit) are looking for chance of vhost-user SCSI and vhost-user block.
> > 
> > Apparently, they also need a vhost-user backend, while DPDK already
> > has a (mature enough) backend, they don't want to implement it again
> > from scratch. They want to leverage the one DPDK provides.
> > 
> > However, the last refactoring hasn't done that right, at least it's
> > not friendly for extending vhost-user to add more devices support.
> > For example, different virtio devices has its own feature set, while
> > APIs like rte_vhost_feature_disable(feature_mask) have no option to
> > tell the device type. Thus, a more proper API should look like:
> > 
> >     rte_vhost_feature_disable(device_type, feature_mask);
> > 
> > Besides that, few public files and structures should be renamed, to
> > not let it bind to virtio-net. Specifically, they are:
> > 
> > - virtio_net_device_ops --> vhost_device_ops
> > - rte_virtio_net.h      --> rte_vhost.h
> > 
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> 
> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for cloud filter
  @ 2017-02-14  3:19  4%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2017-02-14  3:19 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Lu, Wenzhuo, Adrien Mazarguil, Liu, Yong, dev

On Fri, Jan 20, 2017 at 03:57:28PM +0100, Thomas Monjalon wrote:
> 2017-01-20 02:14, Lu, Wenzhuo:
> > Hi Adrien, Thomas, Yong,
> > 
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> > > Sent: Friday, January 20, 2017 2:46 AM
> > > To: Thomas Monjalon
> > > Cc: Liu, Yong; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for cloud filter
> > > 
> > > On Thu, Jan 19, 2017 at 10:06:34AM +0100, Thomas Monjalon wrote:
> > > > 2017-01-19 13:34, Yong Liu:
> > > > > +* ABI changes are planned for 17.05: structure
> > > > > +``rte_eth_tunnel_filter_conf``
> > > > > +  will be extended with a new member ``vf_id`` in order to enable
> > > > > +cloud filter
> > > > > +  on VF device.
> > > >
> > > > I think we should stop rely on this API, and migrate to rte_flow instead.
> > > > Adrien any thought?
> > > 
> > > I'm all for using rte_flow in any case. I've already documented an approach to
> > > convert TUNNEL filter rules to rte_flow rules [1], although it may be
> > > incomplete due to my limited experience with this filter type. We already
> > > know several tunnel item types must be added (currently only VXLAN is
> > > defined).
> > > 
> > > I understand ixgbe/i40e currently map rte_flow on top of the legacy
> > > framework, therefore extending this structure might still be needed in the
> > > meantime. Not sure we should prevent this change as long as such rules can be
> > > configured through rte_flow as well.
> > > 
> > > [1] http://dpdk.org/doc/guides/prog_guide/rte_flow.html#tunnel-to-eth-ipv4-
> > > ipv6-vxlan-or-other-queue
> > The problem is we haven't finished transferring all the functions from the regular filters to the generic filters. 
> > For example, igb, fm10k and enic haven't support generic filters yet. Ixgbe and i40e have supported the basic functions, but some advance features are not transferred to generic filters yet.
> > Seems it's not the time to remove the regular filters. Yong, I suggest to support both generic filter and regular filter in parallel.
> > So, we need to announce ABI change for the regular filter, until someday we remove the regular filter API. 
> 
> I disagree.
> There is a new API framework (rte_flow) and we must focus on this transition.
> It means we must stop any work on the legacy API.

I agree with Thomas here.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
  2017-02-13 17:57  4%   ` Thomas Monjalon
@ 2017-02-14  3:17  4%     ` Jerin Jacob
  2017-02-14 10:33  4%       ` Iremonger, Bernard
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2017-02-14  3:17 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Bernard Iremonger, dev, john.mcnamara

On Mon, Feb 13, 2017 at 06:57:20PM +0100, Thomas Monjalon wrote:
> 2017-01-05 15:25, Bernard Iremonger:
> > In 17.05 nine rte_eth_dev_* functions will be removed from
> > librte_ether, renamed and moved to the ixgbe PMD.
> > 
> > Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> 
> "ixgbe bypass" should be in the title and the description.
> I'll reword to:
> 
> doc: announce move of ethdev bypass function to ixgbe API
> 
> In 17.05, nine rte_eth_dev_* functions for bypass control,
> and implemented only in ixgbe, will be removed from ethdev,
> renamed and moved to the ixgbe PMD-specific API.
> 
> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
  2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
@ 2017-02-14  0:32  4% ` Mcnamara, John
  2017-02-14  3:25  4% ` Jerin Jacob
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2017-02-14  0:32 UTC (permalink / raw)
  To: Richardson, Bruce, dev; +Cc: Richardson, Bruce



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Monday, February 13, 2017 5:39 PM
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>
> Subject: [dpdk-dev] [PATCH] doc: add ABI change notification for ring
> library
> 
> Document proposed changes for the rings code in the next release.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: John McNamara <john.mcnamara@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-10 13:59  4% ` Trahe, Fiona
  2017-02-13 16:07  7%   ` Zhang, Roy Fan
@ 2017-02-14  0:21  4%   ` Hemant Agrawal
  2017-02-14  5:11  4%     ` Hemant Agrawal
  1 sibling, 1 reply; 200+ results
From: Hemant Agrawal @ 2017-02-14  0:21 UTC (permalink / raw)
  To: Trahe, Fiona, Zhang, Roy Fan, dev; +Cc: De Lara Guarch, Pablo

On 2/10/2017 7:59 AM, Trahe, Fiona wrote:
> Hi Fan,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
>> Sent: Friday, February 10, 2017 11:39 AM
>> To: dev@dpdk.org
>> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>> Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
>> structure
>>
>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst
>> b/doc/guides/rel_notes/deprecation.rst
>> index 755dc65..564d93a 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -62,3 +62,7 @@ Deprecation Notices
>>    PMDs that implement the latter.
>>    Target release for removal of the legacy API will be defined once most
>>    PMDs have switched to rte_flow.
>> +
>> +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
>> +  The field ``cryptodev_configure_t`` function prototype will be added a
>> +  parameter of a struct rte_cryptodev_config type pointer.
>> --
>> 2.7.4
>
> Can you fix the grammar here please. I'm not sure what the change is?
>
I also find it hard to understand it first. Not perfect, but I tried to 
reword it.

A new parameter ``struct rte_cryptodev_config *config`` will be added to 
the ``cryptodev_configure_t`` function pointer field.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
  2017-02-13 12:00  0% ` Shreyansh Jain
  2017-02-13 14:44  0%   ` Thomas Monjalon
@ 2017-02-13 21:56  0%   ` Jan Blunck
  2017-02-14  5:18  0%     ` Shreyansh Jain
  1 sibling, 1 reply; 200+ results
From: Jan Blunck @ 2017-02-13 21:56 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, nhorman, Thomas Monjalon

On Mon, Feb 13, 2017 at 1:00 PM, Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
> On Monday 13 February 2017 05:25 PM, Shreyansh Jain wrote:
>>
>> EAL PCI layer is planned to be restructured in 17.05 to unlink it from
>> generic structures like eth_driver, rte_cryptodev_driver, and also move
>> it into a PCI Bus.
>>
>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst
>> b/doc/guides/rel_notes/deprecation.rst
>> index fbe2fcb..b12d435 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -13,10 +13,14 @@ Deprecation Notices
>>    has exposed, like the way we have done with uio-pci-generic. This
>> change
>>    targets release 17.05.
>>
>> -* ``eth_driver`` is planned to be removed in 17.02. This currently serves
>> as
>> -  a placeholder for PMDs to register themselves. Changes for ``rte_bus``
>> will
>> -  provide a way to handle device initialization currently being done in
>> -  ``eth_driver``.
>
>
> Just to highlight, above statement was added by me in 16.11.
> As of now I plan to work on removing rte_pci_driver from eth_driver,
> rather than removing eth_driver all together (which, probably, was
> better idea).
> If someone still wishes to work on its complete removal, we can keep
> the above. (and probably remove the below).
>

There is no benefit in keeping eth_driver and removing rte_pci_driver
from it. Technically it isn't even needed today.

>
>> +* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
>> +  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
>> +  bus.
>> +
>> +* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in
>> 17.05.
>> +  This is to unlink the ethernet driver from PCI dependencies.
>> +  Similarly, ``rte_pci_driver`` in planned to be removed from
>> +  ``rte_cryptodev_driver`` in 17.05.
>>
>>  * In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
>>    extended with new function pointer ``tx_pkt_prepare`` allowing
>> verification
>>
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
  @ 2017-02-13 18:02  4% ` Thomas Monjalon
  2017-02-14  3:21  4%   ` Jerin Jacob
  2017-02-14 13:54  4% ` Maxime Coquelin
  2017-02-14 20:28  4% ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-13 18:02 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Maxime Coquelin, John McNamara, Ben Walker

2017-01-23 21:04, Yuanhan Liu:
> I made a vhost ABI/API refactoring at v16.04, meant to avoid such issue
> forever. Well, apparently, I lied.
> 
> People are looking for more vhost-user options now days, other than
> vhost-user net only. For example, SPDK (Storage Performance Development
> Kit) are looking for chance of vhost-user SCSI and vhost-user block.
> 
> Apparently, they also need a vhost-user backend, while DPDK already
> has a (mature enough) backend, they don't want to implement it again
> from scratch. They want to leverage the one DPDK provides.
> 
> However, the last refactoring hasn't done that right, at least it's
> not friendly for extending vhost-user to add more devices support.
> For example, different virtio devices has its own feature set, while
> APIs like rte_vhost_feature_disable(feature_mask) have no option to
> tell the device type. Thus, a more proper API should look like:
> 
>     rte_vhost_feature_disable(device_type, feature_mask);
> 
> Besides that, few public files and structures should be renamed, to
> not let it bind to virtio-net. Specifically, they are:
> 
> - virtio_net_device_ops --> vhost_device_ops
> - rte_virtio_net.h      --> rte_vhost.h
> 
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
  @ 2017-02-13 17:57  4%   ` Thomas Monjalon
  2017-02-14  3:17  4%     ` Jerin Jacob
  2017-02-14 19:37  4%   ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-13 17:57 UTC (permalink / raw)
  To: Bernard Iremonger; +Cc: dev, john.mcnamara

2017-01-05 15:25, Bernard Iremonger:
> In 17.05 nine rte_eth_dev_* functions will be removed from
> librte_ether, renamed and moved to the ixgbe PMD.
> 
> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>

"ixgbe bypass" should be in the title and the description.
I'll reword to:

doc: announce move of ethdev bypass function to ixgbe API

In 17.05, nine rte_eth_dev_* functions for bypass control,
and implemented only in ixgbe, will be removed from ethdev,
renamed and moved to the ixgbe PMD-specific API.

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
@ 2017-02-13 17:38  9% Bruce Richardson
  2017-02-14  0:32  4% ` Mcnamara, John
                   ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Bruce Richardson @ 2017-02-13 17:38 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Document proposed changes for the rings code in the next release.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..e715fc7 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,25 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
+* ring: Changes are planned to rte_ring APIs in release 17.05. Proposed
+  changes include:
+    - Removing build time options for the ring:
+      CONFIG_RTE_RING_SPLIT_PROD_CONS
+      CONFIG_RTE_RING_PAUSE_REP_COUNT
+    - Adding an additional parameter to enqueue functions to return the
+      amount of free space in the ring
+    - Adding an additional parameter to dequeue functions to return the
+      number of remaining elements in the ring
+    - Removing direct support for watermarks in the rings, since the
+      additional return value from the enqueue function makes it
+      unneeded
+    - Adjusting the return values of the bulk() enq/deq functions to
+      make them consistent with the burst() equivalents. [Note, parameter
+      to these functions are changing too, per points above, so compiler
+      will flag them as needing update in legacy code]
+    - Updates to some library functions e.g. rte_ring_get_memsize() to
+      allow for variably-sized ring elements.
+
 * igb_uio: iomem mapping and sysfs files created for iomem and ioport in
   igb_uio will be removed, because we are able to detect these from what Linux
   has exposed, like the way we have done with uio-pci-generic. This change
-- 
2.9.3

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 16:46  4%   ` Ferruh Yigit
  2017-02-13 17:21  0%     ` Dumitrescu, Cristian
@ 2017-02-13 17:38  3%     ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 17:38 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Dumitrescu, Cristian, dev, Richardson, Bruce, Wiles, Keith

2017-02-13 16:46, Ferruh Yigit:
> On 2/13/2017 4:09 PM, Thomas Monjalon wrote:
> > 2017-02-13 16:02, Dumitrescu, Cristian:
> >> Hi Thomas,
> >>
> >> When a new member (function pointer) is added to struct eth_dev_ops (as the last member), does it need to go through ABI chance process (e.g. chance notice one release before)?
> >>
> >> IMO the answer is no: struct eth_dev_ops is marked as internal and its instances are only accessed through pointers, so the rte_eth_devices array should not be impacted by the ops structure expanding at its end. Unless there is something that I am missing?
> > 
> > You are right, it is an internal struct.
> > So no need of a deprecation notice.
> 
> When dpdk compiled as dynamic library, application will load PMDs
> dynamically as plugin.
> Is this use case cause ABI compatibility issue?
> 
> I think drivers <--> libraries interface can cause ABI breakages for
> dynamic library case, although not sure how common use case this is.

Yes it is a problem for drivers/library interface.
It is not an ABI, which is an application/library interface.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 17:21  0%     ` Dumitrescu, Cristian
@ 2017-02-13 17:36  0%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-02-13 17:36 UTC (permalink / raw)
  To: Dumitrescu, Cristian, Thomas Monjalon
  Cc: dev, Richardson, Bruce, Wiles, Keith

On 2/13/2017 5:21 PM, Dumitrescu, Cristian wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Monday, February 13, 2017 4:46 PM
>> To: Thomas Monjalon <thomas.monjalon@6wind.com>; Dumitrescu, Cristian
>> <cristian.dumitrescu@intel.com>
>> Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>; Wiles,
>> Keith <keith.wiles@intel.com>
>> Subject: Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
>>
>> On 2/13/2017 4:09 PM, Thomas Monjalon wrote:
>>> 2017-02-13 16:02, Dumitrescu, Cristian:
>>>> Hi Thomas,
>>>>
>>>> When a new member (function pointer) is added to struct eth_dev_ops
>> (as the last member), does it need to go through ABI chance process (e.g.
>> chance notice one release before)?
>>>>
>>>> IMO the answer is no: struct eth_dev_ops is marked as internal and its
>> instances are only accessed through pointers, so the rte_eth_devices array
>> should not be impacted by the ops structure expanding at its end. Unless
>> there is something that I am missing?
>>>
>>> You are right, it is an internal struct.
>>> So no need of a deprecation notice.
>>
>> When dpdk compiled as dynamic library, application will load PMDs
>> dynamically as plugin.
>> Is this use case cause ABI compatibility issue?
>>
>> I think drivers <--> libraries interface can cause ABI breakages for
>> dynamic library case, although not sure how common use case this is.
>>
> 
> Do you have a specific example that might cause an issue when adding a new function at the end of the ethdev ops structure? I cannot think of any, given that the ops structure is marked as internal and it is only accessed through pointers.

Adding at the end of the struct is probably safe.

> 
>>
>>>
>>> We must clearly separate API and internal code in ethdev.
>>>
>>>> My question is in the context of this patch under review for 17.5 release:
>> http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.
>>>
>>> I did not look at it yet. Will do after the release.
>>>
>>>
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-13 16:07  7%   ` Zhang, Roy Fan
@ 2017-02-13 17:34  4%     ` Trahe, Fiona
  0 siblings, 0 replies; 200+ results
From: Trahe, Fiona @ 2017-02-13 17:34 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: De Lara Guarch, Pablo

Thanks Fan, now it makes sense.

> -----Original Message-----
> From: Zhang, Roy Fan
> Sent: Monday, February 13, 2017 4:07 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: RE: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> structure
> 
> Hi Fiona,
> 
> Sorry for my bad English, I will try to explain better here.
> 
> "cryptodev_configure_t" is a function prototype with only "rte_cryptodev
> *dev"
> as sole parameter. Structure ``rte_cryptodev_ops`` holds one function pointer
> "dev_configure" of it.
> 
> The patch involves in the announcement of adding a parameter of
> "struct rte_cryptodev_config" pointer so the function prototype could look
> like:
> 
> typedef int (*cryptodev_configure_t)(struct rte_cryptodev *dev, struct
> rte_cryptodev_config *config);
> 
> Without this parameter, a specific crypto PMD may not have enough
> information to
> configure itself. Which may not be big problem as other Cryptodevs as all
> configures
> are done in rte_cryptodev_configure(), but it is important for the scheduler
> PMD as it
> needs this parameter to configure all its slaves. Currently the user have to
> configure
> every slave one by one.
> 
> The problem is, although I want to change an API of the function prototype
> "cryptodev_configure_t",
> but in order to do that I have to break the ABI of structure
> "rte_cryptodev_ops". Any help on the grammar
> for stating this nicer would be appreciated.
> 
> Best regards,
> Fan
> 
> 
> 
> 
> > -----Original Message-----
> > From: Trahe, Fiona
> > Sent: Friday, February 10, 2017 2:00 PM
> > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> > <fiona.trahe@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> > structure
> >
> > Hi Fan,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> > > Sent: Friday, February 10, 2017 11:39 AM
> > > To: dev@dpdk.org
> > > Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> > > Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> > > structure
> > >
> > > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > > ---
> > >  doc/guides/rel_notes/deprecation.rst | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > > b/doc/guides/rel_notes/deprecation.rst
> > > index 755dc65..564d93a 100644
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > @@ -62,3 +62,7 @@ Deprecation Notices
> > >    PMDs that implement the latter.
> > >    Target release for removal of the legacy API will be defined once most
> > >    PMDs have switched to rte_flow.
> > > +
> > > +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops``
> structure.
> > > +  The field ``cryptodev_configure_t`` function prototype will be
> > > +added a
> > > +  parameter of a struct rte_cryptodev_config type pointer.
> > > --
> > > 2.7.4
> >
> > Can you fix the grammar here please. I'm not sure what the change is?

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 16:46  4%   ` Ferruh Yigit
@ 2017-02-13 17:21  0%     ` Dumitrescu, Cristian
  2017-02-13 17:36  0%       ` Ferruh Yigit
  2017-02-13 17:38  3%     ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2017-02-13 17:21 UTC (permalink / raw)
  To: Yigit, Ferruh, Thomas Monjalon; +Cc: dev, Richardson, Bruce, Wiles, Keith



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Monday, February 13, 2017 4:46 PM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>; Wiles,
> Keith <keith.wiles@intel.com>
> Subject: Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
> 
> On 2/13/2017 4:09 PM, Thomas Monjalon wrote:
> > 2017-02-13 16:02, Dumitrescu, Cristian:
> >> Hi Thomas,
> >>
> >> When a new member (function pointer) is added to struct eth_dev_ops
> (as the last member), does it need to go through ABI chance process (e.g.
> chance notice one release before)?
> >>
> >> IMO the answer is no: struct eth_dev_ops is marked as internal and its
> instances are only accessed through pointers, so the rte_eth_devices array
> should not be impacted by the ops structure expanding at its end. Unless
> there is something that I am missing?
> >
> > You are right, it is an internal struct.
> > So no need of a deprecation notice.
> 
> When dpdk compiled as dynamic library, application will load PMDs
> dynamically as plugin.
> Is this use case cause ABI compatibility issue?
> 
> I think drivers <--> libraries interface can cause ABI breakages for
> dynamic library case, although not sure how common use case this is.
> 

Do you have a specific example that might cause an issue when adding a new function at the end of the ethdev ops structure? I cannot think of any, given that the ops structure is marked as internal and it is only accessed through pointers.

> 
> >
> > We must clearly separate API and internal code in ethdev.
> >
> >> My question is in the context of this patch under review for 17.5 release:
> http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.
> >
> > I did not look at it yet. Will do after the release.
> >
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 16:09  0% ` Thomas Monjalon
@ 2017-02-13 16:46  4%   ` Ferruh Yigit
  2017-02-13 17:21  0%     ` Dumitrescu, Cristian
  2017-02-13 17:38  3%     ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2017-02-13 16:46 UTC (permalink / raw)
  To: Thomas Monjalon, Dumitrescu, Cristian
  Cc: dev, Richardson, Bruce, Wiles, Keith

On 2/13/2017 4:09 PM, Thomas Monjalon wrote:
> 2017-02-13 16:02, Dumitrescu, Cristian:
>> Hi Thomas,
>>
>> When a new member (function pointer) is added to struct eth_dev_ops (as the last member), does it need to go through ABI chance process (e.g. chance notice one release before)?
>>
>> IMO the answer is no: struct eth_dev_ops is marked as internal and its instances are only accessed through pointers, so the rte_eth_devices array should not be impacted by the ops structure expanding at its end. Unless there is something that I am missing?
> 
> You are right, it is an internal struct.
> So no need of a deprecation notice.

When dpdk compiled as dynamic library, application will load PMDs
dynamically as plugin.
Is this use case cause ABI compatibility issue?

I think drivers <--> libraries interface can cause ABI breakages for
dynamic library case, although not sure how common use case this is.


> 
> We must clearly separate API and internal code in ethdev.
> 
>> My question is in the context of this patch under review for 17.5 release: http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.
> 
> I did not look at it yet. Will do after the release.
> 
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 16:02  3% [dpdk-dev] doc: deprecation notice for ethdev ops? Dumitrescu, Cristian
@ 2017-02-13 16:09  0% ` Thomas Monjalon
  2017-02-13 16:46  4%   ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-13 16:09 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev, Richardson, Bruce, Yigit, Ferruh, Wiles, Keith

2017-02-13 16:02, Dumitrescu, Cristian:
> Hi Thomas,
> 
> When a new member (function pointer) is added to struct eth_dev_ops (as the last member), does it need to go through ABI chance process (e.g. chance notice one release before)?
> 
> IMO the answer is no: struct eth_dev_ops is marked as internal and its instances are only accessed through pointers, so the rte_eth_devices array should not be impacted by the ops structure expanding at its end. Unless there is something that I am missing?

You are right, it is an internal struct.
So no need of a deprecation notice.

We must clearly separate API and internal code in ethdev.

> My question is in the context of this patch under review for 17.5 release: http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.

I did not look at it yet. Will do after the release.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-10 13:59  4% ` Trahe, Fiona
@ 2017-02-13 16:07  7%   ` Zhang, Roy Fan
  2017-02-13 17:34  4%     ` Trahe, Fiona
  2017-02-14  0:21  4%   ` Hemant Agrawal
  1 sibling, 1 reply; 200+ results
From: Zhang, Roy Fan @ 2017-02-13 16:07 UTC (permalink / raw)
  To: Trahe, Fiona, dev; +Cc: De Lara Guarch, Pablo

Hi Fiona,

Sorry for my bad English, I will try to explain better here.

"cryptodev_configure_t" is a function prototype with only "rte_cryptodev *dev"
as sole parameter. Structure ``rte_cryptodev_ops`` holds one function pointer
"dev_configure" of it. 

The patch involves in the announcement of adding a parameter of 
"struct rte_cryptodev_config" pointer so the function prototype could look like:

typedef int (*cryptodev_configure_t)(struct rte_cryptodev *dev, struct rte_cryptodev_config *config);

Without this parameter, a specific crypto PMD may not have enough information to
configure itself. Which may not be big problem as other Cryptodevs as all configures
are done in rte_cryptodev_configure(), but it is important for the scheduler PMD as it
needs this parameter to configure all its slaves. Currently the user have to configure
every slave one by one.

The problem is, although I want to change an API of the function prototype "cryptodev_configure_t",
but in order to do that I have to break the ABI of structure "rte_cryptodev_ops". Any help on the grammar
for stating this nicer would be appreciated.

Best regards,
Fan




> -----Original Message-----
> From: Trahe, Fiona
> Sent: Friday, February 10, 2017 2:00 PM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>
> Subject: RE: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> structure
> 
> Hi Fan,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> > Sent: Friday, February 10, 2017 11:39 AM
> > To: dev@dpdk.org
> > Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> > Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> > structure
> >
> > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index 755dc65..564d93a 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -62,3 +62,7 @@ Deprecation Notices
> >    PMDs that implement the latter.
> >    Target release for removal of the legacy API will be defined once most
> >    PMDs have switched to rte_flow.
> > +
> > +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
> > +  The field ``cryptodev_configure_t`` function prototype will be
> > +added a
> > +  parameter of a struct rte_cryptodev_config type pointer.
> > --
> > 2.7.4
> 
> Can you fix the grammar here please. I'm not sure what the change is?

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] doc: deprecation notice for ethdev ops?
@ 2017-02-13 16:02  3% Dumitrescu, Cristian
  2017-02-13 16:09  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2017-02-13 16:02 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Richardson, Bruce, Yigit, Ferruh, Wiles, Keith

Hi Thomas,

When a new member (function pointer) is added to struct eth_dev_ops (as the last member), does it need to go through ABI chance process (e.g. chance notice one release before)?

IMO the answer is no: struct eth_dev_ops is marked as internal and its instances are only accessed through pointers, so the rte_eth_devices array should not be impacted by the ops structure expanding at its end. Unless there is something that I am missing?

My question is in the context of this patch under review for 17.5 release: http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.

Thanks,
Cristian

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] cryptodev - Session and queue pair relationship
  @ 2017-02-13 15:09  3%       ` Trahe, Fiona
  0 siblings, 0 replies; 200+ results
From: Trahe, Fiona @ 2017-02-13 15:09 UTC (permalink / raw)
  To: Akhil Goyal, Doherty, Declan, dev, De Lara Guarch, Pablo, Jain, Deepak K
  Cc: hemant.agrawal, Trahe, Fiona

Hi Akhil, 

> -----Original Message-----
> From: Trahe, Fiona
> Sent: Monday, February 13, 2017 2:45 PM
> To: Akhil Goyal <akhil.goyal@nxp.com>; Doherty, Declan
> <declan.doherty@intel.com>; dev@dpdk.org; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; Jain, Deepak K <deepak.k.jain@intel.com>
> Cc: hemant.agrawal@nxp.com; Trahe, Fiona <fiona.trahe@intel.com>
> Subject: RE: cryptodev - Session and queue pair relationship
> 
> Hi Akhil
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > Sent: Monday, February 13, 2017 2:39 PM
> > To: Doherty, Declan <declan.doherty@intel.com>; dev@dpdk.org; De Lara
> > Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Jain, Deepak K
> > <deepak.k.jain@intel.com>
> > Cc: hemant.agrawal@nxp.com; Trahe, Fiona <fiona.trahe@intel.com>
> > Subject: Re: cryptodev - Session and queue pair relationship
> >
> > On 2/8/2017 2:22 AM, Declan Doherty wrote:
> > > On 06/02/17 13:35, Akhil Goyal wrote:
> > >> Hi,
> > >>
> > > Hey Akhil, see my thoughts inline
> > >
> > >> I have some issues w.r.t the mapping sessions and queue pairs.
> > >>
> > >> As per my understanding:
> > >> - Number of sessions may be large - they are independent of number of
> > >> queue pairs
> > >
> > > Yes, cryptodev assumes no implicit connection between sessions and
> > > queue pairs, the current PMDs just use the crypto session to store the
> > > immutable data (keys etc) for a particular crypto transform or chain of
> > > transforms in a format specific to that PMD with no statefull information.
> > >
> > >> - Queue pairs are L-core specific
> > >
> > > Not exactly, queue pairs like ethdev queues are not thread safe, so we
> > > assume that only a single l-core will be using a queue pair at any time
> > > unless the application layer has introduce a locking mechanism to
> > > provide thread safety.
> > >
> > >> - Depending on the implementation, one queue pair can be mapped to
> > many
> > >> sessions. Or, Only one queue pair for every session- especially in the
> > >> systems having large number of queues (hw).
> > >
> > > Currently none of the software crypto PMDs or Intel QuickAssist hardware
> > > accelerated PMD make any assumptions regarding coupling/mapping of
> > > sessions to queue pairs, so today a users could freely change the queue
> > > pair which a session is processed on, or even go as far using the  ame
> > > session for processing on different queue simultaneously as the sessions
> > > are stateless, obviously this could introduce issues for statefull
> > > higher level protocol using the cryptodev PMD service but the cryptodev
> > > API doesn't prohibit this usage model.
> > >
> > >
> > >> - Sessions can be created on the fly - typical rekeying use-cases.
> > >> Generally done by the control threads.
> > >>
> > >
> > > Sure, there is no restriction on session creation other than an element
> > > being free in the mempool which the session is being created on.
> > >
> > >> There seems to be no straight way for the underlying driver
> > >> implementation to know, what all sessions are mapped to a particular
> > >> queue pair. The session and queue pair information is first time exposed
> > >> in the enqueue command.
> > >>
> > >> One of the NXP Crypto Hardware drivers uses per session data structures
> > >> (descriptors) which need to be configured for hardware queues.  Though
> > >> this information can be extracted from the first enqueue command for a
> > >> particular session, it will add checks in the data path. Also, it will
> > >> bring down the connection setup rate.
> > >
> > > We haven't had to support this model of coupling sessions to queue pairs
> > > in any PMDs before. If I understand correctly, in the hardware model you
> > > need to support a queue pair can only be configured to support the
> > > processing of a single session at any one time and it only supports that
> > > session until it is reconfigured, is this correct? So if a session needs
> > > to be re-keyed the queue pair would need to be reconfigured?
> > yes it is correct.
> > >
> > >>
> > >> In the API rte_cryptodev_sym_session_create(), we create session on a
> > >> particular device, but there is no information of queue pair being
> > >> shared.
> > >>
> > >> 1. We want to propose to change the session create/config API to also
> > >> take queue pair id as argument.
> > >> struct rte_cryptodev_sym_session *
> > >> rte_cryptodev_sym_session_create(uint8_t dev_id,
> > >>                               struct rte_crypto_sym_xform *xform) to
> > >> also take "uint16_t qp;"
> > >>
> > >> This will also return "in-use" error, if the underlying hardware only
> > >> support 1 session/descriptor per qp.
> > >
> > > I my mind the idea of coupling the session_create function to the queue
> > > pair of a device doesn't feel right as it would certainly put
> > > unnecessary constraint on all existing PMDs queue pairs.
> > >
> > > One possible approach would be to extend the the queue_pair_setup
> > > function to take an opaque parameter which would allow you to pass a
> > > session through and would be  an approach more in keeping with the
> > > cryptodev current model, but you would then still need to verify that
> > > the operations being enqueued have the same session as the configured
> > > device, assuming that the packet are being enqueued from the host.
> > >
> > > If you need to re-key or change the session you could re-initialize the
> > > queue pair while the device is still active, but stopping the queue pair.
> > >
> > > Following a sequence something like:
> > > stop_qp()
> > > setup_qp()
> > > start_qp()
> > >
> > >
> > > Another option Fiona suggested would be to add 2 new APIs
> > >
> > >
> >
> rte_cryptodev_queue_pair_attach_sym_session/queue_pair_detach_sym_sess
> > ion this
> > > would allow dynamic attaching of one or more sessions to device if it
> > > supported this sort of static mapping of sessions to queue pairs.
> > >
> > >
> > >>
> > >> 2. Currently the application configures the *nb_descriptors* in the
> > >> *rte_cryptodev_queue_pair_setup*. Should we add the queue pair
> > >> capability API?
> > >>
> > >
> > > Regarding capabilities, I think this should be just propagated through
> > > the device capabilities, something like a max number of session mapped
> > > per queue pair, which would be zero for all/most current devices, and
> > > could be 1 or greater for your device. This is assuming that all queue
> > > pairs can all support the same crypto transforms capabilities and that
> > > different queue pairs have different capabilities which could get very
> > > messy to discover.
> > >
> > >>
> > >> Please share your feedback, I will submit the patch accordingly.
> > >>
> > >> Regards,
> > >> Akhil
> > >>
> > >>
> > >>
> > >
> > >
> > Thanks for your feedback Declan,
> > The suggestion from Fiona looks good. Should I send the patch for this
> > or is it already in discussion in some different thread?
> 
> No, it's not under discussion in any other thread that I'm aware of.
> Go ahead and send it.

It may be useful to add max_nb_sessions_per_qp to 
struct rte_cryptodev_info.sym
I'm assuming where there is a limit this would be the same for all qps on the device?
0 meaning unlimited, >0 meaning limited to that number.
This could be used by the application to know whether it needs to use the attach API or not. 
This will cause an ABI breakage, so must be flagged first before changing.

> 
> >
> > Also, if this new API is added, there would be corresponding change in
> > the ipsec-secgw application as well.
> > This API should be optional and underlying implementation may or may not
> > implement this API.
> >
> > Regards,
> > Akhil
> >

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
  2017-02-13 12:00  0% ` Shreyansh Jain
@ 2017-02-13 14:44  0%   ` Thomas Monjalon
  2017-02-13 21:56  0%   ` Jan Blunck
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:44 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, Jan Blunck, Stephen Hemminger

2017-02-13 17:30, Shreyansh Jain:
> On Monday 13 February 2017 05:25 PM, Shreyansh Jain wrote:
> > EAL PCI layer is planned to be restructured in 17.05 to unlink it from
> > generic structures like eth_driver, rte_cryptodev_driver, and also move
> > it into a PCI Bus.
> >
> > Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> > index fbe2fcb..b12d435 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -13,10 +13,14 @@ Deprecation Notices
> >    has exposed, like the way we have done with uio-pci-generic. This change
> >    targets release 17.05.
> >
> > -* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
> > -  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
> > -  provide a way to handle device initialization currently being done in
> > -  ``eth_driver``.
> 
> Just to highlight, above statement was added by me in 16.11.
> As of now I plan to work on removing rte_pci_driver from eth_driver,
> rather than removing eth_driver all together (which, probably, was
> better idea).
> If someone still wishes to work on its complete removal, we can keep
> the above. (and probably remove the below).

Yes I think we should keep the original idea.
I will work on it with Jan Blunck and Stephen Hemminger I think.

> > +* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
> > +  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
> > +  bus.
> > +
> > +* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in 17.05.
> > +  This is to unlink the ethernet driver from PCI dependencies.
> > +  Similarly, ``rte_pci_driver`` in planned to be removed from
> > +  ``rte_cryptodev_driver`` in 17.05.

I am going to reword it in a v2.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: remove deprecation notice for rte_bus
  2017-02-13 11:55  5% [dpdk-dev] [PATCH] doc: remove deprecation notice for rte_bus Shreyansh Jain
@ 2017-02-13 14:36  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:36 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev

2017-02-13 17:25, Shreyansh Jain:
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> ---
> -* ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
> -  impacted because of introduction of a new ``rte_bus`` hierarchy. This would
> -  also impact the way devices are identified by EAL. A bus-device-driver model
> -  will be introduced providing a hierarchical view of devices.

Applied, thanks

rte_device/rte_driver have not been impacted and should not be when implementing
the buses.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: postpone API change in ethdev
@ 2017-02-13 14:26  4% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:26 UTC (permalink / raw)
  To: Bernard Iremonger; +Cc: dev

The change of _rte_eth_dev_callback_process has not been done in 17.02.
Let's postpone to 17.05.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 3d72241..6532482 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -23,8 +23,8 @@ Deprecation Notices
   provide a way to handle device initialization currently being done in
   ``eth_driver``.
 
-* ethdev: an API change is planned for 17.02 for the function
-  ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
+* ethdev: an API change is planned for 17.05 for the function
+  ``_rte_eth_dev_callback_process``. In 17.05 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
 
 * ABI changes are planned for 17.05 in the ``rte_mbuf`` structure: some fields
-- 
2.7.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: remove announce of Tx preparation
  2017-02-13 10:56  9% [dpdk-dev] [PATCH] doc: remove announce of Tx preparation Thomas Monjalon
@ 2017-02-13 14:22  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:22 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

2017-02-13 11:56, Thomas Monjalon:
> The feature is part of 17.02, so the ABI changes notice can be removed.
> 
> Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
> 
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Applied

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: postpone ABI changes to 17.05
  2017-02-13 11:05 19% [dpdk-dev] [PATCH] doc: postpone ABI changes to 17.05 Olivier Matz
@ 2017-02-13 14:21  4% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:21 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, john.mcnamara

2017-02-13 12:05, Olivier Matz:
> Postpone the ABI changes for mempool and mbuf that were planned
> for 17.02 to 17.05.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Applied, thanks

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
  2017-02-13 11:55  9% [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL Shreyansh Jain
@ 2017-02-13 12:00  0% ` Shreyansh Jain
  2017-02-13 14:44  0%   ` Thomas Monjalon
  2017-02-13 21:56  0%   ` Jan Blunck
  0 siblings, 2 replies; 200+ results
From: Shreyansh Jain @ 2017-02-13 12:00 UTC (permalink / raw)
  To: dev; +Cc: nhorman, thomas.monjalon

On Monday 13 February 2017 05:25 PM, Shreyansh Jain wrote:
> EAL PCI layer is planned to be restructured in 17.05 to unlink it from
> generic structures like eth_driver, rte_cryptodev_driver, and also move
> it into a PCI Bus.
>
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index fbe2fcb..b12d435 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -13,10 +13,14 @@ Deprecation Notices
>    has exposed, like the way we have done with uio-pci-generic. This change
>    targets release 17.05.
>
> -* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
> -  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
> -  provide a way to handle device initialization currently being done in
> -  ``eth_driver``.

Just to highlight, above statement was added by me in 16.11.
As of now I plan to work on removing rte_pci_driver from eth_driver,
rather than removing eth_driver all together (which, probably, was
better idea).
If someone still wishes to work on its complete removal, we can keep
the above. (and probably remove the below).

> +* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
> +  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
> +  bus.
> +
> +* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in 17.05.
> +  This is to unlink the ethernet driver from PCI dependencies.
> +  Similarly, ``rte_pci_driver`` in planned to be removed from
> +  ``rte_cryptodev_driver`` in 17.05.
>
>  * In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
>    extended with new function pointer ``tx_pkt_prepare`` allowing verification
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: remove deprecation notice for rte_bus
@ 2017-02-13 11:55  5% Shreyansh Jain
  2017-02-13 14:36  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2017-02-13 11:55 UTC (permalink / raw)
  To: dev; +Cc: nhorman, thomas.monjalon, Shreyansh Jain

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 doc/guides/rel_notes/deprecation.rst | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..fbe2fcb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -13,11 +13,6 @@ Deprecation Notices
   has exposed, like the way we have done with uio-pci-generic. This change
   targets release 17.05.
 
-* ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
-  impacted because of introduction of a new ``rte_bus`` hierarchy. This would
-  also impact the way devices are identified by EAL. A bus-device-driver model
-  will be introduced providing a hierarchical view of devices.
-
 * ``eth_driver`` is planned to be removed in 17.02. This currently serves as
   a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
   provide a way to handle device initialization currently being done in
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
@ 2017-02-13 11:55  9% Shreyansh Jain
  2017-02-13 12:00  0% ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2017-02-13 11:55 UTC (permalink / raw)
  To: dev; +Cc: nhorman, thomas.monjalon, Shreyansh Jain

EAL PCI layer is planned to be restructured in 17.05 to unlink it from
generic structures like eth_driver, rte_cryptodev_driver, and also move
it into a PCI Bus.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index fbe2fcb..b12d435 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -13,10 +13,14 @@ Deprecation Notices
   has exposed, like the way we have done with uio-pci-generic. This change
   targets release 17.05.
 
-* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
-  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
-  provide a way to handle device initialization currently being done in
-  ``eth_driver``.
+* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
+  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
+  bus.
+
+* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in 17.05.
+  This is to unlink the ethernet driver from PCI dependencies.
+  Similarly, ``rte_pci_driver`` in planned to be removed from
+  ``rte_cryptodev_driver`` in 17.05.
 
 * In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
   extended with new function pointer ``tx_pkt_prepare`` allowing verification
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH] doc: postpone ABI changes to 17.05
@ 2017-02-13 11:05 19% Olivier Matz
  2017-02-13 14:21  4% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2017-02-13 11:05 UTC (permalink / raw)
  To: dev, john.mcnamara, thomas.monjalon

Postpone the ABI changes for mempool and mbuf that were planned
for 17.02 to 17.05.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..9d01e86 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -34,7 +34,7 @@ Deprecation Notices
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
 
-* ABI changes are planned for 17.02 in the ``rte_mbuf`` structure: some fields
+* ABI changes are planned for 17.05 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
   store address is not naturally aligned. Other mbuf fields, such as the
@@ -44,15 +44,15 @@ Deprecation Notices
 * The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
   are respectively replaced by PKT_RX_VLAN_STRIPPED and
   PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
-  their behavior will be kept until 16.11 and will be removed in 17.02.
+  their behavior will be kept until 17.02 and will be removed in 17.05.
 
 * mempool: The functions ``rte_mempool_count`` and ``rte_mempool_free_count``
-  will be removed in 17.02.
+  will be removed in 17.05.
   They are replaced by ``rte_mempool_avail_count`` and
   ``rte_mempool_in_use_count`` respectively.
 
 * mempool: The functions for single/multi producer/consumer are deprecated
-  and will be removed in 17.02.
+  and will be removed in 17.05.
   It is replaced by ``rte_mempool_generic_get/put`` functions.
 
 * ethdev: the legacy filter API, including
-- 
2.8.1

^ permalink raw reply	[relevance 19%]

* [dpdk-dev] [PATCH] doc: remove announce of Tx preparation
@ 2017-02-13 10:56  9% Thomas Monjalon
  2017-02-13 14:22  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-13 10:56 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

The feature is part of 17.02, so the ABI changes notice can be removed.

Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..326fde4 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -23,13 +23,6 @@ Deprecation Notices
   provide a way to handle device initialization currently being done in
   ``eth_driver``.
 
-* In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
-  extended with new function pointer ``tx_pkt_prepare`` allowing verification
-  and processing of packet burst to meet HW specific requirements before
-  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
-  ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
-  segments limit to be transmitted by device for TSO/non-TSO packets.
-
 * ethdev: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
-- 
2.7.0

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH 2/2] ethdev: add hierarchical scheduler API
  @ 2017-02-10 14:05  1% ` Cristian Dumitrescu
  2017-02-21 10:35  0%   ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Cristian Dumitrescu @ 2017-02-10 14:05 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, jerin.jacob, hemant.agrawal

This patch introduces the generic ethdev API for the hierarchical scheduler
capability.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow approach)
- Capability query API per port and per hierarchy node
- Scheduling algorithms: strict priority (SP), Weighed Fair Queuing (WFQ),
  Weighted Round Robin (WRR)
- Traffic shaping: single/dual rate, private (per node) and shared (by multiple
  nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Changes since RFC [1]:
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below, hopefully
  nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated object
      IDs. IMO the choice to have application-generated object IDs adds marginal
      complexity to the driver (search ID function required), but it provides
      huge simplification for the application. The app does not need to worry
      about building & managing tree-like structure for storing driver-generated
      object IDs, the app can use its own convention for node IDs depending on
      the specific hierarchy that it needs. Trivial example: identify all
      level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
      on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
      310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
      112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
      the other related simplification that was implemented: leaf nodes now have
      predefined IDs that are the same with their Ethernet TX queue ID (
      therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the shaper
  profile as part of node API (no shaper ID needed for private shapers), while
  the shared shapers are configured outside of the node API using shaper profile
  and communicated to the node using shared shaper ID. So there is no
  configuration overhead for shared shapers if the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet TX
  queue ID (therefore no translation is required for leaf nodes). This is also
  used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause (same
  as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
  based on IP packet bytes)

Next steps:
- SW fallback based on librte_sched library (to be later introduced by
  standalone patch set)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemants’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_scheddev.c        |  790 ++++++++++++++++++++
 lib/librte_ether/rte_scheddev.h        | 1273 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_scheddev_driver.h |  374 ++++++++++
 6 files changed, 2475 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_scheddev.c
 create mode 100644 lib/librte_ether/rte_scheddev.h
 create mode 100644 lib/librte_ether/rte_scheddev_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index cc3bf98..666931d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -247,6 +247,10 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+SchedDev API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+F: lib/librte_ether/rte_scheddev*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 1d095a9..7e0527f 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_scheddev.c
 
 #
 # Export include files
@@ -54,6 +55,8 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_scheddev.h
+SYMLINK-y-include += rte_scheddev_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d00cb5c..6b3c84f 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -159,5 +159,35 @@ DPDK_17.05 {
 	global:
 
 	rte_eth_dev_capability_control;
+	rte_scheddev_capabilities_get;
+	rte_scheddev_node_capabilities_get;
+	rte_scheddev_wred_profile_add;
+	rte_scheddev_wred_profile_delete;
+	rte_scheddev_shared_wred_context_add_update;
+	rte_scheddev_shared_wred_context_delete;
+	rte_scheddev_shaper_profile_add;
+	rte_scheddev_shaper_profile_delete;
+	rte_scheddev_shared_shaper_add_update;
+	rte_scheddev_shared_shaper_delete;
+	rte_scheddev_node_add;
+	rte_scheddev_node_delete;
+	rte_scheddev_node_suspend;
+	rte_scheddev_node_resume;
+	rte_scheddev_hierarchy_set;
+	rte_scheddev_node_parent_update;
+	rte_scheddev_node_shaper_update;
+	rte_scheddev_node_shared_shaper_update;
+	rte_scheddev_node_scheduling_mode_update;
+	rte_scheddev_node_cman_update;
+	rte_scheddev_node_wred_context_update;
+	rte_scheddev_node_shared_wred_context_update;
+	rte_scheddev_mark_vlan_dei;
+	rte_scheddev_mark_ip_ecn;
+	rte_scheddev_mark_ip_dscp;
+	rte_scheddev_stats_get_enabled;
+	rte_scheddev_stats_enable;
+	rte_scheddev_node_stats_get_enabled;
+	rte_scheddev_node_stats_enable;
+	rte_scheddev_node_stats_read;
 
 } DPDK_17.02;
diff --git a/lib/librte_ether/rte_scheddev.c b/lib/librte_ether/rte_scheddev.c
new file mode 100644
index 0000000..679a22d
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev.c
@@ -0,0 +1,790 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_scheddev_driver.h"
+#include "rte_scheddev.h"
+
+/* Get generic scheduler operations structure from a port. */
+const struct rte_scheddev_ops *
+rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_scheddev_error_set(error,
+			ENODEV,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->cap_ctrl == NULL) ||
+		dev->dev_ops->cap_ctrl(dev, RTE_ETH_CAPABILITY_SCHED, &ops) ||
+		(ops == NULL)) {
+		rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+/* Get capabilities */
+int rte_scheddev_capabilities_get(uint8_t port_id,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->capabilities_get(dev, cap, error);
+}
+
+/* Get node capabilities */
+int rte_scheddev_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_capabilities_get(dev, node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_scheddev_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->wred_profile_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->wred_profile_add(dev, wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_scheddev_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->wred_profile_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->wred_profile_delete(dev, wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_wred_context_add_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_wred_context_add_update(dev, shared_wred_context_id,
+		wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_scheddev_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_wred_context_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_wred_context_delete(dev, shared_wred_context_id,
+		error);
+}
+
+/* Add shaper profile */
+int rte_scheddev_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shaper_profile_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shaper_profile_add(dev, shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_scheddev_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shaper_profile_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shaper_profile_delete(dev, shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_scheddev_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_shaper_add_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_shaper_add_update(dev, shared_shaper_id,
+		shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_scheddev_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_shaper_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_shaper_delete(dev, shared_shaper_id, error);
+}
+
+/* Add node to port scheduler hierarchy */
+int rte_scheddev_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_add(dev, node_id, parent_node_id, priority, weight,
+		params, error);
+}
+
+/* Delete node from scheduler hierarchy */
+int rte_scheddev_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_delete(dev, node_id, error);
+}
+
+/* Suspend node */
+int rte_scheddev_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_suspend == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_suspend(dev, node_id, error);
+}
+
+/* Resume node */
+int rte_scheddev_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_resume == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_resume(dev, node_id, error);
+}
+
+/* Set the initial port scheduler hierarchy */
+int rte_scheddev_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->hierarchy_set == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->hierarchy_set(dev, clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_scheddev_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_parent_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_parent_update(dev, node_id, parent_node_id, priority,
+		weight, error);
+}
+
+/* Update node private shaper */
+int rte_scheddev_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shaper_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shaper_update(dev, node_id, shaper_profile_id,
+		error);
+}
+
+/* Update node shared shapers */
+int rte_scheddev_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shared_shaper_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shared_shaper_update(dev, node_id, shared_shaper_id,
+		add, error);
+}
+
+/* Update scheduling mode */
+int rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_scheduling_mode_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_scheduling_mode_update(dev, node_id,
+		scheduling_mode_per_priority, n_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_scheddev_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_cman_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_cman_update(dev, node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_scheddev_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_wred_context_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_wred_context_update(dev, node_id, wred_profile_id,
+		error);
+}
+
+/* Update node shared WRED context */
+int rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shared_wred_context_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shared_wred_context_update(dev, node_id,
+		shared_wred_context_id, add, error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_scheddev_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_vlan_dei == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_vlan_dei(dev, mark_green, mark_yellow, mark_red,
+		error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_scheddev_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_ip_ecn == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_ip_ecn(dev, mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_scheddev_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_ip_dscp == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_ip_dscp(dev, mark_green, mark_yellow, mark_red,
+		error);
+}
+
+/* Get set of stats counter types currently enabled for all nodes */
+int rte_scheddev_stats_get_enabled(uint8_t port_id,
+	uint64_t *nonleaf_node_capability_stats_mask,
+	uint64_t *nonleaf_node_enabled_stats_mask,
+	uint64_t *leaf_node_capability_stats_mask,
+	uint64_t *leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->stats_get_enabled == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->stats_get_enabled(dev,
+		nonleaf_node_capability_stats_mask,
+		nonleaf_node_enabled_stats_mask,
+		leaf_node_capability_stats_mask,
+		leaf_node_enabled_stats_mask,
+		error);
+}
+
+/* Enable specified set of stats counter types for all nodes */
+int rte_scheddev_stats_enable(uint8_t port_id,
+	uint64_t nonleaf_node_enabled_stats_mask,
+	uint64_t leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->stats_enable == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->stats_enable(dev,
+		nonleaf_node_enabled_stats_mask,
+		leaf_node_enabled_stats_mask,
+		error);
+}
+
+/* Get set of stats counter types currently enabled for specific node */
+int rte_scheddev_node_stats_get_enabled(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t *capability_stats_mask,
+	uint64_t *enabled_stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_get_enabled == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_get_enabled(dev,
+		node_id,
+		capability_stats_mask,
+		enabled_stats_mask,
+		error);
+}
+
+/* Enable specified set of stats counter types for specific node */
+int rte_scheddev_node_stats_enable(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t enabled_stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_enable == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_enable(dev, node_id, enabled_stats_mask, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_scheddev_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	int clear,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_read == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_read(dev, node_id, stats, clear, error);
+}
diff --git a/lib/librte_ether/rte_scheddev.h b/lib/librte_ether/rte_scheddev.h
new file mode 100644
index 0000000..fed3df2
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev.h
@@ -0,0 +1,1273 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_SCHEDDEV_H__
+#define __INCLUDE_RTE_SCHEDDEV_H__
+
+/**
+ * @file
+ * RTE Generic Hierarchical Scheduler API
+ *
+ * This interface provides the ability to configure the hierarchical scheduler
+ * feature in a generic way.
+ */
+
+#include <stdint.h>
+
+#include <rte_red.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Ethernet framing overhead
+  *
+  * Overhead fields per Ethernet frame:
+  * 1. Preamble:                                            7 bytes;
+  * 2. Start of Frame Delimiter (SFD):                      1 byte;
+  * 3. Inter-Frame Gap (IFG):                              12 bytes.
+  */
+#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD                  20
+
+/**
+  * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
+  * is generated and added at the end of the Ethernet frame on TX side without
+  * any SW intervention.
+  */
+#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**< Invalid WRED profile ID */
+#define RTE_SCHEDDEV_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**< Invalid shaper profile ID */
+#define RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**< Scheduler hierarchy root node ID */
+#define RTE_SCHEDDEV_ROOT_NODE_ID                          UINT32_MAX
+
+
+/**
+  * Scheduler node capabilities
+  */
+struct rte_scheddev_node_capabilities {
+	/**< Private shaper support. */
+	int shaper_private_supported;
+
+	/**< Dual rate shaping support for private shaper. Valid only when
+	 * private shaper is supported.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/**< Minimum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of supported shared shapers. The value of zero
+	 * indicates that shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Items valid only for non-leaf nodes. */
+	struct {
+		/**< Maximum number of children nodes. */
+		uint32_t n_children_max;
+
+		/**< Lowest priority supported. The value of 1 indicates that
+		 * only priority 0 is supported, which essentially means that
+		 * Strict Priority (SP) algorithm is not supported.
+		 */
+		uint32_t sp_priority_min;
+
+		/**< Maximum number of sibling nodes that can have the same
+		 * priority at any given time. When equal to *n_children_max*,
+		 * it indicates that WFQ/WRR algorithms are not supported.
+		 */
+		uint32_t sp_n_children_max;
+
+		/**< WFQ algorithm support. */
+		int scheduling_wfq_supported;
+
+		/**< WRR algorithm support. */
+		int scheduling_wrr_supported;
+
+		/**< Maximum WFQ/WRR weight. */
+		uint32_t scheduling_wfq_wrr_weight_max;
+	} nonleaf;
+
+	/**< Items valid only for leaf nodes. */
+	struct {
+		/**< Head drop algorithm support. */
+		int cman_head_drop_supported;
+
+		/**< Private WRED context support. */
+		int cman_wred_context_private_supported;
+
+		/**< Maximum number of shared WRED contexts supported. The value
+		 * of zero indicates that shared WRED contexts are not
+		 * supported.
+		 */
+		uint32_t cman_wred_context_shared_n_max;
+	} leaf;
+};
+
+/**
+  * Scheduler capabilities
+  */
+struct rte_scheddev_capabilities {
+	/**< Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/**< Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resource between private and
+	 * shared shapers, it is typically equal to the sum between
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/**< Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have the private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/**< Maximum number of shared shapers. The value of zero indicates that
+	  * shared shapers are not supported.
+	  */
+	uint32_t shaper_shared_n_max;
+
+	/**< Maximum number of nodes that can share the same shared shaper. Only
+	  * valid when shared shapers are supported.
+	  */
+	uint32_t shaper_shared_n_nodes_max;
+
+	/**< Maximum number of shared shapers that can be configured with dual
+	  * rate shaping. The value of zero indicates that dual rate shaping
+	  * support is not available for shared shapers.
+	  */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/**< Minimum committed/peak rate (bytes per second) for shared
+	  * shapers. Only valid when shared shapers are supported.
+	  */
+	uint64_t shaper_shared_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for shared
+	  * shaper. Only valid when shared shapers are supported.
+	  */
+	uint64_t shaper_shared_rate_max;
+
+	/**< Minimum value allowed for packet length adjustment for
+	  * private/shared shapers.
+	  */
+	int shaper_pkt_length_adjust_min;
+
+	/**< Maximum value allowed for packet length adjustment for
+	  * private/shared shapers.
+	  */
+	int shaper_pkt_length_adjust_max;
+
+	/**< Maximum number of WRED contexts. */
+	uint32_t cman_wred_context_n_max;
+
+	/**< Maximum number of private WRED contexts. Indicates the maximum
+	  * number of leaf nodes that can concurrently have the private WRED
+	  * context enabled.
+	  */
+	uint32_t cman_wred_context_private_n_max;
+
+	/**< Maximum number of shared WRED contexts. The value of zero indicates
+	  * that shared WRED contexts are not supported.
+	  */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/**< Maximum number of leaf nodes that can share the same WRED context.
+	  * Only valid when shared WRED contexts are supported.
+	  */
+	uint32_t cman_wred_context_shared_n_nodes_max;
+
+	/**< Support for VLAN DEI packet marking. */
+	int mark_vlan_dei_supported;
+
+	/**< Support for IPv4/IPv6 ECN marking of TCP packets. */
+	int mark_ip_ecn_tcp_supported;
+
+	/**< Support for IPv4/IPv6 ECN marking of SCTP packets. */
+	int mark_ip_ecn_sctp_supported;
+
+	/**< Support for IPv4/IPv6 DSCP packet marking. */
+	int mark_ip_dscp_supported;
+
+	/**< Summary of node-level capabilities across all nodes. */
+	struct rte_scheddev_node_capabilities node;
+};
+
+/**
+  * Congestion management (CMAN) mode
+  *
+  * This is used for controlling the admission of packets into a packet queue or
+  * group of packet queues on congestion. On request of writing a new packet
+  * into the current queue while the queue is full, the *tail drop* algorithm
+  * drops the new packet while leaving the queue unmodified, as opposed to *head
+  * drop* algorithm, which drops the packet at the head of the queue (the oldest
+  * packet waiting in the queue) and admits the new packet at the tail of the
+  * queue.
+  *
+  * The *Random Early Detection (RED)* algorithm works by proactively dropping
+  * more and more input packets as the queue occupancy builds up. When the queue
+  * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+  * RED* algorithm uses a separate set of RED thresholds for each packet color.
+  */
+enum rte_scheddev_cman_mode {
+	RTE_SCHEDDEV_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_SCHEDDEV_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_SCHEDDEV_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+  * Color
+  */
+enum rte_scheddev_color {
+	e_RTE_SCHEDDEV_GREEN = 0, /**< Green */
+	e_RTE_SCHEDDEV_YELLOW,    /**< Yellow */
+	e_RTE_SCHEDDEV_RED,       /**< Red */
+	e_RTE_SCHEDDEV_COLORS     /**< Number of colors */
+};
+
+/**
+  * WRED profile
+  */
+struct rte_scheddev_wred_params {
+	/**< One set of RED parameters per packet color */
+	struct rte_red_params red_params[e_RTE_SCHEDDEV_COLORS];
+};
+
+/**
+  * Token bucket
+  */
+struct rte_scheddev_token_bucket {
+	/**< Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/**< Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+  * Shaper (rate limiter) profile
+  *
+  * Multiple shaper instances can share the same shaper profile. Each node has
+  * zero or one private shaper (only one node using it) and/or zero, one or
+  * several shared shapers (multiple nodes use the same shaper instance).
+  *
+  * Single rate shapers use a single token bucket. A single rate shaper can be
+  * configured by setting the rate of the committed bucket to zero, which
+  * effectively disables this bucket. The peak bucket is used to limit the rate
+  * and the burst size for the current shaper.
+  *
+  * Dual rate shapers use both the committed and the peak token buckets. The
+  * rate of the committed bucket has to be less than or equal to the rate of the
+  * peak bucket.
+  */
+struct rte_scheddev_shaper_params {
+	/**< Committed token bucket */
+	struct rte_scheddev_token_bucket committed;
+
+	/**< Peak token bucket */
+	struct rte_scheddev_token_bucket peak;
+
+	/**< Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+  * Node parameters
+  *
+  * Each scheduler hierarchy node has multiple inputs (children nodes of the
+  * current parent node) and a single output (which is input to its parent
+  * node). The current node arbitrates its inputs using Strict Priority (SP),
+  * Weighted Fair Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to
+  * schedule input packets on its output while observing its shaping (rate
+  * limiting) constraints.
+  *
+  * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
+  * approximations of the ideal of WFQ and are assimilated to WFQ, although
+  * an associated implementation-dependent trade-off on accuracy, performance
+  * and resource usage might exist.
+  *
+  * Children nodes with different priorities are scheduled using the SP
+  * algorithm, based on their priority, with zero (0) as the highest priority.
+  * Children with same priority are scheduled using the WFQ or WRR algorithm,
+  * based on their weight, which is relative to the sum of the weights of all
+  * siblings with same priority, with one (1) as the lowest weight.
+  *
+  * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+  * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
+  * where N is the number of TX queues configured for the current Ethernet port.
+  * The non-leaf nodes have their IDs generated by the application.
+  */
+struct rte_scheddev_node_params {
+	/**< Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/**< User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	union {
+		/**< Parameters only valid for non-leaf nodes. */
+		struct {
+			/**< For each priority, indicates whether the children
+			 * nodes sharing the same priority are to be scheduled
+			 * by WFQ or by WRR. When NULL, it indicates that WFQ
+			 * is to be used for all priorities. When non-NULL, it
+			 * points to a pre-allocated array of *n_priority*
+			 * elements, with a non-zero value element indicating
+			 * WFQ and a zero value element for WRR.
+			 */
+			int *scheduling_mode_per_priority;
+
+			/**< Number of priorities. */
+			uint32_t n_priorities;
+		} nonleaf;
+
+		/**< Parameters only valid for leaf nodes. */
+		struct {
+			/**< Congestion management mode */
+			enum rte_scheddev_cman_mode cman;
+
+			/**< WRED parameters (valid when *cman* is WRED). */
+			struct {
+				/**< WRED profile for private WRED context. */
+				uint32_t wred_profile_id;
+
+				/**< User allocated array of shared WRED context
+				 * IDs. The absence of a private WRED context
+				 * for current leaf node is indicated by value
+				 * RTE_SCHEDDEV_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/**< Number of shared WRED context IDs in the
+				 * *shared_wred_context_id* array.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+};
+
+/**
+  * Node statistics counter type
+  */
+enum rte_scheddev_stats_counter {
+	/**< Number of packets scheduled from current node. */
+	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS = 1 << 0,
+
+	/**< Number of bytes scheduled from current node. */
+	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES = 1 << 1,
+
+	/**< Number of packets dropped by current node.  */
+	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS_DROPPED = 1 << 2,
+
+	/**< Number of bytes dropped by current node.  */
+	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES_DROPPED = 1 << 3,
+
+	/**< Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS_QUEUED = 1 << 4,
+
+	/**< Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES_QUEUED = 1 << 5,
+};
+
+/**
+  * Node statistics counters
+  */
+struct rte_scheddev_node_stats {
+	/**< Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/**< Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/**< Statistics counters for leaf nodes only. */
+	struct {
+		/**< Number of packets dropped by current leaf node. */
+		uint64_t n_pkts_dropped;
+
+		/**< Number of bytes dropped by current leaf node. */
+		uint64_t n_bytes_dropped;
+
+		/**< Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/**< Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_scheddev_error::cause.
+ */
+enum rte_scheddev_error_type {
+	RTE_SCHEDDEV_ERROR_TYPE_NONE, /**< No error. */
+	RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_PARENT_NODE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_PRIORITY,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_WEIGHT,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_CMAN,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_WRED_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_SHARED_WRED_CONTEXT_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_scheddev_error {
+	enum rte_scheddev_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Scheduler capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Scheduler capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_capabilities_get(uint8_t port_id,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param cap
+ *   Scheduler node capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is currently
+ * at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several scheduler hierarchy
+ * leaf nodes configured to use WRED as the congestion management mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. scheduler hierarchy leaf node) of this
+ * shared WRED context.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is no
+ * longer using the shaper profile previously assigned to it and is updated to
+ * use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. scheduler hierarchy node) of this shared
+ * shaper.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node add
+ *
+ * When *node_id* is not a valid node ID, a new node with this ID is created and
+ * connected as child to the existing node identified by *parent_node_id*.
+ *
+ * When *node_id* is a valid node ID, this node is disconnected from its current
+ * parent and connected as child to another existing node identified by
+ * *parent_node_id *.
+ *
+ * This function can be called during port initialization phase (before the
+ * Ethernet port is started) for building the scheduler start-up hierarchy.
+ * Subject to the specific Ethernet port supporting on-the-fly scheduler
+ * hierarchy updates, this function can also be called during run-time (after
+ * the Ethernet port is started).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID
+ * @param parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has at
+ * least one user (i.e. child node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node suspend
+ *
+ * Suspend an existing node.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node resume
+ *
+ * Resume an existing node that was previously suspended.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler hierarchy set
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the scheduler start-up hierarchy.
+ *
+ * This function fails when the currently configured scheduler hierarchy is not
+ * supported by the Ethernet port, in which case the user can abort or try out
+ * another hierarchy configuration (e.g. a hierarchy with less leaf nodes),
+ * which can be build from scratch (when *clear_on_fail* is enabled) or by
+ * modifying the existing hierarchy configuration (when *clear_on_fail* is
+ * disabled).
+ *
+ * Note that, even when the configured scheduler hierarchy is supported (so this
+ * function is successful), the Ethernet port start might still fail due to e.g.
+ * not enough memory being available in the system, etc.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node parent update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node private shaper update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node shared shapers update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node scheduling mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param scheduling_mode_per_priority
+ *   For each priority, indicates whether the children nodes sharing the same
+ *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates that
+ *   WFQ is to be used for all priorities. When non-NULL, it points to a
+ *   pre-allocated array of *n_priority* elements, with a non-zero value element
+ *   indicating WFQ and a zero value element for WRR.
+ * @param n_priorities
+ *   Number of priorities.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node congestion management mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param cman
+ *   Congestion management mode.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node private WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_SCHEDDEV_WRED_PROFILE_ID_NONE, with
+ *   the latter disabling the private WRED context of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node shared WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared WRED context to current node or to
+ *   zero to delete this shared WRED context from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion Notification
+ * (ECN) field (2 bits). The DSCP field is typically used to encode the traffic
+ * class and/or drop priority (RFC 2597), while the ECN field is used by RFC
+ * 3168 to implement a congestion notification mechanism to be leveraged by
+ * transport layer protocols such as TCP and SCTP that have congestion control
+ * mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10 (values
+ * indicating that source endpoint is ECN-capable) to 2'b11 (meaning that
+ * congestion is experienced). The destination endpoint can use the ECN-Echo
+ * (ECE) TCP flag to relay the congestion indication back to the source
+ * endpoint, which acknowledges it back to the destination endpoint with the
+ * Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ *                       Class 1    Class 2    Class 3    Class 4
+ *                     +----------+----------+----------+----------+
+ *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
+ *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
+ *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
+ *                     +----------+----------+----------+----------+
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2, as
+ * well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler get statistics counter types enabled for all nodes
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param nonleaf_node_capability_stats_mask
+ *   Statistics counter types available per node for all non-leaf nodes. Needs
+ *   to be pre-allocated.
+ * @param nonleaf_node_enabled_stats_mask
+ *   Statistics counter types currently enabled per node for each non-leaf node.
+ *   This is a subset of *nonleaf_node_capability_stats_mask*. Needs to be
+ *   pre-allocated.
+ * @param leaf_node_capability_stats_mask
+ *   Statistics counter types available per node for all leaf nodes. Needs to
+ *   be pre-allocated.
+ * @param leaf_node_enabled_stats_mask
+ *   Statistics counter types currently enabled for each leaf node. This is
+ *   a subset of *leaf_node_capability_stats_mask*. Needs to be pre-allocated.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_stats_get_enabled(uint8_t port_id,
+	uint64_t *nonleaf_node_capability_stats_mask,
+	uint64_t *nonleaf_node_enabled_stats_mask,
+	uint64_t *leaf_node_capability_stats_mask,
+	uint64_t *leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler enable selected statistics counters for all nodes
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param nonleaf_node_enabled_stats_mask
+ *   Statistics counter types to be enabled per node for each non-leaf node.
+ *   This needs to be a subset of the statistics counter types available per
+ *   node for all non-leaf nodes. Any statistics counter type not included in
+ *   this set is to be disabled for all non-leaf nodes.
+ * @param leaf_node_enabled_stats_mask
+ *   Statistics counter types to be enabled per node for each leaf node. This
+ *   needs to be a subset of the statistics counter types available per node for
+ *   all leaf nodes. Any statistics counter type not included in this set is to
+ *   be disabled for all leaf nodes.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_stats_enable(uint8_t port_id,
+	uint64_t nonleaf_node_enabled_stats_mask,
+	uint64_t leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler get statistics counter types enabled for current node
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param capability_stats_mask
+ *   Statistics counter types available for the current node. Needs to be
+ *   pre-allocated.
+ * @param enabled_stats_mask
+ *   Statistics counter types currently enabled for the current node. This is
+ *   a subset of *capability_stats_mask*. Needs to be pre-allocated.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_stats_get_enabled(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t *capability_stats_mask,
+	uint64_t *enabled_stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler enable selected statistics counters for current node
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param enabled_stats_mask
+ *   Statistics counter types to be enabled for the current node. This needs to
+ *   be a subset of the statistics counter types available for the current node.
+ *   Any statistics counter type not included in this set is to be disabled for
+ *   the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_stats_enable(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t enabled_stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node statistics counters read
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read, otherwise
+ *   the statistics counters are left untouched.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	int clear,
+	struct rte_scheddev_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_SCHEDDEV_H__ */
diff --git a/lib/librte_ether/rte_scheddev_driver.h b/lib/librte_ether/rte_scheddev_driver.h
new file mode 100644
index 0000000..c0a0321
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev_driver.h
@@ -0,0 +1,374 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
+#define __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Hierarchical Scheduler API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_scheddev.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int (*rte_scheddev_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler capabilities get */
+
+typedef int (*rte_scheddev_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node capabilities get */
+
+typedef int (*rte_scheddev_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler WRED profile add */
+
+typedef int (*rte_scheddev_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler WRED profile delete */
+
+typedef int (*rte_scheddev_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared WRED context add */
+
+typedef int (*rte_scheddev_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared WRED context delete */
+
+typedef int (*rte_scheddev_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shaper profile add */
+
+typedef int (*rte_scheddev_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shaper profile delete */
+
+typedef int (*rte_scheddev_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared shaper add/update */
+
+typedef int (*rte_scheddev_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared shaper delete */
+
+typedef int (*rte_scheddev_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node add */
+
+typedef int (*rte_scheddev_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node delete */
+
+typedef int (*rte_scheddev_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node suspend */
+
+typedef int (*rte_scheddev_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node resume */
+
+typedef int (*rte_scheddev_hierarchy_set_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler hierarchy set */
+
+typedef int (*rte_scheddev_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node parent update */
+
+typedef int (*rte_scheddev_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node shaper update */
+
+typedef int (*rte_scheddev_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node shaper update */
+
+typedef int (*rte_scheddev_node_scheduling_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node scheduling mode update */
+
+typedef int (*rte_scheddev_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node congestion management mode update */
+
+typedef int (*rte_scheddev_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node WRED context update */
+
+typedef int (*rte_scheddev_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node WRED context update */
+
+typedef int (*rte_scheddev_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - VLAN DEI */
+
+typedef int (*rte_scheddev_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - IPv4/IPv6 ECN */
+
+typedef int (*rte_scheddev_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - IPv4/IPv6 DSCP */
+
+typedef int (*rte_scheddev_stats_get_enabled_t)(struct rte_eth_dev *dev,
+	uint64_t *nonleaf_node_capability_stats_mask,
+	uint64_t *nonleaf_node_enabled_stats_mask,
+	uint64_t *leaf_node_capability_stats_mask,
+	uint64_t *leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler get set of stats counters enabled for all nodes */
+
+typedef int (*rte_scheddev_stats_enable_t)(struct rte_eth_dev *dev,
+	uint64_t nonleaf_node_enabled_stats_mask,
+	uint64_t leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler enable selected stats counters for all nodes */
+
+typedef int (*rte_scheddev_node_stats_get_enabled_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t *capability_stats_mask,
+	uint64_t *enabled_stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler get set of stats counters enabled for specific node */
+
+typedef int (*rte_scheddev_node_stats_enable_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t enabled_stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler enable selected stats counters for specific node */
+
+typedef int (*rte_scheddev_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	int clear,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler read stats counters for specific node */
+
+struct rte_scheddev_ops {
+	/** Scheduler capabilities_get */
+	rte_scheddev_capabilities_get_t capabilities_get;
+	/** Scheduler node capabilities get */
+	rte_scheddev_node_capabilities_get_t node_capabilities_get;
+
+	/** Scheduler WRED profile add */
+	rte_scheddev_wred_profile_add_t wred_profile_add;
+	/** Scheduler WRED profile delete */
+	rte_scheddev_wred_profile_delete_t wred_profile_delete;
+	/** Scheduler shared WRED context add/update */
+	rte_scheddev_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Scheduler shared WRED context delete */
+	rte_scheddev_shared_wred_context_delete_t
+		shared_wred_context_delete;
+	/** Scheduler shaper profile add */
+	rte_scheddev_shaper_profile_add_t shaper_profile_add;
+	/** Scheduler shaper profile delete */
+	rte_scheddev_shaper_profile_delete_t shaper_profile_delete;
+	/** Scheduler shared shaper add/update */
+	rte_scheddev_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Scheduler shared shaper delete */
+	rte_scheddev_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Scheduler node add */
+	rte_scheddev_node_add_t node_add;
+	/** Scheduler node delete */
+	rte_scheddev_node_delete_t node_delete;
+	/** Scheduler node suspend */
+	rte_scheddev_node_suspend_t node_suspend;
+	/** Scheduler node resume */
+	rte_scheddev_node_resume_t node_resume;
+	/** Scheduler hierarchy set */
+	rte_scheddev_hierarchy_set_t hierarchy_set;
+
+	/** Scheduler node parent update */
+	rte_scheddev_node_parent_update_t node_parent_update;
+	/** Scheduler node shaper update */
+	rte_scheddev_node_shaper_update_t node_shaper_update;
+	/** Scheduler node shared shaper update */
+	rte_scheddev_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Scheduler node scheduling mode update */
+	rte_scheddev_node_scheduling_mode_update_t node_scheduling_mode_update;
+	/** Scheduler node congestion management mode update */
+	rte_scheddev_node_cman_update_t node_cman_update;
+	/** Scheduler node WRED context update */
+	rte_scheddev_node_wred_context_update_t node_wred_context_update;
+	/** Scheduler node shared WRED context update */
+	rte_scheddev_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+
+	/** Scheduler packet marking - VLAN DEI */
+	rte_scheddev_mark_vlan_dei_t mark_vlan_dei;
+	/** Scheduler packet marking - IPv4/IPv6 ECN */
+	rte_scheddev_mark_ip_ecn_t mark_ip_ecn;
+	/** Scheduler packet marking - IPv4/IPv6 DSCP */
+	rte_scheddev_mark_ip_dscp_t mark_ip_dscp;
+
+	/** Scheduler get statistics counter type enabled for all nodes */
+	rte_scheddev_stats_get_enabled_t stats_get_enabled;
+	/** Scheduler enable selected statistics counters for all nodes */
+	rte_scheddev_stats_enable_t stats_enable;
+	/** Scheduler get statistics counter type enabled for current node */
+	rte_scheddev_node_stats_get_enabled_t node_stats_get_enabled;
+	/** Scheduler enable selected statistics counters for current node */
+	rte_scheddev_node_stats_enable_t node_stats_enable;
+	/** Scheduler read statistics counters for current node */
+	rte_scheddev_node_stats_read_t node_stats_read;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param error
+ *   Pointer to error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error type.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_scheddev_error_set(struct rte_scheddev_error *error,
+		   int code,
+		   enum rte_scheddev_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_scheddev_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic hierarchical scheduler operations structure from a port
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param error
+ *   Error details
+ *
+ * @return
+ *   The hierarchical scheduler operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_scheddev_ops *
+rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_SCHEDDEV_DRIVER_H__ */
-- 
2.5.0

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-10 11:39  9% [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure Fan Zhang
@ 2017-02-10 13:59  4% ` Trahe, Fiona
  2017-02-13 16:07  7%   ` Zhang, Roy Fan
  2017-02-14  0:21  4%   ` Hemant Agrawal
  2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
  1 sibling, 2 replies; 200+ results
From: Trahe, Fiona @ 2017-02-10 13:59 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: De Lara Guarch, Pablo, Trahe, Fiona

Hi Fan,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> Sent: Friday, February 10, 2017 11:39 AM
> To: dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> structure
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 755dc65..564d93a 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -62,3 +62,7 @@ Deprecation Notices
>    PMDs that implement the latter.
>    Target release for removal of the legacy API will be defined once most
>    PMDs have switched to rte_flow.
> +
> +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
> +  The field ``cryptodev_configure_t`` function prototype will be added a
> +  parameter of a struct rte_cryptodev_config type pointer.
> --
> 2.7.4

Can you fix the grammar here please. I'm not sure what the change is?

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
@ 2017-02-10 11:39  9% Fan Zhang
  2017-02-10 13:59  4% ` Trahe, Fiona
  2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
  0 siblings, 2 replies; 200+ results
From: Fan Zhang @ 2017-02-10 11:39 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 755dc65..564d93a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -62,3 +62,7 @@ Deprecation Notices
   PMDs that implement the latter.
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
+
+* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
+  The field ``cryptodev_configure_t`` function prototype will be added a
+  parameter of a struct rte_cryptodev_config type pointer.
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
  2017-02-10 10:44  4%       ` Thomas Monjalon
@ 2017-02-10 11:20  4%         ` Tan, Jianfeng
  0 siblings, 0 replies; 200+ results
From: Tan, Jianfeng @ 2017-02-10 11:20 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Yigit, Ferruh, dev, Mcnamara, John, yuanhan.liu, stephen



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Friday, February 10, 2017 6:44 PM
> To: Tan, Jianfeng
> Cc: Yigit, Ferruh; dev@dpdk.org; Mcnamara, John;
> yuanhan.liu@linux.intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
> 
> 2017-02-09 17:40, Ferruh Yigit:
> > On 2/9/2017 4:06 PM, Jianfeng Tan wrote:
> > > This ABI changes to remove iomem and ioport mapping in igb_uio. The
> > > purpose of this changes was to fix a bug: when DPDK app crashes,
> > > those devices by igb_uio are not stopped either DPDK PMD driver or
> > > igb_uio driver.
> > >
> > > Then it has been pointed out by Stephen Hemminger that it has
> > > backward compatibility issue: cannot run old version DPDK on
> > > modified igb_uio.
> > >
> > > However, we still have not figure out a new way to fix this bug
> > > without this change. Let's postpone this deprecation announcement
> > > in case this change cannot be avoided.
> > >
> > > Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from
> igb_uio")
> > >
> > > Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> > > Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> >
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Applied, thanks
> 
> The images are not real vector images and are almost unreadable.
> Please make the effort to use inkscape in order to have images
> we can update.

Apologies for that. I've submitted a patch to changes the images. And thank you for the solution.

> 
> I did some changes: s/virtio_user/virtio-user/ in order to be consistent.
> Like for vhost-user, we use the underscore only in code.

Thank you for that.

Regards,
Jianfeng

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
  2017-02-09 17:40  4%     ` Ferruh Yigit
@ 2017-02-10 10:44  4%       ` Thomas Monjalon
  2017-02-10 11:20  4%         ` Tan, Jianfeng
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-10 10:44 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: Ferruh Yigit, dev, john.mcnamara, yuanhan.liu, stephen

2017-02-09 17:40, Ferruh Yigit:
> On 2/9/2017 4:06 PM, Jianfeng Tan wrote:
> > This ABI changes to remove iomem and ioport mapping in igb_uio. The
> > purpose of this changes was to fix a bug: when DPDK app crashes,
> > those devices by igb_uio are not stopped either DPDK PMD driver or
> > igb_uio driver.
> > 
> > Then it has been pointed out by Stephen Hemminger that it has
> > backward compatibility issue: cannot run old version DPDK on
> > modified igb_uio.
> > 
> > However, we still have not figure out a new way to fix this bug
> > without this change. Let's postpone this deprecation announcement
> > in case this change cannot be avoided.
> > 
> > Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from igb_uio")
> > 
> > Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> > Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied, thanks

The images are not real vector images and are almost unreadable.
Please make the effort to use inkscape in order to have images
we can update.

I did some changes: s/virtio_user/virtio-user/ in order to be consistent.
Like for vhost-user, we use the underscore only in code.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
  2017-02-09 16:06 12%   ` [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio Jianfeng Tan
@ 2017-02-09 17:40  4%     ` Ferruh Yigit
  2017-02-10 10:44  4%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-02-09 17:40 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: thomas.monjalon, john.mcnamara, yuanhan.liu, stephen

On 2/9/2017 4:06 PM, Jianfeng Tan wrote:
> This ABI changes to remove iomem and ioport mapping in igb_uio. The
> purpose of this changes was to fix a bug: when DPDK app crashes,
> those devices by igb_uio are not stopped either DPDK PMD driver or
> igb_uio driver.
> 
> Then it has been pointed out by Stephen Hemminger that it has
> backward compatibility issue: cannot run old version DPDK on
> modified igb_uio.
> 
> However, we still have not figure out a new way to fix this bug
> without this change. Let's postpone this deprecation announcement
> in case this change cannot be avoided.
> 
> Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from igb_uio")
> 
> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] Kill off PCI dependencies
  2017-02-08 22:56  3% [dpdk-dev] Kill off PCI dependencies Stephen Hemminger
@ 2017-02-09 16:26  3% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-09 16:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

2017-02-08 14:56, Stephen Hemminger:
> I am trying to make DPDK more agnostic about bus type. The existing API still
> has hardwired into that ethernet devices are either PCI or not PCI (ie pci_dev == NULL).
> Jan, Jerin, and Shreyansh started the process but it hasn't gone far enough.
> 
> It would make more sense if the existing generic device was used everywhere
> including rte_ethdev, rte_ethdev_info, etc.

Yes

> The ABI breakage is not catastrophic. Just change pci_dev to a device pointer.
> One option would be to use NEXT_ABI and/or two different calls and data structures.
> Messy but compatible. Something like
>     rte_dev_info_get returns rte_dev_info but is marked deprecated
>     rte_device_info_get returns rte_device_info

Or we can break the ABI to avoid messy code.

> One fallout is that the existing testpmd code makes lots of assumptions that
> is working with a PCI device. Things like ability to get/set PCI registers.
> I suspect this is already broken if one tries to run it on a virtual device like TAP.
> 
> Can we just turn off that functionality?

Which functionality exactly?

> Also KNI has more dependencies that ethernet devices are PCI.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
  2017-02-09 16:06  4% ` [dpdk-dev] [PATCH v2 " Jianfeng Tan
@ 2017-02-09 16:06 12%   ` Jianfeng Tan
  2017-02-09 17:40  4%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Jianfeng Tan @ 2017-02-09 16:06 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, john.mcnamara, yuanhan.liu, stephen, Jianfeng Tan

This ABI changes to remove iomem and ioport mapping in igb_uio. The
purpose of this changes was to fix a bug: when DPDK app crashes,
those devices by igb_uio are not stopped either DPDK PMD driver or
igb_uio driver.

Then it has been pointed out by Stephen Hemminger that it has
backward compatibility issue: cannot run old version DPDK on
modified igb_uio.

However, we still have not figure out a new way to fix this bug
without this change. Let's postpone this deprecation announcement
in case this change cannot be avoided.

Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from igb_uio")

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 755dc65..b49e0a0 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,7 +11,7 @@ Deprecation Notices
 * igb_uio: iomem mapping and sysfs files created for iomem and ioport in
   igb_uio will be removed, because we are able to detect these from what Linux
   has exposed, like the way we have done with uio-pci-generic. This change
-  targets release 17.02.
+  targets release 17.05.
 
 * ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
   impacted because of introduction of a new ``rte_bus`` hierarchy. This would
-- 
2.7.4

^ permalink raw reply	[relevance 12%]

* [dpdk-dev] [PATCH v2 0/3] doc upates
      2017-02-09 14:45  0% ` [dpdk-dev] [PATCH 0/3] doc upates Thomas Monjalon
@ 2017-02-09 16:06  4% ` Jianfeng Tan
  2017-02-09 16:06 12%   ` [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio Jianfeng Tan
  2 siblings, 1 reply; 200+ results
From: Jianfeng Tan @ 2017-02-09 16:06 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, john.mcnamara, yuanhan.liu, stephen, Jianfeng Tan

v2:
  - Change svg files.
  - Postpone instead of remove ABI changes in igb_uio.

Patch 1: howto doc of virtio_user for container networking.
Patch 2: howto doc of virtio_user as exceptional path.
Patch 3: postpone ABI changes in igb_uio

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Jianfeng Tan (3):
  doc: add guide to use virtio_user for container networking
  doc: add guide to use virtio_user as exceptional path
  doc: postpone ABI changes in igb_uio

 .../use_models_for_running_dpdk_in_containers.svg  | 574 ++++++++++++++++++
 .../howto/img/virtio_user_as_exceptional_path.svg  | 386 +++++++++++++
 .../img/virtio_user_for_container_networking.svg   | 638 +++++++++++++++++++++
 doc/guides/howto/index.rst                         |   2 +
 .../howto/virtio_user_as_exceptional_path.rst      | 142 +++++
 .../howto/virtio_user_for_container_networking.rst | 142 +++++
 doc/guides/rel_notes/deprecation.rst               |   2 +-
 7 files changed, 1885 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/howto/img/use_models_for_running_dpdk_in_containers.svg
 create mode 100644 doc/guides/howto/img/virtio_user_as_exceptional_path.svg
 create mode 100644 doc/guides/howto/img/virtio_user_for_container_networking.svg
 create mode 100644 doc/guides/howto/virtio_user_as_exceptional_path.rst
 create mode 100644 doc/guides/howto/virtio_user_for_container_networking.rst

-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 0/3] doc upates
    @ 2017-02-09 14:45  0% ` Thomas Monjalon
  2017-02-09 16:06  4% ` [dpdk-dev] [PATCH v2 " Jianfeng Tan
  2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-09 14:45 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, john.mcnamara, yuanhan.liu, stephen

2017-01-24 07:34, Jianfeng Tan:
> Patch 1: howto doc of virtio_user for container networking.
> Patch 2: howto doc of virtio_user as exceptional path.
> Patch 3: remove ABI changes in igb_uio

For the patch 3, we are waiting a new revision postponing the notice.

For the first 2 patches, the SVG files are embedding some PNG pictures.
Please try to convert it to a full SVG. By the way it fails to apply,
because of the PNG part.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] Kill off PCI dependencies
@ 2017-02-08 22:56  3% Stephen Hemminger
  2017-02-09 16:26  3% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2017-02-08 22:56 UTC (permalink / raw)
  To: dev

I am trying to make DPDK more agnostic about bus type. The existing API still
has hardwired into that ethernet devices are either PCI or not PCI (ie pci_dev == NULL).
Jan, Jerin, and Shreyansh started the process but it hasn't gone far enough.

It would make more sense if the existing generic device was used everywhere
including rte_ethdev, rte_ethdev_info, etc.

The ABI breakage is not catastrophic. Just change pci_dev to a device pointer.
One option would be to use NEXT_ABI and/or two different calls and data structures.
Messy but compatible. Something like
    rte_dev_info_get returns rte_dev_info but is marked deprecated
    rte_device_info_get returns rte_device_info

One fallout is that the existing testpmd code makes lots of assumptions that
is working with a PCI device. Things like ability to get/set PCI registers.
I suspect this is already broken if one tries to run it on a virtual device like TAP.

Can we just turn off that functionality?

Also KNI has more dependencies that ethernet devices are PCI.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH RFCv3 06/19] ring: eliminate duplication of size and mask fields
    2017-02-07 14:12  2% ` [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization Bruce Richardson
@ 2017-02-07 14:12  3% ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-07 14:12 UTC (permalink / raw)
  To: olivier.matz
  Cc: thomas.monjalon, keith.wiles, konstantin.ananyev, stephen, dev,
	Bruce Richardson

The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_ring.c       |  6 +++---
 lib/librte_ring/rte_ring.c | 20 ++++++++++----------
 lib/librte_ring/rte_ring.h | 32 ++++++++++++++++----------------
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index ebcb896..af74e7d 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -148,7 +148,7 @@ check_live_watermark_change(__attribute__((unused)) void *dummy)
 		}
 
 		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->prod.watermark;
+		watermark = r->watermark;
 		if (watermark != watermark_old &&
 		    (watermark_old != 16 || watermark != 32)) {
 			printf("Bad watermark change %u -> %u\n", watermark_old,
@@ -213,7 +213,7 @@ test_set_watermark( void ){
 		printf( " ring lookup failed\n" );
 		goto error;
 	}
-	count = r->prod.size*2;
+	count = r->size*2;
 	setwm = rte_ring_set_water_mark(r, count);
 	if (setwm != -EINVAL){
 		printf("Test failed to detect invalid watermark count value\n");
@@ -222,7 +222,7 @@ test_set_watermark( void ){
 
 	count = 0;
 	rte_ring_set_water_mark(r, count);
-	if (r->prod.watermark != r->prod.size) {
+	if (r->watermark != r->size) {
 		printf("Test failed to detect invalid watermark count value\n");
 		goto error;
 	}
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4bc6da1..183594f 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -144,11 +144,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.watermark = count;
+	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
-	r->prod.size = r->cons.size = count;
-	r->prod.mask = r->cons.mask = count-1;
+	r->size = count;
+	r->mask = count-1;
 	r->prod.head = r->cons.head = 0;
 	r->prod.tail = r->cons.tail = 0;
 
@@ -269,14 +269,14 @@ rte_ring_free(struct rte_ring *r)
 int
 rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 {
-	if (count >= r->prod.size)
+	if (count >= r->size)
 		return -EINVAL;
 
 	/* if count is 0, disable the watermarking */
 	if (count == 0)
-		count = r->prod.size;
+		count = r->size;
 
-	r->prod.watermark = count;
+	r->watermark = count;
 	return 0;
 }
 
@@ -291,17 +291,17 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->prod.watermark == r->prod.size)
+	if (r->watermark == r->size)
 		fprintf(f, "  watermark=0\n");
 	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->prod.watermark);
+		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 
 	/* sum and dump statistics */
 #ifdef RTE_LIBRTE_RING_DEBUG
@@ -318,7 +318,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
 		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
 	}
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
 	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
 	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 75bbcc1..1e4b8ad 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -143,13 +143,10 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 struct rte_ring_ht_ptr {
 	volatile uint32_t head;  /**< Prod/consumer head. */
 	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
 	union {
 		uint32_t sp_enqueue; /**< True, if single producer. */
 		uint32_t sc_dequeue; /**< True, if single consumer. */
 	};
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 };
 
 /**
@@ -169,9 +166,12 @@ struct rte_ring {
 	 * next time the ABI changes
 	 */
 	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
+	int flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_ht_ptr prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
@@ -350,7 +350,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * Placed here since identical code needed in both
  * single and multi producer enqueue functions */
 #define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
+	const uint32_t size = r->size; \
 	uint32_t idx = prod_head & mask; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
@@ -377,7 +377,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * single and multi consumer dequeue functions */
 #define DEQUEUE_PTRS() do { \
 	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
+	const uint32_t size = r->size; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
 			obj_table[i] = r->ring[idx]; \
@@ -432,7 +432,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -480,7 +480,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -539,7 +539,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	prod_head = r->prod.head;
@@ -575,7 +575,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -625,7 +625,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -722,7 +722,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
 	prod_tail = r->prod.tail;
@@ -1051,7 +1051,7 @@ rte_ring_full(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+	return ((cons_tail - prod_tail - 1) & r->mask) == 0;
 }
 
 /**
@@ -1084,7 +1084,7 @@ rte_ring_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (prod_tail - cons_tail) & r->prod.mask;
+	return (prod_tail - cons_tail) & r->mask;
 }
 
 /**
@@ -1100,7 +1100,7 @@ rte_ring_free_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (cons_tail - prod_tail - 1) & r->prod.mask;
+	return (cons_tail - prod_tail - 1) & r->mask;
 }
 
 /**
@@ -1114,7 +1114,7 @@ rte_ring_free_count(const struct rte_ring *r)
 static inline unsigned int
 rte_ring_get_size(struct rte_ring *r)
 {
-	return r->prod.size;
+	return r->size;
 }
 
 /**
-- 
2.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization
  @ 2017-02-07 14:12  2% ` Bruce Richardson
  2017-02-14  8:32  3%   ` Olivier Matz
  2017-02-07 14:12  3% ` [dpdk-dev] [PATCH RFCv3 06/19] ring: eliminate duplication of size and mask fields Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-07 14:12 UTC (permalink / raw)
  To: olivier.matz
  Cc: thomas.monjalon, keith.wiles, konstantin.ananyev, stephen, dev,
	Bruce Richardson

This patchset make a set of, sometimes non-backward compatible, cleanup
changes to the rte_ring code in order to improve it. The resulting code is
shorter*, since the existing functions are restructured to reduce code
duplication, as well as being more consistent in behaviour. The specific
changes made are explained in each patch which makes that change.

Key incompatibilities:
* The biggest, and probably most controversial change is that to the
  enqueue and dequeue APIs. The enqueue/deq burst and bulk functions have
  their function prototypes changed so that they all return an additional
  parameter, indicating the size of next call which is guaranteed to
  succeed. In case on enq, this is the number of available slots on the
  ring, and in case of deq, it is the number of objects which can be
  pulled. As well as this, the return value from the bulk functions have
  been changed to make them compatible with the burst functions. In all
  cases, the functions to enq/deq a set of objs now return the number of
  objects processed, 0 or N, in the case of bulk functions, 0, N or any
  value in between in the case of the burst ones. [Due to the extra
  parameter, the compiler will flag all instances of the function to
  allow the user to also change the return value logic at the same time]
* The parameters to the single object enq/deq functions have not been 
  changed. Because of that, the return value is also unmodified - as the
  compiler cannot automatically flag this to the user.

Potential further cleanups:
* To a certain extent the rte_ring structure has gone from being a whole
  ring structure, including a "ring" element itself, to just being a
  header which can be reused, along with the head/tail update functions
  to create new rings. For now, the enqueue code works by assuming
  that the ring data goes immediately after the header, but that can
  be changed to allow specialised ring implementations to put additional
  metadata of their own after the ring header. I didn't see this as being
  needed right now, but it may be worth considering for a V1 patchset.
* There are 9 enqueue functions and 9 dequeue functions in rte_ring.h. I
  suspect not all of those are used, so personally I would consider
  dropping the functions to enqueue/dequeue a single value using single
  or multi semantics, i.e. drop 
    rte_ring_sp_enqueue
    rte_ring_mp_enqueue
    rte_ring_sc_dequeue
    rte_ring_mc_dequeue
  That would still leave a single enqueue and dequeue function for working
  with a single object at a time.
* It should be possible to merge the head update code for enqueue and
  dequeue into a single function. The key difference between the two is
  the calculation of how far the index can be moved. I felt that the
  functions for moving the head index are sufficiently complicated with
  many parameters to them already, that trying to merge in more code would
  impede readability. However, if so desired this change can be made at a
  later stage without affecting ABI or API.

PERFORMANCE:
I've run performance autotests on a couple of (Intel) platforms. Looking
particularly at the core-2-core results, which I expect are the main ones
of interest, the performance after this patchset is a few cycles per packet
faster in my testing. I'm hoping it should be at least neutral perf-wise.

REQUEST FOR FEEDBACK:
* Are all of these changes worth making?
* Should they be made in existing ring code, or do we look to provide a 
  new fifo library to completely replace the ring one?
* How does the implementation of new ring types using this code compare vs
  that of the previous RFCs?

[*] LOC original rte_ring.h: 462. After patchset: 363. [Numbers generated
using David A. Wheeler's 'SLOCCount'.]

Bruce Richardson (19):
  app/pdump: fix duplicate macro definition
  ring: remove split cacheline build setting
  ring: create common structure for prod and cons metadata
  ring: add a function to return the ring size
  crypto/null: use ring size function
  ring: eliminate duplication of size and mask fields
  ring: remove debug setting
  ring: remove the yield when waiting for tail update
  ring: remove watermark support
  ring: make bulk and burst fn return vals consistent
  ring: allow enq fns to return free space value
  examples/quota_watermark: use ring space for watermarks
  ring: allow dequeue fns to return remaining entry count
  ring: reduce scope of local variables
  ring: separate out head index manipulation for enq/deq
  ring: create common function for updating tail idx
  ring: allow macros to work with any type of object
  ring: add object size parameter to memory size calculation
  ring: add event ring implementation

 app/pdump/main.c                                   |   3 +-
 app/test-pipeline/pipeline_hash.c                  |   5 +-
 app/test-pipeline/runtime.c                        |  19 +-
 app/test/Makefile                                  |   1 +
 app/test/commands.c                                |  52 --
 app/test/test_event_ring.c                         |  85 +++
 app/test/test_link_bonding_mode4.c                 |   6 +-
 app/test/test_pmd_ring_perf.c                      |  12 +-
 app/test/test_ring.c                               | 704 ++-----------------
 app/test/test_ring_perf.c                          |  36 +-
 app/test/test_table_acl.c                          |   2 +-
 app/test/test_table_pipeline.c                     |   2 +-
 app/test/test_table_ports.c                        |  12 +-
 app/test/virtual_pmd.c                             |   8 +-
 config/common_base                                 |   3 -
 doc/guides/prog_guide/env_abstraction_layer.rst    |   5 -
 doc/guides/prog_guide/ring_lib.rst                 |   7 -
 doc/guides/sample_app_ug/server_node_efd.rst       |   2 +-
 drivers/crypto/null/null_crypto_pmd.c              |   2 +-
 drivers/crypto/null/null_crypto_pmd_ops.c          |   2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c             |   3 +-
 drivers/net/ring/rte_eth_ring.c                    |   4 +-
 examples/distributor/main.c                        |   5 +-
 examples/load_balancer/runtime.c                   |  34 +-
 .../client_server_mp/mp_client/client.c            |   9 +-
 .../client_server_mp/mp_server/main.c              |   2 +-
 examples/packet_ordering/main.c                    |  13 +-
 examples/qos_sched/app_thread.c                    |  14 +-
 examples/quota_watermark/qw/init.c                 |   5 +-
 examples/quota_watermark/qw/main.c                 |  15 +-
 examples/quota_watermark/qw/main.h                 |   1 +
 examples/quota_watermark/qwctl/commands.c          |   2 +-
 examples/quota_watermark/qwctl/qwctl.c             |   2 +
 examples/quota_watermark/qwctl/qwctl.h             |   1 +
 examples/server_node_efd/node/node.c               |   2 +-
 examples/server_node_efd/server/main.c             |   2 +-
 lib/librte_hash/rte_cuckoo_hash.c                  |   5 +-
 lib/librte_mempool/rte_mempool_ring.c              |  12 +-
 lib/librte_pdump/rte_pdump.c                       |   2 +-
 lib/librte_port/rte_port_frag.c                    |   3 +-
 lib/librte_port/rte_port_ras.c                     |   2 +-
 lib/librte_port/rte_port_ring.c                    |  34 +-
 lib/librte_ring/Makefile                           |   2 +
 lib/librte_ring/rte_event_ring.c                   | 220 ++++++
 lib/librte_ring/rte_event_ring.h                   | 507 ++++++++++++++
 lib/librte_ring/rte_ring.c                         |  82 +--
 lib/librte_ring/rte_ring.h                         | 762 ++++++++-------------
 47 files changed, 1340 insertions(+), 1373 deletions(-)
 create mode 100644 app/test/test_event_ring.c
 create mode 100644 lib/librte_ring/rte_event_ring.c
 create mode 100644 lib/librte_ring/rte_event_ring.h

-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
  @ 2017-02-07  7:50  0%           ` Yang, Zhiyong
  0 siblings, 0 replies; 200+ results
From: Yang, Zhiyong @ 2017-02-07  7:50 UTC (permalink / raw)
  To: Adrien Mazarguil, Richardson, Bruce
  Cc: Ananyev, Konstantin, Andrew Rybchenko, dev, thomas.monjalon

Hi, Adrien:

	Sorry for the late reply  due to Chinese new year.

> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Tuesday, January 24, 2017 12:36 AM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Andrew
> Rybchenko <arybchenko@solarflare.com>; Yang, Zhiyong
> <zhiyong.yang@intel.com>; dev@dpdk.org; thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching
> behavior
> 
> On Fri, Jan 20, 2017 at 11:48:22AM +0000, Bruce Richardson wrote:
> > On Fri, Jan 20, 2017 at 11:24:40AM +0000, Ananyev, Konstantin wrote:
> > > >
> > > > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> > > > Sent: Friday, January 20, 2017 10:26 AM
> > > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
> > > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce
> > > > <bruce.richardson@intel.com>; Ananyev, Konstantin
> > > > <konstantin.ananyev@intel.com>
> > > > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD
> > > > batching behavior
> > > >
> > > > On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> > > > The rte_eth_tx_burst() function in the file Rte_ethdev.h is
> > > > invoked to transmit output packets on the output queue for DPDK
> > > > applications as follows.
> > > >
> > > > static inline uint16_t
> > > > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
> > > >                  struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> > > >
> > > > Note: The fourth parameter nb_pkts: The number of packets to
> transmit.
> > > > The rte_eth_tx_burst() function returns the number of packets it
> > > > actually sent. The return value equal to *nb_pkts* means that all
> > > > packets have been sent, and this is likely to signify that other
> > > > output packets could be immediately transmitted again.
> > > > Applications that implement a "send as many packets to transmit as
> > > > possible" policy can check this specific case and keep invoking
> > > > the rte_eth_tx_burst() function until a value less than
> > > > *nb_pkts* is returned.
> > > >
> > > > When you call TX only once in rte_eth_tx_burst, you may get
> > > > different behaviors from different PMDs. One problem that every
> > > > DPDK user has to face is that they need to take the policy into
> > > > consideration at the app- lication level when using any specific
> > > > PMD to send the packets whether or not it is necessary, which
> > > > brings usage complexities and makes DPDK users easily confused
> > > > since they have to learn the details on TX function limit of
> > > > specific PMDs and have to handle the different return value: the
> > > > number of packets transmitted successfully for various PMDs. Some
> > > > PMDs Tx func- tions have a limit of sending at most 32 packets for
> > > > every invoking, some PMDs have another limit of at most 64 packets
> > > > once, another ones have imp- lemented to send as many packets to
> transmit as possible, etc. This will easily cause wrong usage for DPDK users.
> > > >
> > > > This patch proposes to implement the above policy in DPDK lib in
> > > > order to simplify the application implementation and avoid the
> > > > incorrect invoking as well. So, DPDK Users don't need to consider
> > > > the implementation policy and to write duplicated code at the
> > > > application level again when sending packets. In addition to it,
> > > > the users don't need to know the difference of specific PMD TX and
> > > > can transmit the arbitrary number of packets as they expect when
> > > > invoking TX API rte_eth_tx_burst, then check the return value to get
> the number of packets actually sent.
> > > >
> > > > How to implement the policy in DPDK lib? Two solutions are proposed
> below.
> > > >
> > > > Solution 1:
> > > > Implement the wrapper functions to remove some limits for each
> > > > specific PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple
> do like that.
> > > >
> > > > > IMHO, the solution is a bit better since it:
> > > > > 1. Does not affect other PMDs at all
> > > > > 2. Could be a bit faster for the PMDs which require it since has
> > > > >no indirect
> > > > >    function call on each iteration
> > > > > 3. No ABI change
> > >
> > > I also would prefer solution number 1 for the reasons outlined by Andrew
> above.
> > > Also, IMO current limitation for number of packets to TX in some
> > > Intel PMD TX routines are sort of artificial:
> > > - they are not caused by any real HW limitations
> > > - avoiding them at PMD level shouldn't cause any performance or
> functional degradation.
> > > So I don't see any good reason why instead of fixing these
> > > limitations in our own PMDs we are trying to push them to the upper
> (rte_ethdev) layer.
> 
> For what it's worth, I agree with Konstantin. Wrappers should be as thin as
> possible on top of PMD functions, they are not helpers. We could define a
> set of higher level functions for this purpose though.
> 
> In the meantime, exposing and documenting PMD limitations seems safe
> enough.
> 
> We could assert that RX/TX burst requests larger than the size of the target
> queue are unlikely to be fully met (i.e. PMDs usually do not check for
> completions in the middle of a TX burst).

As a tool,  it is very important for its users to easily consume it and make it work
in a right way.  Sort of artificial limits will make things look like a little confused  and
make some users probably get into trouble when writing drivers. 
Why do we correct it and make it easier?  :)

Zhiyong

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get
  2017-02-01 16:53  3% [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get Stephen Hemminger
  2017-02-02 13:55  0% ` Mrozowicz, SlawomirX
@ 2017-02-03 12:26  0% ` Mrozowicz, SlawomirX
  1 sibling, 0 replies; 200+ results
From: Mrozowicz, SlawomirX @ 2017-02-03 12:26 UTC (permalink / raw)
  To: Stephen Hemminger, Doherty, Declan; +Cc: dev, De Lara Guarch, Pablo



>-----Original Message-----
>From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>Sent: Wednesday, February 1, 2017 5:54 PM
>To: Mrozowicz, SlawomirX <slawomirx.mrozowicz@intel.com>; Doherty,
>Declan <declan.doherty@intel.com>
>Cc: dev@dpdk.org
>Subject: bugs and glitches in rte_cryptodev_devices_get
>
>The function rte_cryptodev_devices_get has several issues. I was just going
>to fix it, but think it need to be explained.
>
>One potentially serious one (reported by coverity) is:
>
>*** CID 141067:    (BAD_COMPARE)
>/lib/librte_cryptodev/rte_cryptodev.c: 503 in rte_cryptodev_devices_get()
>497     				&& (*devs + i)->attached ==
>498     						RTE_CRYPTODEV_ATTACHED)
>{
>499
>500     			dev = (*devs + i)->device;
>501
>502     			if (dev)
>>>>     CID 141067:    (BAD_COMPARE)
>>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be
>misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0,
>or 1.
>503     				cmp = strncmp(dev->driver->name,
>504     						dev_name,
>505     						strlen(dev_name));
>506     			else
>507     				cmp = strncmp((*devs + i)->data->name,
>508     						dev_name,
>/lib/librte_cryptodev/rte_cryptodev.c: 507 in rte_cryptodev_devices_get()
>501
>502     			if (dev)
>503     				cmp = strncmp(dev->driver->name,
>504     						dev_name,
>505     						strlen(dev_name));
>506     			else
>>>>     CID 141067:    (BAD_COMPARE)
>>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be
>misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0,
>or 1.
>507     				cmp = strncmp((*devs + i)->data->name,
>508     						dev_name,
>509     						strlen(dev_name));
>510
>511     			if (cmp == 0)
>512     				devices[count++] = (*devs + i)->data->dev_id;
>
>
>But also:
>
>1. Incorrect function signature:
>    * function returns int but never a negative value. should be unsigned.
>    * devices argument is not modified should be const.
>
>2. Original ABI seems short sighted with limit of 256 cryptodevs
>    * this seems like 8 bit mindset,  should really use unsigned int instead
>      of uint8_t for number of devices.
>
>3. Wacky indention of the if statement.
>
>4. Make variables local to the block they are used (cmp, dev)
>
>5. Use array instead of pointer:
>     ie. instead of *devs + i use devs[i]
>
We reconsider your suggestions and we addressed all your changes except add the const of the devices argument, since in our opinion it is not necessary.

>
>The overall code in question is:
>
>
>int
>rte_cryptodev_devices_get(const char *dev_name, uint8_t *devices,
>	uint8_t nb_devices)
>{
>	uint8_t i, cmp, count = 0;
>	struct rte_cryptodev **devs = &rte_cryptodev_globals->devs;
>	struct rte_device *dev;
>
>	for (i = 0; i < rte_cryptodev_globals->max_devs && count <
>nb_devices;
>			i++) {
>
>		if ((*devs + i)
>				&& (*devs + i)->attached ==
>						RTE_CRYPTODEV_ATTACHED)
>{
>
>			dev = (*devs + i)->device;
>
>			if (dev)
>				cmp = strncmp(dev->driver->name,
>						dev_name,
>						strlen(dev_name));
>			else
>				cmp = strncmp((*devs + i)->data->name,
>						dev_name,
>						strlen(dev_name));
>
>			if (cmp == 0)
>				devices[count++] = (*devs + i)->data->dev_id;
>		}
>	}
>
>	return count;
>}
>
>Please fix it.
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v10 3/7] lib: add bitrate statistics library
    2017-02-03 10:33  1% ` [dpdk-dev] [PATCH v10 1/7] lib: add information metrics library Remy Horton
@ 2017-02-03 10:33  2% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-02-03 10:33 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/prog_guide/metrics_lib.rst              |  63 ++++++++++
 doc/guides/rel_notes/release_17_02.rst             |   6 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 +++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 132 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 +++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 mk/rte.app.mk                                      |   1 +
 12 files changed, 356 insertions(+)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index eceebaa..375adc9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -631,6 +631,10 @@ Metrics
 M: Remy Horton <remy.horton@intel.com>
 F: lib/librte_metrics/
 
+Bit-rate statistica
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_bitratestats/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index b819932..e7b0e5c 100644
--- a/config/common_base
+++ b/config/common_base
@@ -633,3 +633,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the crypto performance application
 #
 CONFIG_RTE_APP_CRYPTO_PERF=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 26a26b7..8492bce 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -157,4 +157,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [device metrics]     (@ref rte_metrics.h),
+  [bitrate statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index e2e070f..4010340 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -37,6 +37,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_eal/common/include \
                           lib/librte_eal/common/include/generic \
                           lib/librte_acl \
+                          lib/librte_bitratestats \
                           lib/librte_cfgfile \
                           lib/librte_cmdline \
                           lib/librte_compat \
diff --git a/doc/guides/prog_guide/metrics_lib.rst b/doc/guides/prog_guide/metrics_lib.rst
index 87f806d..c06023c 100644
--- a/doc/guides/prog_guide/metrics_lib.rst
+++ b/doc/guides/prog_guide/metrics_lib.rst
@@ -178,3 +178,66 @@ print out all metrics for a given port:
         free(metrics);
         free(names);
     }
+
+
+Bit-rate statistics library
+---------------------------
+
+The bit-rate library calculates the exponentially-weighted moving
+average and peak bit-rates for each active port (i.e. network device).
+These statistics are reported via the metrics library using the
+following names:
+
+    - ``mean_bits_in``: Average inbound bit-rate
+    - ``mean_bits_out``:  Average outbound bit-rate
+    - ``peak_bits_in``:  Peak inbound bit-rate
+    - ``peak_bits_out``:  Peak outbound bit-rate
+
+Once initialised and clocked at the appropriate frequency, these
+statistics can be obtained by querying the metrics library.
+
+Initialization
+~~~~~~~~~~~~~~
+
+Before it is used the bit-rate statistics library has to be initialised
+by calling ``rte_stats_bitrate_create()``, which will return a bit-rate
+calculation object. Since the bit-rate library uses the metrics library
+to report the calculated statistics, the bit-rate library then needs to
+register the calculated statistics with the metrics library. This is
+done using the helper function ``rte_stats_bitrate_reg()``.
+
+.. code-block:: c
+
+    struct rte_stats_bitrates *bitrate_data;
+
+    bitrate_data = rte_stats_bitrate_create();
+    if (bitrate_data == NULL)
+        rte_exit(EXIT_FAILURE, "Could not allocate bit-rate data.\n");
+    rte_stats_bitrate_reg(bitrate_data);
+
+Controlling the sampling rate
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since the library works by periodic sampling but does not use an
+internal thread, the application has to periodically call
+``rte_stats_bitrate_calc()``. The frequency at which this function
+is called should be the intended sampling rate required for the
+calculated statistics. For instance if per-second statistics are
+desired, this function should be called once a second.
+
+.. code-block:: c
+
+    tics_datum = rte_rdtsc();
+    tics_per_1sec = rte_get_timer_hz();
+
+    while( 1 ) {
+        /* ... */
+        tics_current = rte_rdtsc();
+	if (tics_current - tics_datum >= tics_per_1sec) {
+	    /* Periodic bitrate calculation */
+	    for (idx_port = 0; idx_port < cnt_ports; idx_port++)
+	            rte_stats_bitrate_calc(bitrate_data, idx_port);
+		tics_datum = tics_current;
+	    }
+        /* ... */
+    }
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 68581e4..98729e8 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -46,6 +46,11 @@ New Features
   reporting mechanism that is independent of other libraries such
   as ethdev.
 
+* **Added bit-rate calculation library.**
+
+  A library that can be used to calculate device bit-rates. Calculated
+  bitrates are reported using the metrics library.
+
 * **Added generic EAL API for I/O device memory read/write operations.**
 
   This API introduces 8-bit, 16-bit, 32bit, 64bit I/O device
@@ -348,6 +353,7 @@ The libraries prepended with a plus sign were incremented in this version.
 .. code-block:: diff
 
      librte_acl.so.2
+   + librte_bitratestats.so.1
      librte_cfgfile.so.2
      librte_cmdline.so.2
      librte_cryptodev.so.2
diff --git a/lib/Makefile b/lib/Makefile
index 29f6a81..ecc54c0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -50,6 +50,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += librte_jobstats
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..743b62c
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..2c20272
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,132 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+/*
+ * Persistent bit-rate data.
+ * @internal
+ */
+struct rte_stats_bitrate {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates {
+	struct rte_stats_bitrate port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+struct rte_stats_bitrates *
+rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates),
+		RTE_CACHE_LINE_SIZE);
+}
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data)
+{
+	const char * const names[] = {
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	return_value = rte_metrics_reg_names(&names[0], 4);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[4];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming bitrate. This is an iteratively calculated EWMA
+	 * (Expomentially Weighted Moving Average) that uses a
+	 * weighting factor of alpha_percent.
+	 */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	/* The +-50 fixes integer rounding during divison */
+	if (delta > 0)
+		delta = (delta * alpha_percent + 50) / 100;
+	else
+		delta = (delta * alpha_percent - 50) / 100;
+	port_data->ewma_ibits += delta;
+
+	/* Outgoing bitrate (also EWMA) */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_obits += delta;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->peak_ibits;
+	values[3] = port_data->peak_obits;
+	rte_metrics_update_values(port_id, bitrate_data->id_stats_set,
+		values, 4);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..564e4f7
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+/**
+ *  Bitrate statistics data structure.
+ *  This data structure is intentionally opaque.
+ */
+struct rte_stats_bitrates;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics with the metric library.
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period with which
+ * this function is called should be the intended sampling window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..66f232f
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_17.02 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 46de3d3..8f1f8d7 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -100,6 +100,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v10 1/7] lib: add information metrics library
  @ 2017-02-03 10:33  1% ` Remy Horton
  2017-02-03 10:33  2% ` [dpdk-dev] [PATCH v10 3/7] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-02-03 10:33 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a new information metrics library. This Metrics
library implements a mechanism by which producers can publish
numeric information for later querying by consumers. Metrics
themselves are statistics that are not generated by PMDs, and
hence are not reported via ethdev extended statistics.

Metric information is populated using a push model, where
producers update the values contained within the metric
library by calling an update function on the relevant metrics.
Consumers receive metric information by querying the central
metric data, which is held in shared memory.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                |   4 +
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 doc/guides/prog_guide/index.rst            |   1 +
 doc/guides/prog_guide/metrics_lib.rst      | 180 +++++++++++++++++
 doc/guides/rel_notes/release_17_02.rst     |   9 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 +++++
 lib/librte_metrics/rte_metrics.c           | 299 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 240 +++++++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 13 files changed, 807 insertions(+)
 create mode 100644 doc/guides/prog_guide/metrics_lib.rst
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 27f999b..eceebaa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -627,6 +627,10 @@ F: lib/librte_jobstats/
 F: examples/l2fwd-jobstats/
 F: doc/guides/sample_app_ug/l2_forward_job_stats.rst
 
+Metrics
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_metrics/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index 71a4fcb..b819932 100644
--- a/config/common_base
+++ b/config/common_base
@@ -501,6 +501,11 @@ CONFIG_RTE_LIBRTE_EFD=y
 CONFIG_RTE_LIBRTE_JOBSTATS=y
 
 #
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
+
+#
 # Compile librte_lpm
 #
 CONFIG_RTE_LIBRTE_LPM=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index eb39f69..26a26b7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -156,4 +156,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [device metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index b8a5fd8..e2e070f 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -53,6 +53,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_mbuf \
                           lib/librte_mempool \
                           lib/librte_meter \
+                          lib/librte_metrics \
                           lib/librte_net \
                           lib/librte_pdump \
                           lib/librte_pipeline \
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 7f825cb..fea651c 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -62,6 +62,7 @@ Programmer's Guide
     packet_classif_access_ctrl
     packet_framework
     vhost_lib
+    metrics_lib
     port_hotplug_framework
     source_org
     dev_kit_build_system
diff --git a/doc/guides/prog_guide/metrics_lib.rst b/doc/guides/prog_guide/metrics_lib.rst
new file mode 100644
index 0000000..87f806d
--- /dev/null
+++ b/doc/guides/prog_guide/metrics_lib.rst
@@ -0,0 +1,180 @@
+..  BSD LICENSE
+    Copyright(c) 2017 Intel Corporation. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Intel Corporation nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Metrics_Library:
+
+Metrics Library
+===============
+
+The Metrics library implements a mechanism by which *producers* can
+publish numeric information for later querying by *consumers*. In
+practice producers will typically be other libraries or primary
+processes, whereas consumers will typically be applications.
+
+Metrics themselves are statistics that are not generated by PMDs. Metric
+information is populated using a push model, where producers update the
+values contained within the metric library by calling an update function
+on the relevant metrics. Consumers receive metric information by querying
+the central metric data, which is held in shared memory.
+
+For each metric, a separate value is maintained for each port id, and
+when publishing metric values the producers need to specify which port is
+being updated. In addition there is a special id ``RTE_METRICS_GLOBAL``
+that is intended for global statistics that are not associated with any
+individual device. Since the metrics library is self-contained, the only
+restriction on port numbers is that they are less than ``RTE_MAX_ETHPORTS``
+- there is no requirement for the ports to actually exist.
+
+Initialising the library
+------------------------
+
+Before the library can be used, it has to be initialized by calling
+``rte_metrics_init()`` which sets up the metric store in shared memory.
+This is where producers will publish metric information to, and where
+consumers will query it from.
+
+.. code-block:: c
+
+    rte_metrics_init(rte_socket_id());
+
+This function **must** be called from a primary process, but otherwise
+producers and consumers can be in either primary or secondary processes.
+
+Registering metrics
+-------------------
+
+Metrics must first be *registered*, which is the way producers declare
+the names of the metrics they will be publishing. Registration can either
+be done individually, or a set of metrics can be registered as a group.
+Individual registration is done using ``rte_metrics_reg_name()``:
+
+.. code-block:: c
+
+    id_1 = rte_metrics_reg_name("mean_bits_in");
+    id_2 = rte_metrics_reg_name("mean_bits_out");
+    id_3 = rte_metrics_reg_name("peak_bits_in");
+    id_4 = rte_metrics_reg_name("peak_bits_out");
+
+or alternatively, a set of metrics can be registered together using
+``rte_metrics_reg_names()``:
+
+.. code-block:: c
+
+    const char * const names[] = {
+        "mean_bits_in", "mean_bits_out",
+        "peak_bits_in", "peak_bits_out",
+    };
+    id_set = rte_metrics_reg_names(&names[0], 4);
+
+If the return value is negative, it means registration failed. Otherwise
+the return value is the *key* for the metric, which is used when updating
+values. A table mapping together these key values and the metrics' names
+can be obtained using ``rte_metrics_get_names()``.
+
+Updating metric values
+----------------------
+
+Once registered, producers can update the metric for a given port using
+the ``rte_metrics_update_value()`` function. This uses the metric key
+that is returned when registering the metric, and can also be looked up
+using ``rte_metrics_get_names()``.
+
+.. code-block:: c
+
+    rte_metrics_update_value(port_id, id_1, values[0]);
+    rte_metrics_update_value(port_id, id_2, values[1]);
+    rte_metrics_update_value(port_id, id_3, values[2]);
+    rte_metrics_update_value(port_id, id_4, values[3]);
+
+if metrics were registered as a single set, they can either be updated
+individually using ``rte_metrics_update_value()``, or updated together
+using the ``rte_metrics_update_values()`` function:
+
+.. code-block:: c
+
+    rte_metrics_update_value(port_id, id_set, values[0]);
+    rte_metrics_update_value(port_id, id_set + 1, values[1]);
+    rte_metrics_update_value(port_id, id_set + 2, values[2]);
+    rte_metrics_update_value(port_id, id_set + 3, values[3]);
+
+    rte_metrics_update_values(port_id, id_set, values, 4);
+
+Note that ``rte_metrics_update_values()`` cannot be used to update
+metric values from *multiple* *sets*, as there is no guarantee two
+sets registered one after the other have contiguous id values.
+
+Querying metrics
+----------------
+
+Consumers can obtain metric values by querying the metrics library using
+the ``rte_metrics_get_values()`` function that return an array of
+``struct rte_metric_value``. Each entry within this array contains a metric
+value and its associated key. A key-name mapping can be obtained using the
+``rte_metrics_get_names()`` function that returns an array of
+``struct rte_metric_name`` that is indexed by the key. The following will
+print out all metrics for a given port:
+
+.. code-block:: c
+
+    void print_metrics() {
+        struct rte_metric_name *names;
+        int len;
+
+        len = rte_metrics_get_names(NULL, 0);
+        if (len < 0) {
+            printf("Cannot get metrics count\n");
+            return;
+        }
+        if (len == 0) {
+            printf("No metrics to display (none have been registered)\n");
+            return;
+        }
+        metrics = malloc(sizeof(struct rte_metric_value) * len);
+        names =  malloc(sizeof(struct rte_metric_name) * len);
+        if (metrics == NULL || names == NULL) {
+            printf("Cannot allocate memory\n");
+            free(metrics);
+            free(names);
+            return;
+        }
+        ret = rte_metrics_get_values(port_id, metrics, len);
+        if (ret < 0 || ret > len) {
+            printf("Cannot get metrics values\n");
+            free(metrics);
+            free(names);
+            return;
+        }
+        printf("Metrics for port %i:\n", port_id);
+        for (i = 0; i < len; i++)
+            printf("  %s: %"PRIu64"\n",
+                names[metrics[i].key].name, metrics[i].value);
+        free(metrics);
+        free(names);
+    }
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 83519dc..68581e4 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -38,6 +38,14 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added information metric library.**
+
+  A library that allows information metrics to be added and updated
+  by producers, typically other libraries, for later retrieval by
+  consumers such as applications. It is intended to provide a
+  reporting mechanism that is independent of other libraries such
+  as ethdev.
+
 * **Added generic EAL API for I/O device memory read/write operations.**
 
   This API introduces 8-bit, 16-bit, 32bit, 64bit I/O device
@@ -355,6 +363,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_mbuf.so.2
      librte_mempool.so.2
      librte_meter.so.1
+   + librte_metrics.so.1
      librte_net.so.1
      librte_pdump.so.1
      librte_pipeline.so.3
diff --git a/lib/Makefile b/lib/Makefile
index 4178325..29f6a81 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += librte_jobstats
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..889d377
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,299 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+#include <rte_spinlock.h>
+
+#define RTE_METRICS_MAX_METRICS 256
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ */
+struct rte_metrics_meta_s {
+	/** Name of metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+	/** Current value for metric */
+	uint64_t value[RTE_MAX_ETHPORTS];
+	/** Used for global metrics */
+	uint64_t nonport_value;
+	/** Index of next root element (zero for none) */
+	uint16_t idx_next_set;
+	/** Index of next metric in set (zero for none) */
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	/**   Index of last metadata entry with valid data.
+	 * This value is not valid if cnt_stats is zero.
+	 */
+	uint16_t idx_last_set;
+	/**   Number of metrics. */
+	uint16_t cnt_stats;
+	/** Metric data memory block. */
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+	/** Metric data access lock */
+	rte_spinlock_t lock;
+};
+
+void
+rte_metrics_init(int socket_id)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
+		sizeof(struct rte_metrics_data_s), socket_id, 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+	rte_spinlock_init(&stats->lock);
+}
+
+int
+rte_metrics_reg_name(const char *name)
+{
+	const char * const list_names[] = {name};
+
+	return rte_metrics_reg_names(list_names, 1);
+}
+
+int
+rte_metrics_reg_names(const char * const *names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	rte_spinlock_lock(&stats->lock);
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	rte_spinlock_unlock(&stats->lock);
+
+	return idx_base;
+}
+
+int
+rte_metrics_update_value(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_values(port_id, key, &value, 1);
+}
+
+int
+rte_metrics_update_values(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	if (port_id != RTE_METRICS_GLOBAL &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	rte_spinlock_lock(&stats->lock);
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize) {
+		rte_spinlock_unlock(&stats->lock);
+		return -ERANGE;
+	}
+
+	if (port_id == RTE_METRICS_GLOBAL)
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].nonport_value =
+				values[idx_value];
+		}
+	else
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].value[port_id] =
+				values[idx_value];
+		}
+	rte_spinlock_unlock(&stats->lock);
+	return 0;
+}
+
+int
+rte_metrics_get_names(struct rte_metric_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats) {
+			return_value = stats->cnt_stats;
+			rte_spinlock_unlock(&stats->lock);
+			return return_value;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	if (port_id != RTE_METRICS_GLOBAL &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats) {
+			return_value = stats->cnt_stats;
+			rte_spinlock_unlock(&stats->lock);
+			return return_value;
+		}
+		if (port_id == RTE_METRICS_GLOBAL)
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->nonport_value;
+			}
+		else
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->value[port_id];
+			}
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..71c57c6
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,240 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * DPDK Metrics module
+ *
+ * Metrics are statistics that are not generated by PMDs, and hence
+ * are better reported through a mechanism that is independent from
+ * the ethdev-based extended statistics. Providers will typically
+ * be other libraries and consumers will typically be applications.
+ *
+ * Metric information is populated using a push model, where producers
+ * update the values contained within the metric library by calling
+ * an update function on the relevant metrics. Consumers receive
+ * metric information by querying the central metric data, which is
+ * held in shared memory. Currently only bulk querying of metrics
+ * by consumers is supported.
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of metric name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/**
+ * Global (rather than port-specific) metric special id.
+ *
+ * When used for the port_id parameter when calling
+ * rte_metrics_update_metric() or rte_metrics_update_metric(),
+ * the global metrics, which are not associated with any specific
+ * port (i.e. device), are updated.
+ */
+#define RTE_METRICS_GLOBAL -1
+
+
+/**
+ * A name-key lookup for metrics.
+ *
+ * An array of this structure is returned by rte_metrics_get_names().
+ * The struct rte_metric_value references these names via their array index.
+ */
+struct rte_metric_name {
+	/** String describing metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Metric value structure.
+ *
+ * This structure is used by rte_metrics_get_values() to return metrics,
+ * which are statistics that are not generated by PMDs. It maps a name key,
+ * which corresponds to an index in the array returned by
+ * rte_metrics_get_names().
+ */
+struct rte_metric_value {
+	/** Numeric identifier of metric. */
+	uint16_t key;
+	/** Value for metric */
+	uint64_t value;
+};
+
+
+/**
+ * Initializes metric module. This function must be called from
+ * a primary process before metrics are used.
+ *
+ * @param socket_id
+ *   Socket to use for shared memory allocation.
+ */
+void rte_metrics_init(int socket_id);
+
+/**
+ * Register a metric, making it available as a reporting parameter.
+ *
+ * Registering a metric is the way producers declare a parameter
+ * that they wish to be reported. Once registered, the associated
+ * numeric key can be obtained via rte_metrics_get_names(), which
+ * is required for updating said metric's value.
+ *
+ * @param name
+ *   Metric name
+ *
+ * @return
+ *  - Zero or positive: Success (index key of new metric)
+ *  - -EIO: Error, unable to access metrics shared memory
+ *    (rte_metrics_init() not called)
+ *  - -EINVAL: Error, invalid parameters
+ *  - -ENOMEM: Error, maximum metrics reached
+ */
+int rte_metrics_reg_name(const char *name);
+
+/**
+ * Register a set of metrics.
+ *
+ * This is a bulk version of rte_metrics_reg_metrics() and aside from
+ * handling multiple keys at once is functionally identical.
+ *
+ * @param names
+ *   List of metric names
+ *
+ * @param cnt_names
+ *   Number of metrics in set
+ *
+ * @return
+ *  - Zero or positive: Success (index key of start of set)
+ *  - -EIO: Error, unable to access metrics shared memory
+ *    (rte_metrics_init() not called)
+ *  - -EINVAL: Error, invalid parameters
+ *  - -ENOMEM: Error, maximum metrics reached
+ */
+int rte_metrics_reg_names(const char * const *names, uint16_t cnt_names);
+
+/**
+ * Get metric name-key lookup table.
+ *
+ * @param names
+ *   A struct rte_metric_name array of at least *capacity* in size to
+ *   receive key names. If this is NULL, function returns the required
+ *   number of elements for this array.
+ *
+ * @param capacity
+ *   Size (number of elements) of struct rte_metric_name array.
+ *   Disregarded if names is NULL.
+ *
+ * @return
+ *   - Positive value above capacity: error, *names* is too small.
+ *     Return value is required size.
+ *   - Positive value equal or less than capacity: Success. Return
+ *     value is number of elements filled in.
+ *   - Negative value: error.
+ */
+int rte_metrics_get_names(
+	struct rte_metric_name *names,
+	uint16_t capacity);
+
+/**
+ * Get metric value table.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   A struct rte_metric_value array of at least *capacity* in size to
+ *   receive metric ids and values. If this is NULL, function returns
+ *   the required number of elements for this array.
+ *
+ * @param capacity
+ *   Size (number of elements) of struct rte_metric_value array.
+ *   Disregarded if names is NULL.
+ *
+ * @return
+ *   - Positive value above capacity: error, *values* is too small.
+ *     Return value is required size.
+ *   - Positive value equal or less than capacity: Success. Return
+ *     value is number of elements filled in.
+ *   - Negative value: error.
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a metric
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Id of metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_value(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Base id of metrics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds metric set size
+ *   - -EIO if upable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_values(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..ee28fa0
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_17.02 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_name;
+	rte_metrics_reg_names;
+	rte_metrics_update_value;
+	rte_metrics_update_values;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0d0a970..46de3d3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
-- 
2.5.5

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get
  2017-02-01 16:53  3% [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get Stephen Hemminger
@ 2017-02-02 13:55  0% ` Mrozowicz, SlawomirX
  2017-02-03 12:26  0% ` Mrozowicz, SlawomirX
  1 sibling, 0 replies; 200+ results
From: Mrozowicz, SlawomirX @ 2017-02-02 13:55 UTC (permalink / raw)
  To: Stephen Hemminger, Doherty, Declan; +Cc: dev



>-----Original Message-----
>From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>Sent: Wednesday, February 1, 2017 5:54 PM
>To: Mrozowicz, SlawomirX <slawomirx.mrozowicz@intel.com>; Doherty,
>Declan <declan.doherty@intel.com>
>Cc: dev@dpdk.org
>Subject: bugs and glitches in rte_cryptodev_devices_get
>
>The function rte_cryptodev_devices_get has several issues. I was just going
>to fix it, but think it need to be explained.
>
>One potentially serious one (reported by coverity) is:
>
>*** CID 141067:    (BAD_COMPARE)
>/lib/librte_cryptodev/rte_cryptodev.c: 503 in rte_cryptodev_devices_get()
>497     				&& (*devs + i)->attached ==
>498     						RTE_CRYPTODEV_ATTACHED)
>{
>499
>500     			dev = (*devs + i)->device;
>501
>502     			if (dev)
>>>>     CID 141067:    (BAD_COMPARE)
>>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be
>misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0,
>or 1.
>503     				cmp = strncmp(dev->driver->name,
>504     						dev_name,
>505     						strlen(dev_name));
>506     			else
>507     				cmp = strncmp((*devs + i)->data->name,
>508     						dev_name,
>/lib/librte_cryptodev/rte_cryptodev.c: 507 in rte_cryptodev_devices_get()
>501
>502     			if (dev)
>503     				cmp = strncmp(dev->driver->name,
>504     						dev_name,
>505     						strlen(dev_name));
>506     			else
>>>>     CID 141067:    (BAD_COMPARE)
>>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be
>misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0,
>or 1.
>507     				cmp = strncmp((*devs + i)->data->name,
>508     						dev_name,
>509     						strlen(dev_name));
>510
>511     			if (cmp == 0)
>512     				devices[count++] = (*devs + i)->data->dev_id;
>
>
>But also:
>
>1. Incorrect function signature:
>    * function returns int but never a negative value. should be unsigned.
>    * devices argument is not modified should be const.

[SM] Ok. To be changed.

>
>2. Original ABI seems short sighted with limit of 256 cryptodevs
>    * this seems like 8 bit mindset,  should really use unsigned int instead
>      of uint8_t for number of devices.

[SM] Ok. To be changed to uint8_t.

>
>3. Wacky indention of the if statement.

[SM] To be changed.

>
>4. Make variables local to the block they are used (cmp, dev)

[SM] Ok. To be changed.

>
>5. Use array instead of pointer:
>     ie. instead of *devs + i use devs[i]

[SM] We can't change it like this. devs[i] provide wrong address (null) for i>0

>
>
>The overall code in question is:
>
>
>int
>rte_cryptodev_devices_get(const char *dev_name, uint8_t *devices,
>	uint8_t nb_devices)
>{
>	uint8_t i, cmp, count = 0;
>	struct rte_cryptodev **devs = &rte_cryptodev_globals->devs;
>	struct rte_device *dev;
>
>	for (i = 0; i < rte_cryptodev_globals->max_devs && count <
>nb_devices;
>			i++) {
>
>		if ((*devs + i)
>				&& (*devs + i)->attached ==
>						RTE_CRYPTODEV_ATTACHED)
>{
>
>			dev = (*devs + i)->device;
>
>			if (dev)
>				cmp = strncmp(dev->driver->name,
>						dev_name,
>						strlen(dev_name));
>			else
>				cmp = strncmp((*devs + i)->data->name,
>						dev_name,
>						strlen(dev_name));
>
>			if (cmp == 0)
>				devices[count++] = (*devs + i)->data->dev_id;
>		}
>	}
>
>	return count;
>}
>
>Please fix it.
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get
@ 2017-02-01 16:53  3% Stephen Hemminger
  2017-02-02 13:55  0% ` Mrozowicz, SlawomirX
  2017-02-03 12:26  0% ` Mrozowicz, SlawomirX
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2017-02-01 16:53 UTC (permalink / raw)
  To: Slawomir Mrozowicz, Declan Doherty; +Cc: dev

The function rte_cryptodev_devices_get has several issues. I was just going to
fix it, but think it need to be explained.
 
One potentially serious one (reported by coverity) is:

*** CID 141067:    (BAD_COMPARE)
/lib/librte_cryptodev/rte_cryptodev.c: 503 in rte_cryptodev_devices_get()
497     				&& (*devs + i)->attached ==
498     						RTE_CRYPTODEV_ATTACHED) {
499     
500     			dev = (*devs + i)->device;
501     
502     			if (dev)
>>>     CID 141067:    (BAD_COMPARE)
>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0, or 1.  
503     				cmp = strncmp(dev->driver->name,
504     						dev_name,
505     						strlen(dev_name));
506     			else
507     				cmp = strncmp((*devs + i)->data->name,
508     						dev_name,
/lib/librte_cryptodev/rte_cryptodev.c: 507 in rte_cryptodev_devices_get()
501     
502     			if (dev)
503     				cmp = strncmp(dev->driver->name,
504     						dev_name,
505     						strlen(dev_name));
506     			else
>>>     CID 141067:    (BAD_COMPARE)
>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0, or 1.  
507     				cmp = strncmp((*devs + i)->data->name,
508     						dev_name,
509     						strlen(dev_name));
510     
511     			if (cmp == 0)
512     				devices[count++] = (*devs + i)->data->dev_id;


But also:

1. Incorrect function signature:
    * function returns int but never a negative value. should be unsigned.
    * devices argument is not modified should be const.

2. Original ABI seems short sighted with limit of 256 cryptodevs
    * this seems like 8 bit mindset,  should really use unsigned int instead
      of uint8_t for number of devices.

3. Wacky indention of the if statement.

4. Make variables local to the block they are used (cmp, dev)

5. Use array instead of pointer:
     ie. instead of *devs + i use devs[i]


The overall code in question is:


int
rte_cryptodev_devices_get(const char *dev_name, uint8_t *devices,
	uint8_t nb_devices)
{
	uint8_t i, cmp, count = 0;
	struct rte_cryptodev **devs = &rte_cryptodev_globals->devs;
	struct rte_device *dev;

	for (i = 0; i < rte_cryptodev_globals->max_devs && count < nb_devices;
			i++) {

		if ((*devs + i)
				&& (*devs + i)->attached ==
						RTE_CRYPTODEV_ATTACHED) {

			dev = (*devs + i)->device;

			if (dev)
				cmp = strncmp(dev->driver->name,
						dev_name,
						strlen(dev_name));
			else
				cmp = strncmp((*devs + i)->data->name,
						dev_name,
						strlen(dev_name));

			if (cmp == 0)
				devices[count++] = (*devs + i)->data->dev_id;
		}
	}

	return count;
}

Please fix it.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-02-01 12:06  0%               ` Jan Blunck
@ 2017-02-01 14:18  0%                 ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-01 14:18 UTC (permalink / raw)
  To: Jan Blunck
  Cc: Adrien Mazarguil, Thomas Monjalon, Aaron Conole, dev, Stephen Hemminger

On Wed, Feb 01, 2017 at 01:06:03PM +0100, Jan Blunck wrote:
> On Wed, Feb 1, 2017 at 11:54 AM, Adrien Mazarguil
> <adrien.mazarguil@6wind.com> wrote:
> > On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
> >> 2017-01-30 13:38, Aaron Conole:
> >> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > >> > Why use rte_errno?
> >> > >> > Most DPDK calls just return negative value on error which
> >> > >> > corresponds to error number.
> >> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > >> > because before all these
> >> > >> > errors were panic's no working application is going to care.
> >> > >>
> >> > >> Either will work, but I actually prefer this way. I view using rte_errno
> >> > >> to be better as it can work in just about all cases, including with
> >> > >> functions which return pointers. This allows you to have a standard
> >> > >> method across all functions for returning error codes, and it only
> >> > >> requires a single sentinal value to indicate error, rather than using a
> >> > >> whole range of values.
> >> > >
> >> > > The problem is DPDK is getting more inconsistent on how this is done.
> >> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> >> > > matter much which way the value is returned from a technical point of view
> >> > > but the inconsistency is sure to be a usability problem and source of errors.
> >> >
> >> > I am using rte_errno here because I assumed it was the preferred
> >> > method.  In fact, looking at some recently contributed modules (for
> >> > instance pdump), it seems that folks are using it.
> >> >
> >> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> >> > it'd be helpful to know if there's some consensus on reflecting errors
> >> > via this variable, or on returning error codes.  Whichever is the more
> >> > consistent with the way the DPDK project does things, I'm game :).
> >>
> >> I think we can use both return value and rte_errno.
> >> We could try to enforce rte_errno as mandatory everywhere.
> >>
> >> Adrien did the recent rte_flow API.
> >> Please Adrien, could you give your thought?
> >
> > Sure, actually as already pointed out in this thread, both approaches have
> > pros and cons depending on the use-case.
> >
> > Through return value:
> >
> > Pros
> > ----
> >
> > - Most common approach used in DPPK today.
> > - Used internally by the Linux kernel (negative errno) and in the pthreads
> >   library (positive errno).
> > - Avoids the need to access an external, global variable requiring its own
> >   thread-local storage.
> > - Inherently thread-safe and reentrant (i.e. safe with signal handlers).
> > - Returned value is also the error code, two facts reported at once.
> 
> Caller can decide to ignore return value if no error handling is wanted.
>
Not always the case. In the case of a rx or tx burst call, if there is a
negative error that must be checked for or assigned to zero in some
cases to make other logic in the path work sanely, e.g. updating an
array of stats using the return value.

> >
> > Cons
> > ----
> >
> > - Difficult to use with functions returning anything other than signed
> >   integers with negative values having no other meaning.
> > - The returned value must be assigned to a local variable in order not to
> >   discard it and process it later most of the time.
> 
> I believe this is Pro since the rte_errno even needs to assign to a
> thread-local variable even.

No, it's a con, since for errno the value will be preserved in the
absense of other errors. The application can delay handling the error as
long as it wants, in the absense of causes of subsequent errors.

> 
> > - All function calls must be tested for errors.
> 
> The rte_errno needs to do this too to decide if it needs to assign a
> value to rte_errno.
> 
Thats inside the called function, not the application. See my earlier
comment above about having to check your return value is in the valid
"logical range" expected from the call. Having a negative number of
packets received does not make logical sense, so you have to check the
return value when updating stats etc.


> >
> > Through rte_errno:
> >
> > Pros
> > ----
> >
> > - errno-like, well known behavior defined by the C standard and used
> >   everywhere in the C library.
> > - Testing return values is not mandatory, e.g. rte_errno can be initialized
> >   to zero before calling a group of functions and checking its value
> >   afterward (rte_errno is only updated in case of error).
> > - Assigning a local variable to store its value is not necessary as long as
> >   another function that may affect rte_errno is not called.
> >
> > Cons
> > ----
> >
> > - Not fully reentrant, thread-safety is fine for most purposes but signal
> >   handlers affecting it still cause undefined behavior (they must at least
> >   save and restore its value in case they modify it).
> > - Accessing non-local storage may affect CPU cycle-sensitive functions such
> >   as TX/RX burst.
> 
> Actually testing for errors mean you also have to reset the rte_errno
> variable before. That also means you have to access thread-local
> storage twice.
> 
Not true. Your return value still indicates an error via a single
sentinal value. Only in that case do you (the app) access the global value,
to find out the exact error reason.

> Besides that the problem of rte_errno is that you do error handling
> twice because the implementation still needs to check for the error
> condition before assigning a meaningful error value to rte_errno.
> After that again the user code needs to check for the return value to
> decide if looking at rte_errno makes any sense.
> 
Yes, in the case of an error occuring there will be an extra write to a
global variable, and a subsequent read from that value (which should not
be a problem, as the write will have occurred in the same thread).
However, this is irrelevant to normal path processing. Error should be
the exception not the rule.

> 
> > My opinion is that rte_errno is best for control path operations while using
> > the return value makes more sense in the data path. The major issue being
> > that function returning anything other than int (e.g. TX/RX burst) cannot
> > describe any kind of error to the application.
> >
> > I went with both in rte_flow (return + rte_errno) mostly due to the return
> > type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
> > consistent while maintaining compatibility with other DPDK APIs. Note there
> > is little overhead for API functions to set rte_errno _and_ return its
> > value, it's mostly free
+1, and error cases should be rare, even if there is a small cost.
.
> >
> > I think using both is best also because it leaves applications the choice of
> > error-handling method, however if I had to pick one I'd go with rte_errno
> > and standardize on -1 as the default error value (as in the C library).
> >
+1
though I think the sentinal value will vary depending on each case. I would
look to keep the standard packet rx/tx functions and ones like them
returning a zero on any error, to simplify programming logic, and also
because in many cases the only real causes of error they can produce is
from bad parameters.
Functions returning pointers obviously will use NULL as error value.


> > Below are a bunch of use-case examples to illustrate how rte_errno could
> > be convenient to applications.
> >
> > Easily creating many flow rules during init in a all-or-nothing fashion:
> >
> >  rte_errno = 0;
> >  for (i = 0; i != num; ++i)
> >      rule[i] = rte_flow_create(port, ...);
> >  if (unlikely(rte_errno)) {
> >      rte_flow_flush(port);
> >      return -1;
> >  }
> >
> > Complete TX packet burst failure with explanation (could also detect partial
> > failures by initializing rte_errno to 0):
> >
> >  sent = rte_eth_tx_burst(...);
> >  if (unlikely(!sent)) {
> >      switch (rte_errno) {
> >          case E2BIG:
> >              // too many packets in burst
> >          ...
> >          case EMSGSIZE:
> >              // first packet is too large
> >          ...
> >          case ENOBUFS:
> >              // TX queue is full
> >          ...
> >      }
> >  }
> >
> > TX burst functions in PMDs could be modified as follows with minimal impact
> > on their performance and no ABI change:
> >
> >      uint16_t sent = 0;
> >      int error; // new variable
> >
> >      [process burst]
> >      if (unlikely([something went wrong])) { // this check already exists
> >          error = EPROBLEM; // new assignment
> >          goto error; // instead of "return sent"
> >      }
> >      [process burst]
> >      return sent;
> >  error:
> >      rte_errno = error;
> >      return sent;
> >
> > --
> > Adrien Mazarguil
> > 6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-02-01 10:54  3%             ` Adrien Mazarguil
@ 2017-02-01 12:06  0%               ` Jan Blunck
  2017-02-01 14:18  0%                 ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jan Blunck @ 2017-02-01 12:06 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, Aaron Conole, dev, Stephen Hemminger, Bruce Richardson

On Wed, Feb 1, 2017 at 11:54 AM, Adrien Mazarguil
<adrien.mazarguil@6wind.com> wrote:
> On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
>> 2017-01-30 13:38, Aaron Conole:
>> > Stephen Hemminger <stephen@networkplumber.org> writes:
>> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
>> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
>> > >> > Why use rte_errno?
>> > >> > Most DPDK calls just return negative value on error which
>> > >> > corresponds to error number.
>> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
>> > >> > because before all these
>> > >> > errors were panic's no working application is going to care.
>> > >>
>> > >> Either will work, but I actually prefer this way. I view using rte_errno
>> > >> to be better as it can work in just about all cases, including with
>> > >> functions which return pointers. This allows you to have a standard
>> > >> method across all functions for returning error codes, and it only
>> > >> requires a single sentinal value to indicate error, rather than using a
>> > >> whole range of values.
>> > >
>> > > The problem is DPDK is getting more inconsistent on how this is done.
>> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
>> > > matter much which way the value is returned from a technical point of view
>> > > but the inconsistency is sure to be a usability problem and source of errors.
>> >
>> > I am using rte_errno here because I assumed it was the preferred
>> > method.  In fact, looking at some recently contributed modules (for
>> > instance pdump), it seems that folks are using it.
>> >
>> > I'm not really sure the purpose of having rte_errno if it isn't used, so
>> > it'd be helpful to know if there's some consensus on reflecting errors
>> > via this variable, or on returning error codes.  Whichever is the more
>> > consistent with the way the DPDK project does things, I'm game :).
>>
>> I think we can use both return value and rte_errno.
>> We could try to enforce rte_errno as mandatory everywhere.
>>
>> Adrien did the recent rte_flow API.
>> Please Adrien, could you give your thought?
>
> Sure, actually as already pointed out in this thread, both approaches have
> pros and cons depending on the use-case.
>
> Through return value:
>
> Pros
> ----
>
> - Most common approach used in DPPK today.
> - Used internally by the Linux kernel (negative errno) and in the pthreads
>   library (positive errno).
> - Avoids the need to access an external, global variable requiring its own
>   thread-local storage.
> - Inherently thread-safe and reentrant (i.e. safe with signal handlers).
> - Returned value is also the error code, two facts reported at once.

Caller can decide to ignore return value if no error handling is wanted.

>
> Cons
> ----
>
> - Difficult to use with functions returning anything other than signed
>   integers with negative values having no other meaning.
> - The returned value must be assigned to a local variable in order not to
>   discard it and process it later most of the time.

I believe this is Pro since the rte_errno even needs to assign to a
thread-local variable even.

> - All function calls must be tested for errors.

The rte_errno needs to do this too to decide if it needs to assign a
value to rte_errno.

>
> Through rte_errno:
>
> Pros
> ----
>
> - errno-like, well known behavior defined by the C standard and used
>   everywhere in the C library.
> - Testing return values is not mandatory, e.g. rte_errno can be initialized
>   to zero before calling a group of functions and checking its value
>   afterward (rte_errno is only updated in case of error).
> - Assigning a local variable to store its value is not necessary as long as
>   another function that may affect rte_errno is not called.
>
> Cons
> ----
>
> - Not fully reentrant, thread-safety is fine for most purposes but signal
>   handlers affecting it still cause undefined behavior (they must at least
>   save and restore its value in case they modify it).
> - Accessing non-local storage may affect CPU cycle-sensitive functions such
>   as TX/RX burst.

Actually testing for errors mean you also have to reset the rte_errno
variable before. That also means you have to access thread-local
storage twice.

Besides that the problem of rte_errno is that you do error handling
twice because the implementation still needs to check for the error
condition before assigning a meaningful error value to rte_errno.
After that again the user code needs to check for the return value to
decide if looking at rte_errno makes any sense.


> My opinion is that rte_errno is best for control path operations while using
> the return value makes more sense in the data path. The major issue being
> that function returning anything other than int (e.g. TX/RX burst) cannot
> describe any kind of error to the application.
>
> I went with both in rte_flow (return + rte_errno) mostly due to the return
> type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
> consistent while maintaining compatibility with other DPDK APIs. Note there
> is little overhead for API functions to set rte_errno _and_ return its
> value, it's mostly free.
>
> I think using both is best also because it leaves applications the choice of
> error-handling method, however if I had to pick one I'd go with rte_errno
> and standardize on -1 as the default error value (as in the C library).
>
> Below are a bunch of use-case examples to illustrate how rte_errno could
> be convenient to applications.
>
> Easily creating many flow rules during init in a all-or-nothing fashion:
>
>  rte_errno = 0;
>  for (i = 0; i != num; ++i)
>      rule[i] = rte_flow_create(port, ...);
>  if (unlikely(rte_errno)) {
>      rte_flow_flush(port);
>      return -1;
>  }
>
> Complete TX packet burst failure with explanation (could also detect partial
> failures by initializing rte_errno to 0):
>
>  sent = rte_eth_tx_burst(...);
>  if (unlikely(!sent)) {
>      switch (rte_errno) {
>          case E2BIG:
>              // too many packets in burst
>          ...
>          case EMSGSIZE:
>              // first packet is too large
>          ...
>          case ENOBUFS:
>              // TX queue is full
>          ...
>      }
>  }
>
> TX burst functions in PMDs could be modified as follows with minimal impact
> on their performance and no ABI change:
>
>      uint16_t sent = 0;
>      int error; // new variable
>
>      [process burst]
>      if (unlikely([something went wrong])) { // this check already exists
>          error = EPROBLEM; // new assignment
>          goto error; // instead of "return sent"
>      }
>      [process burst]
>      return sent;
>  error:
>      rte_errno = error;
>      return sent;
>
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  @ 2017-02-01 10:54  3%             ` Adrien Mazarguil
  2017-02-01 12:06  0%               ` Jan Blunck
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2017-02-01 10:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Aaron Conole, dev, Stephen Hemminger, Bruce Richardson

On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
> 2017-01-30 13:38, Aaron Conole:
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> > >> > Why use rte_errno?
> > >> > Most DPDK calls just return negative value on error which
> > >> > corresponds to error number.
> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> > >> > because before all these
> > >> > errors were panic's no working application is going to care.  
> > >> 
> > >> Either will work, but I actually prefer this way. I view using rte_errno
> > >> to be better as it can work in just about all cases, including with
> > >> functions which return pointers. This allows you to have a standard
> > >> method across all functions for returning error codes, and it only
> > >> requires a single sentinal value to indicate error, rather than using a
> > >> whole range of values.
> > >
> > > The problem is DPDK is getting more inconsistent on how this is done.
> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > > matter much which way the value is returned from a technical point of view
> > > but the inconsistency is sure to be a usability problem and source of errors.
> > 
> > I am using rte_errno here because I assumed it was the preferred
> > method.  In fact, looking at some recently contributed modules (for
> > instance pdump), it seems that folks are using it.
> > 
> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> > it'd be helpful to know if there's some consensus on reflecting errors
> > via this variable, or on returning error codes.  Whichever is the more
> > consistent with the way the DPDK project does things, I'm game :).
> 
> I think we can use both return value and rte_errno.
> We could try to enforce rte_errno as mandatory everywhere.
> 
> Adrien did the recent rte_flow API.
> Please Adrien, could you give your thought?

Sure, actually as already pointed out in this thread, both approaches have
pros and cons depending on the use-case.

Through return value:

Pros
----

- Most common approach used in DPPK today.
- Used internally by the Linux kernel (negative errno) and in the pthreads
  library (positive errno).
- Avoids the need to access an external, global variable requiring its own
  thread-local storage.
- Inherently thread-safe and reentrant (i.e. safe with signal handlers).
- Returned value is also the error code, two facts reported at once.

Cons
----

- Difficult to use with functions returning anything other than signed
  integers with negative values having no other meaning.
- The returned value must be assigned to a local variable in order not to
  discard it and process it later most of the time.
- All function calls must be tested for errors.

Through rte_errno:

Pros
----

- errno-like, well known behavior defined by the C standard and used
  everywhere in the C library.
- Testing return values is not mandatory, e.g. rte_errno can be initialized
  to zero before calling a group of functions and checking its value
  afterward (rte_errno is only updated in case of error).
- Assigning a local variable to store its value is not necessary as long as
  another function that may affect rte_errno is not called.

Cons
----

- Not fully reentrant, thread-safety is fine for most purposes but signal
  handlers affecting it still cause undefined behavior (they must at least
  save and restore its value in case they modify it).
- Accessing non-local storage may affect CPU cycle-sensitive functions such
  as TX/RX burst.

My opinion is that rte_errno is best for control path operations while using
the return value makes more sense in the data path. The major issue being
that function returning anything other than int (e.g. TX/RX burst) cannot
describe any kind of error to the application.

I went with both in rte_flow (return + rte_errno) mostly due to the return
type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
consistent while maintaining compatibility with other DPDK APIs. Note there
is little overhead for API functions to set rte_errno _and_ return its
value, it's mostly free.

I think using both is best also because it leaves applications the choice of
error-handling method, however if I had to pick one I'd go with rte_errno
and standardize on -1 as the default error value (as in the C library).

Below are a bunch of use-case examples to illustrate how rte_errno could
be convenient to applications.

Easily creating many flow rules during init in a all-or-nothing fashion:

 rte_errno = 0;
 for (i = 0; i != num; ++i)
     rule[i] = rte_flow_create(port, ...);
 if (unlikely(rte_errno)) {
     rte_flow_flush(port);
     return -1;
 }

Complete TX packet burst failure with explanation (could also detect partial
failures by initializing rte_errno to 0):

 sent = rte_eth_tx_burst(...);
 if (unlikely(!sent)) {
     switch (rte_errno) {
         case E2BIG:
             // too many packets in burst
         ...
         case EMSGSIZE:
             // first packet is too large
         ...
         case ENOBUFS:
             // TX queue is full
         ...
     }
 }
 
TX burst functions in PMDs could be modified as follows with minimal impact
on their performance and no ABI change:

     uint16_t sent = 0;
     int error; // new variable
 
     [process burst]
     if (unlikely([something went wrong])) { // this check already exists
         error = EPROBLEM; // new assignment
         goto error; // instead of "return sent"
     }
     [process burst]
     return sent;
 error:
     rte_errno = error;
     return sent;

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio
  @ 2017-02-01  7:24  4%       ` Tan, Jianfeng
  0 siblings, 0 replies; 200+ results
From: Tan, Jianfeng @ 2017-02-01  7:24 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit; +Cc: dev, john.mcnamara, yuanhan.liu, stephen



On 1/31/2017 1:52 AM, Thomas Monjalon wrote:
> 2017-01-24 13:35, Ferruh Yigit:
>> On 1/24/2017 7:34 AM, Jianfeng Tan wrote:
>>> We announced ABI changes to remove iomem and ioport mapping in
>>> igb_uio. But it has potential backward compatibility issue: cannot
>>> run old version DPDK on modified igb_uio.
>>>
>>> The purpose of this changes was to fix a bug: when DPDK app crashes,
>>> those devices by igb_uio are not stopped either DPDK PMD driver or
>>> igb_uio driver. We need to figure out new way to fix this bug.
>> Hi Jianfeng,
>>
>> I believe it would be good to fix this potential defect.
>>
>> Is "remove iomem and ioport" a must for that fix? If so, I suggest
>> re-think about it.
>>
>> If I see correctly, dpdk1.8 and older uses igb_uio iomem files. So
>> backward compatibility is the possible issue for dpdk1.8 and older.
>> Since v1.8 two years old, I would prefer fixing defect instead of
>> keeping that backward compatibility.
>>
>> Jianfeng, Thomas,
>>
>> What do you think postponing this deprecation notice to next release,
>> instead of removing it, and discuss more?
>>
>>
>> And overall, if "remove iomem and ioport" is not a must for this fix, no
>> problem to remove deprecation notice.
> I have no strong opinion here.
> Jianfeng, do you agree with Ferruh?

Hi Ferruh & Thomas,

I agree wit Ferruh to postpone this deprecation notice.

In another thread, we discussed the possibility to fix this problem 
without the deprecation. But I have no time to verify it in this release 
cycle. Let's postpone it then.

Thanks,
Jianfeng

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  @ 2017-01-31 16:56  0%             ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2017-01-31 16:56 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Aaron Conole, dev

On Tue, 31 Jan 2017 09:33:45 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Mon, Jan 30, 2017 at 01:38:00PM -0500, Aaron Conole wrote:
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >   
> > > On Fri, 27 Jan 2017 16:47:40 +0000
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >  
> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:  
> > >> > On Fri, 27 Jan 2017 09:57:03 -0500
> > >> > Aaron Conole <aconole@redhat.com> wrote:
> > >> >     
> > >> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > >> > > index 03fee50..46e427f 100644
> > >> > > --- a/lib/librte_eal/common/include/rte_eal.h
> > >> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> > >> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> > >> > >   *     function call and should not be further interpreted by the
> > >> > >   *     application.  The EAL does not take any ownership of the memory used
> > >> > >   *     for either the argv array, or its members.
> > >> > > - *   - On failure, a negative error value.
> > >> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > >> > > + *     for failure.
> > >> > > + *
> > >> > > + *   The error codes returned via rte_errno:
> > >> > > + *     EACCES indicates a permissions issue.
> > >> > > + *
> > >> > > + *     EAGAIN indicates either a bus or system resource was not available,
> > >> > > + *            try again.
> > >> > > + *
> > >> > > + *     EALREADY indicates that the rte_eal_init function has already been
> > >> > > + *              called, and cannot be called again.
> > >> > > + *
> > >> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > >> > > + *
> > >> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > >> > > + *         caused by an out-of-memory condition.
> > >> > > + *
> > >> > > + *     ENODEV indicates memory setup issues.
> > >> > > + *
> > >> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > >> > > + *
> > >> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > >> > > + *             readable by the eal.
> > >> > >   */
> > >> > >  int rte_eal_init(int argc, char **argv);    
> > >> > 
> > >> > Why use rte_errno?
> > >> > Most DPDK calls just return negative value on error which
> > >> > corresponds to error number.
> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> > >> > because before all these
> > >> > errors were panic's no working application is going to care.    
> > >> 
> > >> Either will work, but I actually prefer this way. I view using rte_errno
> > >> to be better as it can work in just about all cases, including with
> > >> functions which return pointers. This allows you to have a standard
> > >> method across all functions for returning error codes, and it only
> > >> requires a single sentinal value to indicate error, rather than using a
> > >> whole range of values.  
> > >
> > > The problem is DPDK is getting more inconsistent on how this is done.
> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > > matter much which way the value is returned from a technical point of view
> > > but the inconsistency is sure to be a usability problem and source of errors.  
> > 
> > I am using rte_errno here because I assumed it was the preferred
> > method.  In fact, looking at some recently contributed modules (for
> > instance pdump), it seems that folks are using it.
> > 
> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> > it'd be helpful to know if there's some consensus on reflecting errors
> > via this variable, or on returning error codes.  Whichever is the more
> > consistent with the way the DPDK project does things, I'm game :).
> >   
> Unfortunately, this is one area where DPDK is inconsistent, and both
> schemes are widely used. I much prefer using the rte_errno method, but
> returning error codes directly is also common in DPDK.

One argument in favor of returning error codes directly, is that it makes
it safer in application when one user function is returning an error code 
back through its internal call tree.

Also, the API does not really do a good job of distinguishing between normal
(no data present) and exceptional (NIC has died).  At least it doesn't depend
on something like Structured Exception handling...

Feel free to clean the stables on this one.

^ permalink raw reply	[relevance 0%]

Results 11201-11400 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2016-11-07  7:38     [dpdk-dev] [PATCH] maintainers: claim responsability for xen Jianfeng Tan
2016-11-10 18:59     ` Konrad Rzeszutek Wilk
2016-11-10 20:49       ` Tan, Jianfeng
2017-02-16 11:06         ` Thomas Monjalon
2017-02-16 13:36           ` Konrad Rzeszutek Wilk
2017-02-16 21:51             ` Vincent JARDIN
2017-02-17 16:07               ` Konrad Rzeszutek Wilk
2017-02-20  9:56                 ` Jan Blunck
2017-02-20 17:36  3%               ` Joao Martins
2017-01-05 10:44     [dpdk-dev] [PATCH v1] doc: announce API and ABI change for ethdev Bernard Iremonger
2017-01-05 15:25     ` [dpdk-dev] [PATCH v2] " Bernard Iremonger
2017-02-13 17:57  4%   ` Thomas Monjalon
2017-02-14  3:17  4%     ` Jerin Jacob
2017-02-14 10:33  4%       ` Iremonger, Bernard
2017-02-14 19:37  4%   ` Thomas Monjalon
2017-01-19  5:34     [dpdk-dev] [PATCH] doc: announce ABI change for cloud filter Yong Liu
2017-01-19 18:45     ` Adrien Mazarguil
2017-01-20  2:14       ` Lu, Wenzhuo
2017-01-20 14:57         ` Thomas Monjalon
2017-02-14  3:19  4%       ` Jerin Jacob
2017-01-20  9:51     [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior Zhiyong Yang
2017-01-20 10:26     ` Andrew Rybchenko
     [not found]       ` <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com>
2017-01-20 11:24         ` Ananyev, Konstantin
2017-01-20 11:48           ` Bruce Richardson
2017-01-23 16:36             ` Adrien Mazarguil
2017-02-07  7:50  0%           ` Yang, Zhiyong
2017-01-23  9:24     [dpdk-dev] [PATCH v6 1/6] lib: distributor performance enhancements David Hunt
2017-02-21  3:17  3% ` [dpdk-dev] [PATCH v7 0/17] distributor library " David Hunt
2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
2017-02-21 10:27  0%     ` Hunt, David
2017-02-24 14:03  0%     ` Bruce Richardson
2017-03-01  9:55  0%       ` Hunt, David
2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
2017-03-01  7:47  1%       ` [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-06  9:10  2%         ` [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements David Hunt
2017-03-06  9:10  1%           ` [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-15  6:19  2%             ` [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements David Hunt
2017-03-15  6:19  1%               ` [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-20 10:08  2%                 ` [dpdk-dev] [PATCH v11 0/18] distributor lib performance enhancements David Hunt
2017-03-20 10:08  1%                   ` [dpdk-dev] [PATCH v11 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-20 10:08  2%                   ` [dpdk-dev] [PATCH v11 08/18] lib: add symbol versioning to distributor David Hunt
2017-03-27 13:02  3%                     ` Thomas Monjalon
2017-03-15  6:19  2%               ` [dpdk-dev] [PATCH v10 " David Hunt
2017-03-06  9:10  2%           ` [dpdk-dev] [PATCH v9 09/18] " David Hunt
2017-03-10 16:22  0%             ` Bruce Richardson
2017-03-13 10:17  0%               ` Hunt, David
2017-03-13 10:28  0%               ` Hunt, David
2017-03-13 11:01  0%                 ` Van Haaren, Harry
2017-03-13 11:02  0%                   ` Hunt, David
2017-03-01  7:47  3%       ` [dpdk-dev] [PATCH v8 " David Hunt
2017-03-01 14:50  0%         ` Hunt, David
2017-02-24 14:01  0%   ` [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
2017-01-23 13:04     [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost Yuanhan Liu
2017-02-13 18:02  4% ` Thomas Monjalon
2017-02-14  3:21  4%   ` Jerin Jacob
2017-02-14 13:54  4% ` Maxime Coquelin
2017-02-14 20:28  4% ` Thomas Monjalon
2017-01-24  7:34     [dpdk-dev] [PATCH 0/3] doc upates Jianfeng Tan
2017-01-24  7:34     ` [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio Jianfeng Tan
2017-01-24 13:35       ` Ferruh Yigit
2017-01-30 17:52         ` Thomas Monjalon
2017-02-01  7:24  4%       ` Tan, Jianfeng
2017-02-09 14:45  0% ` [dpdk-dev] [PATCH 0/3] doc upates Thomas Monjalon
2017-02-09 16:06  4% ` [dpdk-dev] [PATCH v2 " Jianfeng Tan
2017-02-09 16:06 12%   ` [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio Jianfeng Tan
2017-02-09 17:40  4%     ` Ferruh Yigit
2017-02-10 10:44  4%       ` Thomas Monjalon
2017-02-10 11:20  4%         ` Tan, Jianfeng
2017-01-25 12:14     [dpdk-dev] rte_ring features in use (or not) Bruce Richardson
2017-02-07 14:12  2% ` [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization Bruce Richardson
2017-02-14  8:32  3%   ` Olivier Matz
2017-02-14  9:39  0%     ` Bruce Richardson
2017-02-07 14:12  3% ` [dpdk-dev] [PATCH RFCv3 06/19] ring: eliminate duplication of size and mask fields Bruce Richardson
2017-01-27 14:56     [dpdk-dev] [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
2017-01-27 14:57     ` [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes Aaron Conole
2017-01-27 16:33       ` Stephen Hemminger
2017-01-27 16:47         ` Bruce Richardson
2017-01-27 17:37           ` Stephen Hemminger
2017-01-30 18:38             ` Aaron Conole
2017-01-30 20:19               ` Thomas Monjalon
2017-02-01 10:54  3%             ` Adrien Mazarguil
2017-02-01 12:06  0%               ` Jan Blunck
2017-02-01 14:18  0%                 ` Bruce Richardson
2017-01-31  9:33               ` Bruce Richardson
2017-01-31 16:56  0%             ` Stephen Hemminger
2017-02-01 16:53  3% [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get Stephen Hemminger
2017-02-02 13:55  0% ` Mrozowicz, SlawomirX
2017-02-03 12:26  0% ` Mrozowicz, SlawomirX
2017-02-03 10:33     [dpdk-dev] [PATCH v10 0/7] Expanded statistics reporting Remy Horton
2017-02-03 10:33  1% ` [dpdk-dev] [PATCH v10 1/7] lib: add information metrics library Remy Horton
2017-02-03 10:33  2% ` [dpdk-dev] [PATCH v10 3/7] lib: add bitrate statistics library Remy Horton
2017-02-06 13:35     [dpdk-dev] cryptodev - Session and queue pair relationship Akhil Goyal
2017-02-07 20:52     ` Declan Doherty
2017-02-13 14:38       ` Akhil Goyal
2017-02-13 14:44         ` Trahe, Fiona
2017-02-13 15:09  3%       ` Trahe, Fiona
2017-02-08 22:56  3% [dpdk-dev] Kill off PCI dependencies Stephen Hemminger
2017-02-09 16:26  3% ` Thomas Monjalon
2017-02-10 11:39  9% [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure Fan Zhang
2017-02-10 13:59  4% ` Trahe, Fiona
2017-02-13 16:07  7%   ` Zhang, Roy Fan
2017-02-13 17:34  4%     ` Trahe, Fiona
2017-02-14  0:21  4%   ` Hemant Agrawal
2017-02-14  5:11  4%     ` Hemant Agrawal
2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
2017-02-14 10:48  4%   ` Doherty, Declan
2017-02-14 11:03  4%     ` De Lara Guarch, Pablo
2017-02-14 20:37  4%   ` Thomas Monjalon
2017-02-10 14:05     [dpdk-dev] [PATCH 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
2017-02-10 14:05  1% ` [dpdk-dev] [PATCH 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
2017-02-21 10:35  0%   ` Hemant Agrawal
2017-02-13 10:56  9% [dpdk-dev] [PATCH] doc: remove announce of Tx preparation Thomas Monjalon
2017-02-13 14:22  0% ` Thomas Monjalon
2017-02-13 11:05 19% [dpdk-dev] [PATCH] doc: postpone ABI changes to 17.05 Olivier Matz
2017-02-13 14:21  4% ` Thomas Monjalon
2017-02-13 11:55  9% [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL Shreyansh Jain
2017-02-13 12:00  0% ` Shreyansh Jain
2017-02-13 14:44  0%   ` Thomas Monjalon
2017-02-13 21:56  0%   ` Jan Blunck
2017-02-14  5:18  0%     ` Shreyansh Jain
2017-02-13 11:55  5% [dpdk-dev] [PATCH] doc: remove deprecation notice for rte_bus Shreyansh Jain
2017-02-13 14:36  0% ` Thomas Monjalon
2017-02-13 13:25     [dpdk-dev] crypto drivers in the API Thomas Monjalon
2017-02-14 10:44  4% ` Doherty, Declan
2017-02-14 11:04  0%   ` Thomas Monjalon
2017-02-14 14:46  4%     ` Doherty, Declan
2017-02-14 15:47  0%       ` Thomas Monjalon
2017-02-13 14:26  4% [dpdk-dev] [PATCH] doc: postpone API change in ethdev Thomas Monjalon
2017-02-13 16:02  3% [dpdk-dev] doc: deprecation notice for ethdev ops? Dumitrescu, Cristian
2017-02-13 16:09  0% ` Thomas Monjalon
2017-02-13 16:46  4%   ` Ferruh Yigit
2017-02-13 17:21  0%     ` Dumitrescu, Cristian
2017-02-13 17:36  0%       ` Ferruh Yigit
2017-02-13 17:38  3%     ` Thomas Monjalon
2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
2017-02-14  0:32  4% ` Mcnamara, John
2017-02-14  3:25  4% ` Jerin Jacob
2017-02-14  8:33  4% ` Olivier Matz
2017-02-14 11:43  4%   ` Hemant Agrawal
2017-02-14 18:42  4% ` [dpdk-dev] " Thomas Monjalon
2017-02-14 10:52  8% [dpdk-dev] Further fun with ABI tracking Christian Ehrhardt
2017-02-14 16:19  4% ` Bruce Richardson
2017-02-14 20:31  9% ` Jan Blunck
2017-02-22 13:12  7%   ` Christian Ehrhardt
2017-02-22 13:24 20%     ` [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version Christian Ehrhardt
2017-02-28  8:34  4%       ` Jan Blunck
2017-03-01  9:31  4%         ` Christian Ehrhardt
2017-03-01  9:34 20%           ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
2017-03-01 14:35  4%             ` Jan Blunck
2017-03-16 17:19  4%               ` Thomas Monjalon
2017-03-17  8:27  4%                 ` Christian Ehrhardt
2017-03-17  9:16  4%                   ` Jan Blunck
2017-02-23 18:48  4%     ` [dpdk-dev] Further fun with ABI tracking Ferruh Yigit
2017-02-24  7:32  8%       ` Christian Ehrhardt
2017-02-14 15:32  4% [dpdk-dev] [PATCH v1] doc: update release notes for 17.02 John McNamara
2017-02-14 16:26  2% ` [dpdk-dev] [PATCH v2] " John McNamara
2017-02-15 10:02     [dpdk-dev] [PATCH 0/7] Rework vdev probing to use rte_bus infrastructure Jan Blunck
2017-02-20 14:17     ` [dpdk-dev] [PATCH v2 1/8] eal: use different constructor priorities for initcalls Jan Blunck
2017-02-21 12:30  3%   ` Ferruh Yigit
2017-02-15 12:38  6% [dpdk-dev] [PATCH v1] doc: add template release notes for 17.05 John McNamara
2017-02-15 13:15  1% [dpdk-dev] [PATCH] kni: remove KNI vhost support Ferruh Yigit
2017-02-20 14:30  5% ` [dpdk-dev] [PATCH v2 1/2] doc: add removed items section to release notes Ferruh Yigit
2017-02-20 14:30  1%   ` [dpdk-dev] [PATCH v2 2/2] kni: remove KNI vhost support Ferruh Yigit
2017-02-17 12:00     [dpdk-dev] [PATCH 0/3] cryptodev: change device configuration API Fan Zhang
2017-02-17 12:01  5% ` [dpdk-dev] [PATCH 3/3] doc: remove deprecation notice Fan Zhang
2017-02-19 17:14  4% [dpdk-dev] [PATCH] lpm: extend IPv6 next hop field Vladyslav Buslov
2017-02-21 14:46  4% ` [dpdk-dev] [PATCH v2] " Vladyslav Buslov
2017-02-21 10:22 16% [dpdk-dev] [PATCH] maintainers: fix script paths Thomas Monjalon
2017-02-22 16:09     [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support Shahaf Shuler
2017-03-01 11:11  3% ` [dpdk-dev] [PATCH v2 0/1] net/mlx5: " Shahaf Shuler
2017-02-23 17:23     [dpdk-dev] [PATCH v1 00/14] refactor and cleanup of rte_ring Bruce Richardson
2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting Bruce Richardson
2017-02-28 11:35  0%   ` Jerin Jacob
2017-02-28 11:57  0%     ` Bruce Richardson
2017-02-28 12:08  0%       ` Jerin Jacob
2017-02-28 13:52  0%         ` Bruce Richardson
2017-02-28 17:54  0%           ` Jerin Jacob
2017-03-01  9:47  0%             ` Bruce Richardson
2017-02-23 17:23  3% ` [dpdk-dev] [PATCH v1 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 04/14] ring: remove debug setting Bruce Richardson
2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 06/14] ring: remove watermark support Bruce Richardson
2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
2017-03-07 11:32     ` [dpdk-dev] [PATCH v2 00/14] refactor and cleanup of rte_ring Bruce Richardson
2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 01/14] ring: remove split cacheline build setting Bruce Richardson
2017-03-07 11:32  3%   ` [dpdk-dev] [PATCH v2 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 04/14] ring: remove debug setting Bruce Richardson
2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 06/14] ring: remove watermark support Bruce Richardson
2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
2017-03-24 17:09       ` [dpdk-dev] [PATCH v3 00/14] refactor and cleanup of rte_ring Bruce Richardson
2017-03-24 17:09  5%     ` [dpdk-dev] [PATCH v3 01/14] ring: remove split cacheline build setting Bruce Richardson
2017-03-24 17:09  3%     ` [dpdk-dev] [PATCH v3 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
2017-03-27  9:52           ` Thomas Monjalon
2017-03-27 10:13  3%         ` Bruce Richardson
2017-03-24 17:09  2%     ` [dpdk-dev] [PATCH v3 04/14] ring: remove debug setting Bruce Richardson
2017-03-24 17:09  4%     ` [dpdk-dev] [PATCH v3 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 06/14] ring: remove watermark support Bruce Richardson
2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
2017-03-24 17:10  2%     ` [dpdk-dev] [PATCH v3 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
2017-02-24 16:28     [dpdk-dev] [PATCH v2 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
2017-02-24 16:28  1% ` [dpdk-dev] [PATCH v2 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
2017-02-25  1:22     [dpdk-dev] [PATCH 00/16] Wind River Systems AVP PMD Allain Legacy
2017-02-25  1:23  3% ` [dpdk-dev] [PATCH 04/16] net/avp: add PMD version map file Allain Legacy
2017-02-26 19:08     ` [dpdk-dev] [PATCH v2 00/16] Wind River Systems AVP PMD Allain Legacy
2017-02-26 19:08  3%   ` [dpdk-dev] [PATCH v2 04/15] net/avp: add PMD version map file Allain Legacy
2017-03-02  0:19       ` [dpdk-dev] [PATCH v3 00/16] Wind River Systems AVP PMD Allain Legacy
2017-03-02  0:19  3%     ` [dpdk-dev] [PATCH v3 04/16] net/avp: add PMD version map file Allain Legacy
2017-03-13 19:16         ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD Allain Legacy
2017-03-13 19:16  3%       ` [dpdk-dev] [PATCH v4 04/17] net/avp: add PMD version map file Allain Legacy
2017-03-16 14:52  0%         ` Ferruh Yigit
2017-03-14 17:37           ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? Vincent JARDIN
2017-03-15  4:10             ` O'Driscoll, Tim
2017-03-16 23:17  3%           ` Stephen Hemminger
2017-03-16 23:41  0%             ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back Vincent JARDIN
2017-03-17  0:08  0%               ` Wiles, Keith
2017-03-17  0:15  0%                 ` O'Driscoll, Tim
2017-03-17  0:11  0%               ` Wiles, Keith
2017-03-17  0:14  0%                 ` Stephen Hemminger
2017-03-02  4:03  3% [dpdk-dev] [PATCH 0/6] introduce prgdev abstraction library Chen Jing D(Mark)
2017-03-02  4:03  4% ` [dpdk-dev] [PATCH 5/6] prgdev: add ABI control info Chen Jing D(Mark)
2017-03-02 19:29     [dpdk-dev] [PATCH 0/5] librte_cfgfile enhancement Allain Legacy
2017-03-02 19:29     ` [dpdk-dev] [PATCH 1/5] cfgfile: configurable comment character Allain Legacy
2017-03-02 21:10       ` Bruce Richardson
2017-03-03  0:53         ` Yuanhan Liu
2017-03-03 11:17           ` Dumitrescu, Cristian
2017-03-03 11:31             ` Legacy, Allain
2017-03-03 12:10  4%           ` Bruce Richardson
2017-03-03 12:17  0%             ` Legacy, Allain
2017-03-03 13:10  0%               ` Bruce Richardson
2017-03-03  9:31     [dpdk-dev] [PATCH 0/4] support replace filter function Beilei Xing
2017-03-03  9:31     ` [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type Beilei Xing
2017-03-08 15:50       ` Ferruh Yigit
2017-03-09  5:59  3%     ` Xing, Beilei
2017-03-09 10:01  0%       ` Ferruh Yigit
2017-03-09 10:43  0%         ` Xing, Beilei
2017-03-03  9:31     ` [dpdk-dev] [PATCH 4/4] net/i40e: refine consistent tunnel filter Beilei Xing
2017-03-08 15:50       ` Ferruh Yigit
2017-03-09  6:11  3%     ` Xing, Beilei
2017-03-03  9:51  4% [dpdk-dev] [PATCH 00/17] vhost: generic vhost API Yuanhan Liu
2017-03-03  9:51  3% ` [dpdk-dev] [PATCH 16/17] vhost: rename header file Yuanhan Liu
2017-03-23  7:10  4% ` [dpdk-dev] [PATCH v2 00/22] vhost: generic vhost API Yuanhan Liu
2017-03-23  7:10  5%   ` [dpdk-dev] [PATCH v2 02/22] net/vhost: remove feature related APIs Yuanhan Liu
2017-03-23  7:10  3%   ` [dpdk-dev] [PATCH v2 04/22] vhost: make notify ops per vhost driver Yuanhan Liu
2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 10/22] vhost: export the number of vrings Yuanhan Liu
2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 12/22] vhost: drop the Rx and Tx queue macro Yuanhan Liu
2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 13/22] vhost: do not include net specific headers Yuanhan Liu
2017-03-23  7:10  4%   ` [dpdk-dev] [PATCH v2 14/22] vhost: rename device ops struct Yuanhan Liu
2017-03-23  7:10  3%   ` [dpdk-dev] [PATCH v2 18/22] vhost: introduce API to start a specific driver Yuanhan Liu
2017-03-23  7:10  5%   ` [dpdk-dev] [PATCH v2 19/22] vhost: rename header file Yuanhan Liu
2017-03-03 15:40     [dpdk-dev] [PATCH 00/12] introduce fail-safe PMD Gaetan Rivet
2017-03-14 14:49     ` [dpdk-dev] [PATCH v2 00/13] " Gaëtan Rivet
2017-03-15  3:28       ` Bruce Richardson
2017-03-15 11:15         ` Thomas Monjalon
2017-03-15 14:25           ` Gaëtan Rivet
2017-03-16 20:50             ` Neil Horman
2017-03-17 10:56               ` Gaëtan Rivet
2017-03-18 19:51  3%             ` Neil Horman
2017-03-04  1:10     [dpdk-dev] [PATCH v3 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
2017-03-04  1:10  1% ` [dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
2017-03-06 16:57     ` [dpdk-dev] [PATCH v3 1/2] ethdev: add capability control API Thomas Monjalon
2017-03-06 18:28       ` Dumitrescu, Cristian
2017-03-06 20:21         ` Thomas Monjalon
2017-03-06 20:41  3%       ` Wiles, Keith
2017-03-07 11:11     [dpdk-dev] Issues with ixgbe and rte_flow Le Scouarnec Nicolas
2017-03-08  3:16     ` Lu, Wenzhuo
2017-03-08  9:24       ` Le Scouarnec Nicolas
2017-03-08 15:41  3%     ` Adrien Mazarguil
2017-03-09 15:37     [dpdk-dev] [PATCH v2] lpm: extend IPv6 next hop field Thomas Monjalon
2017-03-14 17:17  4% ` [dpdk-dev] [PATCH v3] " Vladyslav Buslov
2017-03-09 16:25     [dpdk-dev] [PATCH v11 0/7] Expanded statistics reporting Remy Horton
2017-03-09 16:25  1% ` [dpdk-dev] [PATCH v11 1/7] lib: add information metrics library Remy Horton
2017-03-09 16:25  2% ` [dpdk-dev] [PATCH v11 3/7] lib: add bitrate statistics library Remy Horton
2017-03-17 21:15  1% [dpdk-dev] [PATCH] net/ark: poll-mode driver for AtomicRules Arkville Ed Czeck
2017-03-20 21:14  1% [dpdk-dev] [PATCH v2] " Ed Czeck
2017-03-21 21:43  3% [dpdk-dev] [PATCH v3 1/7] net/ark: PMD for Atomic Rules Arkville driver stub Ed Czeck
2017-03-22 18:16  0% ` Ferruh Yigit
2017-03-23  1:03  3% [dpdk-dev] [PATCH v4 " Ed Czeck
2017-03-23 22:59  3% ` [dpdk-dev] [PATCH v5 " Ed Czeck
2017-03-23 10:02     [dpdk-dev] [PATCH v3 0/5] pipeline personalization profile support Beilei Xing
2017-03-24 10:19     ` [dpdk-dev] [PATCH v4 " Beilei Xing
2017-03-24 10:19       ` [dpdk-dev] [PATCH v4 1/5] net/i40e: add pipeline personalization profile processing Beilei Xing
2017-03-24 14:52         ` Chilikin, Andrey
2017-03-25  4:04           ` Xing, Beilei
2017-03-25 21:03  3%         ` Chilikin, Andrey
2017-03-27  2:09  0%           ` Xing, Beilei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).