DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
  @ 2017-01-20 10:26  3% ` Andrew Rybchenko
       [not found]       ` <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com>
  2017-01-21  4:13  2%   ` Yang, Zhiyong
  0 siblings, 2 replies; 200+ results
From: Andrew Rybchenko @ 2017-01-20 10:26 UTC (permalink / raw)
  To: Zhiyong Yang, dev; +Cc: thomas.monjalon, bruce.richardson, konstantin.ananyev

On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to
> transmit output packets on the output queue for DPDK applications as
> follows.
>
> static inline uint16_t
> rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>                   struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
>
> Note: The fourth parameter nb_pkts: The number of packets to transmit.
> The rte_eth_tx_burst() function returns the number of packets it actually
> sent. The return value equal to *nb_pkts* means that all packets have been
> sent, and this is likely to signify that other output packets could be
> immediately transmitted again. Applications that implement a "send as many
> packets to transmit as possible" policy can check this specific case and
> keep invoking the rte_eth_tx_burst() function until a value less than
> *nb_pkts* is returned.
>
> When you call TX only once in rte_eth_tx_burst, you may get different
> behaviors from different PMDs. One problem that every DPDK user has to
> face is that they need to take the policy into consideration at the app-
> lication level when using any specific PMD to send the packets whether or
> not it is necessary, which brings usage complexities and makes DPDK users
> easily confused since they have to learn the details on TX function limit
> of specific PMDs and have to handle the different return value: the number
> of packets transmitted successfully for various PMDs. Some PMDs Tx func-
> tions have a limit of sending at most 32 packets for every invoking, some
> PMDs have another limit of at most 64 packets once, another ones have imp-
> lemented to send as many packets to transmit as possible, etc. This will
> easily cause wrong usage for DPDK users.
>
> This patch proposes to implement the above policy in DPDK lib in order to
> simplify the application implementation and avoid the incorrect invoking
> as well. So, DPDK Users don't need to consider the implementation policy
> and to write duplicated code at the application level again when sending
> packets. In addition to it, the users don't need to know the difference of
> specific PMD TX and can transmit the arbitrary number of packets as they
> expect when invoking TX API rte_eth_tx_burst, then check the return value
> to get the number of packets actually sent.
>
> How to implement the policy in DPDK lib? Two solutions are proposed below.
>
> Solution 1:
> Implement the wrapper functions to remove some limits for each specific
> PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that.

IMHO, the solution is a bit better since it:
  1. Does not affect other PMDs at all
  2. Could be a bit faster for the PMDs which require it since has no 
indirect
      function call on each iteration
  3. No ABI change

> Solution 2:
> Implement the policy in the function rte_eth_tx_burst() at the ethdev lay-
> er in a more consistent batching way. Make best effort to send *nb_pkts*
> packets with bursts of no more than 32 by default since many DPDK TX PMDs
> are using this max TX burst size(32). In addition, one data member which
> defines the max TX burst size such as "uint16_t max_tx_burst_pkts;"will be
> added to rte_eth_dev_data, which drivers can override if they work with
> bursts of 64 or other NB(thanks for Bruce <bruce.richardson@intel.com>'s
> suggestion). This can reduce the performance impacting to the lowest limit.

I see no noticeable difference in performance, so don't mind if this is 
finally choosen.
Just be sure that you update all PMDs to set reasonable default values, 
or may be
even better, set UINT16_MAX in generic place - 0 is a bad default here.
(Lost few seconds wondering why nothing is sent and cannot stop)

> I prefer the latter between the 2 solutions because it makes DPDK code more
> consistent and easier and avoids to write too much duplicate logic in DPDK
> source code. In addition, I think no or a little performance drop is
> brought by solution 2. But ABI change will be introduced.
>
> In fact, the current rte_eth_rx_burst() function is using the similar
> mechanism and faces the same problem as rte_eth_tx_burst().
>
> static inline uint16_t
> rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
>                   struct rte_mbuf **rx_pkts, const uint16_t nb_pkts);
>
> Applications are responsible of implementing the policy "retrieve as many
> received packets as possible", and check this specific case and keep
> invoking the rte_eth_rx_burst() function until a value less than *nb_pkts*
> is returned.
>
> The patch proposes to apply the above method to rte_eth_rx_burst() as well.
>
> In summary, The purpose of the RFC makes the job easier and more simple for
> driver writers and avoids to write too much duplicate code at the applica-
> tion level.
>
> Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
> ---
>   lib/librte_ether/rte_ethdev.h | 41 +++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 1c356c1..6fa83cf 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1712,6 +1712,9 @@ struct rte_eth_dev_data {
>   	uint32_t min_rx_buf_size;
>   	/**< Common rx buffer size handled by all queues */
>   
> +	uint16_t max_rx_burst_pkts;
> +	uint16_t max_tx_burst_pkts;
> +
>   	uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures. */
>   	struct ether_addr* mac_addrs;/**< Device Ethernet Link address. */
>   	uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
> @@ -2695,11 +2698,15 @@ int rte_eth_dev_set_vlan_pvid(uint8_t port_id, uint16_t pvid, int on);
>    *   of pointers to *rte_mbuf* structures effectively supplied to the
>    *   *rx_pkts* array.
>    */
> +
>   static inline uint16_t
>   rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
>   		 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
>   {
>   	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	int16_t nb_rx = 0;
> +	uint16_t pkts = 0;
> +	uint16_t rx_nb_pkts = nb_pkts;
>   
>   #ifdef RTE_LIBRTE_ETHDEV_DEBUG
>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> @@ -2710,8 +2717,20 @@ rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
>   		return 0;
>   	}
>   #endif
> -	int16_t nb_rx = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
> +	if (likely(nb_pkts <= dev->data->max_rx_burst_pkts))
> +		return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
>   			rx_pkts, nb_pkts);
> +	while (rx_nb_pkts) {
> +		uint16_t num_burst = RTE_MIN(nb_pkts,
> +					      dev->data->max_rx_burst_pkts);
> +
> +		pkts = (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
> +						&rx_pkts[nb_rx], num_burst);
> +		nb_rx += pkts;
> +		rx_nb_pkts -= pkts;
> +		if (pkts < num_burst)
> +			break;
> +	}
>   
>   #ifdef RTE_ETHDEV_RXTX_CALLBACKS
>   	struct rte_eth_rxtx_callback *cb = dev->post_rx_burst_cbs[queue_id];
> @@ -2833,11 +2852,13 @@ rte_eth_rx_descriptor_done(uint8_t port_id, uint16_t queue_id, uint16_t offset)
>    *   the transmit ring. The return value can be less than the value of the
>    *   *tx_pkts* parameter when the transmit ring is full or has been filled up.
>    */
> +
>   static inline uint16_t
>   rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>   		 struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>   {
>   	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> +	uint16_t nb_tx = 0;
>   
>   #ifdef RTE_LIBRTE_ETHDEV_DEBUG
>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, 0);
> @@ -2860,8 +2881,24 @@ rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>   		} while (cb != NULL);
>   	}
>   #endif
> +	if (likely(nb_pkts <= dev->data->max_tx_burst_pkts))
> +		return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id],
> +						tx_pkts, nb_pkts);
> +
> +	while (nb_pkts) {
> +		uint16_t num_burst = RTE_MIN(nb_pkts,
> +					     dev->data->max_tx_burst_pkts);
> +		uint16_t pkts;
> +
> +		pkts = (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id],
> +						&tx_pkts[nb_tx], num_burst);
> +		nb_tx += pkts;
> +		nb_pkts -= pkts;
> +		if (pkts < num_burst)
> +			break;
> +	}
>   
> -	return (*dev->tx_pkt_burst)(dev->data->tx_queues[queue_id], tx_pkts, nb_pkts);
> +	return nb_tx;
>   }
>   
>   /**

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
       [not found]       ` <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com>
@ 2017-01-20 11:24  0%     ` Ananyev, Konstantin
  2017-01-20 11:48  0%       ` Bruce Richardson
  2017-01-21  4:07  0%       ` Yang, Zhiyong
  0 siblings, 2 replies; 200+ results
From: Ananyev, Konstantin @ 2017-01-20 11:24 UTC (permalink / raw)
  To: Andrew Rybchenko, Yang, Zhiyong, dev; +Cc: thomas.monjalon, Richardson, Bruce

> 
> From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> Sent: Friday, January 20, 2017 10:26 AM
> To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
> Cc: thomas.monjalon@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
> 
> On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to
> transmit output packets on the output queue for DPDK applications as
> follows.
> 
> static inline uint16_t
> rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
>                  struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> 
> Note: The fourth parameter nb_pkts: The number of packets to transmit.
> The rte_eth_tx_burst() function returns the number of packets it actually
> sent. The return value equal to *nb_pkts* means that all packets have been
> sent, and this is likely to signify that other output packets could be
> immediately transmitted again. Applications that implement a "send as many
> packets to transmit as possible" policy can check this specific case and
> keep invoking the rte_eth_tx_burst() function until a value less than
> *nb_pkts* is returned.
> 
> When you call TX only once in rte_eth_tx_burst, you may get different
> behaviors from different PMDs. One problem that every DPDK user has to
> face is that they need to take the policy into consideration at the app-
> lication level when using any specific PMD to send the packets whether or
> not it is necessary, which brings usage complexities and makes DPDK users
> easily confused since they have to learn the details on TX function limit
> of specific PMDs and have to handle the different return value: the number
> of packets transmitted successfully for various PMDs. Some PMDs Tx func-
> tions have a limit of sending at most 32 packets for every invoking, some
> PMDs have another limit of at most 64 packets once, another ones have imp-
> lemented to send as many packets to transmit as possible, etc. This will
> easily cause wrong usage for DPDK users.
> 
> This patch proposes to implement the above policy in DPDK lib in order to
> simplify the application implementation and avoid the incorrect invoking
> as well. So, DPDK Users don't need to consider the implementation policy
> and to write duplicated code at the application level again when sending
> packets. In addition to it, the users don't need to know the difference of
> specific PMD TX and can transmit the arbitrary number of packets as they
> expect when invoking TX API rte_eth_tx_burst, then check the return value
> to get the number of packets actually sent.
> 
> How to implement the policy in DPDK lib? Two solutions are proposed below.
> 
> Solution 1:
> Implement the wrapper functions to remove some limits for each specific
> PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that.
> 
> > IMHO, the solution is a bit better since it:
> > 1. Does not affect other PMDs at all
> > 2. Could be a bit faster for the PMDs which require it since has no indirect
> >    function call on each iteration
> > 3. No ABI change

I also would prefer solution number 1 for the reasons outlined by Andrew above.
Also, IMO current limitation for number of packets to TX in some Intel PMD TX routines
are sort of artificial:
- they are not caused by any real HW limitations
- avoiding them at PMD level shouldn't cause any performance or functional degradation.
So I don't see any good reason why instead of fixing these limitations in
our own PMDs we are trying to push them to the upper (rte_ethdev) layer.

Konstantin

> 
> 
> Solution 2:
> Implement the policy in the function rte_eth_tx_burst() at the ethdev lay-
> er in a more consistent batching way. Make best effort to send *nb_pkts*
> packets with bursts of no more than 32 by default since many DPDK TX PMDs
> are using this max TX burst size(32). In addition, one data member which
> defines the max TX burst size such as "uint16_t max_tx_burst_pkts;"will be
> added to rte_eth_dev_data, which drivers can override if they work with
> bursts of 64 or other NB(thanks for Bruce <bruce.richardson@intel.com>'s
> suggestion). This can reduce the performance impacting to the lowest limit.
> 
> > I see no noticeable difference in performance, so don't mind if this is finally choosen.
> > Just be sure that you update all PMDs to set reasonable default values, or may be
> > even better, set UINT16_MAX in generic place - 0 is a bad default here.
> > (Lost few seconds wondering why nothing is sent and cannot stop)
> 
> 
> I prefer the latter between the 2 solutions because it makes DPDK code more
> consistent and easier and avoids to write too much duplicate logic in DPDK
> source code. In addition, I think no or a little performance drop is
> brought by solution 2. But ABI change will be introduced.
> 
> In fact, the current rte_eth_rx_burst() function is using the similar
> mechanism and faces the same problem as rte_eth_tx_burst().
> 
> static inline uint16_t
> rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
>                  struct rte_mbuf **rx_pkts, const uint16_t nb_pkts);
> 
> Applications are responsible of implementing the policy "retrieve as many
> received packets as possible", and check this specific case and keep
> invoking the rte_eth_rx_burst() function until a value less than *nb_pkts*
> is returned.
> 
> The patch proposes to apply the above method to rte_eth_rx_burst() as well.
> 
> In summary, The purpose of the RFC makes the job easier and more simple for
> driver writers and avoids to write too much duplicate code at the applica-
> tion level.
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
  2017-01-20 11:24  0%     ` Ananyev, Konstantin
@ 2017-01-20 11:48  0%       ` Bruce Richardson
  2017-01-23 16:36  0%         ` Adrien Mazarguil
  2017-01-21  4:07  0%       ` Yang, Zhiyong
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2017-01-20 11:48 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Andrew Rybchenko, Yang, Zhiyong, dev, thomas.monjalon

On Fri, Jan 20, 2017 at 11:24:40AM +0000, Ananyev, Konstantin wrote:
> > 
> > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> > Sent: Friday, January 20, 2017 10:26 AM
> > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
> > Cc: thomas.monjalon@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>
> > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
> > 
> > On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> > The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to
> > transmit output packets on the output queue for DPDK applications as
> > follows.
> > 
> > static inline uint16_t
> > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
> >                  struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> > 
> > Note: The fourth parameter nb_pkts: The number of packets to transmit.
> > The rte_eth_tx_burst() function returns the number of packets it actually
> > sent. The return value equal to *nb_pkts* means that all packets have been
> > sent, and this is likely to signify that other output packets could be
> > immediately transmitted again. Applications that implement a "send as many
> > packets to transmit as possible" policy can check this specific case and
> > keep invoking the rte_eth_tx_burst() function until a value less than
> > *nb_pkts* is returned.
> > 
> > When you call TX only once in rte_eth_tx_burst, you may get different
> > behaviors from different PMDs. One problem that every DPDK user has to
> > face is that they need to take the policy into consideration at the app-
> > lication level when using any specific PMD to send the packets whether or
> > not it is necessary, which brings usage complexities and makes DPDK users
> > easily confused since they have to learn the details on TX function limit
> > of specific PMDs and have to handle the different return value: the number
> > of packets transmitted successfully for various PMDs. Some PMDs Tx func-
> > tions have a limit of sending at most 32 packets for every invoking, some
> > PMDs have another limit of at most 64 packets once, another ones have imp-
> > lemented to send as many packets to transmit as possible, etc. This will
> > easily cause wrong usage for DPDK users.
> > 
> > This patch proposes to implement the above policy in DPDK lib in order to
> > simplify the application implementation and avoid the incorrect invoking
> > as well. So, DPDK Users don't need to consider the implementation policy
> > and to write duplicated code at the application level again when sending
> > packets. In addition to it, the users don't need to know the difference of
> > specific PMD TX and can transmit the arbitrary number of packets as they
> > expect when invoking TX API rte_eth_tx_burst, then check the return value
> > to get the number of packets actually sent.
> > 
> > How to implement the policy in DPDK lib? Two solutions are proposed below.
> > 
> > Solution 1:
> > Implement the wrapper functions to remove some limits for each specific
> > PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that.
> > 
> > > IMHO, the solution is a bit better since it:
> > > 1. Does not affect other PMDs at all
> > > 2. Could be a bit faster for the PMDs which require it since has no indirect
> > >    function call on each iteration
> > > 3. No ABI change
> 
> I also would prefer solution number 1 for the reasons outlined by Andrew above.
> Also, IMO current limitation for number of packets to TX in some Intel PMD TX routines
> are sort of artificial:
> - they are not caused by any real HW limitations
> - avoiding them at PMD level shouldn't cause any performance or functional degradation.
> So I don't see any good reason why instead of fixing these limitations in
> our own PMDs we are trying to push them to the upper (rte_ethdev) layer.
> 
> Konstantin
> 
The main advantage I see is that it should make it a bit easier for
driver writers, since they have a tighter set of constraints to work
with for packet RX and Tx. The routines only have to handle requests for
packets in the range 0-N, rather than not having an upper bound on the
request. It also then saves code duplicating with having multiple
drivers having the same outer-loop code for handling arbitrarily large
requests.

No big deal to me either way though.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for cloud filter
  @ 2017-01-20 14:57  4%     ` Thomas Monjalon
  2017-02-14  3:19  4%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-01-20 14:57 UTC (permalink / raw)
  To: Lu, Wenzhuo, Adrien Mazarguil; +Cc: Liu, Yong, dev

2017-01-20 02:14, Lu, Wenzhuo:
> Hi Adrien, Thomas, Yong,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> > Sent: Friday, January 20, 2017 2:46 AM
> > To: Thomas Monjalon
> > Cc: Liu, Yong; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for cloud filter
> > 
> > On Thu, Jan 19, 2017 at 10:06:34AM +0100, Thomas Monjalon wrote:
> > > 2017-01-19 13:34, Yong Liu:
> > > > +* ABI changes are planned for 17.05: structure
> > > > +``rte_eth_tunnel_filter_conf``
> > > > +  will be extended with a new member ``vf_id`` in order to enable
> > > > +cloud filter
> > > > +  on VF device.
> > >
> > > I think we should stop rely on this API, and migrate to rte_flow instead.
> > > Adrien any thought?
> > 
> > I'm all for using rte_flow in any case. I've already documented an approach to
> > convert TUNNEL filter rules to rte_flow rules [1], although it may be
> > incomplete due to my limited experience with this filter type. We already
> > know several tunnel item types must be added (currently only VXLAN is
> > defined).
> > 
> > I understand ixgbe/i40e currently map rte_flow on top of the legacy
> > framework, therefore extending this structure might still be needed in the
> > meantime. Not sure we should prevent this change as long as such rules can be
> > configured through rte_flow as well.
> > 
> > [1] http://dpdk.org/doc/guides/prog_guide/rte_flow.html#tunnel-to-eth-ipv4-
> > ipv6-vxlan-or-other-queue
> The problem is we haven't finished transferring all the functions from the regular filters to the generic filters. 
> For example, igb, fm10k and enic haven't support generic filters yet. Ixgbe and i40e have supported the basic functions, but some advance features are not transferred to generic filters yet.
> Seems it's not the time to remove the regular filters. Yong, I suggest to support both generic filter and regular filter in parallel.
> So, we need to announce ABI change for the regular filter, until someday we remove the regular filter API. 

I disagree.
There is a new API framework (rte_flow) and we must focus on this transition.
It means we must stop any work on the legacy API.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
  2017-01-20 11:24  0%     ` Ananyev, Konstantin
  2017-01-20 11:48  0%       ` Bruce Richardson
@ 2017-01-21  4:07  0%       ` Yang, Zhiyong
  1 sibling, 0 replies; 200+ results
From: Yang, Zhiyong @ 2017-01-21  4:07 UTC (permalink / raw)
  To: Ananyev, Konstantin, Andrew Rybchenko, dev
  Cc: thomas.monjalon, Richardson, Bruce



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, January 20, 2017 7:25 PM
> To: Andrew Rybchenko <arybchenko@solarflare.com>; Yang, Zhiyong
> <zhiyong.yang@intel.com>; dev@dpdk.org
> Cc: thomas.monjalon@6wind.com; Richardson, Bruce
> <bruce.richardson@intel.com>
> Subject: RE: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching
> behavior
> 
> >
> > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> > Sent: Friday, January 20, 2017 10:26 AM
> > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
> > Cc: thomas.monjalon@6wind.com; Richardson, Bruce
> > <bruce.richardson@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>
> > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD
> > batching behavior
> >
> > On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> > The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to
> > transmit output packets on the output queue for DPDK applications as
> > follows.
> >
> > static inline uint16_t
> > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
> >                  struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> >
> > Note: The fourth parameter nb_pkts: The number of packets to transmit.
> > The rte_eth_tx_burst() function returns the number of packets it
> > actually sent. The return value equal to *nb_pkts* means that all
> > packets have been sent, and this is likely to signify that other
> > output packets could be immediately transmitted again. Applications
> > that implement a "send as many packets to transmit as possible" policy
> > can check this specific case and keep invoking the rte_eth_tx_burst()
> > function until a value less than
> > *nb_pkts* is returned.
> >
> > When you call TX only once in rte_eth_tx_burst, you may get different
> > behaviors from different PMDs. One problem that every DPDK user has to
> > face is that they need to take the policy into consideration at the
> > app- lication level when using any specific PMD to send the packets
> > whether or not it is necessary, which brings usage complexities and
> > makes DPDK users easily confused since they have to learn the details
> > on TX function limit of specific PMDs and have to handle the different
> > return value: the number of packets transmitted successfully for
> > various PMDs. Some PMDs Tx func- tions have a limit of sending at most
> > 32 packets for every invoking, some PMDs have another limit of at most
> > 64 packets once, another ones have imp- lemented to send as many
> > packets to transmit as possible, etc. This will easily cause wrong usage for
> DPDK users.
> >
> > This patch proposes to implement the above policy in DPDK lib in order
> > to simplify the application implementation and avoid the incorrect
> > invoking as well. So, DPDK Users don't need to consider the
> > implementation policy and to write duplicated code at the application
> > level again when sending packets. In addition to it, the users don't
> > need to know the difference of specific PMD TX and can transmit the
> > arbitrary number of packets as they expect when invoking TX API
> > rte_eth_tx_burst, then check the return value to get the number of
> packets actually sent.
> >
> > How to implement the policy in DPDK lib? Two solutions are proposed
> below.
> >
> > Solution 1:
> > Implement the wrapper functions to remove some limits for each
> > specific PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do
> like that.
> >
> > > IMHO, the solution is a bit better since it:
> > > 1. Does not affect other PMDs at all
> > > 2. Could be a bit faster for the PMDs which require it since has no
> > >indirect
> > >    function call on each iteration
> > > 3. No ABI change
> 
> I also would prefer solution number 1 for the reasons outlined by Andrew
> above.
> Also, IMO current limitation for number of packets to TX in some Intel PMD
> TX routines are sort of artificial:
> - they are not caused by any real HW limitations
> - avoiding them at PMD level shouldn't cause any performance or functional
> degradation.
> So I don't see any good reason why instead of fixing these limitations in our
> own PMDs we are trying to push them to the upper (rte_ethdev) layer.
> 
> Konstantin

 Solution 1 indeed has advantages as Andrew and Konstantin said. 

Zhiyong 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
  2017-01-20 10:26  3% ` Andrew Rybchenko
       [not found]       ` <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com>
@ 2017-01-21  4:13  2%   ` Yang, Zhiyong
  1 sibling, 0 replies; 200+ results
From: Yang, Zhiyong @ 2017-01-21  4:13 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: thomas.monjalon, Richardson, Bruce, Ananyev, Konstantin



From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
Sent: Friday, January 20, 2017 6:26 PM
To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
Cc: thomas.monjalon@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>
Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior

On 01/20/2017 12:51 PM, Zhiyong Yang wrote:

The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to

transmit output packets on the output queue for DPDK applications as

follows.



static inline uint16_t

rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,

                 struct rte_mbuf **tx_pkts, uint16_t nb_pkts);



Note: The fourth parameter nb_pkts: The number of packets to transmit.

The rte_eth_tx_burst() function returns the number of packets it actually

sent. The return value equal to *nb_pkts* means that all packets have been

sent, and this is likely to signify that other output packets could be

immediately transmitted again. Applications that implement a "send as many

packets to transmit as possible" policy can check this specific case and

keep invoking the rte_eth_tx_burst() function until a value less than

*nb_pkts* is returned.



When you call TX only once in rte_eth_tx_burst, you may get different

behaviors from different PMDs. One problem that every DPDK user has to

face is that they need to take the policy into consideration at the app-

lication level when using any specific PMD to send the packets whether or

not it is necessary, which brings usage complexities and makes DPDK users

easily confused since they have to learn the details on TX function limit

of specific PMDs and have to handle the different return value: the number

of packets transmitted successfully for various PMDs. Some PMDs Tx func-

tions have a limit of sending at most 32 packets for every invoking, some

PMDs have another limit of at most 64 packets once, another ones have imp-

lemented to send as many packets to transmit as possible, etc. This will

easily cause wrong usage for DPDK users.



This patch proposes to implement the above policy in DPDK lib in order to

simplify the application implementation and avoid the incorrect invoking

as well. So, DPDK Users don't need to consider the implementation policy

and to write duplicated code at the application level again when sending

packets. In addition to it, the users don't need to know the difference of

specific PMD TX and can transmit the arbitrary number of packets as they

expect when invoking TX API rte_eth_tx_burst, then check the return value

to get the number of packets actually sent.



How to implement the policy in DPDK lib? Two solutions are proposed below.



Solution 1:

Implement the wrapper functions to remove some limits for each specific

PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that.

IMHO, the solution is a bit better since it:
 1. Does not affect other PMDs at all
 2. Could be a bit faster for the PMDs which require it since has no indirect
     function call on each iteration
 3. No ABI change



Solution 2:

Implement the policy in the function rte_eth_tx_burst() at the ethdev lay-

er in a more consistent batching way. Make best effort to send *nb_pkts*

packets with bursts of no more than 32 by default since many DPDK TX PMDs

are using this max TX burst size(32). In addition, one data member which

defines the max TX burst size such as "uint16_t max_tx_burst_pkts;"will be

added to rte_eth_dev_data, which drivers can override if they work with

bursts of 64 or other NB(thanks for Bruce <bruce.richardson@intel.com><mailto:bruce.richardson@intel.com>'s

suggestion). This can reduce the performance impacting to the lowest limit.

I see no noticeable difference in performance, so don't mind if this is finally choosen.
Just be sure that you update all PMDs to set reasonable default values, or may be
even better, set UINT16_MAX in generic place - 0 is a bad default here.
(Lost few seconds wondering why nothing is sent and cannot stop)

Agree with you, 0 is not a good default value.   I recommend 32 by default here, of course, The driver writers
Can configure it as they expect before starting to sending packets.

Zhiyong

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
@ 2017-01-23 13:04 12% Yuanhan Liu
  2017-02-13 18:02  4% ` Thomas Monjalon
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Yuanhan Liu @ 2017-01-23 13:04 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon, Maxime Coquelin, John McNamara, Yuanhan Liu

I made a vhost ABI/API refactoring at v16.04, meant to avoid such issue
forever. Well, apparently, I lied.

People are looking for more vhost-user options now days, other than
vhost-user net only. For example, SPDK (Storage Performance Development
Kit) are looking for chance of vhost-user SCSI and vhost-user block.

Apparently, they also need a vhost-user backend, while DPDK already
has a (mature enough) backend, they don't want to implement it again
from scratch. They want to leverage the one DPDK provides.

However, the last refactoring hasn't done that right, at least it's
not friendly for extending vhost-user to add more devices support.
For example, different virtio devices has its own feature set, while
APIs like rte_vhost_feature_disable(feature_mask) have no option to
tell the device type. Thus, a more proper API should look like:

    rte_vhost_feature_disable(device_type, feature_mask);

Besides that, few public files and structures should be renamed, to
not let it bind to virtio-net. Specifically, they are:

- virtio_net_device_ops --> vhost_device_ops
- rte_virtio_net.h      --> rte_vhost.h

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---

I intended to send out the code first (I have already finished the
better part of it), while I'm starting vacation since tomorrow, leaving
me no time for that.
---
 doc/guides/rel_notes/deprecation.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 755dc65..5d6e9b6 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -62,3 +62,12 @@ Deprecation Notices
   PMDs that implement the latter.
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
+
+* vhost: API/ABI changes are planned for 17.05, for making DPDK vhost library
+  generic enough so that applications can build different vhost-user drivers
+  (instead of vhost-user net only) on top of that.
+  Specifically, ``virtio_net_device_ops`` will be renamed to ``vhost_device_ops``.
+  Correspondingly, some API's parameter need be changed. Few more functions also
+  need be reworked to let it be device aware. For example, different virtio device
+  has different feature set, meaning functions like ``rte_vhost_feature_disable``
+  need be changed. Last, file rte_virtio_net.h will be renamed to rte_vhost.h.
-- 
1.9.0

^ permalink raw reply	[relevance 12%]

* Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
  2017-01-20 11:48  0%       ` Bruce Richardson
@ 2017-01-23 16:36  0%         ` Adrien Mazarguil
  2017-02-07  7:50  0%           ` Yang, Zhiyong
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2017-01-23 16:36 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Ananyev, Konstantin, Andrew Rybchenko, Yang, Zhiyong, dev,
	thomas.monjalon

On Fri, Jan 20, 2017 at 11:48:22AM +0000, Bruce Richardson wrote:
> On Fri, Jan 20, 2017 at 11:24:40AM +0000, Ananyev, Konstantin wrote:
> > > 
> > > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> > > Sent: Friday, January 20, 2017 10:26 AM
> > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
> > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>
> > > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
> > > 
> > > On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> > > The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to
> > > transmit output packets on the output queue for DPDK applications as
> > > follows.
> > > 
> > > static inline uint16_t
> > > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
> > >                  struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> > > 
> > > Note: The fourth parameter nb_pkts: The number of packets to transmit.
> > > The rte_eth_tx_burst() function returns the number of packets it actually
> > > sent. The return value equal to *nb_pkts* means that all packets have been
> > > sent, and this is likely to signify that other output packets could be
> > > immediately transmitted again. Applications that implement a "send as many
> > > packets to transmit as possible" policy can check this specific case and
> > > keep invoking the rte_eth_tx_burst() function until a value less than
> > > *nb_pkts* is returned.
> > > 
> > > When you call TX only once in rte_eth_tx_burst, you may get different
> > > behaviors from different PMDs. One problem that every DPDK user has to
> > > face is that they need to take the policy into consideration at the app-
> > > lication level when using any specific PMD to send the packets whether or
> > > not it is necessary, which brings usage complexities and makes DPDK users
> > > easily confused since they have to learn the details on TX function limit
> > > of specific PMDs and have to handle the different return value: the number
> > > of packets transmitted successfully for various PMDs. Some PMDs Tx func-
> > > tions have a limit of sending at most 32 packets for every invoking, some
> > > PMDs have another limit of at most 64 packets once, another ones have imp-
> > > lemented to send as many packets to transmit as possible, etc. This will
> > > easily cause wrong usage for DPDK users.
> > > 
> > > This patch proposes to implement the above policy in DPDK lib in order to
> > > simplify the application implementation and avoid the incorrect invoking
> > > as well. So, DPDK Users don't need to consider the implementation policy
> > > and to write duplicated code at the application level again when sending
> > > packets. In addition to it, the users don't need to know the difference of
> > > specific PMD TX and can transmit the arbitrary number of packets as they
> > > expect when invoking TX API rte_eth_tx_burst, then check the return value
> > > to get the number of packets actually sent.
> > > 
> > > How to implement the policy in DPDK lib? Two solutions are proposed below.
> > > 
> > > Solution 1:
> > > Implement the wrapper functions to remove some limits for each specific
> > > PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that.
> > > 
> > > > IMHO, the solution is a bit better since it:
> > > > 1. Does not affect other PMDs at all
> > > > 2. Could be a bit faster for the PMDs which require it since has no indirect
> > > >    function call on each iteration
> > > > 3. No ABI change
> > 
> > I also would prefer solution number 1 for the reasons outlined by Andrew above.
> > Also, IMO current limitation for number of packets to TX in some Intel PMD TX routines
> > are sort of artificial:
> > - they are not caused by any real HW limitations
> > - avoiding them at PMD level shouldn't cause any performance or functional degradation.
> > So I don't see any good reason why instead of fixing these limitations in
> > our own PMDs we are trying to push them to the upper (rte_ethdev) layer.

For what it's worth, I agree with Konstantin. Wrappers should be as thin as
possible on top of PMD functions, they are not helpers. We could define a
set of higher level functions for this purpose though.

In the meantime, exposing and documenting PMD limitations seems safe enough.

We could assert that RX/TX burst requests larger than the size of the target
queue are unlikely to be fully met (i.e. PMDs usually do not check for
completions in the middle of a TX burst).

> > Konstantin
> > 
> The main advantage I see is that it should make it a bit easier for
> driver writers, since they have a tighter set of constraints to work
> with for packet RX and Tx. The routines only have to handle requests for
> packets in the range 0-N, rather than not having an upper bound on the
> request. It also then saves code duplicating with having multiple
> drivers having the same outer-loop code for handling arbitrarily large
> requests.
> 
> No big deal to me either way though.
> 
> /Bruce

Right but there is a cost in doing so, as unlikely() as the additional code
is. We should leave that choice to applications.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 0/3] doc upates
@ 2017-01-24  7:34  4% Jianfeng Tan
  2017-01-24  7:34 17% ` [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio Jianfeng Tan
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Jianfeng Tan @ 2017-01-24  7:34 UTC (permalink / raw)
  To: dev; +Cc: john.mcnamara, yuanhan.liu, stephen, Jianfeng Tan

Patch 1: howto doc of virtio_user for container networking.
Patch 2: howto doc of virtio_user as exceptional path.
Patch 3: remove ABI changes in igb_uio

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Jianfeng Tan (3):
  doc: add guide to use virtio_user for container networking
  doc: add guide to use virtio_user as exceptional path
  doc: remove ABI changes in igb_uio

 .../use_models_for_running_dpdk_in_containers.svg  | 3280 ++++++++++++++++++++
 .../howto/img/virtio_user_as_exceptional_path.svg  | 1260 ++++++++
 .../img/virtio_user_for_container_networking.svg   | 1654 ++++++++++
 doc/guides/howto/index.rst                         |    2 +
 .../howto/virtio_user_as_exceptional_path.rst      |  142 +
 .../howto/virtio_user_for_container_networking.rst |  142 +
 doc/guides/rel_notes/deprecation.rst               |    5 -
 7 files changed, 6480 insertions(+), 5 deletions(-)
 create mode 100644 doc/guides/howto/img/use_models_for_running_dpdk_in_containers.svg
 create mode 100644 doc/guides/howto/img/virtio_user_as_exceptional_path.svg
 create mode 100644 doc/guides/howto/img/virtio_user_for_container_networking.svg
 create mode 100644 doc/guides/howto/virtio_user_as_exceptional_path.rst
 create mode 100644 doc/guides/howto/virtio_user_for_container_networking.rst

-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio
  2017-01-24  7:34  4% [dpdk-dev] [PATCH 0/3] doc upates Jianfeng Tan
@ 2017-01-24  7:34 17% ` Jianfeng Tan
  2017-01-24 13:35  4%   ` Ferruh Yigit
  2017-02-09 14:45  0% ` [dpdk-dev] [PATCH 0/3] doc upates Thomas Monjalon
  2017-02-09 16:06  4% ` [dpdk-dev] [PATCH v2 " Jianfeng Tan
  2 siblings, 1 reply; 200+ results
From: Jianfeng Tan @ 2017-01-24  7:34 UTC (permalink / raw)
  To: dev; +Cc: john.mcnamara, yuanhan.liu, stephen, Jianfeng Tan

We announced ABI changes to remove iomem and ioport mapping in
igb_uio. But it has potential backward compatibility issue: cannot
run old version DPDK on modified igb_uio.

The purpose of this changes was to fix a bug: when DPDK app crashes,
those devices by igb_uio are not stopped either DPDK PMD driver or
igb_uio driver. We need to figure out new way to fix this bug.

Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from igb_uio")

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 755dc65..0f039dd 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,11 +8,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* igb_uio: iomem mapping and sysfs files created for iomem and ioport in
-  igb_uio will be removed, because we are able to detect these from what Linux
-  has exposed, like the way we have done with uio-pci-generic. This change
-  targets release 17.02.
-
 * ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
   impacted because of introduction of a new ``rte_bus`` hierarchy. This would
   also impact the way devices are identified by EAL. A bus-device-driver model
-- 
2.7.4

^ permalink raw reply	[relevance 17%]

* [dpdk-dev] [PATCH RFCv2 1/4] ring: create common ring files
  @ 2017-01-24 10:39  2% ` Bruce Richardson
  2017-01-24 10:39  1% ` [dpdk-dev] [PATCH RFCv2 2/4] ring: separate common and rte_ring specific functions Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2017-01-24 10:39 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Create rte_common_ring.[ch] files which will be modified to contain
generic ring implementation code to be shared across multiple ring
implementations for different sizes and types of data.
For now, these are exact copies of the original rte_ring files.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_ring/Makefile                          |    5 +-
 lib/librte_ring/{rte_ring.c => rte_common_ring.c} |    0
 lib/librte_ring/rte_common_ring.h                 | 1269 ++++++++++++++++++++
 lib/librte_ring/rte_ring.h                        | 1270 +--------------------
 4 files changed, 1273 insertions(+), 1271 deletions(-)
 rename lib/librte_ring/{rte_ring.c => rte_common_ring.c} (100%)
 create mode 100644 lib/librte_ring/rte_common_ring.h
 mode change 100644 => 120000 lib/librte_ring/rte_ring.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 4b1112e..1e2396e 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -41,10 +41,11 @@ EXPORT_MAP := rte_ring_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
+SRCS-$(CONFIG_RTE_LIBRTE_RING) += rte_common_ring.c
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include += rte_ring.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include += rte_common_ring.h
 
 DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal
 
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_common_ring.c
similarity index 100%
rename from lib/librte_ring/rte_ring.c
rename to lib/librte_ring/rte_common_ring.c
diff --git a/lib/librte_ring/rte_common_ring.h b/lib/librte_ring/rte_common_ring.h
new file mode 100644
index 0000000..e359aff
--- /dev/null
+++ b/lib/librte_ring/rte_common_ring.h
@@ -0,0 +1,1269 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * Derived from FreeBSD's bufring.h
+ *
+ **************************************************************************
+ *
+ * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *    this list of conditions and the following disclaimer.
+ *
+ * 2. The name of Kip Macy nor the names of other
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ ***************************************************************************/
+
+#ifndef _RTE_RING_H_
+#define _RTE_RING_H_
+
+/**
+ * @file
+ * RTE Ring
+ *
+ * The Ring Manager is a fixed-size queue, implemented as a table of
+ * pointers. Head and tail pointers are modified atomically, allowing
+ * concurrent access to it. It has the following features:
+ *
+ * - FIFO (First In First Out)
+ * - Maximum size is fixed; the pointers are stored in a table.
+ * - Lockless implementation.
+ * - Multi- or single-consumer dequeue.
+ * - Multi- or single-producer enqueue.
+ * - Bulk dequeue.
+ * - Bulk enqueue.
+ *
+ * Note: the ring implementation is not preemptable. A lcore must not
+ * be interrupted by another task that uses the same ring.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdio.h>
+#include <stdint.h>
+#include <sys/queue.h>
+#include <errno.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_branch_prediction.h>
+#include <rte_memzone.h>
+
+#define RTE_TAILQ_RING_NAME "RTE_RING"
+
+enum rte_ring_queue_behavior {
+	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items from a ring */
+	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
+};
+
+#ifdef RTE_LIBRTE_RING_DEBUG
+/**
+ * A structure that stores the ring statistics (per-lcore).
+ */
+struct rte_ring_debug_stats {
+	uint64_t enq_success_bulk; /**< Successful enqueues number. */
+	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
+	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
+	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
+	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
+	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
+	uint64_t deq_success_bulk; /**< Successful dequeues number. */
+	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
+	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
+	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
+} __rte_cache_aligned;
+#endif
+
+#define RTE_RING_MZ_PREFIX "RG_"
+/**< The maximum length of a ring name. */
+#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
+			   sizeof(RTE_RING_MZ_PREFIX) + 1)
+
+#ifndef RTE_RING_PAUSE_REP_COUNT
+#define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
+                                    *   if RTE_RING_PAUSE_REP not defined. */
+#endif
+
+struct rte_memzone; /* forward declaration, so as not to require memzone.h */
+
+/**
+ * An RTE ring structure.
+ *
+ * The producer and the consumer have a head and a tail index. The particularity
+ * of these index is that they are not between 0 and size(ring). These indexes
+ * are between 0 and 2^32, and we mask their value when we access the ring[]
+ * field. Thanks to this assumption, we can do subtractions between 2 index
+ * values in a modulo-32bit base: that's why the overflow of the indexes is not
+ * a problem.
+ */
+struct rte_ring {
+	/*
+	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
+	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
+	 * next time the ABI changes
+	 */
+	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
+	int flags;                       /**< Flags supplied at creation. */
+	const struct rte_memzone *memzone;
+			/**< Memzone, if any, containing the rte_ring */
+
+	/** Ring producer status. */
+	struct prod {
+		uint32_t watermark;      /**< Maximum items before EDQUOT. */
+		uint32_t sp_enqueue;     /**< True, if single producer. */
+		uint32_t size;           /**< Size of ring. */
+		uint32_t mask;           /**< Mask (size-1) of ring. */
+		volatile uint32_t head;  /**< Producer head. */
+		volatile uint32_t tail;  /**< Producer tail. */
+	} prod __rte_cache_aligned;
+
+	/** Ring consumer status. */
+	struct cons {
+		uint32_t sc_dequeue;     /**< True, if single consumer. */
+		uint32_t size;           /**< Size of the ring. */
+		uint32_t mask;           /**< Mask (size-1) of ring. */
+		volatile uint32_t head;  /**< Consumer head. */
+		volatile uint32_t tail;  /**< Consumer tail. */
+#ifdef RTE_RING_SPLIT_PROD_CONS
+	} cons __rte_cache_aligned;
+#else
+	} cons;
+#endif
+
+#ifdef RTE_LIBRTE_RING_DEBUG
+	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
+#endif
+
+	void *ring[] __rte_cache_aligned;   /**< Memory space of ring starts here.
+	                                     * not volatile so need to be careful
+	                                     * about compiler re-ordering */
+};
+
+#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
+#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
+#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
+#define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
+
+/**
+ * @internal When debug is enabled, store ring statistics.
+ * @param r
+ *   A pointer to the ring.
+ * @param name
+ *   The name of the statistics field to increment in the ring.
+ * @param n
+ *   The number to add to the object-oriented statistics.
+ */
+#ifdef RTE_LIBRTE_RING_DEBUG
+#define __RING_STAT_ADD(r, name, n) do {                        \
+		unsigned __lcore_id = rte_lcore_id();           \
+		if (__lcore_id < RTE_MAX_LCORE) {               \
+			r->stats[__lcore_id].name##_objs += n;  \
+			r->stats[__lcore_id].name##_bulk += 1;  \
+		}                                               \
+	} while(0)
+#else
+#define __RING_STAT_ADD(r, name, n) do {} while(0)
+#endif
+
+/**
+ * Calculate the memory size needed for a ring
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it. This value is the sum of the size of
+ * the structure rte_ring and the size of the memory needed by the
+ * objects pointers. The value is aligned to a cache line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+ssize_t rte_ring_get_memsize(unsigned count);
+
+/**
+ * Initialize a ring structure.
+ *
+ * Initialize a ring structure in memory pointed by "r". The size of the
+ * memory area must be large enough to store the ring structure and the
+ * object table. It is advised to use rte_ring_get_memsize() to get the
+ * appropriate size.
+ *
+ * The ring size is set to *count*, which must be a power of two. Water
+ * marking is disabled by default. The real usable ring size is
+ * *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is not added in RTE_TAILQ_RING global list. Indeed, the
+ * memory given by the caller may not be shareable among dpdk
+ * processes.
+ *
+ * @param r
+ *   The pointer to the ring structure followed by the objects table.
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   0 on success, or a negative value on error.
+ */
+int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
+	unsigned flags);
+
+/**
+ * Create a new ring named *name* in memory.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The size of the ring (must be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_ring *rte_ring_create(const char *name, unsigned count,
+				 int socket_id, unsigned flags);
+/**
+ * De-allocate all memory used by the ring.
+ *
+ * @param r
+ *   Ring to free
+ */
+void rte_ring_free(struct rte_ring *r);
+
+/**
+ * Change the high water mark.
+ *
+ * If *count* is 0, water marking is disabled. Otherwise, it is set to the
+ * *count* value. The *count* value must be greater than 0 and less
+ * than the ring size.
+ *
+ * This function can be called at any time (not necessarily at
+ * initialization).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param count
+ *   The new water mark value.
+ * @return
+ *   - 0: Success; water mark changed.
+ *   - -EINVAL: Invalid water mark value.
+ */
+int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
+
+/**
+ * Dump the status of the ring to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param r
+ *   A pointer to the ring structure.
+ */
+void rte_ring_dump(FILE *f, const struct rte_ring *r);
+
+/* the actual enqueue of pointers on the ring.
+ * Placed here since identical code needed in both
+ * single and multi producer enqueue functions */
+#define ENQUEUE_PTRS() do { \
+	const uint32_t size = r->prod.size; \
+	uint32_t idx = prod_head & mask; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
+			r->ring[idx] = obj_table[i]; \
+			r->ring[idx+1] = obj_table[i+1]; \
+			r->ring[idx+2] = obj_table[i+2]; \
+			r->ring[idx+3] = obj_table[i+3]; \
+		} \
+		switch (n & 0x3) { \
+			case 3: r->ring[idx++] = obj_table[i++]; \
+			case 2: r->ring[idx++] = obj_table[i++]; \
+			case 1: r->ring[idx++] = obj_table[i++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++)\
+			r->ring[idx] = obj_table[i]; \
+		for (idx = 0; i < n; i++, idx++) \
+			r->ring[idx] = obj_table[i]; \
+	} \
+} while(0)
+
+/* the actual copy of pointers on the ring to obj_table.
+ * Placed here since identical code needed in both
+ * single and multi consumer dequeue functions */
+#define DEQUEUE_PTRS() do { \
+	uint32_t idx = cons_head & mask; \
+	const uint32_t size = r->cons.size; \
+	if (likely(idx + n < size)) { \
+		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
+			obj_table[i] = r->ring[idx]; \
+			obj_table[i+1] = r->ring[idx+1]; \
+			obj_table[i+2] = r->ring[idx+2]; \
+			obj_table[i+3] = r->ring[idx+3]; \
+		} \
+		switch (n & 0x3) { \
+			case 3: obj_table[i++] = r->ring[idx++]; \
+			case 2: obj_table[i++] = r->ring[idx++]; \
+			case 1: obj_table[i++] = r->ring[idx++]; \
+		} \
+	} else { \
+		for (i = 0; idx < size; i++, idx++) \
+			obj_table[i] = r->ring[idx]; \
+		for (idx = 0; i < n; i++, idx++) \
+			obj_table[i] = r->ring[idx]; \
+	} \
+} while (0)
+
+/**
+ * @internal Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
+ * @return
+ *   Depend on the behavior value
+ *   if behavior = RTE_RING_QUEUE_FIXED
+ *   - 0: Success; objects enqueue.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   if behavior = RTE_RING_QUEUE_VARIABLE
+ *   - n: Actual number of objects enqueued.
+ */
+static inline int __attribute__((always_inline))
+__rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
+			 unsigned n, enum rte_ring_queue_behavior behavior)
+{
+	uint32_t prod_head, prod_next;
+	uint32_t cons_tail, free_entries;
+	const unsigned max = n;
+	int success;
+	unsigned i, rep = 0;
+	uint32_t mask = r->prod.mask;
+	int ret;
+
+	/* Avoid the unnecessary cmpset operation below, which is also
+	 * potentially harmful when n equals 0. */
+	if (n == 0)
+		return 0;
+
+	/* move prod.head atomically */
+	do {
+		/* Reset n to the initial burst count */
+		n = max;
+
+		prod_head = r->prod.head;
+		cons_tail = r->cons.tail;
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * prod_head > cons_tail). So 'free_entries' is always between 0
+		 * and size(ring)-1. */
+		free_entries = (mask + cons_tail - prod_head);
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > free_entries)) {
+			if (behavior == RTE_RING_QUEUE_FIXED) {
+				__RING_STAT_ADD(r, enq_fail, n);
+				return -ENOBUFS;
+			}
+			else {
+				/* No free entry available */
+				if (unlikely(free_entries == 0)) {
+					__RING_STAT_ADD(r, enq_fail, n);
+					return 0;
+				}
+
+				n = free_entries;
+			}
+		}
+
+		prod_next = prod_head + n;
+		success = rte_atomic32_cmpset(&r->prod.head, prod_head,
+					      prod_next);
+	} while (unlikely(success == 0));
+
+	/* write entries in ring */
+	ENQUEUE_PTRS();
+	rte_smp_wmb();
+
+	/* if we exceed the watermark */
+	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
+				(int)(n | RTE_RING_QUOT_EXCEED);
+		__RING_STAT_ADD(r, enq_quota, n);
+	}
+	else {
+		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+		__RING_STAT_ADD(r, enq_success, n);
+	}
+
+	/*
+	 * If there are other enqueues in progress that preceded us,
+	 * we need to wait for them to complete
+	 */
+	while (unlikely(r->prod.tail != prod_head)) {
+		rte_pause();
+
+		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
+		 * for other thread finish. It gives pre-empted thread a chance
+		 * to proceed and finish with ring dequeue operation. */
+		if (RTE_RING_PAUSE_REP_COUNT &&
+		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
+			rep = 0;
+			sched_yield();
+		}
+	}
+	r->prod.tail = prod_next;
+	return ret;
+}
+
+/**
+ * @internal Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
+ * @return
+ *   Depend on the behavior value
+ *   if behavior = RTE_RING_QUEUE_FIXED
+ *   - 0: Success; objects enqueue.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   if behavior = RTE_RING_QUEUE_VARIABLE
+ *   - n: Actual number of objects enqueued.
+ */
+static inline int __attribute__((always_inline))
+__rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
+			 unsigned n, enum rte_ring_queue_behavior behavior)
+{
+	uint32_t prod_head, cons_tail;
+	uint32_t prod_next, free_entries;
+	unsigned i;
+	uint32_t mask = r->prod.mask;
+	int ret;
+
+	prod_head = r->prod.head;
+	cons_tail = r->cons.tail;
+	/* The subtraction is done between two unsigned 32bits value
+	 * (the result is always modulo 32 bits even if we have
+	 * prod_head > cons_tail). So 'free_entries' is always between 0
+	 * and size(ring)-1. */
+	free_entries = mask + cons_tail - prod_head;
+
+	/* check that we have enough room in ring */
+	if (unlikely(n > free_entries)) {
+		if (behavior == RTE_RING_QUEUE_FIXED) {
+			__RING_STAT_ADD(r, enq_fail, n);
+			return -ENOBUFS;
+		}
+		else {
+			/* No free entry available */
+			if (unlikely(free_entries == 0)) {
+				__RING_STAT_ADD(r, enq_fail, n);
+				return 0;
+			}
+
+			n = free_entries;
+		}
+	}
+
+	prod_next = prod_head + n;
+	r->prod.head = prod_next;
+
+	/* write entries in ring */
+	ENQUEUE_PTRS();
+	rte_smp_wmb();
+
+	/* if we exceed the watermark */
+	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
+			(int)(n | RTE_RING_QUOT_EXCEED);
+		__RING_STAT_ADD(r, enq_quota, n);
+	}
+	else {
+		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+		__RING_STAT_ADD(r, enq_success, n);
+	}
+
+	r->prod.tail = prod_next;
+	return ret;
+}
+
+/**
+ * @internal Dequeue several objects from a ring (multi-consumers safe). When
+ * the request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
+ * @return
+ *   Depend on the behavior value
+ *   if behavior = RTE_RING_QUEUE_FIXED
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ *   if behavior = RTE_RING_QUEUE_VARIABLE
+ *   - n: Actual number of objects dequeued.
+ */
+
+static inline int __attribute__((always_inline))
+__rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
+		 unsigned n, enum rte_ring_queue_behavior behavior)
+{
+	uint32_t cons_head, prod_tail;
+	uint32_t cons_next, entries;
+	const unsigned max = n;
+	int success;
+	unsigned i, rep = 0;
+	uint32_t mask = r->prod.mask;
+
+	/* Avoid the unnecessary cmpset operation below, which is also
+	 * potentially harmful when n equals 0. */
+	if (n == 0)
+		return 0;
+
+	/* move cons.head atomically */
+	do {
+		/* Restore n as it may change every loop */
+		n = max;
+
+		cons_head = r->cons.head;
+		prod_tail = r->prod.tail;
+		/* The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * cons_head > prod_tail). So 'entries' is always between 0
+		 * and size(ring)-1. */
+		entries = (prod_tail - cons_head);
+
+		/* Set the actual entries for dequeue */
+		if (n > entries) {
+			if (behavior == RTE_RING_QUEUE_FIXED) {
+				__RING_STAT_ADD(r, deq_fail, n);
+				return -ENOENT;
+			}
+			else {
+				if (unlikely(entries == 0)){
+					__RING_STAT_ADD(r, deq_fail, n);
+					return 0;
+				}
+
+				n = entries;
+			}
+		}
+
+		cons_next = cons_head + n;
+		success = rte_atomic32_cmpset(&r->cons.head, cons_head,
+					      cons_next);
+	} while (unlikely(success == 0));
+
+	/* copy in table */
+	DEQUEUE_PTRS();
+	rte_smp_rmb();
+
+	/*
+	 * If there are other dequeues in progress that preceded us,
+	 * we need to wait for them to complete
+	 */
+	while (unlikely(r->cons.tail != cons_head)) {
+		rte_pause();
+
+		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
+		 * for other thread finish. It gives pre-empted thread a chance
+		 * to proceed and finish with ring dequeue operation. */
+		if (RTE_RING_PAUSE_REP_COUNT &&
+		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
+			rep = 0;
+			sched_yield();
+		}
+	}
+	__RING_STAT_ADD(r, deq_success, n);
+	r->cons.tail = cons_next;
+
+	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+}
+
+/**
+ * @internal Dequeue several objects from a ring (NOT multi-consumers safe).
+ * When the request objects are more than the available objects, only dequeue
+ * the actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
+ *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
+ * @return
+ *   Depend on the behavior value
+ *   if behavior = RTE_RING_QUEUE_FIXED
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ *   if behavior = RTE_RING_QUEUE_VARIABLE
+ *   - n: Actual number of objects dequeued.
+ */
+static inline int __attribute__((always_inline))
+__rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
+		 unsigned n, enum rte_ring_queue_behavior behavior)
+{
+	uint32_t cons_head, prod_tail;
+	uint32_t cons_next, entries;
+	unsigned i;
+	uint32_t mask = r->prod.mask;
+
+	cons_head = r->cons.head;
+	prod_tail = r->prod.tail;
+	/* The subtraction is done between two unsigned 32bits value
+	 * (the result is always modulo 32 bits even if we have
+	 * cons_head > prod_tail). So 'entries' is always between 0
+	 * and size(ring)-1. */
+	entries = prod_tail - cons_head;
+
+	if (n > entries) {
+		if (behavior == RTE_RING_QUEUE_FIXED) {
+			__RING_STAT_ADD(r, deq_fail, n);
+			return -ENOENT;
+		}
+		else {
+			if (unlikely(entries == 0)){
+				__RING_STAT_ADD(r, deq_fail, n);
+				return 0;
+			}
+
+			n = entries;
+		}
+	}
+
+	cons_next = cons_head + n;
+	r->cons.head = cons_next;
+
+	/* copy in table */
+	DEQUEUE_PTRS();
+	rte_smp_rmb();
+
+	__RING_STAT_ADD(r, deq_success, n);
+	r->cons.tail = cons_next;
+	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+}
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueue.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned n)
+{
+	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned n)
+{
+	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+		      unsigned n)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n);
+	else
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
+{
+	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
+{
+	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_enqueue(struct rte_ring *r, void *obj)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue(r, obj);
+	else
+		return rte_ring_mp_enqueue(r, obj);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+	else
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
+{
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_dequeue(struct rte_ring *r, void **obj_p)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue(r, obj_p);
+	else
+		return rte_ring_mc_dequeue(r, obj_p);
+}
+
+/**
+ * Test if a ring is full.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   - 1: The ring is full.
+ *   - 0: The ring is not full.
+ */
+static inline int
+rte_ring_full(const struct rte_ring *r)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_tail = r->cons.tail;
+	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+}
+
+/**
+ * Test if a ring is empty.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   - 1: The ring is empty.
+ *   - 0: The ring is not empty.
+ */
+static inline int
+rte_ring_empty(const struct rte_ring *r)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_tail = r->cons.tail;
+	return !!(cons_tail == prod_tail);
+}
+
+/**
+ * Return the number of entries in a ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The number of entries in the ring.
+ */
+static inline unsigned
+rte_ring_count(const struct rte_ring *r)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_tail = r->cons.tail;
+	return (prod_tail - cons_tail) & r->prod.mask;
+}
+
+/**
+ * Return the number of free entries in a ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The number of free entries in the ring.
+ */
+static inline unsigned
+rte_ring_free_count(const struct rte_ring *r)
+{
+	uint32_t prod_tail = r->prod.tail;
+	uint32_t cons_tail = r->cons.tail;
+	return (cons_tail - prod_tail - 1) & r->prod.mask;
+}
+
+/**
+ * Dump the status of all rings on the console
+ *
+ * @param f
+ *   A pointer to a file for output
+ */
+void rte_ring_list_dump(FILE *f);
+
+/**
+ * Search a ring from its name
+ *
+ * @param name
+ *   The name of the ring.
+ * @return
+ *   The pointer to the ring matching the name, or NULL if not found,
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - ENOENT - required entry not available to return.
+ */
+struct rte_ring *rte_ring_lookup(const char *name);
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned n)
+{
+	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned n)
+{
+	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+		      unsigned n)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue_burst(r, obj_table, n);
+	else
+		return rte_ring_mp_enqueue_burst(r, obj_table, n);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - Number of objects dequeued
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+	else
+		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_H_ */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
deleted file mode 100644
index e359aff..0000000
--- a/lib/librte_ring/rte_ring.h
+++ /dev/null
@@ -1,1269 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-/*
- * Derived from FreeBSD's bufring.h
- *
- **************************************************************************
- *
- * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions are met:
- *
- * 1. Redistributions of source code must retain the above copyright notice,
- *    this list of conditions and the following disclaimer.
- *
- * 2. The name of Kip Macy nor the names of other
- *    contributors may be used to endorse or promote products derived from
- *    this software without specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
- * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- * POSSIBILITY OF SUCH DAMAGE.
- *
- ***************************************************************************/
-
-#ifndef _RTE_RING_H_
-#define _RTE_RING_H_
-
-/**
- * @file
- * RTE Ring
- *
- * The Ring Manager is a fixed-size queue, implemented as a table of
- * pointers. Head and tail pointers are modified atomically, allowing
- * concurrent access to it. It has the following features:
- *
- * - FIFO (First In First Out)
- * - Maximum size is fixed; the pointers are stored in a table.
- * - Lockless implementation.
- * - Multi- or single-consumer dequeue.
- * - Multi- or single-producer enqueue.
- * - Bulk dequeue.
- * - Bulk enqueue.
- *
- * Note: the ring implementation is not preemptable. A lcore must not
- * be interrupted by another task that uses the same ring.
- *
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#include <stdio.h>
-#include <stdint.h>
-#include <sys/queue.h>
-#include <errno.h>
-#include <rte_common.h>
-#include <rte_memory.h>
-#include <rte_lcore.h>
-#include <rte_atomic.h>
-#include <rte_branch_prediction.h>
-#include <rte_memzone.h>
-
-#define RTE_TAILQ_RING_NAME "RTE_RING"
-
-enum rte_ring_queue_behavior {
-	RTE_RING_QUEUE_FIXED = 0, /* Enq/Deq a fixed number of items from a ring */
-	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
-};
-
-#ifdef RTE_LIBRTE_RING_DEBUG
-/**
- * A structure that stores the ring statistics (per-lcore).
- */
-struct rte_ring_debug_stats {
-	uint64_t enq_success_bulk; /**< Successful enqueues number. */
-	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
-	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
-	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
-	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
-	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
-	uint64_t deq_success_bulk; /**< Successful dequeues number. */
-	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
-	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
-	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
-} __rte_cache_aligned;
-#endif
-
-#define RTE_RING_MZ_PREFIX "RG_"
-/**< The maximum length of a ring name. */
-#define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
-			   sizeof(RTE_RING_MZ_PREFIX) + 1)
-
-#ifndef RTE_RING_PAUSE_REP_COUNT
-#define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
-                                    *   if RTE_RING_PAUSE_REP not defined. */
-#endif
-
-struct rte_memzone; /* forward declaration, so as not to require memzone.h */
-
-/**
- * An RTE ring structure.
- *
- * The producer and the consumer have a head and a tail index. The particularity
- * of these index is that they are not between 0 and size(ring). These indexes
- * are between 0 and 2^32, and we mask their value when we access the ring[]
- * field. Thanks to this assumption, we can do subtractions between 2 index
- * values in a modulo-32bit base: that's why the overflow of the indexes is not
- * a problem.
- */
-struct rte_ring {
-	/*
-	 * Note: this field kept the RTE_MEMZONE_NAMESIZE size due to ABI
-	 * compatibility requirements, it could be changed to RTE_RING_NAMESIZE
-	 * next time the ABI changes
-	 */
-	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
-	const struct rte_memzone *memzone;
-			/**< Memzone, if any, containing the rte_ring */
-
-	/** Ring producer status. */
-	struct prod {
-		uint32_t watermark;      /**< Maximum items before EDQUOT. */
-		uint32_t sp_enqueue;     /**< True, if single producer. */
-		uint32_t size;           /**< Size of ring. */
-		uint32_t mask;           /**< Mask (size-1) of ring. */
-		volatile uint32_t head;  /**< Producer head. */
-		volatile uint32_t tail;  /**< Producer tail. */
-	} prod __rte_cache_aligned;
-
-	/** Ring consumer status. */
-	struct cons {
-		uint32_t sc_dequeue;     /**< True, if single consumer. */
-		uint32_t size;           /**< Size of the ring. */
-		uint32_t mask;           /**< Mask (size-1) of ring. */
-		volatile uint32_t head;  /**< Consumer head. */
-		volatile uint32_t tail;  /**< Consumer tail. */
-#ifdef RTE_RING_SPLIT_PROD_CONS
-	} cons __rte_cache_aligned;
-#else
-	} cons;
-#endif
-
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-#endif
-
-	void *ring[] __rte_cache_aligned;   /**< Memory space of ring starts here.
-	                                     * not volatile so need to be careful
-	                                     * about compiler re-ordering */
-};
-
-#define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
-#define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
-#define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
-
-/**
- * @internal When debug is enabled, store ring statistics.
- * @param r
- *   A pointer to the ring.
- * @param name
- *   The name of the statistics field to increment in the ring.
- * @param n
- *   The number to add to the object-oriented statistics.
- */
-#ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {                        \
-		unsigned __lcore_id = rte_lcore_id();           \
-		if (__lcore_id < RTE_MAX_LCORE) {               \
-			r->stats[__lcore_id].name##_objs += n;  \
-			r->stats[__lcore_id].name##_bulk += 1;  \
-		}                                               \
-	} while(0)
-#else
-#define __RING_STAT_ADD(r, name, n) do {} while(0)
-#endif
-
-/**
- * Calculate the memory size needed for a ring
- *
- * This function returns the number of bytes needed for a ring, given
- * the number of elements in it. This value is the sum of the size of
- * the structure rte_ring and the size of the memory needed by the
- * objects pointers. The value is aligned to a cache line size.
- *
- * @param count
- *   The number of elements in the ring (must be a power of 2).
- * @return
- *   - The memory size needed for the ring on success.
- *   - -EINVAL if count is not a power of 2.
- */
-ssize_t rte_ring_get_memsize(unsigned count);
-
-/**
- * Initialize a ring structure.
- *
- * Initialize a ring structure in memory pointed by "r". The size of the
- * memory area must be large enough to store the ring structure and the
- * object table. It is advised to use rte_ring_get_memsize() to get the
- * appropriate size.
- *
- * The ring size is set to *count*, which must be a power of two. Water
- * marking is disabled by default. The real usable ring size is
- * *count-1* instead of *count* to differentiate a free ring from an
- * empty ring.
- *
- * The ring is not added in RTE_TAILQ_RING global list. Indeed, the
- * memory given by the caller may not be shareable among dpdk
- * processes.
- *
- * @param r
- *   The pointer to the ring structure followed by the objects table.
- * @param name
- *   The name of the ring.
- * @param count
- *   The number of elements in the ring (must be a power of 2).
- * @param flags
- *   An OR of the following:
- *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
- *      is "single-producer". Otherwise, it is "multi-producers".
- *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
- *      is "single-consumer". Otherwise, it is "multi-consumers".
- * @return
- *   0 on success, or a negative value on error.
- */
-int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
-	unsigned flags);
-
-/**
- * Create a new ring named *name* in memory.
- *
- * This function uses ``memzone_reserve()`` to allocate memory. Then it
- * calls rte_ring_init() to initialize an empty ring.
- *
- * The new ring size is set to *count*, which must be a power of
- * two. Water marking is disabled by default. The real usable ring size
- * is *count-1* instead of *count* to differentiate a free ring from an
- * empty ring.
- *
- * The ring is added in RTE_TAILQ_RING list.
- *
- * @param name
- *   The name of the ring.
- * @param count
- *   The size of the ring (must be a power of 2).
- * @param socket_id
- *   The *socket_id* argument is the socket identifier in case of
- *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
- *   constraint for the reserved zone.
- * @param flags
- *   An OR of the following:
- *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
- *      is "single-producer". Otherwise, it is "multi-producers".
- *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
- *      is "single-consumer". Otherwise, it is "multi-consumers".
- * @return
- *   On success, the pointer to the new allocated ring. NULL on error with
- *    rte_errno set appropriately. Possible errno values include:
- *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
- *    - E_RTE_SECONDARY - function was called from a secondary process instance
- *    - EINVAL - count provided is not a power of 2
- *    - ENOSPC - the maximum number of memzones has already been allocated
- *    - EEXIST - a memzone with the same name already exists
- *    - ENOMEM - no appropriate memory area found in which to create memzone
- */
-struct rte_ring *rte_ring_create(const char *name, unsigned count,
-				 int socket_id, unsigned flags);
-/**
- * De-allocate all memory used by the ring.
- *
- * @param r
- *   Ring to free
- */
-void rte_ring_free(struct rte_ring *r);
-
-/**
- * Change the high water mark.
- *
- * If *count* is 0, water marking is disabled. Otherwise, it is set to the
- * *count* value. The *count* value must be greater than 0 and less
- * than the ring size.
- *
- * This function can be called at any time (not necessarily at
- * initialization).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param count
- *   The new water mark value.
- * @return
- *   - 0: Success; water mark changed.
- *   - -EINVAL: Invalid water mark value.
- */
-int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
-
-/**
- * Dump the status of the ring to a file.
- *
- * @param f
- *   A pointer to a file for output
- * @param r
- *   A pointer to the ring structure.
- */
-void rte_ring_dump(FILE *f, const struct rte_ring *r);
-
-/* the actual enqueue of pointers on the ring.
- * Placed here since identical code needed in both
- * single and multi producer enqueue functions */
-#define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
-	uint32_t idx = prod_head & mask; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
-			r->ring[idx] = obj_table[i]; \
-			r->ring[idx+1] = obj_table[i+1]; \
-			r->ring[idx+2] = obj_table[i+2]; \
-			r->ring[idx+3] = obj_table[i+3]; \
-		} \
-		switch (n & 0x3) { \
-			case 3: r->ring[idx++] = obj_table[i++]; \
-			case 2: r->ring[idx++] = obj_table[i++]; \
-			case 1: r->ring[idx++] = obj_table[i++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			r->ring[idx] = obj_table[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			r->ring[idx] = obj_table[i]; \
-	} \
-} while(0)
-
-/* the actual copy of pointers on the ring to obj_table.
- * Placed here since identical code needed in both
- * single and multi consumer dequeue functions */
-#define DEQUEUE_PTRS() do { \
-	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
-			obj_table[i] = r->ring[idx]; \
-			obj_table[i+1] = r->ring[idx+1]; \
-			obj_table[i+2] = r->ring[idx+2]; \
-			obj_table[i+3] = r->ring[idx+3]; \
-		} \
-		switch (n & 0x3) { \
-			case 3: obj_table[i++] = r->ring[idx++]; \
-			case 2: obj_table[i++] = r->ring[idx++]; \
-			case 1: obj_table[i++] = r->ring[idx++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj_table[i] = r->ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj_table[i] = r->ring[idx]; \
-	} \
-} while (0)
-
-/**
- * @internal Enqueue several objects on the ring (multi-producers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * producer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
- * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
- */
-static inline int __attribute__((always_inline))
-__rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
-			 unsigned n, enum rte_ring_queue_behavior behavior)
-{
-	uint32_t prod_head, prod_next;
-	uint32_t cons_tail, free_entries;
-	const unsigned max = n;
-	int success;
-	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
-	int ret;
-
-	/* Avoid the unnecessary cmpset operation below, which is also
-	 * potentially harmful when n equals 0. */
-	if (n == 0)
-		return 0;
-
-	/* move prod.head atomically */
-	do {
-		/* Reset n to the initial burst count */
-		n = max;
-
-		prod_head = r->prod.head;
-		cons_tail = r->cons.tail;
-		/* The subtraction is done between two unsigned 32bits value
-		 * (the result is always modulo 32 bits even if we have
-		 * prod_head > cons_tail). So 'free_entries' is always between 0
-		 * and size(ring)-1. */
-		free_entries = (mask + cons_tail - prod_head);
-
-		/* check that we have enough room in ring */
-		if (unlikely(n > free_entries)) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, enq_fail, n);
-				return -ENOBUFS;
-			}
-			else {
-				/* No free entry available */
-				if (unlikely(free_entries == 0)) {
-					__RING_STAT_ADD(r, enq_fail, n);
-					return 0;
-				}
-
-				n = free_entries;
-			}
-		}
-
-		prod_next = prod_head + n;
-		success = rte_atomic32_cmpset(&r->prod.head, prod_head,
-					      prod_next);
-	} while (unlikely(success == 0));
-
-	/* write entries in ring */
-	ENQUEUE_PTRS();
-	rte_smp_wmb();
-
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-				(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
-
-	/*
-	 * If there are other enqueues in progress that preceded us,
-	 * we need to wait for them to complete
-	 */
-	while (unlikely(r->prod.tail != prod_head)) {
-		rte_pause();
-
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
-	r->prod.tail = prod_next;
-	return ret;
-}
-
-/**
- * @internal Enqueue several objects on a ring (NOT multi-producers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
- * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
- */
-static inline int __attribute__((always_inline))
-__rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
-			 unsigned n, enum rte_ring_queue_behavior behavior)
-{
-	uint32_t prod_head, cons_tail;
-	uint32_t prod_next, free_entries;
-	unsigned i;
-	uint32_t mask = r->prod.mask;
-	int ret;
-
-	prod_head = r->prod.head;
-	cons_tail = r->cons.tail;
-	/* The subtraction is done between two unsigned 32bits value
-	 * (the result is always modulo 32 bits even if we have
-	 * prod_head > cons_tail). So 'free_entries' is always between 0
-	 * and size(ring)-1. */
-	free_entries = mask + cons_tail - prod_head;
-
-	/* check that we have enough room in ring */
-	if (unlikely(n > free_entries)) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, enq_fail, n);
-			return -ENOBUFS;
-		}
-		else {
-			/* No free entry available */
-			if (unlikely(free_entries == 0)) {
-				__RING_STAT_ADD(r, enq_fail, n);
-				return 0;
-			}
-
-			n = free_entries;
-		}
-	}
-
-	prod_next = prod_head + n;
-	r->prod.head = prod_next;
-
-	/* write entries in ring */
-	ENQUEUE_PTRS();
-	rte_smp_wmb();
-
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-			(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
-
-	r->prod.tail = prod_next;
-	return ret;
-}
-
-/**
- * @internal Dequeue several objects from a ring (multi-consumers safe). When
- * the request objects are more than the available objects, only dequeue the
- * actual number of objects
- *
- * This function uses a "compare and set" instruction to move the
- * consumer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
- * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
- */
-
-static inline int __attribute__((always_inline))
-__rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
-{
-	uint32_t cons_head, prod_tail;
-	uint32_t cons_next, entries;
-	const unsigned max = n;
-	int success;
-	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
-
-	/* Avoid the unnecessary cmpset operation below, which is also
-	 * potentially harmful when n equals 0. */
-	if (n == 0)
-		return 0;
-
-	/* move cons.head atomically */
-	do {
-		/* Restore n as it may change every loop */
-		n = max;
-
-		cons_head = r->cons.head;
-		prod_tail = r->prod.tail;
-		/* The subtraction is done between two unsigned 32bits value
-		 * (the result is always modulo 32 bits even if we have
-		 * cons_head > prod_tail). So 'entries' is always between 0
-		 * and size(ring)-1. */
-		entries = (prod_tail - cons_head);
-
-		/* Set the actual entries for dequeue */
-		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, deq_fail, n);
-				return -ENOENT;
-			}
-			else {
-				if (unlikely(entries == 0)){
-					__RING_STAT_ADD(r, deq_fail, n);
-					return 0;
-				}
-
-				n = entries;
-			}
-		}
-
-		cons_next = cons_head + n;
-		success = rte_atomic32_cmpset(&r->cons.head, cons_head,
-					      cons_next);
-	} while (unlikely(success == 0));
-
-	/* copy in table */
-	DEQUEUE_PTRS();
-	rte_smp_rmb();
-
-	/*
-	 * If there are other dequeues in progress that preceded us,
-	 * we need to wait for them to complete
-	 */
-	while (unlikely(r->cons.tail != cons_head)) {
-		rte_pause();
-
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
-	__RING_STAT_ADD(r, deq_success, n);
-	r->cons.tail = cons_next;
-
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
-}
-
-/**
- * @internal Dequeue several objects from a ring (NOT multi-consumers safe).
- * When the request objects are more than the available objects, only dequeue
- * the actual number of objects
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
- * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
- */
-static inline int __attribute__((always_inline))
-__rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
-{
-	uint32_t cons_head, prod_tail;
-	uint32_t cons_next, entries;
-	unsigned i;
-	uint32_t mask = r->prod.mask;
-
-	cons_head = r->cons.head;
-	prod_tail = r->prod.tail;
-	/* The subtraction is done between two unsigned 32bits value
-	 * (the result is always modulo 32 bits even if we have
-	 * cons_head > prod_tail). So 'entries' is always between 0
-	 * and size(ring)-1. */
-	entries = prod_tail - cons_head;
-
-	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, deq_fail, n);
-			return -ENOENT;
-		}
-		else {
-			if (unlikely(entries == 0)){
-				__RING_STAT_ADD(r, deq_fail, n);
-				return 0;
-			}
-
-			n = entries;
-		}
-	}
-
-	cons_next = cons_head + n;
-	r->cons.head = cons_next;
-
-	/* copy in table */
-	DEQUEUE_PTRS();
-	rte_smp_rmb();
-
-	__RING_STAT_ADD(r, deq_success, n);
-	r->cons.tail = cons_next;
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
-}
-
-/**
- * Enqueue several objects on the ring (multi-producers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * producer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
-			 unsigned n)
-{
-	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
-}
-
-/**
- * Enqueue several objects on a ring (NOT multi-producers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
-			 unsigned n)
-{
-	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
-}
-
-/**
- * Enqueue several objects on a ring.
- *
- * This function calls the multi-producer or the single-producer
- * version depending on the default behavior that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
-		      unsigned n)
-{
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue_bulk(r, obj_table, n);
-	else
-		return rte_ring_mp_enqueue_bulk(r, obj_table, n);
-}
-
-/**
- * Enqueue one object on a ring (multi-producers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * producer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj
- *   A pointer to the object to be added.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
-{
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
-}
-
-/**
- * Enqueue one object on a ring (NOT multi-producers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj
- *   A pointer to the object to be added.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
-{
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
-}
-
-/**
- * Enqueue one object on a ring.
- *
- * This function calls the multi-producer or the single-producer
- * version, depending on the default behaviour that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj
- *   A pointer to the object to be added.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_enqueue(struct rte_ring *r, void *obj)
-{
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue(r, obj);
-	else
-		return rte_ring_mp_enqueue(r, obj);
-}
-
-/**
- * Dequeue several objects from a ring (multi-consumers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * consumer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
-}
-
-/**
- * Dequeue several objects from a ring (NOT multi-consumers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table,
- *   must be strictly positive.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
-}
-
-/**
- * Dequeue several objects from a ring.
- *
- * This function calls the multi-consumers or the single-consumer
- * version, depending on the default behaviour that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
-	else
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
-}
-
-/**
- * Dequeue one object from a ring (multi-consumers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * consumer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
-{
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
-}
-
-/**
- * Dequeue one object from a ring (NOT multi-consumers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
-{
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
-}
-
-/**
- * Dequeue one object from a ring.
- *
- * This function calls the multi-consumers or the single-consumer
- * version depending on the default behaviour that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
- * @return
- *   - 0: Success, objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_dequeue(struct rte_ring *r, void **obj_p)
-{
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue(r, obj_p);
-	else
-		return rte_ring_mc_dequeue(r, obj_p);
-}
-
-/**
- * Test if a ring is full.
- *
- * @param r
- *   A pointer to the ring structure.
- * @return
- *   - 1: The ring is full.
- *   - 0: The ring is not full.
- */
-static inline int
-rte_ring_full(const struct rte_ring *r)
-{
-	uint32_t prod_tail = r->prod.tail;
-	uint32_t cons_tail = r->cons.tail;
-	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
-}
-
-/**
- * Test if a ring is empty.
- *
- * @param r
- *   A pointer to the ring structure.
- * @return
- *   - 1: The ring is empty.
- *   - 0: The ring is not empty.
- */
-static inline int
-rte_ring_empty(const struct rte_ring *r)
-{
-	uint32_t prod_tail = r->prod.tail;
-	uint32_t cons_tail = r->cons.tail;
-	return !!(cons_tail == prod_tail);
-}
-
-/**
- * Return the number of entries in a ring.
- *
- * @param r
- *   A pointer to the ring structure.
- * @return
- *   The number of entries in the ring.
- */
-static inline unsigned
-rte_ring_count(const struct rte_ring *r)
-{
-	uint32_t prod_tail = r->prod.tail;
-	uint32_t cons_tail = r->cons.tail;
-	return (prod_tail - cons_tail) & r->prod.mask;
-}
-
-/**
- * Return the number of free entries in a ring.
- *
- * @param r
- *   A pointer to the ring structure.
- * @return
- *   The number of free entries in the ring.
- */
-static inline unsigned
-rte_ring_free_count(const struct rte_ring *r)
-{
-	uint32_t prod_tail = r->prod.tail;
-	uint32_t cons_tail = r->cons.tail;
-	return (cons_tail - prod_tail - 1) & r->prod.mask;
-}
-
-/**
- * Dump the status of all rings on the console
- *
- * @param f
- *   A pointer to a file for output
- */
-void rte_ring_list_dump(FILE *f);
-
-/**
- * Search a ring from its name
- *
- * @param name
- *   The name of the ring.
- * @return
- *   The pointer to the ring matching the name, or NULL if not found,
- *   with rte_errno set appropriately. Possible rte_errno values include:
- *    - ENOENT - required entry not available to return.
- */
-struct rte_ring *rte_ring_lookup(const char *name);
-
-/**
- * Enqueue several objects on the ring (multi-producers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * producer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - n: Actual number of objects enqueued.
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
-			 unsigned n)
-{
-	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
-}
-
-/**
- * Enqueue several objects on a ring (NOT multi-producers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - n: Actual number of objects enqueued.
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
-			 unsigned n)
-{
-	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
-}
-
-/**
- * Enqueue several objects on a ring.
- *
- * This function calls the multi-producer or the single-producer
- * version depending on the default behavior that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - n: Actual number of objects enqueued.
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
-		      unsigned n)
-{
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue_burst(r, obj_table, n);
-	else
-		return rte_ring_mp_enqueue_burst(r, obj_table, n);
-}
-
-/**
- * Dequeue several objects from a ring (multi-consumers safe). When the request
- * objects are more than the available objects, only dequeue the actual number
- * of objects
- *
- * This function uses a "compare and set" instruction to move the
- * consumer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - n: Actual number of objects dequeued, 0 if ring is empty
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
-}
-
-/**
- * Dequeue several objects from a ring (NOT multi-consumers safe).When the
- * request objects are more than the available objects, only dequeue the
- * actual number of objects
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - n: Actual number of objects dequeued, 0 if ring is empty
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
-}
-
-/**
- * Dequeue multiple objects from a ring up to a maximum number.
- *
- * This function calls the multi-consumers or the single-consumer
- * version, depending on the default behaviour that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - Number of objects dequeued
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_burst(r, obj_table, n);
-	else
-		return rte_ring_mc_dequeue_burst(r, obj_table, n);
-}
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _RTE_RING_H_ */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
new file mode 120000
index 0000000..54dad23
--- /dev/null
+++ b/lib/librte_ring/rte_ring.h
@@ -0,0 +1 @@
+rte_common_ring.h
\ No newline at end of file
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH RFCv2 2/4] ring: separate common and rte_ring specific functions
    2017-01-24 10:39  2% ` [dpdk-dev] [PATCH RFCv2 1/4] ring: create common ring files Bruce Richardson
@ 2017-01-24 10:39  1% ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2017-01-24 10:39 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Provide a separate rte_ring implementation which just calls into the
common ring code. This allows us to generalise the common ring code
without affecting the API/ABI of the rte_ring. The common functions
are now all renamed to have an rte_common_ring prefix.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_ring/Makefile          |   1 +
 lib/librte_ring/rte_common_ring.c |  57 ++--
 lib/librte_ring/rte_common_ring.h | 463 ++-----------------------
 lib/librte_ring/rte_ring.c        |  86 +++++
 lib/librte_ring/rte_ring.h        | 692 +++++++++++++++++++++++++++++++++++++-
 5 files changed, 832 insertions(+), 467 deletions(-)
 create mode 100644 lib/librte_ring/rte_ring.c
 mode change 120000 => 100644 lib/librte_ring/rte_ring.h

diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 1e2396e..5cebb29 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -42,6 +42,7 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_RING) += rte_common_ring.c
+SRCS-$(CONFIG_RTE_LIBRTE_RING) += rte_ring.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include += rte_ring.h
diff --git a/lib/librte_ring/rte_common_ring.c b/lib/librte_ring/rte_common_ring.c
index ca0a108..a0c4b5a 100644
--- a/lib/librte_ring/rte_common_ring.c
+++ b/lib/librte_ring/rte_common_ring.c
@@ -89,19 +89,19 @@
 
 #include "rte_ring.h"
 
-TAILQ_HEAD(rte_ring_list, rte_tailq_entry);
+TAILQ_HEAD(rte_common_ring_list, rte_tailq_entry);
 
-static struct rte_tailq_elem rte_ring_tailq = {
+static struct rte_tailq_elem rte_common_ring_tailq = {
 	.name = RTE_TAILQ_RING_NAME,
 };
-EAL_REGISTER_TAILQ(rte_ring_tailq)
+EAL_REGISTER_TAILQ(rte_common_ring_tailq)
 
 /* true if x is a power of 2 */
 #define POWEROF2(x) ((((x)-1) & (x)) == 0)
 
 /* return the size of memory occupied by a ring */
 ssize_t
-rte_ring_get_memsize(unsigned count)
+rte_common_ring_get_memsize(unsigned count)
 {
 	ssize_t sz;
 
@@ -119,7 +119,7 @@ rte_ring_get_memsize(unsigned count)
 }
 
 int
-rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
+rte_common_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	unsigned flags)
 {
 	int ret;
@@ -134,7 +134,7 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #ifdef RTE_LIBRTE_RING_DEBUG
-	RTE_BUILD_BUG_ON((sizeof(struct rte_ring_debug_stats) &
+	RTE_BUILD_BUG_ON((sizeof(struct rte_common_ring_debug_stats) &
 			  RTE_CACHE_LINE_MASK) != 0);
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, stats) &
 			  RTE_CACHE_LINE_MASK) != 0);
@@ -159,7 +159,7 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 
 /* create the ring */
 struct rte_ring *
-rte_ring_create(const char *name, unsigned count, int socket_id,
+rte_common_ring_create(const char *name, unsigned count, int socket_id,
 		unsigned flags)
 {
 	char mz_name[RTE_MEMZONE_NAMESIZE];
@@ -168,12 +168,12 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 	const struct rte_memzone *mz;
 	ssize_t ring_size;
 	int mz_flags = 0;
-	struct rte_ring_list* ring_list = NULL;
+	struct rte_common_ring_list* ring_list = NULL;
 	int ret;
 
-	ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
+	ring_list = RTE_TAILQ_CAST(rte_common_ring_tailq.head, rte_common_ring_list);
 
-	ring_size = rte_ring_get_memsize(count);
+	ring_size = rte_common_ring_get_memsize(count);
 	if (ring_size < 0) {
 		rte_errno = ring_size;
 		return NULL;
@@ -203,7 +203,7 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 		r = mz->addr;
 		/* no need to check return value here, we already checked the
 		 * arguments above */
-		rte_ring_init(r, name, count, flags);
+		rte_common_ring_init(r, name, count, flags);
 
 		te->data = (void *) r;
 		r->memzone = mz;
@@ -221,20 +221,20 @@ rte_ring_create(const char *name, unsigned count, int socket_id,
 
 /* free the ring */
 void
-rte_ring_free(struct rte_ring *r)
+rte_common_ring_free(struct rte_ring *r)
 {
-	struct rte_ring_list *ring_list = NULL;
+	struct rte_common_ring_list *ring_list = NULL;
 	struct rte_tailq_entry *te;
 
 	if (r == NULL)
 		return;
 
 	/*
-	 * Ring was not created with rte_ring_create,
+	 * Ring was not created with rte_common_ring_create,
 	 * therefore, there is no memzone to free.
 	 */
 	if (r->memzone == NULL) {
-		RTE_LOG(ERR, RING, "Cannot free ring (not created with rte_ring_create()");
+		RTE_LOG(ERR, RING, "Cannot free ring (not created with rte_common_ring_create()");
 		return;
 	}
 
@@ -243,7 +243,7 @@ rte_ring_free(struct rte_ring *r)
 		return;
 	}
 
-	ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
+	ring_list = RTE_TAILQ_CAST(rte_common_ring_tailq.head, rte_common_ring_list);
 	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
 
 	/* find out tailq entry */
@@ -269,7 +269,7 @@ rte_ring_free(struct rte_ring *r)
  * disabled
  */
 int
-rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
+rte_common_ring_set_water_mark(struct rte_ring *r, unsigned count)
 {
 	if (count >= r->prod.size)
 		return -EINVAL;
@@ -284,10 +284,10 @@ rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 
 /* dump the status of the ring on the console */
 void
-rte_ring_dump(FILE *f, const struct rte_ring *r)
+rte_common_ring_dump(FILE *f, const struct rte_ring *r)
 {
 #ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats sum;
+	struct rte_common_ring_debug_stats sum;
 	unsigned lcore_id;
 #endif
 
@@ -298,8 +298,8 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
-	fprintf(f, "  used=%u\n", rte_ring_count(r));
-	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
+	fprintf(f, "  used=%u\n", rte_common_ring_count(r));
+	fprintf(f, "  avail=%u\n", rte_common_ring_free_count(r));
 	if (r->prod.watermark == r->prod.size)
 		fprintf(f, "  watermark=0\n");
 	else
@@ -338,17 +338,17 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 
 /* dump the status of all rings on the console */
 void
-rte_ring_list_dump(FILE *f)
+rte_common_ring_list_dump(FILE *f)
 {
 	const struct rte_tailq_entry *te;
-	struct rte_ring_list *ring_list;
+	struct rte_common_ring_list *ring_list;
 
-	ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
+	ring_list = RTE_TAILQ_CAST(rte_common_ring_tailq.head, rte_common_ring_list);
 
 	rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
 
 	TAILQ_FOREACH(te, ring_list, next) {
-		rte_ring_dump(f, (struct rte_ring *) te->data);
+		rte_common_ring_dump(f, (struct rte_ring *) te->data);
 	}
 
 	rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -356,13 +356,14 @@ rte_ring_list_dump(FILE *f)
 
 /* search a ring from its name */
 struct rte_ring *
-rte_ring_lookup(const char *name)
+rte_common_ring_lookup(const char *name)
 {
 	struct rte_tailq_entry *te;
 	struct rte_ring *r = NULL;
-	struct rte_ring_list *ring_list;
+	struct rte_common_ring_list *ring_list;
 
-	ring_list = RTE_TAILQ_CAST(rte_ring_tailq.head, rte_ring_list);
+	ring_list = RTE_TAILQ_CAST(rte_common_ring_tailq.head,
+		rte_common_ring_list);
 
 	rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
 
diff --git a/lib/librte_ring/rte_common_ring.h b/lib/librte_ring/rte_common_ring.h
index e359aff..f2c1c46 100644
--- a/lib/librte_ring/rte_common_ring.h
+++ b/lib/librte_ring/rte_common_ring.h
@@ -63,8 +63,8 @@
  *
  ***************************************************************************/
 
-#ifndef _RTE_RING_H_
-#define _RTE_RING_H_
+#ifndef _RTE_COMMON_RING_H_
+#define _RTE_COMMON_RING_H_
 
 /**
  * @file
@@ -232,14 +232,14 @@ struct rte_ring {
  *   - The memory size needed for the ring on success.
  *   - -EINVAL if count is not a power of 2.
  */
-ssize_t rte_ring_get_memsize(unsigned count);
+ssize_t rte_common_ring_get_memsize(unsigned count);
 
 /**
  * Initialize a ring structure.
  *
  * Initialize a ring structure in memory pointed by "r". The size of the
  * memory area must be large enough to store the ring structure and the
- * object table. It is advised to use rte_ring_get_memsize() to get the
+ * object table. It is advised to use rte_common_ring_get_memsize() to get the
  * appropriate size.
  *
  * The ring size is set to *count*, which must be a power of two. Water
@@ -260,22 +260,22 @@ ssize_t rte_ring_get_memsize(unsigned count);
  * @param flags
  *   An OR of the following:
  *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      using ``rte_common_ring_enqueue()`` or ``rte_common_ring_enqueue_bulk()``
  *      is "single-producer". Otherwise, it is "multi-producers".
  *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      using ``rte_common_ring_dequeue()`` or ``rte_common_ring_dequeue_bulk()``
  *      is "single-consumer". Otherwise, it is "multi-consumers".
  * @return
  *   0 on success, or a negative value on error.
  */
-int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
+int rte_common_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	unsigned flags);
 
 /**
  * Create a new ring named *name* in memory.
  *
  * This function uses ``memzone_reserve()`` to allocate memory. Then it
- * calls rte_ring_init() to initialize an empty ring.
+ * calls rte_common_ring_init() to initialize an empty ring.
  *
  * The new ring size is set to *count*, which must be a power of
  * two. Water marking is disabled by default. The real usable ring size
@@ -295,10 +295,10 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  * @param flags
  *   An OR of the following:
  *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
- *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      using ``rte_common_ring_enqueue()`` or ``rte_common_ring_enqueue_bulk()``
  *      is "single-producer". Otherwise, it is "multi-producers".
  *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
- *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      using ``rte_common_ring_dequeue()`` or ``rte_common_ring_dequeue_bulk()``
  *      is "single-consumer". Otherwise, it is "multi-consumers".
  * @return
  *   On success, the pointer to the new allocated ring. NULL on error with
@@ -310,7 +310,7 @@ int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
  *    - EEXIST - a memzone with the same name already exists
  *    - ENOMEM - no appropriate memory area found in which to create memzone
  */
-struct rte_ring *rte_ring_create(const char *name, unsigned count,
+struct rte_ring *rte_common_ring_create(const char *name, unsigned count,
 				 int socket_id, unsigned flags);
 /**
  * De-allocate all memory used by the ring.
@@ -318,7 +318,7 @@ struct rte_ring *rte_ring_create(const char *name, unsigned count,
  * @param r
  *   Ring to free
  */
-void rte_ring_free(struct rte_ring *r);
+void rte_common_ring_free(struct rte_ring *r);
 
 /**
  * Change the high water mark.
@@ -338,7 +338,7 @@ void rte_ring_free(struct rte_ring *r);
  *   - 0: Success; water mark changed.
  *   - -EINVAL: Invalid water mark value.
  */
-int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
+int rte_common_ring_set_water_mark(struct rte_ring *r, unsigned count);
 
 /**
  * Dump the status of the ring to a file.
@@ -348,7 +348,7 @@ int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
  * @param r
  *   A pointer to the ring structure.
  */
-void rte_ring_dump(FILE *f, const struct rte_ring *r);
+void rte_common_ring_dump(FILE *f, const struct rte_ring *r);
 
 /* the actual enqueue of pointers on the ring.
  * Placed here since identical code needed in both
@@ -428,7 +428,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   - n: Actual number of objects enqueued.
  */
 static inline int __attribute__((always_inline))
-__rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
+__rte_common_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
 	uint32_t prod_head, prod_next;
@@ -537,7 +537,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   - n: Actual number of objects enqueued.
  */
 static inline int __attribute__((always_inline))
-__rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
+__rte_common_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
 	uint32_t prod_head, cons_tail;
@@ -621,7 +621,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  */
 
 static inline int __attribute__((always_inline))
-__rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
+__rte_common_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
 	uint32_t cons_head, prod_tail;
@@ -720,7 +720,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   - n: Actual number of objects dequeued.
  */
 static inline int __attribute__((always_inline))
-__rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
+__rte_common_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
 	uint32_t cons_head, prod_tail;
@@ -764,284 +764,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 }
 
 /**
- * Enqueue several objects on the ring (multi-producers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * producer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
-			 unsigned n)
-{
-	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
-}
-
-/**
- * Enqueue several objects on a ring (NOT multi-producers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
-			 unsigned n)
-{
-	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
-}
-
-/**
- * Enqueue several objects on a ring.
- *
- * This function calls the multi-producer or the single-producer
- * version depending on the default behavior that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
-		      unsigned n)
-{
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue_bulk(r, obj_table, n);
-	else
-		return rte_ring_mp_enqueue_bulk(r, obj_table, n);
-}
-
-/**
- * Enqueue one object on a ring (multi-producers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * producer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj
- *   A pointer to the object to be added.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
-{
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
-}
-
-/**
- * Enqueue one object on a ring (NOT multi-producers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj
- *   A pointer to the object to be added.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
-{
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
-}
-
-/**
- * Enqueue one object on a ring.
- *
- * This function calls the multi-producer or the single-producer
- * version, depending on the default behaviour that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj
- *   A pointer to the object to be added.
- * @return
- *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_enqueue(struct rte_ring *r, void *obj)
-{
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue(r, obj);
-	else
-		return rte_ring_mp_enqueue(r, obj);
-}
-
-/**
- * Dequeue several objects from a ring (multi-consumers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * consumer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
-}
-
-/**
- * Dequeue several objects from a ring (NOT multi-consumers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table,
- *   must be strictly positive.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
-}
-
-/**
- * Dequeue several objects from a ring.
- *
- * This function calls the multi-consumers or the single-consumer
- * version, depending on the default behaviour that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
-	else
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
-}
-
-/**
- * Dequeue one object from a ring (multi-consumers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * consumer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
-{
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
-}
-
-/**
- * Dequeue one object from a ring (NOT multi-consumers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
- * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
-{
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
-}
-
-/**
- * Dequeue one object from a ring.
- *
- * This function calls the multi-consumers or the single-consumer
- * version depending on the default behaviour that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_p
- *   A pointer to a void * pointer (object) that will be filled.
- * @return
- *   - 0: Success, objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
- */
-static inline int __attribute__((always_inline))
-rte_ring_dequeue(struct rte_ring *r, void **obj_p)
-{
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue(r, obj_p);
-	else
-		return rte_ring_mc_dequeue(r, obj_p);
-}
-
-/**
  * Test if a ring is full.
  *
  * @param r
@@ -1051,7 +773,7 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
  *   - 0: The ring is not full.
  */
 static inline int
-rte_ring_full(const struct rte_ring *r)
+rte_common_ring_full(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
@@ -1068,7 +790,7 @@ rte_ring_full(const struct rte_ring *r)
  *   - 0: The ring is not empty.
  */
 static inline int
-rte_ring_empty(const struct rte_ring *r)
+rte_common_ring_empty(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
@@ -1084,7 +806,7 @@ rte_ring_empty(const struct rte_ring *r)
  *   The number of entries in the ring.
  */
 static inline unsigned
-rte_ring_count(const struct rte_ring *r)
+rte_common_ring_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
@@ -1100,7 +822,7 @@ rte_ring_count(const struct rte_ring *r)
  *   The number of free entries in the ring.
  */
 static inline unsigned
-rte_ring_free_count(const struct rte_ring *r)
+rte_common_ring_free_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
@@ -1113,7 +835,7 @@ rte_ring_free_count(const struct rte_ring *r)
  * @param f
  *   A pointer to a file for output
  */
-void rte_ring_list_dump(FILE *f);
+void rte_common_ring_list_dump(FILE *f);
 
 /**
  * Search a ring from its name
@@ -1125,145 +847,10 @@ void rte_ring_list_dump(FILE *f);
  *   with rte_errno set appropriately. Possible rte_errno values include:
  *    - ENOENT - required entry not available to return.
  */
-struct rte_ring *rte_ring_lookup(const char *name);
-
-/**
- * Enqueue several objects on the ring (multi-producers safe).
- *
- * This function uses a "compare and set" instruction to move the
- * producer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - n: Actual number of objects enqueued.
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
-			 unsigned n)
-{
-	return __rte_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
-}
-
-/**
- * Enqueue several objects on a ring (NOT multi-producers safe).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - n: Actual number of objects enqueued.
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
-			 unsigned n)
-{
-	return __rte_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
-}
-
-/**
- * Enqueue several objects on a ring.
- *
- * This function calls the multi-producer or the single-producer
- * version depending on the default behavior that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @return
- *   - n: Actual number of objects enqueued.
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
-		      unsigned n)
-{
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue_burst(r, obj_table, n);
-	else
-		return rte_ring_mp_enqueue_burst(r, obj_table, n);
-}
-
-/**
- * Dequeue several objects from a ring (multi-consumers safe). When the request
- * objects are more than the available objects, only dequeue the actual number
- * of objects
- *
- * This function uses a "compare and set" instruction to move the
- * consumer index atomically.
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - n: Actual number of objects dequeued, 0 if ring is empty
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
-}
-
-/**
- * Dequeue several objects from a ring (NOT multi-consumers safe).When the
- * request objects are more than the available objects, only dequeue the
- * actual number of objects
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - n: Actual number of objects dequeued, 0 if ring is empty
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
-}
-
-/**
- * Dequeue multiple objects from a ring up to a maximum number.
- *
- * This function calls the multi-consumers or the single-consumer
- * version, depending on the default behaviour that was specified at
- * ring creation time (see flags).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects) that will be filled.
- * @param n
- *   The number of objects to dequeue from the ring to the obj_table.
- * @return
- *   - Number of objects dequeued
- */
-static inline unsigned __attribute__((always_inline))
-rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
-{
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_burst(r, obj_table, n);
-	else
-		return rte_ring_mc_dequeue_burst(r, obj_table, n);
-}
+struct rte_ring *rte_common_ring_lookup(const char *name);
 
 #ifdef __cplusplus
 }
 #endif
 
-#endif /* _RTE_RING_H_ */
+#endif /* _RTE_COMMON_RING_H_ */
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
new file mode 100644
index 0000000..16ddc39
--- /dev/null
+++ b/lib/librte_ring/rte_ring.c
@@ -0,0 +1,86 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "rte_ring.h"
+
+ssize_t
+rte_ring_get_memsize(unsigned count)
+{
+	return rte_common_ring_get_memsize(count);
+}
+
+int
+rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
+	unsigned flags)
+{
+	return rte_common_ring_init(r, name, count, flags);
+}
+
+
+struct rte_ring *
+rte_ring_create(const char *name, unsigned count, int socket_id,
+		unsigned flags)
+{
+	return rte_common_ring_create(name, count, socket_id, flags);
+}
+
+void
+rte_ring_free(struct rte_ring *r)
+{
+	return rte_common_ring_free(r);
+}
+
+int
+rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
+{
+	return rte_common_ring_set_water_mark(r, count);
+}
+
+void
+rte_ring_dump(FILE *f, const struct rte_ring *r)
+{
+	return rte_common_ring_dump(f, r);
+}
+
+void
+rte_ring_list_dump(FILE *f)
+{
+	rte_common_ring_list_dump(f);
+}
+
+struct rte_ring *
+rte_ring_lookup(const char *name)
+{
+	return rte_common_ring_lookup(name);
+}
+
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
deleted file mode 120000
index 54dad23..0000000
--- a/lib/librte_ring/rte_ring.h
+++ /dev/null
@@ -1 +0,0 @@
-rte_common_ring.h
\ No newline at end of file
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
new file mode 100644
index 0000000..993796f
--- /dev/null
+++ b/lib/librte_ring/rte_ring.h
@@ -0,0 +1,691 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_RING_H_
+#define _RTE_RING_H_
+
+/**
+ * @file
+ * RTE Ring
+ *
+ * The Ring Manager is a fixed-size queue, implemented as a table of
+ * pointers. Head and tail pointers are modified atomically, allowing
+ * concurrent access to it. It has the following features:
+ *
+ * - FIFO (First In First Out)
+ * - Maximum size is fixed; the pointers are stored in a table.
+ * - Lockless implementation.
+ * - Multi- or single-consumer dequeue.
+ * - Multi- or single-producer enqueue.
+ * - Bulk dequeue.
+ * - Bulk enqueue.
+ *
+ * Note: the ring implementation is not preemptable. A lcore must not
+ * be interrupted by another task that uses the same ring.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_common_ring.h"
+
+/**
+ * Calculate the memory size needed for a ring
+ *
+ * This function returns the number of bytes needed for a ring, given
+ * the number of elements in it. This value is the sum of the size of
+ * the structure rte_ring and the size of the memory needed by the
+ * objects pointers. The value is aligned to a cache line size.
+ *
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @return
+ *   - The memory size needed for the ring on success.
+ *   - -EINVAL if count is not a power of 2.
+ */
+ssize_t rte_ring_get_memsize(unsigned count);
+
+/**
+ * Initialize a ring structure.
+ *
+ * Initialize a ring structure in memory pointed by "r". The size of the
+ * memory area must be large enough to store the ring structure and the
+ * object table. It is advised to use rte_ring_get_memsize() to get the
+ * appropriate size.
+ *
+ * The ring size is set to *count*, which must be a power of two. Water
+ * marking is disabled by default. The real usable ring size is
+ * *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is not added in RTE_TAILQ_RING global list. Indeed, the
+ * memory given by the caller may not be shareable among dpdk
+ * processes.
+ *
+ * @param r
+ *   The pointer to the ring structure followed by the objects table.
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The number of elements in the ring (must be a power of 2).
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   0 on success, or a negative value on error.
+ */
+int rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
+	unsigned flags);
+
+/**
+ * Create a new ring named *name* in memory.
+ *
+ * This function uses ``memzone_reserve()`` to allocate memory. Then it
+ * calls rte_ring_init() to initialize an empty ring.
+ *
+ * The new ring size is set to *count*, which must be a power of
+ * two. Water marking is disabled by default. The real usable ring size
+ * is *count-1* instead of *count* to differentiate a free ring from an
+ * empty ring.
+ *
+ * The ring is added in RTE_TAILQ_RING list.
+ *
+ * @param name
+ *   The name of the ring.
+ * @param count
+ *   The size of the ring (must be a power of 2).
+ * @param socket_id
+ *   The *socket_id* argument is the socket identifier in case of
+ *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
+ *   constraint for the reserved zone.
+ * @param flags
+ *   An OR of the following:
+ *    - RING_F_SP_ENQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_enqueue()`` or ``rte_ring_enqueue_bulk()``
+ *      is "single-producer". Otherwise, it is "multi-producers".
+ *    - RING_F_SC_DEQ: If this flag is set, the default behavior when
+ *      using ``rte_ring_dequeue()`` or ``rte_ring_dequeue_bulk()``
+ *      is "single-consumer". Otherwise, it is "multi-consumers".
+ * @return
+ *   On success, the pointer to the new allocated ring. NULL on error with
+ *    rte_errno set appropriately. Possible errno values include:
+ *    - E_RTE_NO_CONFIG - function could not get pointer to rte_config structure
+ *    - E_RTE_SECONDARY - function was called from a secondary process instance
+ *    - EINVAL - count provided is not a power of 2
+ *    - ENOSPC - the maximum number of memzones has already been allocated
+ *    - EEXIST - a memzone with the same name already exists
+ *    - ENOMEM - no appropriate memory area found in which to create memzone
+ */
+struct rte_ring *rte_ring_create(const char *name, unsigned count,
+				 int socket_id, unsigned flags);
+/**
+ * De-allocate all memory used by the ring.
+ *
+ * @param r
+ *   Ring to free
+ */
+void rte_ring_free(struct rte_ring *r);
+
+/**
+ * Change the high water mark.
+ *
+ * If *count* is 0, water marking is disabled. Otherwise, it is set to the
+ * *count* value. The *count* value must be greater than 0 and less
+ * than the ring size.
+ *
+ * This function can be called at any time (not necessarily at
+ * initialization).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param count
+ *   The new water mark value.
+ * @return
+ *   - 0: Success; water mark changed.
+ *   - -EINVAL: Invalid water mark value.
+ */
+int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
+
+/**
+ * Dump the status of the ring to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param r
+ *   A pointer to the ring structure.
+ */
+void rte_ring_dump(FILE *f, const struct rte_ring *r);
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueue.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned n)
+{
+	return __rte_common_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+			 unsigned n)
+{
+	return __rte_common_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
+		      unsigned n)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue_bulk(r, obj_table, n);
+	else
+		return rte_ring_mp_enqueue_bulk(r, obj_table, n);
+}
+
+/**
+ * Enqueue one object on a ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
+{
+	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+}
+
+/**
+ * Enqueue one object on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
+{
+	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+}
+
+/**
+ * Enqueue one object on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj
+ *   A pointer to the object to be added.
+ * @return
+ *   - 0: Success; objects enqueued.
+ *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
+ *     high water mark is exceeded.
+ *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_enqueue(struct rte_ring *r, void *obj)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue(r, obj);
+	else
+		return rte_ring_mp_enqueue(r, obj);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	return __rte_common_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table,
+ *   must be strictly positive.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	return __rte_common_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+}
+
+/**
+ * Dequeue several objects from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+	else
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+}
+
+/**
+ * Dequeue one object from a ring (multi-consumers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
+{
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+}
+
+/**
+ * Dequeue one object from a ring (NOT multi-consumers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success; objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
+{
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+}
+
+/**
+ * Dequeue one object from a ring.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_p
+ *   A pointer to a void * pointer (object) that will be filled.
+ * @return
+ *   - 0: Success, objects dequeued.
+ *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
+ *     dequeued.
+ */
+static inline int __attribute__((always_inline))
+rte_ring_dequeue(struct rte_ring *r, void **obj_p)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue(r, obj_p);
+	else
+		return rte_ring_mc_dequeue(r, obj_p);
+}
+
+/**
+ * Test if a ring is full.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   - 1: The ring is full.
+ *   - 0: The ring is not full.
+ */
+static inline int
+rte_ring_full(const struct rte_ring *r)
+{
+	return rte_common_ring_full(r);
+}
+
+/**
+ * Test if a ring is empty.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   - 1: The ring is empty.
+ *   - 0: The ring is not empty.
+ */
+static inline int
+rte_ring_empty(const struct rte_ring *r)
+{
+	return rte_common_ring_empty(r);
+}
+
+/**
+ * Return the number of entries in a ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The number of entries in the ring.
+ */
+static inline unsigned
+rte_ring_count(const struct rte_ring *r)
+{
+	return rte_common_ring_count(r);
+}
+
+/**
+ * Return the number of free entries in a ring.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @return
+ *   The number of free entries in the ring.
+ */
+static inline unsigned
+rte_ring_free_count(const struct rte_ring *r)
+{
+	return rte_common_ring_free_count(r);
+}
+
+/**
+ * Dump the status of all rings on the console
+ *
+ * @param f
+ *   A pointer to a file for output
+ */
+void rte_ring_list_dump(FILE *f);
+
+/**
+ * Search a ring from its name
+ *
+ * @param name
+ *   The name of the ring.
+ * @return
+ *   The pointer to the ring matching the name, or NULL if not found,
+ *   with rte_errno set appropriately. Possible rte_errno values include:
+ *    - ENOENT - required entry not available to return.
+ */
+struct rte_ring *rte_ring_lookup(const char *name);
+
+/**
+ * Enqueue several objects on the ring (multi-producers safe).
+ *
+ * This function uses a "compare and set" instruction to move the
+ * producer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned n)
+{
+	return __rte_common_ring_mp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Enqueue several objects on a ring (NOT multi-producers safe).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+			 unsigned n)
+{
+	return __rte_common_ring_sp_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Enqueue several objects on a ring.
+ *
+ * This function calls the multi-producer or the single-producer
+ * version depending on the default behavior that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects).
+ * @param n
+ *   The number of objects to add in the ring from the obj_table.
+ * @return
+ *   - n: Actual number of objects enqueued.
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
+		      unsigned n)
+{
+	if (r->prod.sp_enqueue)
+		return rte_ring_sp_enqueue_burst(r, obj_table, n);
+	else
+		return rte_ring_mp_enqueue_burst(r, obj_table, n);
+}
+
+/**
+ * Dequeue several objects from a ring (multi-consumers safe). When the request
+ * objects are more than the available objects, only dequeue the actual number
+ * of objects
+ *
+ * This function uses a "compare and set" instruction to move the
+ * consumer index atomically.
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	return __rte_common_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Dequeue several objects from a ring (NOT multi-consumers safe).When the
+ * request objects are more than the available objects, only dequeue the
+ * actual number of objects
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - n: Actual number of objects dequeued, 0 if ring is empty
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	return __rte_common_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+}
+
+/**
+ * Dequeue multiple objects from a ring up to a maximum number.
+ *
+ * This function calls the multi-consumers or the single-consumer
+ * version, depending on the default behaviour that was specified at
+ * ring creation time (see flags).
+ *
+ * @param r
+ *   A pointer to the ring structure.
+ * @param obj_table
+ *   A pointer to a table of void * pointers (objects) that will be filled.
+ * @param n
+ *   The number of objects to dequeue from the ring to the obj_table.
+ * @return
+ *   - Number of objects dequeued
+ */
+static inline unsigned __attribute__((always_inline))
+rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+{
+	if (r->cons.sc_dequeue)
+		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+	else
+		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_RING_H_ */
-- 
2.9.3

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio
  2017-01-24  7:34 17% ` [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio Jianfeng Tan
@ 2017-01-24 13:35  4%   ` Ferruh Yigit
  2017-01-30 17:52  4%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-01-24 13:35 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: john.mcnamara, yuanhan.liu, stephen, Thomas Monjalon

On 1/24/2017 7:34 AM, Jianfeng Tan wrote:
> We announced ABI changes to remove iomem and ioport mapping in
> igb_uio. But it has potential backward compatibility issue: cannot
> run old version DPDK on modified igb_uio.
> 
> The purpose of this changes was to fix a bug: when DPDK app crashes,
> those devices by igb_uio are not stopped either DPDK PMD driver or
> igb_uio driver. We need to figure out new way to fix this bug.

Hi Jianfeng,

I believe it would be good to fix this potential defect.

Is "remove iomem and ioport" a must for that fix? If so, I suggest
re-think about it.

If I see correctly, dpdk1.8 and older uses igb_uio iomem files. So
backward compatibility is the possible issue for dpdk1.8 and older.
Since v1.8 two years old, I would prefer fixing defect instead of
keeping that backward compatibility.

Jianfeng, Thomas,

What do you think postponing this deprecation notice to next release,
instead of removing it, and discuss more?


And overall, if "remove iomem and ioport" is not a must for this fix, no
problem to remove deprecation notice.

Thanks,
ferruh


> 
> Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from igb_uio")
> 
> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 755dc65..0f039dd 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,11 +8,6 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
>  
> -* igb_uio: iomem mapping and sysfs files created for iomem and ioport in
> -  igb_uio will be removed, because we are able to detect these from what Linux
> -  has exposed, like the way we have done with uio-pci-generic. This change
> -  targets release 17.02.
> -
>  * ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
>    impacted because of introduction of a new ``rte_bus`` hierarchy. This would
>    also impact the way devices are identified by EAL. A bus-device-driver model
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v7 5/6] lib: added new library for latency stats
  @ 2017-01-24 15:24  3%               ` Olivier MATZ
  0 siblings, 0 replies; 200+ results
From: Olivier MATZ @ 2017-01-24 15:24 UTC (permalink / raw)
  To: Olivier Matz
  Cc: Jerin Jacob, Mcnamara, John, Horton, Remy, dev, Pattan, Reshma,
	Thomas Monjalon, Richardson, Bruce

On Wed, 18 Jan 2017 21:11:28 +0100, Olivier Matz
<olivier.matz@6wind.com> wrote:
> Hi guys,
> 
> On Tue, 17 Jan 2017 21:55:16 +0530, Jerin Jacob
> > Oliver,
> > 
> > Could you please suggest how to proceed further?
> >   
> 
> Sorry for the lack of response. I know people are waiting for
> me, but these days I have too many things to do at the same time, and
> it's difficult to find time.
> 
> In few words (I'll provide more detailed answers to the thread by
> friday): I expected to post the mbuf rework patchset for this release,
> which includes the structure changes (Jerin's patch for arm access,
> timestamp, port, nb_segs, refcnt changes). But the patchset is clearly
> not ready yet, it needs a rebase, and it lacks test.
> 
> Jerin, I know that you submitted your patch a long time ago, and I'm
> the only one to blame, please do not see any vendor preference in it.
> 
> I'll check friday what's the effective state of the patchset in my
> workspace. If I can extract a minimal patch that only change the
> structure, I'll send it for discussion. But from what I remember, the
> mbuf structure rework depends on changing the way we access the
> refcnt, so it can be moved to the 2nd cache line.
> 
> If that's not possible, I'll try propose some alternatives.

I just posted a mbuf RFC patchset [1]. I think it contains most
things that were mentioned on the ML. As checked with Thomas, it's too
late to have it included in 17.02.

I'll tend to agree with John that having the timestamp in the mbuf for
latency is not an ABI break, since it is added at the end of the
structure. So I won't oppose to add this field in the mbuf structure
for the release.

The mbuf rearm patch was not forgotten, but it took clearly too long to
be integrated. With the benefit of hindsight, it should have been
pushed without waiting the mbuf rework. Again, apologies for that, I
understand it's quite frustrating.

Anyway, tests or comments on my RFC patchset are welcome, so we can
integrate it at the beginning of the 17.05 cycle.

Regards,
Olivier

[1] http://dpdk.org/ml/archives/dev/2017-January/056187.html

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  @ 2017-01-25 13:20  3% ` Olivier MATZ
  2017-01-25 13:54  0%   ` Bruce Richardson
  2017-02-07 14:12  2% ` [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization Bruce Richardson
  2017-02-07 14:12  3% ` [dpdk-dev] [PATCH RFCv3 06/19] ring: eliminate duplication of size and mask fields Bruce Richardson
  2 siblings, 1 reply; 200+ results
From: Olivier MATZ @ 2017-01-25 13:20 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Wed, 25 Jan 2017 12:14:56 +0000, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> Hi all,
> 
> while looking at the rte_ring code, I'm wondering if we can simplify
> that a bit by removing some of the code it in that may not be used.
> Specifically:
> 
> * Does anyone use the NIC stats functionality for debugging? I've
>   certainly never seen it used, and it's presence makes the rest less
>   readable. Can it be dropped?

What do you call NIC stats? The stats that are enabled with
RTE_LIBRTE_RING_DEBUG?

If yes, I was recently thinking almost the same about mempool stats. The
need to enable stats at compilation makes them less usable. On the
other hand, I feel the mempool/ring stats may be useful, for instance
to check if mbufs are used from mempool cache, and not from common pool.

For mempool, my conclusion was:
- Enabling stats (debug) changes the ABI, because it adds a field in
  the structure, this is bad
- enabling stats is not the same than enabling debug, we should have 2
  different ifdefs
- if statistics don't cost a lot, they should be enabled by default,
  because it's a good debug tool (ex: have a stats for each access to
  common pool)

For the ring, in my opinion, the stats could be fully removed.


> * RTE_RING_PAUSE_REP_COUNT is set to be disabled at build time, and
>   so does anyone actually use this? Can it be dropped?

This option looks like a hack to use the ring in conditions where it
should no be used (preemptable threads). And having a compile-time
option for this kind of stuff is not in vogue ;)


> * Who uses the watermarks feature as is? I know we have a sample app
>   that uses it, but there are better ways I think to achieve the same
>   goal while simplifying the ring implementation. Rather than have a
> set watermark on enqueue, have both enqueue and dequeue functions
> return the number of free or used slots available in the ring (in
> case of enqueue, how many free there are, in case of dequeue, how
> many items are available). Easier to implement and far more useful to
> the app.

+1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-25 13:20  3% ` Olivier MATZ
@ 2017-01-25 13:54  0%   ` Bruce Richardson
    0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-01-25 13:54 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev

On Wed, Jan 25, 2017 at 02:20:52PM +0100, Olivier MATZ wrote:
> On Wed, 25 Jan 2017 12:14:56 +0000, Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > Hi all,
> > 
> > while looking at the rte_ring code, I'm wondering if we can simplify
> > that a bit by removing some of the code it in that may not be used.
> > Specifically:
> > 
> > * Does anyone use the NIC stats functionality for debugging? I've
> >   certainly never seen it used, and it's presence makes the rest less
> >   readable. Can it be dropped?
> 
> What do you call NIC stats? The stats that are enabled with
> RTE_LIBRTE_RING_DEBUG?

Yes. By NIC I meant ring. :-(
> 
> If yes, I was recently thinking almost the same about mempool stats. The
> need to enable stats at compilation makes them less usable. On the
> other hand, I feel the mempool/ring stats may be useful, for instance
> to check if mbufs are used from mempool cache, and not from common pool.
> 
> For mempool, my conclusion was:
> - Enabling stats (debug) changes the ABI, because it adds a field in
>   the structure, this is bad
> - enabling stats is not the same than enabling debug, we should have 2
>   different ifdefs
> - if statistics don't cost a lot, they should be enabled by default,
>   because it's a good debug tool (ex: have a stats for each access to
>   common pool)
> 
> For the ring, in my opinion, the stats could be fully removed.

That is my thinking too. For mempool, I'd wait to see the potential
performance hits before deciding whether or not to enable by default.
Having them run-time enabled may also be an option too - if the branches
get predicted properly, there should be little to no impact as we avoid
all the writes to the stats, which is likely to be where the biggest hit
is.

> 
> 
> > * RTE_RING_PAUSE_REP_COUNT is set to be disabled at build time, and
> >   so does anyone actually use this? Can it be dropped?
> 
> This option looks like a hack to use the ring in conditions where it
> should no be used (preemptable threads). And having a compile-time
> option for this kind of stuff is not in vogue ;)

Definitely agree. As well as being a compile time option, I also think
that it's the wrong way to solve the problem. If we want to break out of
a loop like that early, then we should look to do a non-blocking version
of the APIs with a subsequent tail update call. That way an app can
decide per-ring when to sleep or context switch, or can even to do other
work while it waits.
However, I wouldn't be in a rush to implement that without a compelling
use-case.

> 
> 
> > * Who uses the watermarks feature as is? I know we have a sample app
> >   that uses it, but there are better ways I think to achieve the same
> >   goal while simplifying the ring implementation. Rather than have a
> > set watermark on enqueue, have both enqueue and dequeue functions
> > return the number of free or used slots available in the ring (in
> > case of enqueue, how many free there are, in case of dequeue, how
> > many items are available). Easier to implement and far more useful to
> > the app.
> 
> +1
> 
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  @ 2017-01-25 15:59  3%       ` Wiles, Keith
  2017-01-25 16:57  3%         ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Wiles, Keith @ 2017-01-25 15:59 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: Olivier MATZ, dev



Sent from my iPhone

> On Jan 25, 2017, at 7:48 AM, Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
>> On Wed, Jan 25, 2017 at 01:54:04PM +0000, Bruce Richardson wrote:
>>> On Wed, Jan 25, 2017 at 02:20:52PM +0100, Olivier MATZ wrote:
>>> On Wed, 25 Jan 2017 12:14:56 +0000, Bruce Richardson
>>> <bruce.richardson@intel.com> wrote:
>>>> Hi all,
>>>> 
>>>> while looking at the rte_ring code, I'm wondering if we can simplify
>>>> that a bit by removing some of the code it in that may not be used.
>>>> Specifically:
>>>> 
>>>> * Does anyone use the NIC stats functionality for debugging? I've
>>>>  certainly never seen it used, and it's presence makes the rest less
>>>>  readable. Can it be dropped?
>>> 
>>> What do you call NIC stats? The stats that are enabled with
>>> RTE_LIBRTE_RING_DEBUG?
>> 
>> Yes. By NIC I meant ring. :-(
>>> 
> <snip>
>>> For the ring, in my opinion, the stats could be fully removed.
>> 
>> That is my thinking too. For mempool, I'd wait to see the potential
>> performance hits before deciding whether or not to enable by default.
>> Having them run-time enabled may also be an option too - if the branches
>> get predicted properly, there should be little to no impact as we avoid
>> all the writes to the stats, which is likely to be where the biggest hit
>> is.
>> 
>>> 
>>> 
>>>> * RTE_RING_PAUSE_REP_COUNT is set to be disabled at build time, and
>>>>  so does anyone actually use this? Can it be dropped?
>>> 
>>> This option looks like a hack to use the ring in conditions where it
>>> should no be used (preemptable threads). And having a compile-time
>>> option for this kind of stuff is not in vogue ;)
>> 
> <snip>
>>> 
>>> 
>>>> * Who uses the watermarks feature as is? I know we have a sample app
>>>>  that uses it, but there are better ways I think to achieve the same
>>>>  goal while simplifying the ring implementation. Rather than have a
>>>> set watermark on enqueue, have both enqueue and dequeue functions
>>>> return the number of free or used slots available in the ring (in
>>>> case of enqueue, how many free there are, in case of dequeue, how
>>>> many items are available). Easier to implement and far more useful to
>>>> the app.
>>> 
>>> +1
>>> 
> Bonus question:
> * Do we know how widely used the enq_bulk/deq_bulk functions are? They
>  are useful for unit tests, so they do have uses, but I think it would
>  be good if we harmonized the return values between bulk and burst
>  functions. Right now:
>    enq_bulk  - only enqueues all elements or none. Returns 0 for all, or
>                negative error for none.
>    enq_burst - enqueues as many elements as possible. Returns the number
>                enqueued.

I do use the apis in pktgen and the difference in return values has got me once. Making them common would be great,  but the problem is backward compat to old versions I would need to have an ifdef in pktgen now. So it seems like we moved the problem to the application.

I would like to see the old API kept and a new API with the new behavior. I know it adds another API but one of the API would be nothing more than wrapper function if not a macro. 

Would that be more reasonable then changing the ABI?

>  I think it would be better if bulk and burst both returned the number
>  enqueued, and only differed in the case of the behaviour when not all
>  elements could be enqueued.
> 
>  That would mean an API change for enq_bulk, where it would return only
>  0 or N, rather than 0 or negative. While we can map one set of return
>  values to another inside the rte_ring library, I'm not sure I see a
>  good reason to keep the old behaviour except for backward compatibility.
>  Changing it makes it easier to switch between the two functions in
>  code, and avoids confusion as to what the return values could be. Is
>  it worth doing so? [My opinion is yes!]
> 
> 
> Regards,
> /Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-25 15:59  3%       ` Wiles, Keith
@ 2017-01-25 16:57  3%         ` Bruce Richardson
  2017-01-25 17:29  0%           ` Ananyev, Konstantin
  2017-01-25 22:27  0%           ` Wiles, Keith
  0 siblings, 2 replies; 200+ results
From: Bruce Richardson @ 2017-01-25 16:57 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: Olivier MATZ, dev

On Wed, Jan 25, 2017 at 03:59:55PM +0000, Wiles, Keith wrote:
> 
> 
> Sent from my iPhone
> 
> > On Jan 25, 2017, at 7:48 AM, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > 
> >> On Wed, Jan 25, 2017 at 01:54:04PM +0000, Bruce Richardson wrote:
> >>> On Wed, Jan 25, 2017 at 02:20:52PM +0100, Olivier MATZ wrote:
> >>> On Wed, 25 Jan 2017 12:14:56 +0000, Bruce Richardson
> >>> <bruce.richardson@intel.com> wrote:
> >>>> Hi all,
> >>>> 
> >>>> while looking at the rte_ring code, I'm wondering if we can simplify
> >>>> that a bit by removing some of the code it in that may not be used.
> >>>> Specifically:
> >>>> 
> >>>> * Does anyone use the NIC stats functionality for debugging? I've
> >>>>  certainly never seen it used, and it's presence makes the rest less
> >>>>  readable. Can it be dropped?
> >>> 
> >>> What do you call NIC stats? The stats that are enabled with
> >>> RTE_LIBRTE_RING_DEBUG?
> >> 
> >> Yes. By NIC I meant ring. :-(
> >>> 
> > <snip>
> >>> For the ring, in my opinion, the stats could be fully removed.
> >> 
> >> That is my thinking too. For mempool, I'd wait to see the potential
> >> performance hits before deciding whether or not to enable by default.
> >> Having them run-time enabled may also be an option too - if the branches
> >> get predicted properly, there should be little to no impact as we avoid
> >> all the writes to the stats, which is likely to be where the biggest hit
> >> is.
> >> 
> >>> 
> >>> 
> >>>> * RTE_RING_PAUSE_REP_COUNT is set to be disabled at build time, and
> >>>>  so does anyone actually use this? Can it be dropped?
> >>> 
> >>> This option looks like a hack to use the ring in conditions where it
> >>> should no be used (preemptable threads). And having a compile-time
> >>> option for this kind of stuff is not in vogue ;)
> >> 
> > <snip>
> >>> 
> >>> 
> >>>> * Who uses the watermarks feature as is? I know we have a sample app
> >>>>  that uses it, but there are better ways I think to achieve the same
> >>>>  goal while simplifying the ring implementation. Rather than have a
> >>>> set watermark on enqueue, have both enqueue and dequeue functions
> >>>> return the number of free or used slots available in the ring (in
> >>>> case of enqueue, how many free there are, in case of dequeue, how
> >>>> many items are available). Easier to implement and far more useful to
> >>>> the app.
> >>> 
> >>> +1
> >>> 
> > Bonus question:
> > * Do we know how widely used the enq_bulk/deq_bulk functions are? They
> >  are useful for unit tests, so they do have uses, but I think it would
> >  be good if we harmonized the return values between bulk and burst
> >  functions. Right now:
> >    enq_bulk  - only enqueues all elements or none. Returns 0 for all, or
> >                negative error for none.
> >    enq_burst - enqueues as many elements as possible. Returns the number
> >                enqueued.
> 
> I do use the apis in pktgen and the difference in return values has got me once. Making them common would be great,  but the problem is backward compat to old versions I would need to have an ifdef in pktgen now. So it seems like we moved the problem to the application.
> 

Yes, an ifdef would be needed, but how many versions of DPDK back do you
support? Could the ifdef be removed again after say, 6 months?

> I would like to see the old API kept and a new API with the new behavior. I know it adds another API but one of the API would be nothing more than wrapper function if not a macro. 
> 
> Would that be more reasonable then changing the ABI?

Technically, this would be an API rather than ABI change, since the
functions are inlined in the code. However, it's not the only API change
I'm looking to make here - I'd like to have all the functions start
returning details of the state of the ring, rather than have the
watermarks facility. If we add all new functions for this and keep the
old ones around, we are just increasing our maintenance burden.

I'd like other opinions here. Do we see increasing the API surface as
the best solution, or are we ok to change the APIs of a key library like
the rings one?

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-25 16:57  3%         ` Bruce Richardson
@ 2017-01-25 17:29  0%           ` Ananyev, Konstantin
  2017-01-31 10:53  0%             ` Olivier Matz
  2017-01-25 22:27  0%           ` Wiles, Keith
  1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2017-01-25 17:29 UTC (permalink / raw)
  To: Richardson, Bruce, Wiles, Keith; +Cc: Olivier MATZ, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, January 25, 2017 4:58 PM
> To: Wiles, Keith <keith.wiles@intel.com>
> Cc: Olivier MATZ <olivier.matz@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] rte_ring features in use (or not)
> 
> On Wed, Jan 25, 2017 at 03:59:55PM +0000, Wiles, Keith wrote:
> >
> >
> > Sent from my iPhone
> >
> > > On Jan 25, 2017, at 7:48 AM, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >
> > >> On Wed, Jan 25, 2017 at 01:54:04PM +0000, Bruce Richardson wrote:
> > >>> On Wed, Jan 25, 2017 at 02:20:52PM +0100, Olivier MATZ wrote:
> > >>> On Wed, 25 Jan 2017 12:14:56 +0000, Bruce Richardson
> > >>> <bruce.richardson@intel.com> wrote:
> > >>>> Hi all,
> > >>>>
> > >>>> while looking at the rte_ring code, I'm wondering if we can simplify
> > >>>> that a bit by removing some of the code it in that may not be used.
> > >>>> Specifically:
> > >>>>
> > >>>> * Does anyone use the NIC stats functionality for debugging? I've
> > >>>>  certainly never seen it used, and it's presence makes the rest less
> > >>>>  readable. Can it be dropped?
> > >>>
> > >>> What do you call NIC stats? The stats that are enabled with
> > >>> RTE_LIBRTE_RING_DEBUG?
> > >>
> > >> Yes. By NIC I meant ring. :-(
> > >>>
> > > <snip>
> > >>> For the ring, in my opinion, the stats could be fully removed.
> > >>
> > >> That is my thinking too. For mempool, I'd wait to see the potential
> > >> performance hits before deciding whether or not to enable by default.
> > >> Having them run-time enabled may also be an option too - if the branches
> > >> get predicted properly, there should be little to no impact as we avoid
> > >> all the writes to the stats, which is likely to be where the biggest hit
> > >> is.
> > >>
> > >>>
> > >>>
> > >>>> * RTE_RING_PAUSE_REP_COUNT is set to be disabled at build time, and
> > >>>>  so does anyone actually use this? Can it be dropped?
> > >>>
> > >>> This option looks like a hack to use the ring in conditions where it
> > >>> should no be used (preemptable threads). And having a compile-time
> > >>> option for this kind of stuff is not in vogue ;)
> > >>
> > > <snip>
> > >>>
> > >>>
> > >>>> * Who uses the watermarks feature as is? I know we have a sample app
> > >>>>  that uses it, but there are better ways I think to achieve the same
> > >>>>  goal while simplifying the ring implementation. Rather than have a
> > >>>> set watermark on enqueue, have both enqueue and dequeue functions
> > >>>> return the number of free or used slots available in the ring (in
> > >>>> case of enqueue, how many free there are, in case of dequeue, how
> > >>>> many items are available). Easier to implement and far more useful to
> > >>>> the app.
> > >>>
> > >>> +1
> > >>>
> > > Bonus question:
> > > * Do we know how widely used the enq_bulk/deq_bulk functions are? They
> > >  are useful for unit tests, so they do have uses, but I think it would
> > >  be good if we harmonized the return values between bulk and burst
> > >  functions. Right now:
> > >    enq_bulk  - only enqueues all elements or none. Returns 0 for all, or
> > >                negative error for none.
> > >    enq_burst - enqueues as many elements as possible. Returns the number
> > >                enqueued.
> >
> > I do use the apis in pktgen and the difference in return values has got me once. Making them common would be great,  but the problem is
> backward compat to old versions I would need to have an ifdef in pktgen now. So it seems like we moved the problem to the application.
> >
> 
> Yes, an ifdef would be needed, but how many versions of DPDK back do you
> support? Could the ifdef be removed again after say, 6 months?
> 
> > I would like to see the old API kept and a new API with the new behavior. I know it adds another API but one of the API would be nothing
> more than wrapper function if not a macro.
> >
> > Would that be more reasonable then changing the ABI?
> 
> Technically, this would be an API rather than ABI change, since the
> functions are inlined in the code. However, it's not the only API change
> I'm looking to make here - I'd like to have all the functions start
> returning details of the state of the ring, rather than have the
> watermarks facility. If we add all new functions for this and keep the
> old ones around, we are just increasing our maintenance burden.
> 
> I'd like other opinions here. Do we see increasing the API surface as
> the best solution, or are we ok to change the APIs of a key library like
> the rings one?

I am ok with changing API to make both _bulk and _burst return the same thing.
Konstantin 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-25 16:57  3%         ` Bruce Richardson
  2017-01-25 17:29  0%           ` Ananyev, Konstantin
@ 2017-01-25 22:27  0%           ` Wiles, Keith
  1 sibling, 0 replies; 200+ results
From: Wiles, Keith @ 2017-01-25 22:27 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: Olivier MATZ, dev


> On Jan 25, 2017, at 9:57 AM, Richardson, Bruce <bruce.richardson@intel.com> wrote:
> 
> On Wed, Jan 25, 2017 at 03:59:55PM +0000, Wiles, Keith wrote:
>> 
>> 
>> Sent from my iPhone
>> 
>>> On Jan 25, 2017, at 7:48 AM, Bruce Richardson <bruce.richardson@intel.com> wrote:
>>> 
>>>> On Wed, Jan 25, 2017 at 01:54:04PM +0000, Bruce Richardson wrote:
>>>>> On Wed, Jan 25, 2017 at 02:20:52PM +0100, Olivier MATZ wrote:
>>>>> On Wed, 25 Jan 2017 12:14:56 +0000, Bruce Richardson
>>>>> <bruce.richardson@intel.com> wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> while looking at the rte_ring code, I'm wondering if we can simplify
>>>>>> that a bit by removing some of the code it in that may not be used.
>>>>>> Specifically:
>>>>>> 
>>>>>> * Does anyone use the NIC stats functionality for debugging? I've
>>>>>> certainly never seen it used, and it's presence makes the rest less
>>>>>> readable. Can it be dropped?
>>>>> 
>>>>> What do you call NIC stats? The stats that are enabled with
>>>>> RTE_LIBRTE_RING_DEBUG?
>>>> 
>>>> Yes. By NIC I meant ring. :-(
>>>>> 
>>> <snip>
>>>>> For the ring, in my opinion, the stats could be fully removed.
>>>> 
>>>> That is my thinking too. For mempool, I'd wait to see the potential
>>>> performance hits before deciding whether or not to enable by default.
>>>> Having them run-time enabled may also be an option too - if the branches
>>>> get predicted properly, there should be little to no impact as we avoid
>>>> all the writes to the stats, which is likely to be where the biggest hit
>>>> is.
>>>> 
>>>>> 
>>>>> 
>>>>>> * RTE_RING_PAUSE_REP_COUNT is set to be disabled at build time, and
>>>>>> so does anyone actually use this? Can it be dropped?
>>>>> 
>>>>> This option looks like a hack to use the ring in conditions where it
>>>>> should no be used (preemptable threads). And having a compile-time
>>>>> option for this kind of stuff is not in vogue ;)
>>>> 
>>> <snip>
>>>>> 
>>>>> 
>>>>>> * Who uses the watermarks feature as is? I know we have a sample app
>>>>>> that uses it, but there are better ways I think to achieve the same
>>>>>> goal while simplifying the ring implementation. Rather than have a
>>>>>> set watermark on enqueue, have both enqueue and dequeue functions
>>>>>> return the number of free or used slots available in the ring (in
>>>>>> case of enqueue, how many free there are, in case of dequeue, how
>>>>>> many items are available). Easier to implement and far more useful to
>>>>>> the app.
>>>>> 
>>>>> +1
>>>>> 
>>> Bonus question:
>>> * Do we know how widely used the enq_bulk/deq_bulk functions are? They
>>> are useful for unit tests, so they do have uses, but I think it would
>>> be good if we harmonized the return values between bulk and burst
>>> functions. Right now:
>>>   enq_bulk  - only enqueues all elements or none. Returns 0 for all, or
>>>               negative error for none.
>>>   enq_burst - enqueues as many elements as possible. Returns the number
>>>               enqueued.
>> 
>> I do use the apis in pktgen and the difference in return values has got me once. Making them common would be great,  but the problem is backward compat to old versions I would need to have an ifdef in pktgen now. So it seems like we moved the problem to the application.
>> 
> 
> Yes, an ifdef would be needed, but how many versions of DPDK back do you
> support? Could the ifdef be removed again after say, 6 months?

I have people trying to run 2.1 and 2.2 versions of Pktgen. I can cut them off, but I would prefer not to.
> 
>> I would like to see the old API kept and a new API with the new behavior. I know it adds another API but one of the API would be nothing more than wrapper function if not a macro. 
>> 
>> Would that be more reasonable then changing the ABI?
> 
> Technically, this would be an API rather than ABI change, since the
> functions are inlined in the code. However, it's not the only API change
> I'm looking to make here - I'd like to have all the functions start
> returning details of the state of the ring, rather than have the
> watermarks facility. If we add all new functions for this and keep the
> old ones around, we are just increasing our maintenance burden.
> 
> I'd like other opinions here. Do we see increasing the API surface as
> the best solution, or are we ok to change the APIs of a key library like
> the rings one?
> 
> /Bruce

Regards,
Keith

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: add PMD specific API
@ 2017-01-27 12:27  4% Ferruh Yigit
  2017-01-30 17:57  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-01-27 12:27 UTC (permalink / raw)
  To: John McNamara, Helin Zhang; +Cc: dev, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 doc/api/doxy-api-index.md | 4 ++++
 doc/api/doxy-api.conf     | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index f9958c4..525d2e1 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -153,3 +153,7 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [version]            (@ref rte_version.h)
+
+- **PMD specific**:
+  [ixgbe]              (@ref rte_pmd_ixgbe.h),
+  [i40e]               (@ref rte_pmd_i40e.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 6892315..b8a5fd8 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -32,6 +32,8 @@ PROJECT_NAME            = DPDK
 INPUT                   = doc/api/doxy-api-index.md \
                           doc/api/examples.dox \
                           drivers/net/bonding \
+                          drivers/net/i40e \
+                          drivers/net/ixgbe \
                           lib/librte_eal/common/include \
                           lib/librte_eal/common/include/generic \
                           lib/librte_acl \
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  @ 2017-01-27 16:33  3%   ` Stephen Hemminger
  2017-01-27 16:47  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2017-01-27 16:33 UTC (permalink / raw)
  To: Aaron Conole; +Cc: dev

On Fri, 27 Jan 2017 09:57:03 -0500
Aaron Conole <aconole@redhat.com> wrote:

> diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> index 03fee50..46e427f 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
>   *     function call and should not be further interpreted by the
>   *     application.  The EAL does not take any ownership of the memory used
>   *     for either the argv array, or its members.
> - *   - On failure, a negative error value.
> + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> + *     for failure.
> + *
> + *   The error codes returned via rte_errno:
> + *     EACCES indicates a permissions issue.
> + *
> + *     EAGAIN indicates either a bus or system resource was not available,
> + *            try again.
> + *
> + *     EALREADY indicates that the rte_eal_init function has already been
> + *              called, and cannot be called again.
> + *
> + *     EINVAL indicates invalid parameters were passed as argv/argc.
> + *
> + *     EIO indicates failure to setup the logging handlers.  This is usually
> + *         caused by an out-of-memory condition.
> + *
> + *     ENODEV indicates memory setup issues.
> + *
> + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> + *
> + *     EUNATCH indicates that the PCI bus is either not present, or is not
> + *             readable by the eal.
>   */
>  int rte_eal_init(int argc, char **argv);

Why use rte_errno?
Most DPDK calls just return negative value on error which corresponds to error number.
Are you trying to keep ABI compatibility? Doesn't make sense because before all these
errors were panic's no working application is going to care.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-27 16:33  3%   ` Stephen Hemminger
@ 2017-01-27 16:47  0%     ` Bruce Richardson
  2017-01-27 17:37  0%       ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-01-27 16:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Aaron Conole, dev

On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> On Fri, 27 Jan 2017 09:57:03 -0500
> Aaron Conole <aconole@redhat.com> wrote:
> 
> > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > index 03fee50..46e427f 100644
> > --- a/lib/librte_eal/common/include/rte_eal.h
> > +++ b/lib/librte_eal/common/include/rte_eal.h
> > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> >   *     function call and should not be further interpreted by the
> >   *     application.  The EAL does not take any ownership of the memory used
> >   *     for either the argv array, or its members.
> > - *   - On failure, a negative error value.
> > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > + *     for failure.
> > + *
> > + *   The error codes returned via rte_errno:
> > + *     EACCES indicates a permissions issue.
> > + *
> > + *     EAGAIN indicates either a bus or system resource was not available,
> > + *            try again.
> > + *
> > + *     EALREADY indicates that the rte_eal_init function has already been
> > + *              called, and cannot be called again.
> > + *
> > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > + *
> > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > + *         caused by an out-of-memory condition.
> > + *
> > + *     ENODEV indicates memory setup issues.
> > + *
> > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > + *
> > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > + *             readable by the eal.
> >   */
> >  int rte_eal_init(int argc, char **argv);
> 
> Why use rte_errno?
> Most DPDK calls just return negative value on error which corresponds to error number.
> Are you trying to keep ABI compatibility? Doesn't make sense because before all these
> errors were panic's no working application is going to care.

Either will work, but I actually prefer this way. I view using rte_errno
to be better as it can work in just about all cases, including with
functions which return pointers. This allows you to have a standard
method across all functions for returning error codes, and it only
requires a single sentinal value to indicate error, rather than using a
whole range of values.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-27 16:47  0%     ` Bruce Richardson
@ 2017-01-27 17:37  0%       ` Stephen Hemminger
  2017-01-30 18:38  0%         ` Aaron Conole
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2017-01-27 17:37 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Aaron Conole, dev

On Fri, 27 Jan 2017 16:47:40 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> > On Fri, 27 Jan 2017 09:57:03 -0500
> > Aaron Conole <aconole@redhat.com> wrote:
> >   
> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > > index 03fee50..46e427f 100644
> > > --- a/lib/librte_eal/common/include/rte_eal.h
> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> > >   *     function call and should not be further interpreted by the
> > >   *     application.  The EAL does not take any ownership of the memory used
> > >   *     for either the argv array, or its members.
> > > - *   - On failure, a negative error value.
> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > > + *     for failure.
> > > + *
> > > + *   The error codes returned via rte_errno:
> > > + *     EACCES indicates a permissions issue.
> > > + *
> > > + *     EAGAIN indicates either a bus or system resource was not available,
> > > + *            try again.
> > > + *
> > > + *     EALREADY indicates that the rte_eal_init function has already been
> > > + *              called, and cannot be called again.
> > > + *
> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > > + *
> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > > + *         caused by an out-of-memory condition.
> > > + *
> > > + *     ENODEV indicates memory setup issues.
> > > + *
> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > > + *
> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > > + *             readable by the eal.
> > >   */
> > >  int rte_eal_init(int argc, char **argv);  
> > 
> > Why use rte_errno?
> > Most DPDK calls just return negative value on error which corresponds to error number.
> > Are you trying to keep ABI compatibility? Doesn't make sense because before all these
> > errors were panic's no working application is going to care.  
> 
> Either will work, but I actually prefer this way. I view using rte_errno
> to be better as it can work in just about all cases, including with
> functions which return pointers. This allows you to have a standard
> method across all functions for returning error codes, and it only
> requires a single sentinal value to indicate error, rather than using a
> whole range of values.

The problem is DPDK is getting more inconsistent on how this is done.
As long as error returns are always same as kernel/glibc errno's it really doesn't
matter much which way the value is returned from a technical point of view
but the inconsistency is sure to be a usability problem and source of errors.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 1/7] lib: add information metrics library
  @ 2017-01-30 15:50  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-01-30 15:50 UTC (permalink / raw)
  To: Remy Horton; +Cc: dev

Hi Remy,

> This patch adds a new information metric library that allows other
> modules to register named metrics and update their values. It is
> intended to be independent of ethdev, rather than mixing ethdev
> and non-ethdev information in xstats.

I'm still not convinced by this library, and this introduction does
not help a lot.

I would like to thanks Harry for the review of this series.
If we had more opinions or enthousiasm about this patch, it would
be easier to accept this new library and assert it is the way to go.

It could be a matter of technical board discussion if we had a clear
explanation of the needs, the pros and cons of this design.

The overview for using this library should be given in the prog guide.


2017-01-18 15:05, Remy Horton:
> --- a/config/common_base
> +++ b/config/common_base
> @@ -593,3 +593,8 @@ CONFIG_RTE_APP_TEST_RESOURCE_TAR=n
>  CONFIG_RTE_TEST_PMD=y
>  CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES=n
>  CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
> +
> +#
> +# Compile the device metrics library
> +#
> +CONFIG_RTE_LIBRTE_METRICS=y

I know the config file is not so well sorted.
However it would be a bit more logical below CONFIG_RTE_LIBRTE_JOBSTATS.

> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> index 72d59b2..94f0f69 100644
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -150,4 +150,5 @@ There are many libraries, so their headers may be grouped by topics:
>    [common]             (@ref rte_common.h),
>    [ABI compat]         (@ref rte_compat.h),
>    [keepalive]          (@ref rte_keepalive.h),
> +  [Device Metrics]     (@ref rte_metrics.h),

No first letter uppercase in this list.

> --- a/doc/guides/rel_notes/release_17_02.rst
> +++ b/doc/guides/rel_notes/release_17_02.rst
> @@ -34,6 +34,12 @@ New Features
>  
>       Refer to the previous release notes for examples.
>  
> +   * **Added information metric library.**
> +
> +     A library that allows information metrics to be added and update. It is

update -> updated

added and updated by who?

> +     intended to provide a reporting mechanism that is independent of the
> +     ethdev library.

and independent of the cryptodev library?
Does it apply to other types of devices (cryptodev/eventdev)?

> +
>       This section is a comment. do not overwrite or remove it.
>       Also, make sure to start the actual text at the margin.
>       =========================================================

Your text should start below this line, and indented at the margin.

> @@ -205,6 +211,7 @@ The libraries prepended with a plus sign were incremented in this version.
>  .. code-block:: diff
>  
>       librte_acl.so.2
> +   + librte_bitratestats.so.1

not part of this patch

> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -58,6 +58,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
>  DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
>  DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
>  DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
> +DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics

insert it below librte_jobstats is a better choice

> --- /dev/null
> +++ b/lib/librte_metrics/rte_metrics.c
> +/**
> + * Internal stats metadata and value entry.
> + *
> + * @internal
> + * @param name
> + *   Name of metric
> + * @param value
> + *   Current value for metric
> + * @param idx_next_set
> + *   Index of next root element (zero for none)
> + * @param idx_next_metric
> + *   Index of next metric in set (zero for none)
> + *
> + * Only the root of each set needs idx_next_set but since it has to be
> + * assumed that number of sets could equal total number of metrics,
> + * having a separate set metadata table doesn't save any memory.
> + */
> +struct rte_metrics_meta_s {
> +	char name[RTE_METRICS_MAX_NAME_LEN];
> +	uint64_t value[RTE_MAX_ETHPORTS];
> +	uint64_t nonport_value;
> +	uint16_t idx_next_set;
> +	uint16_t idx_next_stat;
> +};

It would be a lot easier to read with comments near each field.
It would avoid to forget some fields like nonport_value in this struct.
You do not need to use a doxygen syntax in a .c file.

> --- /dev/null
> +++ b/lib/librte_metrics/rte_metrics.h
> +/**
> + * @file
> + *
> + * RTE Metrics module

RTE is not meaningful here.
Please prefer DPDK.

> + *
> + * Metric information is populated using a push model, where the
> + * information provider calls an update function on the relevant
> + * metrics. Currently only bulk querying of metrics is supported.
> + */

This description should explain who is a provider (drivers?) and who
is the reader (registered thread?).
What do you mean by "push model"? A callback is used?

> +/**
> + * Global (rather than port-specific) metric.

It does not say what kind of constant it is. A special metric id?

> + *
> + * When used instead of port number by rte_metrics_update_metric()
> + * or rte_metrics_update_metric(), the global metrics, which are
> + * not associated with any specific port, are updated.
> + */
> +#define RTE_METRICS_GLOBAL -1

I thought you agreed that "port" is not really a good wording.

> +/**
> + * A name-key lookup for metrics.
> + *
> + * An array of this structure is returned by rte_metrics_get_names().
> + * The struct rte_eth_stats references these names via their array index.

rte_eth_stats?

> + */
> +struct rte_metric_name {
> +	/** String describing metric */
> +	char name[RTE_METRICS_MAX_NAME_LEN];
> +};
[...]
> +/**
> + * Metric value structure.
> + *
> + * This structure is used by rte_metrics_get_values() to return metrics,
> + * which are statistics that are not generated by PMDs. It maps a name key,

Here we have a definition of what is a metric:
"statistics that are not generated by PMDs"
It could help in the introduction.

> + * which corresponds to an index in the array returned by
> + * rte_metrics_get_names().
> + */
> +struct rte_metric_value {
> +	/** Numeric identifier of metric. */
> +	uint16_t key;
> +	/** Value for metric */
> +	uint64_t value;
> +};
> +
> +
> +/**
> + * Initializes metric module. This function must be called from
> + * a primary process before metrics are used.

Why not integrating it in the global init?
Is there some performance drawbacks?

> + *
> + * @param socket_id
> + *   Socket to use for shared memory allocation.
> + */
> +void rte_metrics_init(int socket_id);
> +
> +/**
> + * Register a metric, making it available as a reporting parameter.
> + *
> + * Registering a metric is the way third-parties declare a parameter

third-party? You mean the provider?

> + * that they wish to be reported. Once registered, the associated
> + * numeric key can be obtained via rte_metrics_get_names(), which
> + * is required for updating said metric's value.
> + *
> + * @param name
> + *   Metric name
> + *
> + * @return
> + *  - Zero or positive: Success (index key of new metric)
> + *  - \b -EIO: Error, unable to access metrics shared memory
> + *    (rte_metrics_init() not called)
> + *  - \b -EINVAL: Error, invalid parameters
> + *  - \b -ENOMEM: Error, maximum metrics reached

Please, no extra formatting in doxygen.

> + */
> +int rte_metrics_reg_name(const char *name);
> +

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio
  2017-01-24 13:35  4%   ` Ferruh Yigit
@ 2017-01-30 17:52  4%     ` Thomas Monjalon
  2017-02-01  7:24  4%       ` Tan, Jianfeng
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-01-30 17:52 UTC (permalink / raw)
  To: Ferruh Yigit, Jianfeng Tan; +Cc: dev, john.mcnamara, yuanhan.liu, stephen

2017-01-24 13:35, Ferruh Yigit:
> On 1/24/2017 7:34 AM, Jianfeng Tan wrote:
> > We announced ABI changes to remove iomem and ioport mapping in
> > igb_uio. But it has potential backward compatibility issue: cannot
> > run old version DPDK on modified igb_uio.
> > 
> > The purpose of this changes was to fix a bug: when DPDK app crashes,
> > those devices by igb_uio are not stopped either DPDK PMD driver or
> > igb_uio driver. We need to figure out new way to fix this bug.
> 
> Hi Jianfeng,
> 
> I believe it would be good to fix this potential defect.
> 
> Is "remove iomem and ioport" a must for that fix? If so, I suggest
> re-think about it.
> 
> If I see correctly, dpdk1.8 and older uses igb_uio iomem files. So
> backward compatibility is the possible issue for dpdk1.8 and older.
> Since v1.8 two years old, I would prefer fixing defect instead of
> keeping that backward compatibility.
> 
> Jianfeng, Thomas,
> 
> What do you think postponing this deprecation notice to next release,
> instead of removing it, and discuss more?
> 
> 
> And overall, if "remove iomem and ioport" is not a must for this fix, no
> problem to remove deprecation notice.

I have no strong opinion here.
Jianfeng, do you agree with Ferruh?

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add PMD specific API
  2017-01-27 12:27  4% [dpdk-dev] [PATCH] doc: add PMD specific API Ferruh Yigit
@ 2017-01-30 17:57  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-01-30 17:57 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: dev, John McNamara, Helin Zhang, Konstantin Ananyev, Jingjing Wu

2017-01-27 12:27, Ferruh Yigit:
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -153,3 +153,7 @@ There are many libraries, so their headers may be grouped by topics:
>    [ABI compat]         (@ref rte_compat.h),
>    [keepalive]          (@ref rte_keepalive.h),
>    [version]            (@ref rte_version.h)
> +
> +- **PMD specific**:
> +  [ixgbe]              (@ref rte_pmd_ixgbe.h),
> +  [i40e]               (@ref rte_pmd_i40e.h)

They could be grouped with bonding, vhost and KNI
in a section device-specific, below "device" section.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-27 17:37  0%       ` Stephen Hemminger
@ 2017-01-30 18:38  0%         ` Aaron Conole
  2017-01-30 20:19  0%           ` Thomas Monjalon
  2017-01-31  9:33  0%           ` Bruce Richardson
  0 siblings, 2 replies; 200+ results
From: Aaron Conole @ 2017-01-30 18:38 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Bruce Richardson, dev

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Fri, 27 Jan 2017 16:47:40 +0000
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
>> > On Fri, 27 Jan 2017 09:57:03 -0500
>> > Aaron Conole <aconole@redhat.com> wrote:
>> >   
>> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
>> > > index 03fee50..46e427f 100644
>> > > --- a/lib/librte_eal/common/include/rte_eal.h
>> > > +++ b/lib/librte_eal/common/include/rte_eal.h
>> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
>> > >   *     function call and should not be further interpreted by the
>> > >   *     application.  The EAL does not take any ownership of the memory used
>> > >   *     for either the argv array, or its members.
>> > > - *   - On failure, a negative error value.
>> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
>> > > + *     for failure.
>> > > + *
>> > > + *   The error codes returned via rte_errno:
>> > > + *     EACCES indicates a permissions issue.
>> > > + *
>> > > + *     EAGAIN indicates either a bus or system resource was not available,
>> > > + *            try again.
>> > > + *
>> > > + *     EALREADY indicates that the rte_eal_init function has already been
>> > > + *              called, and cannot be called again.
>> > > + *
>> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
>> > > + *
>> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
>> > > + *         caused by an out-of-memory condition.
>> > > + *
>> > > + *     ENODEV indicates memory setup issues.
>> > > + *
>> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
>> > > + *
>> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
>> > > + *             readable by the eal.
>> > >   */
>> > >  int rte_eal_init(int argc, char **argv);  
>> > 
>> > Why use rte_errno?
>> > Most DPDK calls just return negative value on error which
>> > corresponds to error number.
>> > Are you trying to keep ABI compatibility? Doesn't make sense
>> > because before all these
>> > errors were panic's no working application is going to care.  
>> 
>> Either will work, but I actually prefer this way. I view using rte_errno
>> to be better as it can work in just about all cases, including with
>> functions which return pointers. This allows you to have a standard
>> method across all functions for returning error codes, and it only
>> requires a single sentinal value to indicate error, rather than using a
>> whole range of values.
>
> The problem is DPDK is getting more inconsistent on how this is done.
> As long as error returns are always same as kernel/glibc errno's it really doesn't
> matter much which way the value is returned from a technical point of view
> but the inconsistency is sure to be a usability problem and source of errors.

I am using rte_errno here because I assumed it was the preferred
method.  In fact, looking at some recently contributed modules (for
instance pdump), it seems that folks are using it.

I'm not really sure the purpose of having rte_errno if it isn't used, so
it'd be helpful to know if there's some consensus on reflecting errors
via this variable, or on returning error codes.  Whichever is the more
consistent with the way the DPDK project does things, I'm game :).

Thanks for the thoughts, and review.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-30 18:38  0%         ` Aaron Conole
@ 2017-01-30 20:19  0%           ` Thomas Monjalon
  2017-02-01 10:54  3%             ` Adrien Mazarguil
  2017-01-31  9:33  0%           ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-01-30 20:19 UTC (permalink / raw)
  To: Aaron Conole, adrien.mazarguil; +Cc: dev, Stephen Hemminger, Bruce Richardson

2017-01-30 13:38, Aaron Conole:
> Stephen Hemminger <stephen@networkplumber.org> writes:
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > Why use rte_errno?
> >> > Most DPDK calls just return negative value on error which
> >> > corresponds to error number.
> >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > because before all these
> >> > errors were panic's no working application is going to care.  
> >> 
> >> Either will work, but I actually prefer this way. I view using rte_errno
> >> to be better as it can work in just about all cases, including with
> >> functions which return pointers. This allows you to have a standard
> >> method across all functions for returning error codes, and it only
> >> requires a single sentinal value to indicate error, rather than using a
> >> whole range of values.
> >
> > The problem is DPDK is getting more inconsistent on how this is done.
> > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > matter much which way the value is returned from a technical point of view
> > but the inconsistency is sure to be a usability problem and source of errors.
> 
> I am using rte_errno here because I assumed it was the preferred
> method.  In fact, looking at some recently contributed modules (for
> instance pdump), it seems that folks are using it.
> 
> I'm not really sure the purpose of having rte_errno if it isn't used, so
> it'd be helpful to know if there's some consensus on reflecting errors
> via this variable, or on returning error codes.  Whichever is the more
> consistent with the way the DPDK project does things, I'm game :).

I think we can use both return value and rte_errno.
We could try to enforce rte_errno as mandatory everywhere.

Adrien did the recent rte_flow API.
Please Adrien, could you give your thought?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-30 18:38  0%         ` Aaron Conole
  2017-01-30 20:19  0%           ` Thomas Monjalon
@ 2017-01-31  9:33  0%           ` Bruce Richardson
  2017-01-31 16:56  0%             ` Stephen Hemminger
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2017-01-31  9:33 UTC (permalink / raw)
  To: Aaron Conole; +Cc: Stephen Hemminger, dev

On Mon, Jan 30, 2017 at 01:38:00PM -0500, Aaron Conole wrote:
> Stephen Hemminger <stephen@networkplumber.org> writes:
> 
> > On Fri, 27 Jan 2017 16:47:40 +0000
> > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >
> >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > On Fri, 27 Jan 2017 09:57:03 -0500
> >> > Aaron Conole <aconole@redhat.com> wrote:
> >> >   
> >> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> >> > > index 03fee50..46e427f 100644
> >> > > --- a/lib/librte_eal/common/include/rte_eal.h
> >> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> >> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> >> > >   *     function call and should not be further interpreted by the
> >> > >   *     application.  The EAL does not take any ownership of the memory used
> >> > >   *     for either the argv array, or its members.
> >> > > - *   - On failure, a negative error value.
> >> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> >> > > + *     for failure.
> >> > > + *
> >> > > + *   The error codes returned via rte_errno:
> >> > > + *     EACCES indicates a permissions issue.
> >> > > + *
> >> > > + *     EAGAIN indicates either a bus or system resource was not available,
> >> > > + *            try again.
> >> > > + *
> >> > > + *     EALREADY indicates that the rte_eal_init function has already been
> >> > > + *              called, and cannot be called again.
> >> > > + *
> >> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> >> > > + *
> >> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> >> > > + *         caused by an out-of-memory condition.
> >> > > + *
> >> > > + *     ENODEV indicates memory setup issues.
> >> > > + *
> >> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> >> > > + *
> >> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> >> > > + *             readable by the eal.
> >> > >   */
> >> > >  int rte_eal_init(int argc, char **argv);  
> >> > 
> >> > Why use rte_errno?
> >> > Most DPDK calls just return negative value on error which
> >> > corresponds to error number.
> >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > because before all these
> >> > errors were panic's no working application is going to care.  
> >> 
> >> Either will work, but I actually prefer this way. I view using rte_errno
> >> to be better as it can work in just about all cases, including with
> >> functions which return pointers. This allows you to have a standard
> >> method across all functions for returning error codes, and it only
> >> requires a single sentinal value to indicate error, rather than using a
> >> whole range of values.
> >
> > The problem is DPDK is getting more inconsistent on how this is done.
> > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > matter much which way the value is returned from a technical point of view
> > but the inconsistency is sure to be a usability problem and source of errors.
> 
> I am using rte_errno here because I assumed it was the preferred
> method.  In fact, looking at some recently contributed modules (for
> instance pdump), it seems that folks are using it.
> 
> I'm not really sure the purpose of having rte_errno if it isn't used, so
> it'd be helpful to know if there's some consensus on reflecting errors
> via this variable, or on returning error codes.  Whichever is the more
> consistent with the way the DPDK project does things, I'm game :).
> 
Unfortunately, this is one area where DPDK is inconsistent, and both
schemes are widely used. I much prefer using the rte_errno method, but
returning error codes directly is also common in DPDK.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-25 17:29  0%           ` Ananyev, Konstantin
@ 2017-01-31 10:53  0%             ` Olivier Matz
  2017-01-31 11:41  0%               ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2017-01-31 10:53 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: Richardson, Bruce, Wiles, Keith, dev

On Wed, 25 Jan 2017 17:29:18 +0000, "Ananyev, Konstantin"
<konstantin.ananyev@intel.com> wrote:
> > > > Bonus question:
> > > > * Do we know how widely used the enq_bulk/deq_bulk functions
> > > > are? They are useful for unit tests, so they do have uses, but
> > > > I think it would be good if we harmonized the return values
> > > > between bulk and burst functions. Right now:
> > > >    enq_bulk  - only enqueues all elements or none. Returns 0
> > > > for all, or negative error for none.
> > > >    enq_burst - enqueues as many elements as possible. Returns
> > > > the number enqueued.  
> > >
> > > I do use the apis in pktgen and the difference in return values
> > > has got me once. Making them common would be great,  but the
> > > problem is  
> > backward compat to old versions I would need to have an ifdef in
> > pktgen now. So it seems like we moved the problem to the
> > application.  
> > >  
> > 
> > Yes, an ifdef would be needed, but how many versions of DPDK back
> > do you support? Could the ifdef be removed again after say, 6
> > months? 
> > > I would like to see the old API kept and a new API with the new
> > > behavior. I know it adds another API but one of the API would be
> > > nothing  
> > more than wrapper function if not a macro.  
> > >
> > > Would that be more reasonable then changing the ABI?  
> > 
> > Technically, this would be an API rather than ABI change, since the
> > functions are inlined in the code. However, it's not the only API
> > change I'm looking to make here - I'd like to have all the
> > functions start returning details of the state of the ring, rather
> > than have the watermarks facility. If we add all new functions for
> > this and keep the old ones around, we are just increasing our
> > maintenance burden.
> > 
> > I'd like other opinions here. Do we see increasing the API surface
> > as the best solution, or are we ok to change the APIs of a key
> > library like the rings one?  
> 
> I am ok with changing API to make both _bulk and _burst return the
> same thing. Konstantin 

I agree that the _bulk() functions returning 0 or -err can be confusing.
But it has at least one advantage: it explicitly shows that if user ask
for N enqueues/dequeues, it will either get N or 0, not something
between.

Changing the API of the existing _bulk() functions looks a bit
dangerous to me. There's probably a lot of code relying on the ring
API, and changing its behavior may break it.

I'd prefer to deprecate the old _bulk and _burst functions, and
introduce a new api, maybe something like:

  rte_ring_generic_dequeue(ring, objs, n, behavior, flags)
  -> return nb_objs or -err


Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-31 10:53  0%             ` Olivier Matz
@ 2017-01-31 11:41  0%               ` Bruce Richardson
  2017-01-31 12:10  0%                 ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-01-31 11:41 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Ananyev, Konstantin, Wiles, Keith, dev

On Tue, Jan 31, 2017 at 11:53:49AM +0100, Olivier Matz wrote:
> On Wed, 25 Jan 2017 17:29:18 +0000, "Ananyev, Konstantin"
> <konstantin.ananyev@intel.com> wrote:
> > > > > Bonus question:
> > > > > * Do we know how widely used the enq_bulk/deq_bulk functions
> > > > > are? They are useful for unit tests, so they do have uses, but
> > > > > I think it would be good if we harmonized the return values
> > > > > between bulk and burst functions. Right now:
> > > > >    enq_bulk  - only enqueues all elements or none. Returns 0
> > > > > for all, or negative error for none.
> > > > >    enq_burst - enqueues as many elements as possible. Returns
> > > > > the number enqueued.  
> > > >
> > > > I do use the apis in pktgen and the difference in return values
> > > > has got me once. Making them common would be great,  but the
> > > > problem is  
> > > backward compat to old versions I would need to have an ifdef in
> > > pktgen now. So it seems like we moved the problem to the
> > > application.  
> > > >  
> > > 
> > > Yes, an ifdef would be needed, but how many versions of DPDK back
> > > do you support? Could the ifdef be removed again after say, 6
> > > months? 
> > > > I would like to see the old API kept and a new API with the new
> > > > behavior. I know it adds another API but one of the API would be
> > > > nothing  
> > > more than wrapper function if not a macro.  
> > > >
> > > > Would that be more reasonable then changing the ABI?  
> > > 
> > > Technically, this would be an API rather than ABI change, since the
> > > functions are inlined in the code. However, it's not the only API
> > > change I'm looking to make here - I'd like to have all the
> > > functions start returning details of the state of the ring, rather
> > > than have the watermarks facility. If we add all new functions for
> > > this and keep the old ones around, we are just increasing our
> > > maintenance burden.
> > > 
> > > I'd like other opinions here. Do we see increasing the API surface
> > > as the best solution, or are we ok to change the APIs of a key
> > > library like the rings one?  
> > 
> > I am ok with changing API to make both _bulk and _burst return the
> > same thing. Konstantin 
> 
> I agree that the _bulk() functions returning 0 or -err can be confusing.
> But it has at least one advantage: it explicitly shows that if user ask
> for N enqueues/dequeues, it will either get N or 0, not something
> between.
> 
> Changing the API of the existing _bulk() functions looks a bit
> dangerous to me. There's probably a lot of code relying on the ring
> API, and changing its behavior may break it.
> 
> I'd prefer to deprecate the old _bulk and _burst functions, and
> introduce a new api, maybe something like:
> 
>   rte_ring_generic_dequeue(ring, objs, n, behavior, flags)
>   -> return nb_objs or -err
> 
Don't like the -err, since it's not a valid value that can be used e.g.
in simple loops in the case that the user doesn't care about the exact
reason for error. I prefer having zero returned on error, with rte_errno
set appropriately, since then it is trivial for apps to ignore error
values they don't care about.
It also makes the APIs in a ring library consistent in that all will set
rte_errno on error, rather than returning the error code. It's not right
for rte_ring_create and rte_ring_lookup to return an error code since
they return pointers, not integer values.

As for deprecating the functions - I'm not sure about that. I think the
names of the existing functions are ok, and should be kept. I've a new
patchset of cleanups for rte_rings in the works. Let me try and finish
that and send it out as an RFC and we'll see what you think then.

Regards,
/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-31 11:41  0%               ` Bruce Richardson
@ 2017-01-31 12:10  0%                 ` Bruce Richardson
  2017-01-31 13:27  0%                   ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-01-31 12:10 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Ananyev, Konstantin, Wiles, Keith, dev

On Tue, Jan 31, 2017 at 11:41:42AM +0000, Bruce Richardson wrote:
> On Tue, Jan 31, 2017 at 11:53:49AM +0100, Olivier Matz wrote:
> > On Wed, 25 Jan 2017 17:29:18 +0000, "Ananyev, Konstantin"
> > <konstantin.ananyev@intel.com> wrote:
> > > > > > Bonus question:
> > > > > > * Do we know how widely used the enq_bulk/deq_bulk functions
> > > > > > are? They are useful for unit tests, so they do have uses, but
> > > > > > I think it would be good if we harmonized the return values
> > > > > > between bulk and burst functions. Right now:
> > > > > >    enq_bulk  - only enqueues all elements or none. Returns 0
> > > > > > for all, or negative error for none.
> > > > > >    enq_burst - enqueues as many elements as possible. Returns
> > > > > > the number enqueued.  
> > > > >
> > > > > I do use the apis in pktgen and the difference in return values
> > > > > has got me once. Making them common would be great,  but the
> > > > > problem is  
> > > > backward compat to old versions I would need to have an ifdef in
> > > > pktgen now. So it seems like we moved the problem to the
> > > > application.  
> > > > >  
> > > > 
> > > > Yes, an ifdef would be needed, but how many versions of DPDK back
> > > > do you support? Could the ifdef be removed again after say, 6
> > > > months? 
> > > > > I would like to see the old API kept and a new API with the new
> > > > > behavior. I know it adds another API but one of the API would be
> > > > > nothing  
> > > > more than wrapper function if not a macro.  
> > > > >
> > > > > Would that be more reasonable then changing the ABI?  
> > > > 
> > > > Technically, this would be an API rather than ABI change, since the
> > > > functions are inlined in the code. However, it's not the only API
> > > > change I'm looking to make here - I'd like to have all the
> > > > functions start returning details of the state of the ring, rather
> > > > than have the watermarks facility. If we add all new functions for
> > > > this and keep the old ones around, we are just increasing our
> > > > maintenance burden.
> > > > 
> > > > I'd like other opinions here. Do we see increasing the API surface
> > > > as the best solution, or are we ok to change the APIs of a key
> > > > library like the rings one?  
> > > 
> > > I am ok with changing API to make both _bulk and _burst return the
> > > same thing. Konstantin 
> > 
> > I agree that the _bulk() functions returning 0 or -err can be confusing.
> > But it has at least one advantage: it explicitly shows that if user ask
> > for N enqueues/dequeues, it will either get N or 0, not something
> > between.
> > 
> > Changing the API of the existing _bulk() functions looks a bit
> > dangerous to me. There's probably a lot of code relying on the ring
> > API, and changing its behavior may break it.
> > 
> > I'd prefer to deprecate the old _bulk and _burst functions, and
> > introduce a new api, maybe something like:
> > 
> >   rte_ring_generic_dequeue(ring, objs, n, behavior, flags)
> >   -> return nb_objs or -err
> > 
> Don't like the -err, since it's not a valid value that can be used e.g.
> in simple loops in the case that the user doesn't care about the exact
> reason for error. I prefer having zero returned on error, with rte_errno
> set appropriately, since then it is trivial for apps to ignore error
> values they don't care about.
> It also makes the APIs in a ring library consistent in that all will set
> rte_errno on error, rather than returning the error code. It's not right
> for rte_ring_create and rte_ring_lookup to return an error code since
> they return pointers, not integer values.
> 
> As for deprecating the functions - I'm not sure about that. I think the
> names of the existing functions are ok, and should be kept. I've a new
> patchset of cleanups for rte_rings in the works. Let me try and finish
> that and send it out as an RFC and we'll see what you think then.
> 
Sorry, I realised on re-reading this reply seemed overly negative,
sorry. I can actually see the case for deprecating both sets of
functions to allow us to "start afresh". If we do so, are we as well to
just replace the whole library with a new one, e.g. rte_fifo, which
would allow us the freedom to keep e.g. functions with "burst" in the
name if we so wish? If might also allow an easier transition.

Regards,
/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-31 12:10  0%                 ` Bruce Richardson
@ 2017-01-31 13:27  0%                   ` Olivier Matz
  2017-01-31 13:46  0%                     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2017-01-31 13:27 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Ananyev, Konstantin, Wiles, Keith, dev

On Tue, 31 Jan 2017 12:10:50 +0000, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> On Tue, Jan 31, 2017 at 11:41:42AM +0000, Bruce Richardson wrote:
> > On Tue, Jan 31, 2017 at 11:53:49AM +0100, Olivier Matz wrote:  
> > > On Wed, 25 Jan 2017 17:29:18 +0000, "Ananyev, Konstantin"
> > > <konstantin.ananyev@intel.com> wrote:  
> > > > > > > Bonus question:
> > > > > > > * Do we know how widely used the enq_bulk/deq_bulk
> > > > > > > functions are? They are useful for unit tests, so they do
> > > > > > > have uses, but I think it would be good if we harmonized
> > > > > > > the return values between bulk and burst functions. Right
> > > > > > > now: enq_bulk  - only enqueues all elements or none.
> > > > > > > Returns 0 for all, or negative error for none.
> > > > > > >    enq_burst - enqueues as many elements as possible.
> > > > > > > Returns the number enqueued.    
> > > > > >
> > > > > > I do use the apis in pktgen and the difference in return
> > > > > > values has got me once. Making them common would be great,
> > > > > > but the problem is    
> > > > > backward compat to old versions I would need to have an ifdef
> > > > > in pktgen now. So it seems like we moved the problem to the
> > > > > application.    
> > > > > >    
> > > > > 
> > > > > Yes, an ifdef would be needed, but how many versions of DPDK
> > > > > back do you support? Could the ifdef be removed again after
> > > > > say, 6 months?   
> > > > > > I would like to see the old API kept and a new API with the
> > > > > > new behavior. I know it adds another API but one of the API
> > > > > > would be nothing    
> > > > > more than wrapper function if not a macro.    
> > > > > >
> > > > > > Would that be more reasonable then changing the ABI?    
> > > > > 
> > > > > Technically, this would be an API rather than ABI change,
> > > > > since the functions are inlined in the code. However, it's
> > > > > not the only API change I'm looking to make here - I'd like
> > > > > to have all the functions start returning details of the
> > > > > state of the ring, rather than have the watermarks facility.
> > > > > If we add all new functions for this and keep the old ones
> > > > > around, we are just increasing our maintenance burden.
> > > > > 
> > > > > I'd like other opinions here. Do we see increasing the API
> > > > > surface as the best solution, or are we ok to change the APIs
> > > > > of a key library like the rings one?    
> > > > 
> > > > I am ok with changing API to make both _bulk and _burst return
> > > > the same thing. Konstantin   
> > > 
> > > I agree that the _bulk() functions returning 0 or -err can be
> > > confusing. But it has at least one advantage: it explicitly shows
> > > that if user ask for N enqueues/dequeues, it will either get N or
> > > 0, not something between.
> > > 
> > > Changing the API of the existing _bulk() functions looks a bit
> > > dangerous to me. There's probably a lot of code relying on the
> > > ring API, and changing its behavior may break it.
> > > 
> > > I'd prefer to deprecate the old _bulk and _burst functions, and
> > > introduce a new api, maybe something like:
> > > 
> > >   rte_ring_generic_dequeue(ring, objs, n, behavior, flags)  
> > >   -> return nb_objs or -err  
> > >   
> > Don't like the -err, since it's not a valid value that can be used
> > e.g. in simple loops in the case that the user doesn't care about
> > the exact reason for error. I prefer having zero returned on error,
> > with rte_errno set appropriately, since then it is trivial for apps
> > to ignore error values they don't care about.
> > It also makes the APIs in a ring library consistent in that all
> > will set rte_errno on error, rather than returning the error code.
> > It's not right for rte_ring_create and rte_ring_lookup to return an
> > error code since they return pointers, not integer values.

My assumption was that functions returning an int should return an
error instead of rte_errno. By the way, it's actually the same debate
than http://dpdk.org/ml/archives/dev/2017-January/056546.html

In that particular case, I'm not convinced that this code:

	ret = ring_dequeue(r, objs, n);
	if (ret == 0) {
		/* handle error in rte_errno */
		return;
	}
	do_stuff_with_elements(objs, ret);

Is better/faster/clearer than this one:

	ret = ring_dequeue(r, objs, n);
	if (ret <= 0) {
		/* handle error in ret */
		return;
	}
	do_stuff_with_elements(objs, ret);


In the first case, you could argue that the "if (ret)" part could be
stripped if the app does not care about errors, but I think it's not
efficient to call the next function with 0 object. Also, this if() does
not necessarily adds a test since ring_dequeue() is inline.

In the first case, ring_dequeue needs to write rte_errno in memory on
error (because it's a global variable), even if the caller does not
look at it. In the second case, it can stay in a register.


> > 
> > As for deprecating the functions - I'm not sure about that. I think
> > the names of the existing functions are ok, and should be kept.
> > I've a new patchset of cleanups for rte_rings in the works. Let me
> > try and finish that and send it out as an RFC and we'll see what
> > you think then. 
> Sorry, I realised on re-reading this reply seemed overly negative,
> sorry.

haha, no problem :)


> I can actually see the case for deprecating both sets of
> functions to allow us to "start afresh". If we do so, are we as well
> to just replace the whole library with a new one, e.g. rte_fifo, which
> would allow us the freedom to keep e.g. functions with "burst" in the
> name if we so wish? If might also allow an easier transition.

Yes, that's also an option.

My fear is about changing the API of such widely used functions,
without triggering any compilation error because the prototypes stays
the same.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] rte_ring features in use (or not)
  2017-01-31 13:27  0%                   ` Olivier Matz
@ 2017-01-31 13:46  0%                     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-01-31 13:46 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Ananyev, Konstantin, Wiles, Keith, dev

On Tue, Jan 31, 2017 at 02:27:18PM +0100, Olivier Matz wrote:
> On Tue, 31 Jan 2017 12:10:50 +0000, Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > On Tue, Jan 31, 2017 at 11:41:42AM +0000, Bruce Richardson wrote:
> > > On Tue, Jan 31, 2017 at 11:53:49AM +0100, Olivier Matz wrote:  
> > > > On Wed, 25 Jan 2017 17:29:18 +0000, "Ananyev, Konstantin"
> > > > <konstantin.ananyev@intel.com> wrote:  
> > > > > > > > Bonus question:
> > > > > > > > * Do we know how widely used the enq_bulk/deq_bulk
> > > > > > > > functions are? They are useful for unit tests, so they do
> > > > > > > > have uses, but I think it would be good if we harmonized
> > > > > > > > the return values between bulk and burst functions. Right
> > > > > > > > now: enq_bulk  - only enqueues all elements or none.
> > > > > > > > Returns 0 for all, or negative error for none.
> > > > > > > >    enq_burst - enqueues as many elements as possible.
> > > > > > > > Returns the number enqueued.    
> > > > > > >
> > > > > > > I do use the apis in pktgen and the difference in return
> > > > > > > values has got me once. Making them common would be great,
> > > > > > > but the problem is    
> > > > > > backward compat to old versions I would need to have an ifdef
> > > > > > in pktgen now. So it seems like we moved the problem to the
> > > > > > application.    
> > > > > > >    
> > > > > > 
> > > > > > Yes, an ifdef would be needed, but how many versions of DPDK
> > > > > > back do you support? Could the ifdef be removed again after
> > > > > > say, 6 months?   
> > > > > > > I would like to see the old API kept and a new API with the
> > > > > > > new behavior. I know it adds another API but one of the API
> > > > > > > would be nothing    
> > > > > > more than wrapper function if not a macro.    
> > > > > > >
> > > > > > > Would that be more reasonable then changing the ABI?    
> > > > > > 
> > > > > > Technically, this would be an API rather than ABI change,
> > > > > > since the functions are inlined in the code. However, it's
> > > > > > not the only API change I'm looking to make here - I'd like
> > > > > > to have all the functions start returning details of the
> > > > > > state of the ring, rather than have the watermarks facility.
> > > > > > If we add all new functions for this and keep the old ones
> > > > > > around, we are just increasing our maintenance burden.
> > > > > > 
> > > > > > I'd like other opinions here. Do we see increasing the API
> > > > > > surface as the best solution, or are we ok to change the APIs
> > > > > > of a key library like the rings one?    
> > > > > 
> > > > > I am ok with changing API to make both _bulk and _burst return
> > > > > the same thing. Konstantin   
> > > > 
> > > > I agree that the _bulk() functions returning 0 or -err can be
> > > > confusing. But it has at least one advantage: it explicitly shows
> > > > that if user ask for N enqueues/dequeues, it will either get N or
> > > > 0, not something between.
> > > > 
> > > > Changing the API of the existing _bulk() functions looks a bit
> > > > dangerous to me. There's probably a lot of code relying on the
> > > > ring API, and changing its behavior may break it.
> > > > 
> > > > I'd prefer to deprecate the old _bulk and _burst functions, and
> > > > introduce a new api, maybe something like:
> > > > 
> > > >   rte_ring_generic_dequeue(ring, objs, n, behavior, flags)  
> > > >   -> return nb_objs or -err  
> > > >   
> > > Don't like the -err, since it's not a valid value that can be used
> > > e.g. in simple loops in the case that the user doesn't care about
> > > the exact reason for error. I prefer having zero returned on error,
> > > with rte_errno set appropriately, since then it is trivial for apps
> > > to ignore error values they don't care about.
> > > It also makes the APIs in a ring library consistent in that all
> > > will set rte_errno on error, rather than returning the error code.
> > > It's not right for rte_ring_create and rte_ring_lookup to return an
> > > error code since they return pointers, not integer values.
> 
> My assumption was that functions returning an int should return an
> error instead of rte_errno. By the way, it's actually the same debate
> than http://dpdk.org/ml/archives/dev/2017-January/056546.html
> 
> In that particular case, I'm not convinced that this code:
> 
> 	ret = ring_dequeue(r, objs, n);
> 	if (ret == 0) {
> 		/* handle error in rte_errno */
> 		return;
> 	}
> 	do_stuff_with_elements(objs, ret);
> 
> Is better/faster/clearer than this one:
> 
> 	ret = ring_dequeue(r, objs, n);
> 	if (ret <= 0) {
> 		/* handle error in ret */
> 		return;
> 	}
> 	do_stuff_with_elements(objs, ret);
> 
> 
> In the first case, you could argue that the "if (ret)" part could be
> stripped if the app does not care about errors, but I think it's not
> efficient to call the next function with 0 object. Also, this if() does
> not necessarily adds a test since ring_dequeue() is inline.
> 
> In the first case, ring_dequeue needs to write rte_errno in memory on
> error (because it's a global variable), even if the caller does not
> look at it. In the second case, it can stay in a register.
> 

I agree in many cases there is not a lot to choose between the two
methods.

However, I prefer the errno approach for 3 reasons:

1. Firstly, and primarily, it works in all cases, including for use with
  functions that return pointers. That allows a library like rte_ring to
  use it across all functions, rather than having some functions use an
  errno variable, or extra return value, and other functions return the
  error code directly.
2. It's how unix system calls work, so everyone is familiar with the
  scheme.
3. It allows the return value to be always in the valid domain of return
  values for the type. You can have dequeue functions that always return
  unsigned values, you can have functions that return enums etc. This
  means you can track stats and chain function calls without having to
  do error checking if you like: for example, moving packets between
  rings:
  	rte_ring_enqueue_burst(r2, objs, rte_ring_dequeue_burst(r1, objs, sizeof(objs));
  This is for me the least compelling of the reasons, but is still worth
  considering, and I do admit to liking the functional style of
  programming it allows.

> 
> > > 
> > > As for deprecating the functions - I'm not sure about that. I think
> > > the names of the existing functions are ok, and should be kept.
> > > I've a new patchset of cleanups for rte_rings in the works. Let me
> > > try and finish that and send it out as an RFC and we'll see what
> > > you think then. 
> > Sorry, I realised on re-reading this reply seemed overly negative,
> > sorry.
> 
> haha, no problem :)
> 
> 
> > I can actually see the case for deprecating both sets of
> > functions to allow us to "start afresh". If we do so, are we as well
> > to just replace the whole library with a new one, e.g. rte_fifo, which
> > would allow us the freedom to keep e.g. functions with "burst" in the
> > name if we so wish? If might also allow an easier transition.
> 
> Yes, that's also an option.
> 
> My fear is about changing the API of such widely used functions,
> without triggering any compilation error because the prototypes stays
> the same.
>
Don't worry, I also plan to change the prototypes! I want to add in an
extra parameter to each call to optionally return the amount of space in
the ring. This is can lead to large cycle savings - by avoiding extra
calls to rte_ring_count() or rte_ring_free_count() - in cases where it
is useful. We found that this lead to significant performance
improvements in the SW eventdev, as we had less ping-ponging of
cachelines between cores. Since we already know the amount of free space
in an enqueue call we can return that for free, while calling a separate
free_space API can lead to huge stalls if the cachelines have been
adjusted by another core in the meantime. [Yes, this does mean that the
value returned from an enqueue call would be inaccurate, but it can
still, in the case of SP rings, provide a guarantee that any number of
objects up to that number can be enqueued without error, since any
changes will only increase the number that can be enqueued]

The new APIs would therefore be:

	rte_ring_enqueue_bulk(r, objs, n, &free_space);

	rte_ring_dequeue_bulk(r, objs, n, &avail_objs);

This also would allow us to remove the watermarks feature, as the app
can itself check the free_space value against as many watermark
thresholds as it likes.
Hopefully I'll get an RFC with this change ready in the next few days
and we can base the discussion more on working code.

Regards,
/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-31  9:33  0%           ` Bruce Richardson
@ 2017-01-31 16:56  0%             ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2017-01-31 16:56 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Aaron Conole, dev

On Tue, 31 Jan 2017 09:33:45 +0000
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Mon, Jan 30, 2017 at 01:38:00PM -0500, Aaron Conole wrote:
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >   
> > > On Fri, 27 Jan 2017 16:47:40 +0000
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >  
> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:  
> > >> > On Fri, 27 Jan 2017 09:57:03 -0500
> > >> > Aaron Conole <aconole@redhat.com> wrote:
> > >> >     
> > >> > > diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> > >> > > index 03fee50..46e427f 100644
> > >> > > --- a/lib/librte_eal/common/include/rte_eal.h
> > >> > > +++ b/lib/librte_eal/common/include/rte_eal.h
> > >> > > @@ -159,7 +159,29 @@ int rte_eal_iopl_init(void);
> > >> > >   *     function call and should not be further interpreted by the
> > >> > >   *     application.  The EAL does not take any ownership of the memory used
> > >> > >   *     for either the argv array, or its members.
> > >> > > - *   - On failure, a negative error value.
> > >> > > + *   - On failure, -1 and rte_errno is set to a value indicating the cause
> > >> > > + *     for failure.
> > >> > > + *
> > >> > > + *   The error codes returned via rte_errno:
> > >> > > + *     EACCES indicates a permissions issue.
> > >> > > + *
> > >> > > + *     EAGAIN indicates either a bus or system resource was not available,
> > >> > > + *            try again.
> > >> > > + *
> > >> > > + *     EALREADY indicates that the rte_eal_init function has already been
> > >> > > + *              called, and cannot be called again.
> > >> > > + *
> > >> > > + *     EINVAL indicates invalid parameters were passed as argv/argc.
> > >> > > + *
> > >> > > + *     EIO indicates failure to setup the logging handlers.  This is usually
> > >> > > + *         caused by an out-of-memory condition.
> > >> > > + *
> > >> > > + *     ENODEV indicates memory setup issues.
> > >> > > + *
> > >> > > + *     ENOTSUP indicates that the EAL cannot initialize on this system.
> > >> > > + *
> > >> > > + *     EUNATCH indicates that the PCI bus is either not present, or is not
> > >> > > + *             readable by the eal.
> > >> > >   */
> > >> > >  int rte_eal_init(int argc, char **argv);    
> > >> > 
> > >> > Why use rte_errno?
> > >> > Most DPDK calls just return negative value on error which
> > >> > corresponds to error number.
> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> > >> > because before all these
> > >> > errors were panic's no working application is going to care.    
> > >> 
> > >> Either will work, but I actually prefer this way. I view using rte_errno
> > >> to be better as it can work in just about all cases, including with
> > >> functions which return pointers. This allows you to have a standard
> > >> method across all functions for returning error codes, and it only
> > >> requires a single sentinal value to indicate error, rather than using a
> > >> whole range of values.  
> > >
> > > The problem is DPDK is getting more inconsistent on how this is done.
> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > > matter much which way the value is returned from a technical point of view
> > > but the inconsistency is sure to be a usability problem and source of errors.  
> > 
> > I am using rte_errno here because I assumed it was the preferred
> > method.  In fact, looking at some recently contributed modules (for
> > instance pdump), it seems that folks are using it.
> > 
> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> > it'd be helpful to know if there's some consensus on reflecting errors
> > via this variable, or on returning error codes.  Whichever is the more
> > consistent with the way the DPDK project does things, I'm game :).
> >   
> Unfortunately, this is one area where DPDK is inconsistent, and both
> schemes are widely used. I much prefer using the rte_errno method, but
> returning error codes directly is also common in DPDK.

One argument in favor of returning error codes directly, is that it makes
it safer in application when one user function is returning an error code 
back through its internal call tree.

Also, the API does not really do a good job of distinguishing between normal
(no data present) and exceptional (NIC has died).  At least it doesn't depend
on something like Structured Exception handling...

Feel free to clean the stables on this one.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio
  2017-01-30 17:52  4%     ` Thomas Monjalon
@ 2017-02-01  7:24  4%       ` Tan, Jianfeng
  0 siblings, 0 replies; 200+ results
From: Tan, Jianfeng @ 2017-02-01  7:24 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit; +Cc: dev, john.mcnamara, yuanhan.liu, stephen



On 1/31/2017 1:52 AM, Thomas Monjalon wrote:
> 2017-01-24 13:35, Ferruh Yigit:
>> On 1/24/2017 7:34 AM, Jianfeng Tan wrote:
>>> We announced ABI changes to remove iomem and ioport mapping in
>>> igb_uio. But it has potential backward compatibility issue: cannot
>>> run old version DPDK on modified igb_uio.
>>>
>>> The purpose of this changes was to fix a bug: when DPDK app crashes,
>>> those devices by igb_uio are not stopped either DPDK PMD driver or
>>> igb_uio driver. We need to figure out new way to fix this bug.
>> Hi Jianfeng,
>>
>> I believe it would be good to fix this potential defect.
>>
>> Is "remove iomem and ioport" a must for that fix? If so, I suggest
>> re-think about it.
>>
>> If I see correctly, dpdk1.8 and older uses igb_uio iomem files. So
>> backward compatibility is the possible issue for dpdk1.8 and older.
>> Since v1.8 two years old, I would prefer fixing defect instead of
>> keeping that backward compatibility.
>>
>> Jianfeng, Thomas,
>>
>> What do you think postponing this deprecation notice to next release,
>> instead of removing it, and discuss more?
>>
>>
>> And overall, if "remove iomem and ioport" is not a must for this fix, no
>> problem to remove deprecation notice.
> I have no strong opinion here.
> Jianfeng, do you agree with Ferruh?

Hi Ferruh & Thomas,

I agree wit Ferruh to postpone this deprecation notice.

In another thread, we discussed the possibility to fix this problem 
without the deprecation. But I have no time to verify it in this release 
cycle. Let's postpone it then.

Thanks,
Jianfeng

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-01-30 20:19  0%           ` Thomas Monjalon
@ 2017-02-01 10:54  3%             ` Adrien Mazarguil
  2017-02-01 12:06  0%               ` Jan Blunck
  0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2017-02-01 10:54 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Aaron Conole, dev, Stephen Hemminger, Bruce Richardson

On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
> 2017-01-30 13:38, Aaron Conole:
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> > >> > Why use rte_errno?
> > >> > Most DPDK calls just return negative value on error which
> > >> > corresponds to error number.
> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> > >> > because before all these
> > >> > errors were panic's no working application is going to care.  
> > >> 
> > >> Either will work, but I actually prefer this way. I view using rte_errno
> > >> to be better as it can work in just about all cases, including with
> > >> functions which return pointers. This allows you to have a standard
> > >> method across all functions for returning error codes, and it only
> > >> requires a single sentinal value to indicate error, rather than using a
> > >> whole range of values.
> > >
> > > The problem is DPDK is getting more inconsistent on how this is done.
> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> > > matter much which way the value is returned from a technical point of view
> > > but the inconsistency is sure to be a usability problem and source of errors.
> > 
> > I am using rte_errno here because I assumed it was the preferred
> > method.  In fact, looking at some recently contributed modules (for
> > instance pdump), it seems that folks are using it.
> > 
> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> > it'd be helpful to know if there's some consensus on reflecting errors
> > via this variable, or on returning error codes.  Whichever is the more
> > consistent with the way the DPDK project does things, I'm game :).
> 
> I think we can use both return value and rte_errno.
> We could try to enforce rte_errno as mandatory everywhere.
> 
> Adrien did the recent rte_flow API.
> Please Adrien, could you give your thought?

Sure, actually as already pointed out in this thread, both approaches have
pros and cons depending on the use-case.

Through return value:

Pros
----

- Most common approach used in DPPK today.
- Used internally by the Linux kernel (negative errno) and in the pthreads
  library (positive errno).
- Avoids the need to access an external, global variable requiring its own
  thread-local storage.
- Inherently thread-safe and reentrant (i.e. safe with signal handlers).
- Returned value is also the error code, two facts reported at once.

Cons
----

- Difficult to use with functions returning anything other than signed
  integers with negative values having no other meaning.
- The returned value must be assigned to a local variable in order not to
  discard it and process it later most of the time.
- All function calls must be tested for errors.

Through rte_errno:

Pros
----

- errno-like, well known behavior defined by the C standard and used
  everywhere in the C library.
- Testing return values is not mandatory, e.g. rte_errno can be initialized
  to zero before calling a group of functions and checking its value
  afterward (rte_errno is only updated in case of error).
- Assigning a local variable to store its value is not necessary as long as
  another function that may affect rte_errno is not called.

Cons
----

- Not fully reentrant, thread-safety is fine for most purposes but signal
  handlers affecting it still cause undefined behavior (they must at least
  save and restore its value in case they modify it).
- Accessing non-local storage may affect CPU cycle-sensitive functions such
  as TX/RX burst.

My opinion is that rte_errno is best for control path operations while using
the return value makes more sense in the data path. The major issue being
that function returning anything other than int (e.g. TX/RX burst) cannot
describe any kind of error to the application.

I went with both in rte_flow (return + rte_errno) mostly due to the return
type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
consistent while maintaining compatibility with other DPDK APIs. Note there
is little overhead for API functions to set rte_errno _and_ return its
value, it's mostly free.

I think using both is best also because it leaves applications the choice of
error-handling method, however if I had to pick one I'd go with rte_errno
and standardize on -1 as the default error value (as in the C library).

Below are a bunch of use-case examples to illustrate how rte_errno could
be convenient to applications.

Easily creating many flow rules during init in a all-or-nothing fashion:

 rte_errno = 0;
 for (i = 0; i != num; ++i)
     rule[i] = rte_flow_create(port, ...);
 if (unlikely(rte_errno)) {
     rte_flow_flush(port);
     return -1;
 }

Complete TX packet burst failure with explanation (could also detect partial
failures by initializing rte_errno to 0):

 sent = rte_eth_tx_burst(...);
 if (unlikely(!sent)) {
     switch (rte_errno) {
         case E2BIG:
             // too many packets in burst
         ...
         case EMSGSIZE:
             // first packet is too large
         ...
         case ENOBUFS:
             // TX queue is full
         ...
     }
 }
 
TX burst functions in PMDs could be modified as follows with minimal impact
on their performance and no ABI change:

     uint16_t sent = 0;
     int error; // new variable
 
     [process burst]
     if (unlikely([something went wrong])) { // this check already exists
         error = EPROBLEM; // new assignment
         goto error; // instead of "return sent"
     }
     [process burst]
     return sent;
 error:
     rte_errno = error;
     return sent;

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-02-01 10:54  3%             ` Adrien Mazarguil
@ 2017-02-01 12:06  0%               ` Jan Blunck
  2017-02-01 14:18  0%                 ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jan Blunck @ 2017-02-01 12:06 UTC (permalink / raw)
  To: Adrien Mazarguil
  Cc: Thomas Monjalon, Aaron Conole, dev, Stephen Hemminger, Bruce Richardson

On Wed, Feb 1, 2017 at 11:54 AM, Adrien Mazarguil
<adrien.mazarguil@6wind.com> wrote:
> On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
>> 2017-01-30 13:38, Aaron Conole:
>> > Stephen Hemminger <stephen@networkplumber.org> writes:
>> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
>> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
>> > >> > Why use rte_errno?
>> > >> > Most DPDK calls just return negative value on error which
>> > >> > corresponds to error number.
>> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
>> > >> > because before all these
>> > >> > errors were panic's no working application is going to care.
>> > >>
>> > >> Either will work, but I actually prefer this way. I view using rte_errno
>> > >> to be better as it can work in just about all cases, including with
>> > >> functions which return pointers. This allows you to have a standard
>> > >> method across all functions for returning error codes, and it only
>> > >> requires a single sentinal value to indicate error, rather than using a
>> > >> whole range of values.
>> > >
>> > > The problem is DPDK is getting more inconsistent on how this is done.
>> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
>> > > matter much which way the value is returned from a technical point of view
>> > > but the inconsistency is sure to be a usability problem and source of errors.
>> >
>> > I am using rte_errno here because I assumed it was the preferred
>> > method.  In fact, looking at some recently contributed modules (for
>> > instance pdump), it seems that folks are using it.
>> >
>> > I'm not really sure the purpose of having rte_errno if it isn't used, so
>> > it'd be helpful to know if there's some consensus on reflecting errors
>> > via this variable, or on returning error codes.  Whichever is the more
>> > consistent with the way the DPDK project does things, I'm game :).
>>
>> I think we can use both return value and rte_errno.
>> We could try to enforce rte_errno as mandatory everywhere.
>>
>> Adrien did the recent rte_flow API.
>> Please Adrien, could you give your thought?
>
> Sure, actually as already pointed out in this thread, both approaches have
> pros and cons depending on the use-case.
>
> Through return value:
>
> Pros
> ----
>
> - Most common approach used in DPPK today.
> - Used internally by the Linux kernel (negative errno) and in the pthreads
>   library (positive errno).
> - Avoids the need to access an external, global variable requiring its own
>   thread-local storage.
> - Inherently thread-safe and reentrant (i.e. safe with signal handlers).
> - Returned value is also the error code, two facts reported at once.

Caller can decide to ignore return value if no error handling is wanted.

>
> Cons
> ----
>
> - Difficult to use with functions returning anything other than signed
>   integers with negative values having no other meaning.
> - The returned value must be assigned to a local variable in order not to
>   discard it and process it later most of the time.

I believe this is Pro since the rte_errno even needs to assign to a
thread-local variable even.

> - All function calls must be tested for errors.

The rte_errno needs to do this too to decide if it needs to assign a
value to rte_errno.

>
> Through rte_errno:
>
> Pros
> ----
>
> - errno-like, well known behavior defined by the C standard and used
>   everywhere in the C library.
> - Testing return values is not mandatory, e.g. rte_errno can be initialized
>   to zero before calling a group of functions and checking its value
>   afterward (rte_errno is only updated in case of error).
> - Assigning a local variable to store its value is not necessary as long as
>   another function that may affect rte_errno is not called.
>
> Cons
> ----
>
> - Not fully reentrant, thread-safety is fine for most purposes but signal
>   handlers affecting it still cause undefined behavior (they must at least
>   save and restore its value in case they modify it).
> - Accessing non-local storage may affect CPU cycle-sensitive functions such
>   as TX/RX burst.

Actually testing for errors mean you also have to reset the rte_errno
variable before. That also means you have to access thread-local
storage twice.

Besides that the problem of rte_errno is that you do error handling
twice because the implementation still needs to check for the error
condition before assigning a meaningful error value to rte_errno.
After that again the user code needs to check for the return value to
decide if looking at rte_errno makes any sense.


> My opinion is that rte_errno is best for control path operations while using
> the return value makes more sense in the data path. The major issue being
> that function returning anything other than int (e.g. TX/RX burst) cannot
> describe any kind of error to the application.
>
> I went with both in rte_flow (return + rte_errno) mostly due to the return
> type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
> consistent while maintaining compatibility with other DPDK APIs. Note there
> is little overhead for API functions to set rte_errno _and_ return its
> value, it's mostly free.
>
> I think using both is best also because it leaves applications the choice of
> error-handling method, however if I had to pick one I'd go with rte_errno
> and standardize on -1 as the default error value (as in the C library).
>
> Below are a bunch of use-case examples to illustrate how rte_errno could
> be convenient to applications.
>
> Easily creating many flow rules during init in a all-or-nothing fashion:
>
>  rte_errno = 0;
>  for (i = 0; i != num; ++i)
>      rule[i] = rte_flow_create(port, ...);
>  if (unlikely(rte_errno)) {
>      rte_flow_flush(port);
>      return -1;
>  }
>
> Complete TX packet burst failure with explanation (could also detect partial
> failures by initializing rte_errno to 0):
>
>  sent = rte_eth_tx_burst(...);
>  if (unlikely(!sent)) {
>      switch (rte_errno) {
>          case E2BIG:
>              // too many packets in burst
>          ...
>          case EMSGSIZE:
>              // first packet is too large
>          ...
>          case ENOBUFS:
>              // TX queue is full
>          ...
>      }
>  }
>
> TX burst functions in PMDs could be modified as follows with minimal impact
> on their performance and no ABI change:
>
>      uint16_t sent = 0;
>      int error; // new variable
>
>      [process burst]
>      if (unlikely([something went wrong])) { // this check already exists
>          error = EPROBLEM; // new assignment
>          goto error; // instead of "return sent"
>      }
>      [process burst]
>      return sent;
>  error:
>      rte_errno = error;
>      return sent;
>
> --
> Adrien Mazarguil
> 6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes
  2017-02-01 12:06  0%               ` Jan Blunck
@ 2017-02-01 14:18  0%                 ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-01 14:18 UTC (permalink / raw)
  To: Jan Blunck
  Cc: Adrien Mazarguil, Thomas Monjalon, Aaron Conole, dev, Stephen Hemminger

On Wed, Feb 01, 2017 at 01:06:03PM +0100, Jan Blunck wrote:
> On Wed, Feb 1, 2017 at 11:54 AM, Adrien Mazarguil
> <adrien.mazarguil@6wind.com> wrote:
> > On Mon, Jan 30, 2017 at 09:19:29PM +0100, Thomas Monjalon wrote:
> >> 2017-01-30 13:38, Aaron Conole:
> >> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >> > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> >> > >> On Fri, Jan 27, 2017 at 08:33:46AM -0800, Stephen Hemminger wrote:
> >> > >> > Why use rte_errno?
> >> > >> > Most DPDK calls just return negative value on error which
> >> > >> > corresponds to error number.
> >> > >> > Are you trying to keep ABI compatibility? Doesn't make sense
> >> > >> > because before all these
> >> > >> > errors were panic's no working application is going to care.
> >> > >>
> >> > >> Either will work, but I actually prefer this way. I view using rte_errno
> >> > >> to be better as it can work in just about all cases, including with
> >> > >> functions which return pointers. This allows you to have a standard
> >> > >> method across all functions for returning error codes, and it only
> >> > >> requires a single sentinal value to indicate error, rather than using a
> >> > >> whole range of values.
> >> > >
> >> > > The problem is DPDK is getting more inconsistent on how this is done.
> >> > > As long as error returns are always same as kernel/glibc errno's it really doesn't
> >> > > matter much which way the value is returned from a technical point of view
> >> > > but the inconsistency is sure to be a usability problem and source of errors.
> >> >
> >> > I am using rte_errno here because I assumed it was the preferred
> >> > method.  In fact, looking at some recently contributed modules (for
> >> > instance pdump), it seems that folks are using it.
> >> >
> >> > I'm not really sure the purpose of having rte_errno if it isn't used, so
> >> > it'd be helpful to know if there's some consensus on reflecting errors
> >> > via this variable, or on returning error codes.  Whichever is the more
> >> > consistent with the way the DPDK project does things, I'm game :).
> >>
> >> I think we can use both return value and rte_errno.
> >> We could try to enforce rte_errno as mandatory everywhere.
> >>
> >> Adrien did the recent rte_flow API.
> >> Please Adrien, could you give your thought?
> >
> > Sure, actually as already pointed out in this thread, both approaches have
> > pros and cons depending on the use-case.
> >
> > Through return value:
> >
> > Pros
> > ----
> >
> > - Most common approach used in DPPK today.
> > - Used internally by the Linux kernel (negative errno) and in the pthreads
> >   library (positive errno).
> > - Avoids the need to access an external, global variable requiring its own
> >   thread-local storage.
> > - Inherently thread-safe and reentrant (i.e. safe with signal handlers).
> > - Returned value is also the error code, two facts reported at once.
> 
> Caller can decide to ignore return value if no error handling is wanted.
>
Not always the case. In the case of a rx or tx burst call, if there is a
negative error that must be checked for or assigned to zero in some
cases to make other logic in the path work sanely, e.g. updating an
array of stats using the return value.

> >
> > Cons
> > ----
> >
> > - Difficult to use with functions returning anything other than signed
> >   integers with negative values having no other meaning.
> > - The returned value must be assigned to a local variable in order not to
> >   discard it and process it later most of the time.
> 
> I believe this is Pro since the rte_errno even needs to assign to a
> thread-local variable even.

No, it's a con, since for errno the value will be preserved in the
absense of other errors. The application can delay handling the error as
long as it wants, in the absense of causes of subsequent errors.

> 
> > - All function calls must be tested for errors.
> 
> The rte_errno needs to do this too to decide if it needs to assign a
> value to rte_errno.
> 
Thats inside the called function, not the application. See my earlier
comment above about having to check your return value is in the valid
"logical range" expected from the call. Having a negative number of
packets received does not make logical sense, so you have to check the
return value when updating stats etc.


> >
> > Through rte_errno:
> >
> > Pros
> > ----
> >
> > - errno-like, well known behavior defined by the C standard and used
> >   everywhere in the C library.
> > - Testing return values is not mandatory, e.g. rte_errno can be initialized
> >   to zero before calling a group of functions and checking its value
> >   afterward (rte_errno is only updated in case of error).
> > - Assigning a local variable to store its value is not necessary as long as
> >   another function that may affect rte_errno is not called.
> >
> > Cons
> > ----
> >
> > - Not fully reentrant, thread-safety is fine for most purposes but signal
> >   handlers affecting it still cause undefined behavior (they must at least
> >   save and restore its value in case they modify it).
> > - Accessing non-local storage may affect CPU cycle-sensitive functions such
> >   as TX/RX burst.
> 
> Actually testing for errors mean you also have to reset the rte_errno
> variable before. That also means you have to access thread-local
> storage twice.
> 
Not true. Your return value still indicates an error via a single
sentinal value. Only in that case do you (the app) access the global value,
to find out the exact error reason.

> Besides that the problem of rte_errno is that you do error handling
> twice because the implementation still needs to check for the error
> condition before assigning a meaningful error value to rte_errno.
> After that again the user code needs to check for the return value to
> decide if looking at rte_errno makes any sense.
> 
Yes, in the case of an error occuring there will be an extra write to a
global variable, and a subsequent read from that value (which should not
be a problem, as the write will have occurred in the same thread).
However, this is irrelevant to normal path processing. Error should be
the exception not the rule.

> 
> > My opinion is that rte_errno is best for control path operations while using
> > the return value makes more sense in the data path. The major issue being
> > that function returning anything other than int (e.g. TX/RX burst) cannot
> > describe any kind of error to the application.
> >
> > I went with both in rte_flow (return + rte_errno) mostly due to the return
> > type of a few functions (e.g. rte_flow_create()) and wanted to keep the API
> > consistent while maintaining compatibility with other DPDK APIs. Note there
> > is little overhead for API functions to set rte_errno _and_ return its
> > value, it's mostly free
+1, and error cases should be rare, even if there is a small cost.
.
> >
> > I think using both is best also because it leaves applications the choice of
> > error-handling method, however if I had to pick one I'd go with rte_errno
> > and standardize on -1 as the default error value (as in the C library).
> >
+1
though I think the sentinal value will vary depending on each case. I would
look to keep the standard packet rx/tx functions and ones like them
returning a zero on any error, to simplify programming logic, and also
because in many cases the only real causes of error they can produce is
from bad parameters.
Functions returning pointers obviously will use NULL as error value.


> > Below are a bunch of use-case examples to illustrate how rte_errno could
> > be convenient to applications.
> >
> > Easily creating many flow rules during init in a all-or-nothing fashion:
> >
> >  rte_errno = 0;
> >  for (i = 0; i != num; ++i)
> >      rule[i] = rte_flow_create(port, ...);
> >  if (unlikely(rte_errno)) {
> >      rte_flow_flush(port);
> >      return -1;
> >  }
> >
> > Complete TX packet burst failure with explanation (could also detect partial
> > failures by initializing rte_errno to 0):
> >
> >  sent = rte_eth_tx_burst(...);
> >  if (unlikely(!sent)) {
> >      switch (rte_errno) {
> >          case E2BIG:
> >              // too many packets in burst
> >          ...
> >          case EMSGSIZE:
> >              // first packet is too large
> >          ...
> >          case ENOBUFS:
> >              // TX queue is full
> >          ...
> >      }
> >  }
> >
> > TX burst functions in PMDs could be modified as follows with minimal impact
> > on their performance and no ABI change:
> >
> >      uint16_t sent = 0;
> >      int error; // new variable
> >
> >      [process burst]
> >      if (unlikely([something went wrong])) { // this check already exists
> >          error = EPROBLEM; // new assignment
> >          goto error; // instead of "return sent"
> >      }
> >      [process burst]
> >      return sent;
> >  error:
> >      rte_errno = error;
> >      return sent;
> >
> > --
> > Adrien Mazarguil
> > 6WIND

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get
@ 2017-02-01 16:53  3% Stephen Hemminger
  2017-02-02 13:55  0% ` Mrozowicz, SlawomirX
  2017-02-03 12:26  0% ` Mrozowicz, SlawomirX
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2017-02-01 16:53 UTC (permalink / raw)
  To: Slawomir Mrozowicz, Declan Doherty; +Cc: dev

The function rte_cryptodev_devices_get has several issues. I was just going to
fix it, but think it need to be explained.
 
One potentially serious one (reported by coverity) is:

*** CID 141067:    (BAD_COMPARE)
/lib/librte_cryptodev/rte_cryptodev.c: 503 in rte_cryptodev_devices_get()
497     				&& (*devs + i)->attached ==
498     						RTE_CRYPTODEV_ATTACHED) {
499     
500     			dev = (*devs + i)->device;
501     
502     			if (dev)
>>>     CID 141067:    (BAD_COMPARE)
>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0, or 1.  
503     				cmp = strncmp(dev->driver->name,
504     						dev_name,
505     						strlen(dev_name));
506     			else
507     				cmp = strncmp((*devs + i)->data->name,
508     						dev_name,
/lib/librte_cryptodev/rte_cryptodev.c: 507 in rte_cryptodev_devices_get()
501     
502     			if (dev)
503     				cmp = strncmp(dev->driver->name,
504     						dev_name,
505     						strlen(dev_name));
506     			else
>>>     CID 141067:    (BAD_COMPARE)
>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0, or 1.  
507     				cmp = strncmp((*devs + i)->data->name,
508     						dev_name,
509     						strlen(dev_name));
510     
511     			if (cmp == 0)
512     				devices[count++] = (*devs + i)->data->dev_id;


But also:

1. Incorrect function signature:
    * function returns int but never a negative value. should be unsigned.
    * devices argument is not modified should be const.

2. Original ABI seems short sighted with limit of 256 cryptodevs
    * this seems like 8 bit mindset,  should really use unsigned int instead
      of uint8_t for number of devices.

3. Wacky indention of the if statement.

4. Make variables local to the block they are used (cmp, dev)

5. Use array instead of pointer:
     ie. instead of *devs + i use devs[i]


The overall code in question is:


int
rte_cryptodev_devices_get(const char *dev_name, uint8_t *devices,
	uint8_t nb_devices)
{
	uint8_t i, cmp, count = 0;
	struct rte_cryptodev **devs = &rte_cryptodev_globals->devs;
	struct rte_device *dev;

	for (i = 0; i < rte_cryptodev_globals->max_devs && count < nb_devices;
			i++) {

		if ((*devs + i)
				&& (*devs + i)->attached ==
						RTE_CRYPTODEV_ATTACHED) {

			dev = (*devs + i)->device;

			if (dev)
				cmp = strncmp(dev->driver->name,
						dev_name,
						strlen(dev_name));
			else
				cmp = strncmp((*devs + i)->data->name,
						dev_name,
						strlen(dev_name));

			if (cmp == 0)
				devices[count++] = (*devs + i)->data->dev_id;
		}
	}

	return count;
}

Please fix it.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get
  2017-02-01 16:53  3% [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get Stephen Hemminger
@ 2017-02-02 13:55  0% ` Mrozowicz, SlawomirX
  2017-02-03 12:26  0% ` Mrozowicz, SlawomirX
  1 sibling, 0 replies; 200+ results
From: Mrozowicz, SlawomirX @ 2017-02-02 13:55 UTC (permalink / raw)
  To: Stephen Hemminger, Doherty, Declan; +Cc: dev



>-----Original Message-----
>From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>Sent: Wednesday, February 1, 2017 5:54 PM
>To: Mrozowicz, SlawomirX <slawomirx.mrozowicz@intel.com>; Doherty,
>Declan <declan.doherty@intel.com>
>Cc: dev@dpdk.org
>Subject: bugs and glitches in rte_cryptodev_devices_get
>
>The function rte_cryptodev_devices_get has several issues. I was just going
>to fix it, but think it need to be explained.
>
>One potentially serious one (reported by coverity) is:
>
>*** CID 141067:    (BAD_COMPARE)
>/lib/librte_cryptodev/rte_cryptodev.c: 503 in rte_cryptodev_devices_get()
>497     				&& (*devs + i)->attached ==
>498     						RTE_CRYPTODEV_ATTACHED)
>{
>499
>500     			dev = (*devs + i)->device;
>501
>502     			if (dev)
>>>>     CID 141067:    (BAD_COMPARE)
>>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be
>misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0,
>or 1.
>503     				cmp = strncmp(dev->driver->name,
>504     						dev_name,
>505     						strlen(dev_name));
>506     			else
>507     				cmp = strncmp((*devs + i)->data->name,
>508     						dev_name,
>/lib/librte_cryptodev/rte_cryptodev.c: 507 in rte_cryptodev_devices_get()
>501
>502     			if (dev)
>503     				cmp = strncmp(dev->driver->name,
>504     						dev_name,
>505     						strlen(dev_name));
>506     			else
>>>>     CID 141067:    (BAD_COMPARE)
>>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be
>misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0,
>or 1.
>507     				cmp = strncmp((*devs + i)->data->name,
>508     						dev_name,
>509     						strlen(dev_name));
>510
>511     			if (cmp == 0)
>512     				devices[count++] = (*devs + i)->data->dev_id;
>
>
>But also:
>
>1. Incorrect function signature:
>    * function returns int but never a negative value. should be unsigned.
>    * devices argument is not modified should be const.

[SM] Ok. To be changed.

>
>2. Original ABI seems short sighted with limit of 256 cryptodevs
>    * this seems like 8 bit mindset,  should really use unsigned int instead
>      of uint8_t for number of devices.

[SM] Ok. To be changed to uint8_t.

>
>3. Wacky indention of the if statement.

[SM] To be changed.

>
>4. Make variables local to the block they are used (cmp, dev)

[SM] Ok. To be changed.

>
>5. Use array instead of pointer:
>     ie. instead of *devs + i use devs[i]

[SM] We can't change it like this. devs[i] provide wrong address (null) for i>0

>
>
>The overall code in question is:
>
>
>int
>rte_cryptodev_devices_get(const char *dev_name, uint8_t *devices,
>	uint8_t nb_devices)
>{
>	uint8_t i, cmp, count = 0;
>	struct rte_cryptodev **devs = &rte_cryptodev_globals->devs;
>	struct rte_device *dev;
>
>	for (i = 0; i < rte_cryptodev_globals->max_devs && count <
>nb_devices;
>			i++) {
>
>		if ((*devs + i)
>				&& (*devs + i)->attached ==
>						RTE_CRYPTODEV_ATTACHED)
>{
>
>			dev = (*devs + i)->device;
>
>			if (dev)
>				cmp = strncmp(dev->driver->name,
>						dev_name,
>						strlen(dev_name));
>			else
>				cmp = strncmp((*devs + i)->data->name,
>						dev_name,
>						strlen(dev_name));
>
>			if (cmp == 0)
>				devices[count++] = (*devs + i)->data->dev_id;
>		}
>	}
>
>	return count;
>}
>
>Please fix it.
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v10 1/7] lib: add information metrics library
  @ 2017-02-03 10:33  1% ` Remy Horton
  2017-02-03 10:33  2% ` [dpdk-dev] [PATCH v10 3/7] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-02-03 10:33 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a new information metrics library. This Metrics
library implements a mechanism by which producers can publish
numeric information for later querying by consumers. Metrics
themselves are statistics that are not generated by PMDs, and
hence are not reported via ethdev extended statistics.

Metric information is populated using a push model, where
producers update the values contained within the metric
library by calling an update function on the relevant metrics.
Consumers receive metric information by querying the central
metric data, which is held in shared memory.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                |   4 +
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 doc/guides/prog_guide/index.rst            |   1 +
 doc/guides/prog_guide/metrics_lib.rst      | 180 +++++++++++++++++
 doc/guides/rel_notes/release_17_02.rst     |   9 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 +++++
 lib/librte_metrics/rte_metrics.c           | 299 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 240 +++++++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 13 files changed, 807 insertions(+)
 create mode 100644 doc/guides/prog_guide/metrics_lib.rst
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 27f999b..eceebaa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -627,6 +627,10 @@ F: lib/librte_jobstats/
 F: examples/l2fwd-jobstats/
 F: doc/guides/sample_app_ug/l2_forward_job_stats.rst
 
+Metrics
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_metrics/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index 71a4fcb..b819932 100644
--- a/config/common_base
+++ b/config/common_base
@@ -501,6 +501,11 @@ CONFIG_RTE_LIBRTE_EFD=y
 CONFIG_RTE_LIBRTE_JOBSTATS=y
 
 #
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
+
+#
 # Compile librte_lpm
 #
 CONFIG_RTE_LIBRTE_LPM=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index eb39f69..26a26b7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -156,4 +156,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [device metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index b8a5fd8..e2e070f 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -53,6 +53,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_mbuf \
                           lib/librte_mempool \
                           lib/librte_meter \
+                          lib/librte_metrics \
                           lib/librte_net \
                           lib/librte_pdump \
                           lib/librte_pipeline \
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 7f825cb..fea651c 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -62,6 +62,7 @@ Programmer's Guide
     packet_classif_access_ctrl
     packet_framework
     vhost_lib
+    metrics_lib
     port_hotplug_framework
     source_org
     dev_kit_build_system
diff --git a/doc/guides/prog_guide/metrics_lib.rst b/doc/guides/prog_guide/metrics_lib.rst
new file mode 100644
index 0000000..87f806d
--- /dev/null
+++ b/doc/guides/prog_guide/metrics_lib.rst
@@ -0,0 +1,180 @@
+..  BSD LICENSE
+    Copyright(c) 2017 Intel Corporation. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Intel Corporation nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Metrics_Library:
+
+Metrics Library
+===============
+
+The Metrics library implements a mechanism by which *producers* can
+publish numeric information for later querying by *consumers*. In
+practice producers will typically be other libraries or primary
+processes, whereas consumers will typically be applications.
+
+Metrics themselves are statistics that are not generated by PMDs. Metric
+information is populated using a push model, where producers update the
+values contained within the metric library by calling an update function
+on the relevant metrics. Consumers receive metric information by querying
+the central metric data, which is held in shared memory.
+
+For each metric, a separate value is maintained for each port id, and
+when publishing metric values the producers need to specify which port is
+being updated. In addition there is a special id ``RTE_METRICS_GLOBAL``
+that is intended for global statistics that are not associated with any
+individual device. Since the metrics library is self-contained, the only
+restriction on port numbers is that they are less than ``RTE_MAX_ETHPORTS``
+- there is no requirement for the ports to actually exist.
+
+Initialising the library
+------------------------
+
+Before the library can be used, it has to be initialized by calling
+``rte_metrics_init()`` which sets up the metric store in shared memory.
+This is where producers will publish metric information to, and where
+consumers will query it from.
+
+.. code-block:: c
+
+    rte_metrics_init(rte_socket_id());
+
+This function **must** be called from a primary process, but otherwise
+producers and consumers can be in either primary or secondary processes.
+
+Registering metrics
+-------------------
+
+Metrics must first be *registered*, which is the way producers declare
+the names of the metrics they will be publishing. Registration can either
+be done individually, or a set of metrics can be registered as a group.
+Individual registration is done using ``rte_metrics_reg_name()``:
+
+.. code-block:: c
+
+    id_1 = rte_metrics_reg_name("mean_bits_in");
+    id_2 = rte_metrics_reg_name("mean_bits_out");
+    id_3 = rte_metrics_reg_name("peak_bits_in");
+    id_4 = rte_metrics_reg_name("peak_bits_out");
+
+or alternatively, a set of metrics can be registered together using
+``rte_metrics_reg_names()``:
+
+.. code-block:: c
+
+    const char * const names[] = {
+        "mean_bits_in", "mean_bits_out",
+        "peak_bits_in", "peak_bits_out",
+    };
+    id_set = rte_metrics_reg_names(&names[0], 4);
+
+If the return value is negative, it means registration failed. Otherwise
+the return value is the *key* for the metric, which is used when updating
+values. A table mapping together these key values and the metrics' names
+can be obtained using ``rte_metrics_get_names()``.
+
+Updating metric values
+----------------------
+
+Once registered, producers can update the metric for a given port using
+the ``rte_metrics_update_value()`` function. This uses the metric key
+that is returned when registering the metric, and can also be looked up
+using ``rte_metrics_get_names()``.
+
+.. code-block:: c
+
+    rte_metrics_update_value(port_id, id_1, values[0]);
+    rte_metrics_update_value(port_id, id_2, values[1]);
+    rte_metrics_update_value(port_id, id_3, values[2]);
+    rte_metrics_update_value(port_id, id_4, values[3]);
+
+if metrics were registered as a single set, they can either be updated
+individually using ``rte_metrics_update_value()``, or updated together
+using the ``rte_metrics_update_values()`` function:
+
+.. code-block:: c
+
+    rte_metrics_update_value(port_id, id_set, values[0]);
+    rte_metrics_update_value(port_id, id_set + 1, values[1]);
+    rte_metrics_update_value(port_id, id_set + 2, values[2]);
+    rte_metrics_update_value(port_id, id_set + 3, values[3]);
+
+    rte_metrics_update_values(port_id, id_set, values, 4);
+
+Note that ``rte_metrics_update_values()`` cannot be used to update
+metric values from *multiple* *sets*, as there is no guarantee two
+sets registered one after the other have contiguous id values.
+
+Querying metrics
+----------------
+
+Consumers can obtain metric values by querying the metrics library using
+the ``rte_metrics_get_values()`` function that return an array of
+``struct rte_metric_value``. Each entry within this array contains a metric
+value and its associated key. A key-name mapping can be obtained using the
+``rte_metrics_get_names()`` function that returns an array of
+``struct rte_metric_name`` that is indexed by the key. The following will
+print out all metrics for a given port:
+
+.. code-block:: c
+
+    void print_metrics() {
+        struct rte_metric_name *names;
+        int len;
+
+        len = rte_metrics_get_names(NULL, 0);
+        if (len < 0) {
+            printf("Cannot get metrics count\n");
+            return;
+        }
+        if (len == 0) {
+            printf("No metrics to display (none have been registered)\n");
+            return;
+        }
+        metrics = malloc(sizeof(struct rte_metric_value) * len);
+        names =  malloc(sizeof(struct rte_metric_name) * len);
+        if (metrics == NULL || names == NULL) {
+            printf("Cannot allocate memory\n");
+            free(metrics);
+            free(names);
+            return;
+        }
+        ret = rte_metrics_get_values(port_id, metrics, len);
+        if (ret < 0 || ret > len) {
+            printf("Cannot get metrics values\n");
+            free(metrics);
+            free(names);
+            return;
+        }
+        printf("Metrics for port %i:\n", port_id);
+        for (i = 0; i < len; i++)
+            printf("  %s: %"PRIu64"\n",
+                names[metrics[i].key].name, metrics[i].value);
+        free(metrics);
+        free(names);
+    }
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 83519dc..68581e4 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -38,6 +38,14 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added information metric library.**
+
+  A library that allows information metrics to be added and updated
+  by producers, typically other libraries, for later retrieval by
+  consumers such as applications. It is intended to provide a
+  reporting mechanism that is independent of other libraries such
+  as ethdev.
+
 * **Added generic EAL API for I/O device memory read/write operations.**
 
   This API introduces 8-bit, 16-bit, 32bit, 64bit I/O device
@@ -355,6 +363,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_mbuf.so.2
      librte_mempool.so.2
      librte_meter.so.1
+   + librte_metrics.so.1
      librte_net.so.1
      librte_pdump.so.1
      librte_pipeline.so.3
diff --git a/lib/Makefile b/lib/Makefile
index 4178325..29f6a81 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += librte_jobstats
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..889d377
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,299 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+#include <rte_spinlock.h>
+
+#define RTE_METRICS_MAX_METRICS 256
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ */
+struct rte_metrics_meta_s {
+	/** Name of metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+	/** Current value for metric */
+	uint64_t value[RTE_MAX_ETHPORTS];
+	/** Used for global metrics */
+	uint64_t nonport_value;
+	/** Index of next root element (zero for none) */
+	uint16_t idx_next_set;
+	/** Index of next metric in set (zero for none) */
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	/**   Index of last metadata entry with valid data.
+	 * This value is not valid if cnt_stats is zero.
+	 */
+	uint16_t idx_last_set;
+	/**   Number of metrics. */
+	uint16_t cnt_stats;
+	/** Metric data memory block. */
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+	/** Metric data access lock */
+	rte_spinlock_t lock;
+};
+
+void
+rte_metrics_init(int socket_id)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
+		sizeof(struct rte_metrics_data_s), socket_id, 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+	rte_spinlock_init(&stats->lock);
+}
+
+int
+rte_metrics_reg_name(const char *name)
+{
+	const char * const list_names[] = {name};
+
+	return rte_metrics_reg_names(list_names, 1);
+}
+
+int
+rte_metrics_reg_names(const char * const *names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	rte_spinlock_lock(&stats->lock);
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	rte_spinlock_unlock(&stats->lock);
+
+	return idx_base;
+}
+
+int
+rte_metrics_update_value(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_values(port_id, key, &value, 1);
+}
+
+int
+rte_metrics_update_values(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	if (port_id != RTE_METRICS_GLOBAL &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	rte_spinlock_lock(&stats->lock);
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize) {
+		rte_spinlock_unlock(&stats->lock);
+		return -ERANGE;
+	}
+
+	if (port_id == RTE_METRICS_GLOBAL)
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].nonport_value =
+				values[idx_value];
+		}
+	else
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].value[port_id] =
+				values[idx_value];
+		}
+	rte_spinlock_unlock(&stats->lock);
+	return 0;
+}
+
+int
+rte_metrics_get_names(struct rte_metric_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats) {
+			return_value = stats->cnt_stats;
+			rte_spinlock_unlock(&stats->lock);
+			return return_value;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	if (port_id != RTE_METRICS_GLOBAL &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats) {
+			return_value = stats->cnt_stats;
+			rte_spinlock_unlock(&stats->lock);
+			return return_value;
+		}
+		if (port_id == RTE_METRICS_GLOBAL)
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->nonport_value;
+			}
+		else
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->value[port_id];
+			}
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..71c57c6
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,240 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * DPDK Metrics module
+ *
+ * Metrics are statistics that are not generated by PMDs, and hence
+ * are better reported through a mechanism that is independent from
+ * the ethdev-based extended statistics. Providers will typically
+ * be other libraries and consumers will typically be applications.
+ *
+ * Metric information is populated using a push model, where producers
+ * update the values contained within the metric library by calling
+ * an update function on the relevant metrics. Consumers receive
+ * metric information by querying the central metric data, which is
+ * held in shared memory. Currently only bulk querying of metrics
+ * by consumers is supported.
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of metric name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/**
+ * Global (rather than port-specific) metric special id.
+ *
+ * When used for the port_id parameter when calling
+ * rte_metrics_update_metric() or rte_metrics_update_metric(),
+ * the global metrics, which are not associated with any specific
+ * port (i.e. device), are updated.
+ */
+#define RTE_METRICS_GLOBAL -1
+
+
+/**
+ * A name-key lookup for metrics.
+ *
+ * An array of this structure is returned by rte_metrics_get_names().
+ * The struct rte_metric_value references these names via their array index.
+ */
+struct rte_metric_name {
+	/** String describing metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Metric value structure.
+ *
+ * This structure is used by rte_metrics_get_values() to return metrics,
+ * which are statistics that are not generated by PMDs. It maps a name key,
+ * which corresponds to an index in the array returned by
+ * rte_metrics_get_names().
+ */
+struct rte_metric_value {
+	/** Numeric identifier of metric. */
+	uint16_t key;
+	/** Value for metric */
+	uint64_t value;
+};
+
+
+/**
+ * Initializes metric module. This function must be called from
+ * a primary process before metrics are used.
+ *
+ * @param socket_id
+ *   Socket to use for shared memory allocation.
+ */
+void rte_metrics_init(int socket_id);
+
+/**
+ * Register a metric, making it available as a reporting parameter.
+ *
+ * Registering a metric is the way producers declare a parameter
+ * that they wish to be reported. Once registered, the associated
+ * numeric key can be obtained via rte_metrics_get_names(), which
+ * is required for updating said metric's value.
+ *
+ * @param name
+ *   Metric name
+ *
+ * @return
+ *  - Zero or positive: Success (index key of new metric)
+ *  - -EIO: Error, unable to access metrics shared memory
+ *    (rte_metrics_init() not called)
+ *  - -EINVAL: Error, invalid parameters
+ *  - -ENOMEM: Error, maximum metrics reached
+ */
+int rte_metrics_reg_name(const char *name);
+
+/**
+ * Register a set of metrics.
+ *
+ * This is a bulk version of rte_metrics_reg_metrics() and aside from
+ * handling multiple keys at once is functionally identical.
+ *
+ * @param names
+ *   List of metric names
+ *
+ * @param cnt_names
+ *   Number of metrics in set
+ *
+ * @return
+ *  - Zero or positive: Success (index key of start of set)
+ *  - -EIO: Error, unable to access metrics shared memory
+ *    (rte_metrics_init() not called)
+ *  - -EINVAL: Error, invalid parameters
+ *  - -ENOMEM: Error, maximum metrics reached
+ */
+int rte_metrics_reg_names(const char * const *names, uint16_t cnt_names);
+
+/**
+ * Get metric name-key lookup table.
+ *
+ * @param names
+ *   A struct rte_metric_name array of at least *capacity* in size to
+ *   receive key names. If this is NULL, function returns the required
+ *   number of elements for this array.
+ *
+ * @param capacity
+ *   Size (number of elements) of struct rte_metric_name array.
+ *   Disregarded if names is NULL.
+ *
+ * @return
+ *   - Positive value above capacity: error, *names* is too small.
+ *     Return value is required size.
+ *   - Positive value equal or less than capacity: Success. Return
+ *     value is number of elements filled in.
+ *   - Negative value: error.
+ */
+int rte_metrics_get_names(
+	struct rte_metric_name *names,
+	uint16_t capacity);
+
+/**
+ * Get metric value table.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   A struct rte_metric_value array of at least *capacity* in size to
+ *   receive metric ids and values. If this is NULL, function returns
+ *   the required number of elements for this array.
+ *
+ * @param capacity
+ *   Size (number of elements) of struct rte_metric_value array.
+ *   Disregarded if names is NULL.
+ *
+ * @return
+ *   - Positive value above capacity: error, *values* is too small.
+ *     Return value is required size.
+ *   - Positive value equal or less than capacity: Success. Return
+ *     value is number of elements filled in.
+ *   - Negative value: error.
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a metric
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Id of metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_value(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Base id of metrics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds metric set size
+ *   - -EIO if upable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_values(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..ee28fa0
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_17.02 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_name;
+	rte_metrics_reg_names;
+	rte_metrics_update_value;
+	rte_metrics_update_values;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 0d0a970..46de3d3 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
-- 
2.5.5

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v10 3/7] lib: add bitrate statistics library
    2017-02-03 10:33  1% ` [dpdk-dev] [PATCH v10 1/7] lib: add information metrics library Remy Horton
@ 2017-02-03 10:33  2% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-02-03 10:33 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/prog_guide/metrics_lib.rst              |  63 ++++++++++
 doc/guides/rel_notes/release_17_02.rst             |   6 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 +++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 132 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 +++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 mk/rte.app.mk                                      |   1 +
 12 files changed, 356 insertions(+)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index eceebaa..375adc9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -631,6 +631,10 @@ Metrics
 M: Remy Horton <remy.horton@intel.com>
 F: lib/librte_metrics/
 
+Bit-rate statistica
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_bitratestats/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index b819932..e7b0e5c 100644
--- a/config/common_base
+++ b/config/common_base
@@ -633,3 +633,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the crypto performance application
 #
 CONFIG_RTE_APP_CRYPTO_PERF=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 26a26b7..8492bce 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -157,4 +157,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [device metrics]     (@ref rte_metrics.h),
+  [bitrate statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index e2e070f..4010340 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -37,6 +37,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_eal/common/include \
                           lib/librte_eal/common/include/generic \
                           lib/librte_acl \
+                          lib/librte_bitratestats \
                           lib/librte_cfgfile \
                           lib/librte_cmdline \
                           lib/librte_compat \
diff --git a/doc/guides/prog_guide/metrics_lib.rst b/doc/guides/prog_guide/metrics_lib.rst
index 87f806d..c06023c 100644
--- a/doc/guides/prog_guide/metrics_lib.rst
+++ b/doc/guides/prog_guide/metrics_lib.rst
@@ -178,3 +178,66 @@ print out all metrics for a given port:
         free(metrics);
         free(names);
     }
+
+
+Bit-rate statistics library
+---------------------------
+
+The bit-rate library calculates the exponentially-weighted moving
+average and peak bit-rates for each active port (i.e. network device).
+These statistics are reported via the metrics library using the
+following names:
+
+    - ``mean_bits_in``: Average inbound bit-rate
+    - ``mean_bits_out``:  Average outbound bit-rate
+    - ``peak_bits_in``:  Peak inbound bit-rate
+    - ``peak_bits_out``:  Peak outbound bit-rate
+
+Once initialised and clocked at the appropriate frequency, these
+statistics can be obtained by querying the metrics library.
+
+Initialization
+~~~~~~~~~~~~~~
+
+Before it is used the bit-rate statistics library has to be initialised
+by calling ``rte_stats_bitrate_create()``, which will return a bit-rate
+calculation object. Since the bit-rate library uses the metrics library
+to report the calculated statistics, the bit-rate library then needs to
+register the calculated statistics with the metrics library. This is
+done using the helper function ``rte_stats_bitrate_reg()``.
+
+.. code-block:: c
+
+    struct rte_stats_bitrates *bitrate_data;
+
+    bitrate_data = rte_stats_bitrate_create();
+    if (bitrate_data == NULL)
+        rte_exit(EXIT_FAILURE, "Could not allocate bit-rate data.\n");
+    rte_stats_bitrate_reg(bitrate_data);
+
+Controlling the sampling rate
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since the library works by periodic sampling but does not use an
+internal thread, the application has to periodically call
+``rte_stats_bitrate_calc()``. The frequency at which this function
+is called should be the intended sampling rate required for the
+calculated statistics. For instance if per-second statistics are
+desired, this function should be called once a second.
+
+.. code-block:: c
+
+    tics_datum = rte_rdtsc();
+    tics_per_1sec = rte_get_timer_hz();
+
+    while( 1 ) {
+        /* ... */
+        tics_current = rte_rdtsc();
+	if (tics_current - tics_datum >= tics_per_1sec) {
+	    /* Periodic bitrate calculation */
+	    for (idx_port = 0; idx_port < cnt_ports; idx_port++)
+	            rte_stats_bitrate_calc(bitrate_data, idx_port);
+		tics_datum = tics_current;
+	    }
+        /* ... */
+    }
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 68581e4..98729e8 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -46,6 +46,11 @@ New Features
   reporting mechanism that is independent of other libraries such
   as ethdev.
 
+* **Added bit-rate calculation library.**
+
+  A library that can be used to calculate device bit-rates. Calculated
+  bitrates are reported using the metrics library.
+
 * **Added generic EAL API for I/O device memory read/write operations.**
 
   This API introduces 8-bit, 16-bit, 32bit, 64bit I/O device
@@ -348,6 +353,7 @@ The libraries prepended with a plus sign were incremented in this version.
 .. code-block:: diff
 
      librte_acl.so.2
+   + librte_bitratestats.so.1
      librte_cfgfile.so.2
      librte_cmdline.so.2
      librte_cryptodev.so.2
diff --git a/lib/Makefile b/lib/Makefile
index 29f6a81..ecc54c0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -50,6 +50,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += librte_jobstats
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..743b62c
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..2c20272
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,132 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+/*
+ * Persistent bit-rate data.
+ * @internal
+ */
+struct rte_stats_bitrate {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates {
+	struct rte_stats_bitrate port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+struct rte_stats_bitrates *
+rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates),
+		RTE_CACHE_LINE_SIZE);
+}
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data)
+{
+	const char * const names[] = {
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	return_value = rte_metrics_reg_names(&names[0], 4);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[4];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming bitrate. This is an iteratively calculated EWMA
+	 * (Expomentially Weighted Moving Average) that uses a
+	 * weighting factor of alpha_percent.
+	 */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	/* The +-50 fixes integer rounding during divison */
+	if (delta > 0)
+		delta = (delta * alpha_percent + 50) / 100;
+	else
+		delta = (delta * alpha_percent - 50) / 100;
+	port_data->ewma_ibits += delta;
+
+	/* Outgoing bitrate (also EWMA) */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_obits += delta;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->peak_ibits;
+	values[3] = port_data->peak_obits;
+	rte_metrics_update_values(port_id, bitrate_data->id_stats_set,
+		values, 4);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..564e4f7
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+/**
+ *  Bitrate statistics data structure.
+ *  This data structure is intentionally opaque.
+ */
+struct rte_stats_bitrates;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics with the metric library.
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period with which
+ * this function is called should be the intended sampling window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..66f232f
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_17.02 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 46de3d3..8f1f8d7 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -100,6 +100,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get
  2017-02-01 16:53  3% [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get Stephen Hemminger
  2017-02-02 13:55  0% ` Mrozowicz, SlawomirX
@ 2017-02-03 12:26  0% ` Mrozowicz, SlawomirX
  1 sibling, 0 replies; 200+ results
From: Mrozowicz, SlawomirX @ 2017-02-03 12:26 UTC (permalink / raw)
  To: Stephen Hemminger, Doherty, Declan; +Cc: dev, De Lara Guarch, Pablo



>-----Original Message-----
>From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>Sent: Wednesday, February 1, 2017 5:54 PM
>To: Mrozowicz, SlawomirX <slawomirx.mrozowicz@intel.com>; Doherty,
>Declan <declan.doherty@intel.com>
>Cc: dev@dpdk.org
>Subject: bugs and glitches in rte_cryptodev_devices_get
>
>The function rte_cryptodev_devices_get has several issues. I was just going
>to fix it, but think it need to be explained.
>
>One potentially serious one (reported by coverity) is:
>
>*** CID 141067:    (BAD_COMPARE)
>/lib/librte_cryptodev/rte_cryptodev.c: 503 in rte_cryptodev_devices_get()
>497     				&& (*devs + i)->attached ==
>498     						RTE_CRYPTODEV_ATTACHED)
>{
>499
>500     			dev = (*devs + i)->device;
>501
>502     			if (dev)
>>>>     CID 141067:    (BAD_COMPARE)
>>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be
>misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0,
>or 1.
>503     				cmp = strncmp(dev->driver->name,
>504     						dev_name,
>505     						strlen(dev_name));
>506     			else
>507     				cmp = strncmp((*devs + i)->data->name,
>508     						dev_name,
>/lib/librte_cryptodev/rte_cryptodev.c: 507 in rte_cryptodev_devices_get()
>501
>502     			if (dev)
>503     				cmp = strncmp(dev->driver->name,
>504     						dev_name,
>505     						strlen(dev_name));
>506     			else
>>>>     CID 141067:    (BAD_COMPARE)
>>>>     Truncating the result of "strncmp" to "unsigned char" may cause it to be
>misinterpreted as 0. Note that "strncmp" may return an integer besides -1, 0,
>or 1.
>507     				cmp = strncmp((*devs + i)->data->name,
>508     						dev_name,
>509     						strlen(dev_name));
>510
>511     			if (cmp == 0)
>512     				devices[count++] = (*devs + i)->data->dev_id;
>
>
>But also:
>
>1. Incorrect function signature:
>    * function returns int but never a negative value. should be unsigned.
>    * devices argument is not modified should be const.
>
>2. Original ABI seems short sighted with limit of 256 cryptodevs
>    * this seems like 8 bit mindset,  should really use unsigned int instead
>      of uint8_t for number of devices.
>
>3. Wacky indention of the if statement.
>
>4. Make variables local to the block they are used (cmp, dev)
>
>5. Use array instead of pointer:
>     ie. instead of *devs + i use devs[i]
>
We reconsider your suggestions and we addressed all your changes except add the const of the devices argument, since in our opinion it is not necessary.

>
>The overall code in question is:
>
>
>int
>rte_cryptodev_devices_get(const char *dev_name, uint8_t *devices,
>	uint8_t nb_devices)
>{
>	uint8_t i, cmp, count = 0;
>	struct rte_cryptodev **devs = &rte_cryptodev_globals->devs;
>	struct rte_device *dev;
>
>	for (i = 0; i < rte_cryptodev_globals->max_devs && count <
>nb_devices;
>			i++) {
>
>		if ((*devs + i)
>				&& (*devs + i)->attached ==
>						RTE_CRYPTODEV_ATTACHED)
>{
>
>			dev = (*devs + i)->device;
>
>			if (dev)
>				cmp = strncmp(dev->driver->name,
>						dev_name,
>						strlen(dev_name));
>			else
>				cmp = strncmp((*devs + i)->data->name,
>						dev_name,
>						strlen(dev_name));
>
>			if (cmp == 0)
>				devices[count++] = (*devs + i)->data->dev_id;
>		}
>	}
>
>	return count;
>}
>
>Please fix it.
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior
  2017-01-23 16:36  0%         ` Adrien Mazarguil
@ 2017-02-07  7:50  0%           ` Yang, Zhiyong
  0 siblings, 0 replies; 200+ results
From: Yang, Zhiyong @ 2017-02-07  7:50 UTC (permalink / raw)
  To: Adrien Mazarguil, Richardson, Bruce
  Cc: Ananyev, Konstantin, Andrew Rybchenko, dev, thomas.monjalon

Hi, Adrien:

	Sorry for the late reply  due to Chinese new year.

> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Tuesday, January 24, 2017 12:36 AM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Andrew
> Rybchenko <arybchenko@solarflare.com>; Yang, Zhiyong
> <zhiyong.yang@intel.com>; dev@dpdk.org; thomas.monjalon@6wind.com
> Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching
> behavior
> 
> On Fri, Jan 20, 2017 at 11:48:22AM +0000, Bruce Richardson wrote:
> > On Fri, Jan 20, 2017 at 11:24:40AM +0000, Ananyev, Konstantin wrote:
> > > >
> > > > From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> > > > Sent: Friday, January 20, 2017 10:26 AM
> > > > To: Yang, Zhiyong <zhiyong.yang@intel.com>; dev@dpdk.org
> > > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce
> > > > <bruce.richardson@intel.com>; Ananyev, Konstantin
> > > > <konstantin.ananyev@intel.com>
> > > > Subject: Re: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD
> > > > batching behavior
> > > >
> > > > On 01/20/2017 12:51 PM, Zhiyong Yang wrote:
> > > > The rte_eth_tx_burst() function in the file Rte_ethdev.h is
> > > > invoked to transmit output packets on the output queue for DPDK
> > > > applications as follows.
> > > >
> > > > static inline uint16_t
> > > > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
> > > >                  struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> > > >
> > > > Note: The fourth parameter nb_pkts: The number of packets to
> transmit.
> > > > The rte_eth_tx_burst() function returns the number of packets it
> > > > actually sent. The return value equal to *nb_pkts* means that all
> > > > packets have been sent, and this is likely to signify that other
> > > > output packets could be immediately transmitted again.
> > > > Applications that implement a "send as many packets to transmit as
> > > > possible" policy can check this specific case and keep invoking
> > > > the rte_eth_tx_burst() function until a value less than
> > > > *nb_pkts* is returned.
> > > >
> > > > When you call TX only once in rte_eth_tx_burst, you may get
> > > > different behaviors from different PMDs. One problem that every
> > > > DPDK user has to face is that they need to take the policy into
> > > > consideration at the app- lication level when using any specific
> > > > PMD to send the packets whether or not it is necessary, which
> > > > brings usage complexities and makes DPDK users easily confused
> > > > since they have to learn the details on TX function limit of
> > > > specific PMDs and have to handle the different return value: the
> > > > number of packets transmitted successfully for various PMDs. Some
> > > > PMDs Tx func- tions have a limit of sending at most 32 packets for
> > > > every invoking, some PMDs have another limit of at most 64 packets
> > > > once, another ones have imp- lemented to send as many packets to
> transmit as possible, etc. This will easily cause wrong usage for DPDK users.
> > > >
> > > > This patch proposes to implement the above policy in DPDK lib in
> > > > order to simplify the application implementation and avoid the
> > > > incorrect invoking as well. So, DPDK Users don't need to consider
> > > > the implementation policy and to write duplicated code at the
> > > > application level again when sending packets. In addition to it,
> > > > the users don't need to know the difference of specific PMD TX and
> > > > can transmit the arbitrary number of packets as they expect when
> > > > invoking TX API rte_eth_tx_burst, then check the return value to get
> the number of packets actually sent.
> > > >
> > > > How to implement the policy in DPDK lib? Two solutions are proposed
> below.
> > > >
> > > > Solution 1:
> > > > Implement the wrapper functions to remove some limits for each
> > > > specific PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple
> do like that.
> > > >
> > > > > IMHO, the solution is a bit better since it:
> > > > > 1. Does not affect other PMDs at all
> > > > > 2. Could be a bit faster for the PMDs which require it since has
> > > > >no indirect
> > > > >    function call on each iteration
> > > > > 3. No ABI change
> > >
> > > I also would prefer solution number 1 for the reasons outlined by Andrew
> above.
> > > Also, IMO current limitation for number of packets to TX in some
> > > Intel PMD TX routines are sort of artificial:
> > > - they are not caused by any real HW limitations
> > > - avoiding them at PMD level shouldn't cause any performance or
> functional degradation.
> > > So I don't see any good reason why instead of fixing these
> > > limitations in our own PMDs we are trying to push them to the upper
> (rte_ethdev) layer.
> 
> For what it's worth, I agree with Konstantin. Wrappers should be as thin as
> possible on top of PMD functions, they are not helpers. We could define a
> set of higher level functions for this purpose though.
> 
> In the meantime, exposing and documenting PMD limitations seems safe
> enough.
> 
> We could assert that RX/TX burst requests larger than the size of the target
> queue are unlikely to be fully met (i.e. PMDs usually do not check for
> completions in the middle of a TX burst).

As a tool,  it is very important for its users to easily consume it and make it work
in a right way.  Sort of artificial limits will make things look like a little confused  and
make some users probably get into trouble when writing drivers. 
Why do we correct it and make it easier?  :)

Zhiyong

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization
    2017-01-25 13:20  3% ` Olivier MATZ
@ 2017-02-07 14:12  2% ` Bruce Richardson
  2017-02-14  8:32  3%   ` Olivier Matz
  2017-02-07 14:12  3% ` [dpdk-dev] [PATCH RFCv3 06/19] ring: eliminate duplication of size and mask fields Bruce Richardson
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-07 14:12 UTC (permalink / raw)
  To: olivier.matz
  Cc: thomas.monjalon, keith.wiles, konstantin.ananyev, stephen, dev,
	Bruce Richardson

This patchset make a set of, sometimes non-backward compatible, cleanup
changes to the rte_ring code in order to improve it. The resulting code is
shorter*, since the existing functions are restructured to reduce code
duplication, as well as being more consistent in behaviour. The specific
changes made are explained in each patch which makes that change.

Key incompatibilities:
* The biggest, and probably most controversial change is that to the
  enqueue and dequeue APIs. The enqueue/deq burst and bulk functions have
  their function prototypes changed so that they all return an additional
  parameter, indicating the size of next call which is guaranteed to
  succeed. In case on enq, this is the number of available slots on the
  ring, and in case of deq, it is the number of objects which can be
  pulled. As well as this, the return value from the bulk functions have
  been changed to make them compatible with the burst functions. In all
  cases, the functions to enq/deq a set of objs now return the number of
  objects processed, 0 or N, in the case of bulk functions, 0, N or any
  value in between in the case of the burst ones. [Due to the extra
  parameter, the compiler will flag all instances of the function to
  allow the user to also change the return value logic at the same time]
* The parameters to the single object enq/deq functions have not been 
  changed. Because of that, the return value is also unmodified - as the
  compiler cannot automatically flag this to the user.

Potential further cleanups:
* To a certain extent the rte_ring structure has gone from being a whole
  ring structure, including a "ring" element itself, to just being a
  header which can be reused, along with the head/tail update functions
  to create new rings. For now, the enqueue code works by assuming
  that the ring data goes immediately after the header, but that can
  be changed to allow specialised ring implementations to put additional
  metadata of their own after the ring header. I didn't see this as being
  needed right now, but it may be worth considering for a V1 patchset.
* There are 9 enqueue functions and 9 dequeue functions in rte_ring.h. I
  suspect not all of those are used, so personally I would consider
  dropping the functions to enqueue/dequeue a single value using single
  or multi semantics, i.e. drop 
    rte_ring_sp_enqueue
    rte_ring_mp_enqueue
    rte_ring_sc_dequeue
    rte_ring_mc_dequeue
  That would still leave a single enqueue and dequeue function for working
  with a single object at a time.
* It should be possible to merge the head update code for enqueue and
  dequeue into a single function. The key difference between the two is
  the calculation of how far the index can be moved. I felt that the
  functions for moving the head index are sufficiently complicated with
  many parameters to them already, that trying to merge in more code would
  impede readability. However, if so desired this change can be made at a
  later stage without affecting ABI or API.

PERFORMANCE:
I've run performance autotests on a couple of (Intel) platforms. Looking
particularly at the core-2-core results, which I expect are the main ones
of interest, the performance after this patchset is a few cycles per packet
faster in my testing. I'm hoping it should be at least neutral perf-wise.

REQUEST FOR FEEDBACK:
* Are all of these changes worth making?
* Should they be made in existing ring code, or do we look to provide a 
  new fifo library to completely replace the ring one?
* How does the implementation of new ring types using this code compare vs
  that of the previous RFCs?

[*] LOC original rte_ring.h: 462. After patchset: 363. [Numbers generated
using David A. Wheeler's 'SLOCCount'.]

Bruce Richardson (19):
  app/pdump: fix duplicate macro definition
  ring: remove split cacheline build setting
  ring: create common structure for prod and cons metadata
  ring: add a function to return the ring size
  crypto/null: use ring size function
  ring: eliminate duplication of size and mask fields
  ring: remove debug setting
  ring: remove the yield when waiting for tail update
  ring: remove watermark support
  ring: make bulk and burst fn return vals consistent
  ring: allow enq fns to return free space value
  examples/quota_watermark: use ring space for watermarks
  ring: allow dequeue fns to return remaining entry count
  ring: reduce scope of local variables
  ring: separate out head index manipulation for enq/deq
  ring: create common function for updating tail idx
  ring: allow macros to work with any type of object
  ring: add object size parameter to memory size calculation
  ring: add event ring implementation

 app/pdump/main.c                                   |   3 +-
 app/test-pipeline/pipeline_hash.c                  |   5 +-
 app/test-pipeline/runtime.c                        |  19 +-
 app/test/Makefile                                  |   1 +
 app/test/commands.c                                |  52 --
 app/test/test_event_ring.c                         |  85 +++
 app/test/test_link_bonding_mode4.c                 |   6 +-
 app/test/test_pmd_ring_perf.c                      |  12 +-
 app/test/test_ring.c                               | 704 ++-----------------
 app/test/test_ring_perf.c                          |  36 +-
 app/test/test_table_acl.c                          |   2 +-
 app/test/test_table_pipeline.c                     |   2 +-
 app/test/test_table_ports.c                        |  12 +-
 app/test/virtual_pmd.c                             |   8 +-
 config/common_base                                 |   3 -
 doc/guides/prog_guide/env_abstraction_layer.rst    |   5 -
 doc/guides/prog_guide/ring_lib.rst                 |   7 -
 doc/guides/sample_app_ug/server_node_efd.rst       |   2 +-
 drivers/crypto/null/null_crypto_pmd.c              |   2 +-
 drivers/crypto/null/null_crypto_pmd_ops.c          |   2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c             |   3 +-
 drivers/net/ring/rte_eth_ring.c                    |   4 +-
 examples/distributor/main.c                        |   5 +-
 examples/load_balancer/runtime.c                   |  34 +-
 .../client_server_mp/mp_client/client.c            |   9 +-
 .../client_server_mp/mp_server/main.c              |   2 +-
 examples/packet_ordering/main.c                    |  13 +-
 examples/qos_sched/app_thread.c                    |  14 +-
 examples/quota_watermark/qw/init.c                 |   5 +-
 examples/quota_watermark/qw/main.c                 |  15 +-
 examples/quota_watermark/qw/main.h                 |   1 +
 examples/quota_watermark/qwctl/commands.c          |   2 +-
 examples/quota_watermark/qwctl/qwctl.c             |   2 +
 examples/quota_watermark/qwctl/qwctl.h             |   1 +
 examples/server_node_efd/node/node.c               |   2 +-
 examples/server_node_efd/server/main.c             |   2 +-
 lib/librte_hash/rte_cuckoo_hash.c                  |   5 +-
 lib/librte_mempool/rte_mempool_ring.c              |  12 +-
 lib/librte_pdump/rte_pdump.c                       |   2 +-
 lib/librte_port/rte_port_frag.c                    |   3 +-
 lib/librte_port/rte_port_ras.c                     |   2 +-
 lib/librte_port/rte_port_ring.c                    |  34 +-
 lib/librte_ring/Makefile                           |   2 +
 lib/librte_ring/rte_event_ring.c                   | 220 ++++++
 lib/librte_ring/rte_event_ring.h                   | 507 ++++++++++++++
 lib/librte_ring/rte_ring.c                         |  82 +--
 lib/librte_ring/rte_ring.h                         | 762 ++++++++-------------
 47 files changed, 1340 insertions(+), 1373 deletions(-)
 create mode 100644 app/test/test_event_ring.c
 create mode 100644 lib/librte_ring/rte_event_ring.c
 create mode 100644 lib/librte_ring/rte_event_ring.h

-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH RFCv3 06/19] ring: eliminate duplication of size and mask fields
    2017-01-25 13:20  3% ` Olivier MATZ
  2017-02-07 14:12  2% ` [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization Bruce Richardson
@ 2017-02-07 14:12  3% ` Bruce Richardson
  2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-07 14:12 UTC (permalink / raw)
  To: olivier.matz
  Cc: thomas.monjalon, keith.wiles, konstantin.ananyev, stephen, dev,
	Bruce Richardson

The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_ring.c       |  6 +++---
 lib/librte_ring/rte_ring.c | 20 ++++++++++----------
 lib/librte_ring/rte_ring.h | 32 ++++++++++++++++----------------
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index ebcb896..af74e7d 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -148,7 +148,7 @@ check_live_watermark_change(__attribute__((unused)) void *dummy)
 		}
 
 		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->prod.watermark;
+		watermark = r->watermark;
 		if (watermark != watermark_old &&
 		    (watermark_old != 16 || watermark != 32)) {
 			printf("Bad watermark change %u -> %u\n", watermark_old,
@@ -213,7 +213,7 @@ test_set_watermark( void ){
 		printf( " ring lookup failed\n" );
 		goto error;
 	}
-	count = r->prod.size*2;
+	count = r->size*2;
 	setwm = rte_ring_set_water_mark(r, count);
 	if (setwm != -EINVAL){
 		printf("Test failed to detect invalid watermark count value\n");
@@ -222,7 +222,7 @@ test_set_watermark( void ){
 
 	count = 0;
 	rte_ring_set_water_mark(r, count);
-	if (r->prod.watermark != r->prod.size) {
+	if (r->watermark != r->size) {
 		printf("Test failed to detect invalid watermark count value\n");
 		goto error;
 	}
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4bc6da1..183594f 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -144,11 +144,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.watermark = count;
+	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
-	r->prod.size = r->cons.size = count;
-	r->prod.mask = r->cons.mask = count-1;
+	r->size = count;
+	r->mask = count-1;
 	r->prod.head = r->cons.head = 0;
 	r->prod.tail = r->cons.tail = 0;
 
@@ -269,14 +269,14 @@ rte_ring_free(struct rte_ring *r)
 int
 rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 {
-	if (count >= r->prod.size)
+	if (count >= r->size)
 		return -EINVAL;
 
 	/* if count is 0, disable the watermarking */
 	if (count == 0)
-		count = r->prod.size;
+		count = r->size;
 
-	r->prod.watermark = count;
+	r->watermark = count;
 	return 0;
 }
 
@@ -291,17 +291,17 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->prod.watermark == r->prod.size)
+	if (r->watermark == r->size)
 		fprintf(f, "  watermark=0\n");
 	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->prod.watermark);
+		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 
 	/* sum and dump statistics */
 #ifdef RTE_LIBRTE_RING_DEBUG
@@ -318,7 +318,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
 		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
 	}
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
 	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
 	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 75bbcc1..1e4b8ad 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -143,13 +143,10 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 struct rte_ring_ht_ptr {
 	volatile uint32_t head;  /**< Prod/consumer head. */
 	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
 	union {
 		uint32_t sp_enqueue; /**< True, if single producer. */
 		uint32_t sc_dequeue; /**< True, if single consumer. */
 	};
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 };
 
 /**
@@ -169,9 +166,12 @@ struct rte_ring {
 	 * next time the ABI changes
 	 */
 	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
+	int flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_ht_ptr prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
@@ -350,7 +350,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * Placed here since identical code needed in both
  * single and multi producer enqueue functions */
 #define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
+	const uint32_t size = r->size; \
 	uint32_t idx = prod_head & mask; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
@@ -377,7 +377,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * single and multi consumer dequeue functions */
 #define DEQUEUE_PTRS() do { \
 	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
+	const uint32_t size = r->size; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
 			obj_table[i] = r->ring[idx]; \
@@ -432,7 +432,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -480,7 +480,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -539,7 +539,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	prod_head = r->prod.head;
@@ -575,7 +575,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -625,7 +625,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -722,7 +722,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
 	prod_tail = r->prod.tail;
@@ -1051,7 +1051,7 @@ rte_ring_full(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+	return ((cons_tail - prod_tail - 1) & r->mask) == 0;
 }
 
 /**
@@ -1084,7 +1084,7 @@ rte_ring_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (prod_tail - cons_tail) & r->prod.mask;
+	return (prod_tail - cons_tail) & r->mask;
 }
 
 /**
@@ -1100,7 +1100,7 @@ rte_ring_free_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (cons_tail - prod_tail - 1) & r->prod.mask;
+	return (cons_tail - prod_tail - 1) & r->mask;
 }
 
 /**
@@ -1114,7 +1114,7 @@ rte_ring_free_count(const struct rte_ring *r)
 static inline unsigned int
 rte_ring_get_size(struct rte_ring *r)
 {
-	return r->prod.size;
+	return r->size;
 }
 
 /**
-- 
2.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Kill off PCI dependencies
@ 2017-02-08 22:56  3% Stephen Hemminger
  2017-02-09 16:26  3% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2017-02-08 22:56 UTC (permalink / raw)
  To: dev

I am trying to make DPDK more agnostic about bus type. The existing API still
has hardwired into that ethernet devices are either PCI or not PCI (ie pci_dev == NULL).
Jan, Jerin, and Shreyansh started the process but it hasn't gone far enough.

It would make more sense if the existing generic device was used everywhere
including rte_ethdev, rte_ethdev_info, etc.

The ABI breakage is not catastrophic. Just change pci_dev to a device pointer.
One option would be to use NEXT_ABI and/or two different calls and data structures.
Messy but compatible. Something like
    rte_dev_info_get returns rte_dev_info but is marked deprecated
    rte_device_info_get returns rte_device_info

One fallout is that the existing testpmd code makes lots of assumptions that
is working with a PCI device. Things like ability to get/set PCI registers.
I suspect this is already broken if one tries to run it on a virtual device like TAP.

Can we just turn off that functionality?

Also KNI has more dependencies that ethernet devices are PCI.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 0/3] doc upates
  2017-01-24  7:34  4% [dpdk-dev] [PATCH 0/3] doc upates Jianfeng Tan
  2017-01-24  7:34 17% ` [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio Jianfeng Tan
@ 2017-02-09 14:45  0% ` Thomas Monjalon
  2017-02-09 16:06  4% ` [dpdk-dev] [PATCH v2 " Jianfeng Tan
  2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-09 14:45 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, john.mcnamara, yuanhan.liu, stephen

2017-01-24 07:34, Jianfeng Tan:
> Patch 1: howto doc of virtio_user for container networking.
> Patch 2: howto doc of virtio_user as exceptional path.
> Patch 3: remove ABI changes in igb_uio

For the patch 3, we are waiting a new revision postponing the notice.

For the first 2 patches, the SVG files are embedding some PNG pictures.
Please try to convert it to a full SVG. By the way it fails to apply,
because of the PNG part.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 0/3] doc upates
  2017-01-24  7:34  4% [dpdk-dev] [PATCH 0/3] doc upates Jianfeng Tan
  2017-01-24  7:34 17% ` [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio Jianfeng Tan
  2017-02-09 14:45  0% ` [dpdk-dev] [PATCH 0/3] doc upates Thomas Monjalon
@ 2017-02-09 16:06  4% ` Jianfeng Tan
  2017-02-09 16:06 12%   ` [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio Jianfeng Tan
  2 siblings, 1 reply; 200+ results
From: Jianfeng Tan @ 2017-02-09 16:06 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, john.mcnamara, yuanhan.liu, stephen, Jianfeng Tan

v2:
  - Change svg files.
  - Postpone instead of remove ABI changes in igb_uio.

Patch 1: howto doc of virtio_user for container networking.
Patch 2: howto doc of virtio_user as exceptional path.
Patch 3: postpone ABI changes in igb_uio

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Jianfeng Tan (3):
  doc: add guide to use virtio_user for container networking
  doc: add guide to use virtio_user as exceptional path
  doc: postpone ABI changes in igb_uio

 .../use_models_for_running_dpdk_in_containers.svg  | 574 ++++++++++++++++++
 .../howto/img/virtio_user_as_exceptional_path.svg  | 386 +++++++++++++
 .../img/virtio_user_for_container_networking.svg   | 638 +++++++++++++++++++++
 doc/guides/howto/index.rst                         |   2 +
 .../howto/virtio_user_as_exceptional_path.rst      | 142 +++++
 .../howto/virtio_user_for_container_networking.rst | 142 +++++
 doc/guides/rel_notes/deprecation.rst               |   2 +-
 7 files changed, 1885 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/howto/img/use_models_for_running_dpdk_in_containers.svg
 create mode 100644 doc/guides/howto/img/virtio_user_as_exceptional_path.svg
 create mode 100644 doc/guides/howto/img/virtio_user_for_container_networking.svg
 create mode 100644 doc/guides/howto/virtio_user_as_exceptional_path.rst
 create mode 100644 doc/guides/howto/virtio_user_for_container_networking.rst

-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
  2017-02-09 16:06  4% ` [dpdk-dev] [PATCH v2 " Jianfeng Tan
@ 2017-02-09 16:06 12%   ` Jianfeng Tan
  2017-02-09 17:40  4%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Jianfeng Tan @ 2017-02-09 16:06 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, john.mcnamara, yuanhan.liu, stephen, Jianfeng Tan

This ABI changes to remove iomem and ioport mapping in igb_uio. The
purpose of this changes was to fix a bug: when DPDK app crashes,
those devices by igb_uio are not stopped either DPDK PMD driver or
igb_uio driver.

Then it has been pointed out by Stephen Hemminger that it has
backward compatibility issue: cannot run old version DPDK on
modified igb_uio.

However, we still have not figure out a new way to fix this bug
without this change. Let's postpone this deprecation announcement
in case this change cannot be avoided.

Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from igb_uio")

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 755dc65..b49e0a0 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,7 +11,7 @@ Deprecation Notices
 * igb_uio: iomem mapping and sysfs files created for iomem and ioport in
   igb_uio will be removed, because we are able to detect these from what Linux
   has exposed, like the way we have done with uio-pci-generic. This change
-  targets release 17.02.
+  targets release 17.05.
 
 * ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
   impacted because of introduction of a new ``rte_bus`` hierarchy. This would
-- 
2.7.4

^ permalink raw reply	[relevance 12%]

* Re: [dpdk-dev] Kill off PCI dependencies
  2017-02-08 22:56  3% [dpdk-dev] Kill off PCI dependencies Stephen Hemminger
@ 2017-02-09 16:26  3% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-09 16:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

2017-02-08 14:56, Stephen Hemminger:
> I am trying to make DPDK more agnostic about bus type. The existing API still
> has hardwired into that ethernet devices are either PCI or not PCI (ie pci_dev == NULL).
> Jan, Jerin, and Shreyansh started the process but it hasn't gone far enough.
> 
> It would make more sense if the existing generic device was used everywhere
> including rte_ethdev, rte_ethdev_info, etc.

Yes

> The ABI breakage is not catastrophic. Just change pci_dev to a device pointer.
> One option would be to use NEXT_ABI and/or two different calls and data structures.
> Messy but compatible. Something like
>     rte_dev_info_get returns rte_dev_info but is marked deprecated
>     rte_device_info_get returns rte_device_info

Or we can break the ABI to avoid messy code.

> One fallout is that the existing testpmd code makes lots of assumptions that
> is working with a PCI device. Things like ability to get/set PCI registers.
> I suspect this is already broken if one tries to run it on a virtual device like TAP.
> 
> Can we just turn off that functionality?

Which functionality exactly?

> Also KNI has more dependencies that ethernet devices are PCI.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
  2017-02-09 16:06 12%   ` [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio Jianfeng Tan
@ 2017-02-09 17:40  4%     ` Ferruh Yigit
  2017-02-10 10:44  4%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-02-09 17:40 UTC (permalink / raw)
  To: Jianfeng Tan, dev; +Cc: thomas.monjalon, john.mcnamara, yuanhan.liu, stephen

On 2/9/2017 4:06 PM, Jianfeng Tan wrote:
> This ABI changes to remove iomem and ioport mapping in igb_uio. The
> purpose of this changes was to fix a bug: when DPDK app crashes,
> those devices by igb_uio are not stopped either DPDK PMD driver or
> igb_uio driver.
> 
> Then it has been pointed out by Stephen Hemminger that it has
> backward compatibility issue: cannot run old version DPDK on
> modified igb_uio.
> 
> However, we still have not figure out a new way to fix this bug
> without this change. Let's postpone this deprecation announcement
> in case this change cannot be avoided.
> 
> Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from igb_uio")
> 
> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
  2017-02-09 17:40  4%     ` Ferruh Yigit
@ 2017-02-10 10:44  4%       ` Thomas Monjalon
  2017-02-10 11:20  4%         ` Tan, Jianfeng
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-10 10:44 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: Ferruh Yigit, dev, john.mcnamara, yuanhan.liu, stephen

2017-02-09 17:40, Ferruh Yigit:
> On 2/9/2017 4:06 PM, Jianfeng Tan wrote:
> > This ABI changes to remove iomem and ioport mapping in igb_uio. The
> > purpose of this changes was to fix a bug: when DPDK app crashes,
> > those devices by igb_uio are not stopped either DPDK PMD driver or
> > igb_uio driver.
> > 
> > Then it has been pointed out by Stephen Hemminger that it has
> > backward compatibility issue: cannot run old version DPDK on
> > modified igb_uio.
> > 
> > However, we still have not figure out a new way to fix this bug
> > without this change. Let's postpone this deprecation announcement
> > in case this change cannot be avoided.
> > 
> > Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from igb_uio")
> > 
> > Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> > Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied, thanks

The images are not real vector images and are almost unreadable.
Please make the effort to use inkscape in order to have images
we can update.

I did some changes: s/virtio_user/virtio-user/ in order to be consistent.
Like for vhost-user, we use the underscore only in code.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
  2017-02-10 10:44  4%       ` Thomas Monjalon
@ 2017-02-10 11:20  4%         ` Tan, Jianfeng
  0 siblings, 0 replies; 200+ results
From: Tan, Jianfeng @ 2017-02-10 11:20 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Yigit, Ferruh, dev, Mcnamara, John, yuanhan.liu, stephen



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Friday, February 10, 2017 6:44 PM
> To: Tan, Jianfeng
> Cc: Yigit, Ferruh; dev@dpdk.org; Mcnamara, John;
> yuanhan.liu@linux.intel.com; stephen@networkplumber.org
> Subject: Re: [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio
> 
> 2017-02-09 17:40, Ferruh Yigit:
> > On 2/9/2017 4:06 PM, Jianfeng Tan wrote:
> > > This ABI changes to remove iomem and ioport mapping in igb_uio. The
> > > purpose of this changes was to fix a bug: when DPDK app crashes,
> > > those devices by igb_uio are not stopped either DPDK PMD driver or
> > > igb_uio driver.
> > >
> > > Then it has been pointed out by Stephen Hemminger that it has
> > > backward compatibility issue: cannot run old version DPDK on
> > > modified igb_uio.
> > >
> > > However, we still have not figure out a new way to fix this bug
> > > without this change. Let's postpone this deprecation announcement
> > > in case this change cannot be avoided.
> > >
> > > Fixes: 3bac1dbc1ed ("doc: announce iomem and ioport removal from
> igb_uio")
> > >
> > > Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> > > Suggested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > > Suggested-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> > > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> >
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Applied, thanks
> 
> The images are not real vector images and are almost unreadable.
> Please make the effort to use inkscape in order to have images
> we can update.

Apologies for that. I've submitted a patch to changes the images. And thank you for the solution.

> 
> I did some changes: s/virtio_user/virtio-user/ in order to be consistent.
> Like for vhost-user, we use the underscore only in code.

Thank you for that.

Regards,
Jianfeng

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
@ 2017-02-10 11:39  9% Fan Zhang
  2017-02-10 13:59  4% ` Trahe, Fiona
  2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
  0 siblings, 2 replies; 200+ results
From: Fan Zhang @ 2017-02-10 11:39 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 755dc65..564d93a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -62,3 +62,7 @@ Deprecation Notices
   PMDs that implement the latter.
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
+
+* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
+  The field ``cryptodev_configure_t`` function prototype will be added a
+  parameter of a struct rte_cryptodev_config type pointer.
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-10 11:39  9% [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure Fan Zhang
@ 2017-02-10 13:59  4% ` Trahe, Fiona
  2017-02-13 16:07  7%   ` Zhang, Roy Fan
  2017-02-14  0:21  4%   ` Hemant Agrawal
  2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
  1 sibling, 2 replies; 200+ results
From: Trahe, Fiona @ 2017-02-10 13:59 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: De Lara Guarch, Pablo, Trahe, Fiona

Hi Fan,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> Sent: Friday, February 10, 2017 11:39 AM
> To: dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> structure
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 755dc65..564d93a 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -62,3 +62,7 @@ Deprecation Notices
>    PMDs that implement the latter.
>    Target release for removal of the legacy API will be defined once most
>    PMDs have switched to rte_flow.
> +
> +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
> +  The field ``cryptodev_configure_t`` function prototype will be added a
> +  parameter of a struct rte_cryptodev_config type pointer.
> --
> 2.7.4

Can you fix the grammar here please. I'm not sure what the change is?

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 2/2] ethdev: add hierarchical scheduler API
  @ 2017-02-10 14:05  1% ` Cristian Dumitrescu
  2017-02-21 10:35  0%   ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Cristian Dumitrescu @ 2017-02-10 14:05 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, jerin.jacob, hemant.agrawal

This patch introduces the generic ethdev API for the hierarchical scheduler
capability.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow approach)
- Capability query API per port and per hierarchy node
- Scheduling algorithms: strict priority (SP), Weighed Fair Queuing (WFQ),
  Weighted Round Robin (WRR)
- Traffic shaping: single/dual rate, private (per node) and shared (by multiple
  nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Changes since RFC [1]:
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below, hopefully
  nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated object
      IDs. IMO the choice to have application-generated object IDs adds marginal
      complexity to the driver (search ID function required), but it provides
      huge simplification for the application. The app does not need to worry
      about building & managing tree-like structure for storing driver-generated
      object IDs, the app can use its own convention for node IDs depending on
      the specific hierarchy that it needs. Trivial example: identify all
      level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
      on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
      310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
      112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
      the other related simplification that was implemented: leaf nodes now have
      predefined IDs that are the same with their Ethernet TX queue ID (
      therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the shaper
  profile as part of node API (no shaper ID needed for private shapers), while
  the shared shapers are configured outside of the node API using shaper profile
  and communicated to the node using shared shaper ID. So there is no
  configuration overhead for shared shapers if the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet TX
  queue ID (therefore no translation is required for leaf nodes). This is also
  used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause (same
  as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
  based on IP packet bytes)

Next steps:
- SW fallback based on librte_sched library (to be later introduced by
  standalone patch set)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemants’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_scheddev.c        |  790 ++++++++++++++++++++
 lib/librte_ether/rte_scheddev.h        | 1273 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_scheddev_driver.h |  374 ++++++++++
 6 files changed, 2475 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_scheddev.c
 create mode 100644 lib/librte_ether/rte_scheddev.h
 create mode 100644 lib/librte_ether/rte_scheddev_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index cc3bf98..666931d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -247,6 +247,10 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+SchedDev API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+F: lib/librte_ether/rte_scheddev*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 1d095a9..7e0527f 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_scheddev.c
 
 #
 # Export include files
@@ -54,6 +55,8 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_scheddev.h
+SYMLINK-y-include += rte_scheddev_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index d00cb5c..6b3c84f 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -159,5 +159,35 @@ DPDK_17.05 {
 	global:
 
 	rte_eth_dev_capability_control;
+	rte_scheddev_capabilities_get;
+	rte_scheddev_node_capabilities_get;
+	rte_scheddev_wred_profile_add;
+	rte_scheddev_wred_profile_delete;
+	rte_scheddev_shared_wred_context_add_update;
+	rte_scheddev_shared_wred_context_delete;
+	rte_scheddev_shaper_profile_add;
+	rte_scheddev_shaper_profile_delete;
+	rte_scheddev_shared_shaper_add_update;
+	rte_scheddev_shared_shaper_delete;
+	rte_scheddev_node_add;
+	rte_scheddev_node_delete;
+	rte_scheddev_node_suspend;
+	rte_scheddev_node_resume;
+	rte_scheddev_hierarchy_set;
+	rte_scheddev_node_parent_update;
+	rte_scheddev_node_shaper_update;
+	rte_scheddev_node_shared_shaper_update;
+	rte_scheddev_node_scheduling_mode_update;
+	rte_scheddev_node_cman_update;
+	rte_scheddev_node_wred_context_update;
+	rte_scheddev_node_shared_wred_context_update;
+	rte_scheddev_mark_vlan_dei;
+	rte_scheddev_mark_ip_ecn;
+	rte_scheddev_mark_ip_dscp;
+	rte_scheddev_stats_get_enabled;
+	rte_scheddev_stats_enable;
+	rte_scheddev_node_stats_get_enabled;
+	rte_scheddev_node_stats_enable;
+	rte_scheddev_node_stats_read;
 
 } DPDK_17.02;
diff --git a/lib/librte_ether/rte_scheddev.c b/lib/librte_ether/rte_scheddev.c
new file mode 100644
index 0000000..679a22d
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev.c
@@ -0,0 +1,790 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_scheddev_driver.h"
+#include "rte_scheddev.h"
+
+/* Get generic scheduler operations structure from a port. */
+const struct rte_scheddev_ops *
+rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_scheddev_error_set(error,
+			ENODEV,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->cap_ctrl == NULL) ||
+		dev->dev_ops->cap_ctrl(dev, RTE_ETH_CAPABILITY_SCHED, &ops) ||
+		(ops == NULL)) {
+		rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+/* Get capabilities */
+int rte_scheddev_capabilities_get(uint8_t port_id,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->capabilities_get(dev, cap, error);
+}
+
+/* Get node capabilities */
+int rte_scheddev_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_capabilities_get(dev, node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_scheddev_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->wred_profile_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->wred_profile_add(dev, wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_scheddev_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->wred_profile_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->wred_profile_delete(dev, wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_wred_context_add_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_wred_context_add_update(dev, shared_wred_context_id,
+		wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_scheddev_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_wred_context_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_wred_context_delete(dev, shared_wred_context_id,
+		error);
+}
+
+/* Add shaper profile */
+int rte_scheddev_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shaper_profile_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shaper_profile_add(dev, shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_scheddev_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shaper_profile_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shaper_profile_delete(dev, shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_scheddev_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_shaper_add_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_shaper_add_update(dev, shared_shaper_id,
+		shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_scheddev_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_shaper_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_shaper_delete(dev, shared_shaper_id, error);
+}
+
+/* Add node to port scheduler hierarchy */
+int rte_scheddev_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_add(dev, node_id, parent_node_id, priority, weight,
+		params, error);
+}
+
+/* Delete node from scheduler hierarchy */
+int rte_scheddev_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_delete(dev, node_id, error);
+}
+
+/* Suspend node */
+int rte_scheddev_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_suspend == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_suspend(dev, node_id, error);
+}
+
+/* Resume node */
+int rte_scheddev_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_resume == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_resume(dev, node_id, error);
+}
+
+/* Set the initial port scheduler hierarchy */
+int rte_scheddev_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->hierarchy_set == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->hierarchy_set(dev, clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_scheddev_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_parent_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_parent_update(dev, node_id, parent_node_id, priority,
+		weight, error);
+}
+
+/* Update node private shaper */
+int rte_scheddev_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shaper_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shaper_update(dev, node_id, shaper_profile_id,
+		error);
+}
+
+/* Update node shared shapers */
+int rte_scheddev_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shared_shaper_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shared_shaper_update(dev, node_id, shared_shaper_id,
+		add, error);
+}
+
+/* Update scheduling mode */
+int rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_scheduling_mode_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_scheduling_mode_update(dev, node_id,
+		scheduling_mode_per_priority, n_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_scheddev_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_cman_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_cman_update(dev, node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_scheddev_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_wred_context_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_wred_context_update(dev, node_id, wred_profile_id,
+		error);
+}
+
+/* Update node shared WRED context */
+int rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shared_wred_context_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shared_wred_context_update(dev, node_id,
+		shared_wred_context_id, add, error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_scheddev_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_vlan_dei == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_vlan_dei(dev, mark_green, mark_yellow, mark_red,
+		error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_scheddev_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_ip_ecn == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_ip_ecn(dev, mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_scheddev_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_ip_dscp == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_ip_dscp(dev, mark_green, mark_yellow, mark_red,
+		error);
+}
+
+/* Get set of stats counter types currently enabled for all nodes */
+int rte_scheddev_stats_get_enabled(uint8_t port_id,
+	uint64_t *nonleaf_node_capability_stats_mask,
+	uint64_t *nonleaf_node_enabled_stats_mask,
+	uint64_t *leaf_node_capability_stats_mask,
+	uint64_t *leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->stats_get_enabled == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->stats_get_enabled(dev,
+		nonleaf_node_capability_stats_mask,
+		nonleaf_node_enabled_stats_mask,
+		leaf_node_capability_stats_mask,
+		leaf_node_enabled_stats_mask,
+		error);
+}
+
+/* Enable specified set of stats counter types for all nodes */
+int rte_scheddev_stats_enable(uint8_t port_id,
+	uint64_t nonleaf_node_enabled_stats_mask,
+	uint64_t leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->stats_enable == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->stats_enable(dev,
+		nonleaf_node_enabled_stats_mask,
+		leaf_node_enabled_stats_mask,
+		error);
+}
+
+/* Get set of stats counter types currently enabled for specific node */
+int rte_scheddev_node_stats_get_enabled(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t *capability_stats_mask,
+	uint64_t *enabled_stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_get_enabled == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_get_enabled(dev,
+		node_id,
+		capability_stats_mask,
+		enabled_stats_mask,
+		error);
+}
+
+/* Enable specified set of stats counter types for specific node */
+int rte_scheddev_node_stats_enable(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t enabled_stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_enable == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_enable(dev, node_id, enabled_stats_mask, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_scheddev_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	int clear,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_read == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_read(dev, node_id, stats, clear, error);
+}
diff --git a/lib/librte_ether/rte_scheddev.h b/lib/librte_ether/rte_scheddev.h
new file mode 100644
index 0000000..fed3df2
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev.h
@@ -0,0 +1,1273 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_SCHEDDEV_H__
+#define __INCLUDE_RTE_SCHEDDEV_H__
+
+/**
+ * @file
+ * RTE Generic Hierarchical Scheduler API
+ *
+ * This interface provides the ability to configure the hierarchical scheduler
+ * feature in a generic way.
+ */
+
+#include <stdint.h>
+
+#include <rte_red.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Ethernet framing overhead
+  *
+  * Overhead fields per Ethernet frame:
+  * 1. Preamble:                                            7 bytes;
+  * 2. Start of Frame Delimiter (SFD):                      1 byte;
+  * 3. Inter-Frame Gap (IFG):                              12 bytes.
+  */
+#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD                  20
+
+/**
+  * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
+  * is generated and added at the end of the Ethernet frame on TX side without
+  * any SW intervention.
+  */
+#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**< Invalid WRED profile ID */
+#define RTE_SCHEDDEV_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**< Invalid shaper profile ID */
+#define RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**< Scheduler hierarchy root node ID */
+#define RTE_SCHEDDEV_ROOT_NODE_ID                          UINT32_MAX
+
+
+/**
+  * Scheduler node capabilities
+  */
+struct rte_scheddev_node_capabilities {
+	/**< Private shaper support. */
+	int shaper_private_supported;
+
+	/**< Dual rate shaping support for private shaper. Valid only when
+	 * private shaper is supported.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/**< Minimum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of supported shared shapers. The value of zero
+	 * indicates that shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Items valid only for non-leaf nodes. */
+	struct {
+		/**< Maximum number of children nodes. */
+		uint32_t n_children_max;
+
+		/**< Lowest priority supported. The value of 1 indicates that
+		 * only priority 0 is supported, which essentially means that
+		 * Strict Priority (SP) algorithm is not supported.
+		 */
+		uint32_t sp_priority_min;
+
+		/**< Maximum number of sibling nodes that can have the same
+		 * priority at any given time. When equal to *n_children_max*,
+		 * it indicates that WFQ/WRR algorithms are not supported.
+		 */
+		uint32_t sp_n_children_max;
+
+		/**< WFQ algorithm support. */
+		int scheduling_wfq_supported;
+
+		/**< WRR algorithm support. */
+		int scheduling_wrr_supported;
+
+		/**< Maximum WFQ/WRR weight. */
+		uint32_t scheduling_wfq_wrr_weight_max;
+	} nonleaf;
+
+	/**< Items valid only for leaf nodes. */
+	struct {
+		/**< Head drop algorithm support. */
+		int cman_head_drop_supported;
+
+		/**< Private WRED context support. */
+		int cman_wred_context_private_supported;
+
+		/**< Maximum number of shared WRED contexts supported. The value
+		 * of zero indicates that shared WRED contexts are not
+		 * supported.
+		 */
+		uint32_t cman_wred_context_shared_n_max;
+	} leaf;
+};
+
+/**
+  * Scheduler capabilities
+  */
+struct rte_scheddev_capabilities {
+	/**< Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/**< Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resource between private and
+	 * shared shapers, it is typically equal to the sum between
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/**< Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have the private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/**< Maximum number of shared shapers. The value of zero indicates that
+	  * shared shapers are not supported.
+	  */
+	uint32_t shaper_shared_n_max;
+
+	/**< Maximum number of nodes that can share the same shared shaper. Only
+	  * valid when shared shapers are supported.
+	  */
+	uint32_t shaper_shared_n_nodes_max;
+
+	/**< Maximum number of shared shapers that can be configured with dual
+	  * rate shaping. The value of zero indicates that dual rate shaping
+	  * support is not available for shared shapers.
+	  */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/**< Minimum committed/peak rate (bytes per second) for shared
+	  * shapers. Only valid when shared shapers are supported.
+	  */
+	uint64_t shaper_shared_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for shared
+	  * shaper. Only valid when shared shapers are supported.
+	  */
+	uint64_t shaper_shared_rate_max;
+
+	/**< Minimum value allowed for packet length adjustment for
+	  * private/shared shapers.
+	  */
+	int shaper_pkt_length_adjust_min;
+
+	/**< Maximum value allowed for packet length adjustment for
+	  * private/shared shapers.
+	  */
+	int shaper_pkt_length_adjust_max;
+
+	/**< Maximum number of WRED contexts. */
+	uint32_t cman_wred_context_n_max;
+
+	/**< Maximum number of private WRED contexts. Indicates the maximum
+	  * number of leaf nodes that can concurrently have the private WRED
+	  * context enabled.
+	  */
+	uint32_t cman_wred_context_private_n_max;
+
+	/**< Maximum number of shared WRED contexts. The value of zero indicates
+	  * that shared WRED contexts are not supported.
+	  */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/**< Maximum number of leaf nodes that can share the same WRED context.
+	  * Only valid when shared WRED contexts are supported.
+	  */
+	uint32_t cman_wred_context_shared_n_nodes_max;
+
+	/**< Support for VLAN DEI packet marking. */
+	int mark_vlan_dei_supported;
+
+	/**< Support for IPv4/IPv6 ECN marking of TCP packets. */
+	int mark_ip_ecn_tcp_supported;
+
+	/**< Support for IPv4/IPv6 ECN marking of SCTP packets. */
+	int mark_ip_ecn_sctp_supported;
+
+	/**< Support for IPv4/IPv6 DSCP packet marking. */
+	int mark_ip_dscp_supported;
+
+	/**< Summary of node-level capabilities across all nodes. */
+	struct rte_scheddev_node_capabilities node;
+};
+
+/**
+  * Congestion management (CMAN) mode
+  *
+  * This is used for controlling the admission of packets into a packet queue or
+  * group of packet queues on congestion. On request of writing a new packet
+  * into the current queue while the queue is full, the *tail drop* algorithm
+  * drops the new packet while leaving the queue unmodified, as opposed to *head
+  * drop* algorithm, which drops the packet at the head of the queue (the oldest
+  * packet waiting in the queue) and admits the new packet at the tail of the
+  * queue.
+  *
+  * The *Random Early Detection (RED)* algorithm works by proactively dropping
+  * more and more input packets as the queue occupancy builds up. When the queue
+  * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+  * RED* algorithm uses a separate set of RED thresholds for each packet color.
+  */
+enum rte_scheddev_cman_mode {
+	RTE_SCHEDDEV_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_SCHEDDEV_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_SCHEDDEV_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+  * Color
+  */
+enum rte_scheddev_color {
+	e_RTE_SCHEDDEV_GREEN = 0, /**< Green */
+	e_RTE_SCHEDDEV_YELLOW,    /**< Yellow */
+	e_RTE_SCHEDDEV_RED,       /**< Red */
+	e_RTE_SCHEDDEV_COLORS     /**< Number of colors */
+};
+
+/**
+  * WRED profile
+  */
+struct rte_scheddev_wred_params {
+	/**< One set of RED parameters per packet color */
+	struct rte_red_params red_params[e_RTE_SCHEDDEV_COLORS];
+};
+
+/**
+  * Token bucket
+  */
+struct rte_scheddev_token_bucket {
+	/**< Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/**< Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+  * Shaper (rate limiter) profile
+  *
+  * Multiple shaper instances can share the same shaper profile. Each node has
+  * zero or one private shaper (only one node using it) and/or zero, one or
+  * several shared shapers (multiple nodes use the same shaper instance).
+  *
+  * Single rate shapers use a single token bucket. A single rate shaper can be
+  * configured by setting the rate of the committed bucket to zero, which
+  * effectively disables this bucket. The peak bucket is used to limit the rate
+  * and the burst size for the current shaper.
+  *
+  * Dual rate shapers use both the committed and the peak token buckets. The
+  * rate of the committed bucket has to be less than or equal to the rate of the
+  * peak bucket.
+  */
+struct rte_scheddev_shaper_params {
+	/**< Committed token bucket */
+	struct rte_scheddev_token_bucket committed;
+
+	/**< Peak token bucket */
+	struct rte_scheddev_token_bucket peak;
+
+	/**< Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+  * Node parameters
+  *
+  * Each scheduler hierarchy node has multiple inputs (children nodes of the
+  * current parent node) and a single output (which is input to its parent
+  * node). The current node arbitrates its inputs using Strict Priority (SP),
+  * Weighted Fair Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to
+  * schedule input packets on its output while observing its shaping (rate
+  * limiting) constraints.
+  *
+  * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
+  * approximations of the ideal of WFQ and are assimilated to WFQ, although
+  * an associated implementation-dependent trade-off on accuracy, performance
+  * and resource usage might exist.
+  *
+  * Children nodes with different priorities are scheduled using the SP
+  * algorithm, based on their priority, with zero (0) as the highest priority.
+  * Children with same priority are scheduled using the WFQ or WRR algorithm,
+  * based on their weight, which is relative to the sum of the weights of all
+  * siblings with same priority, with one (1) as the lowest weight.
+  *
+  * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+  * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
+  * where N is the number of TX queues configured for the current Ethernet port.
+  * The non-leaf nodes have their IDs generated by the application.
+  */
+struct rte_scheddev_node_params {
+	/**< Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/**< User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	union {
+		/**< Parameters only valid for non-leaf nodes. */
+		struct {
+			/**< For each priority, indicates whether the children
+			 * nodes sharing the same priority are to be scheduled
+			 * by WFQ or by WRR. When NULL, it indicates that WFQ
+			 * is to be used for all priorities. When non-NULL, it
+			 * points to a pre-allocated array of *n_priority*
+			 * elements, with a non-zero value element indicating
+			 * WFQ and a zero value element for WRR.
+			 */
+			int *scheduling_mode_per_priority;
+
+			/**< Number of priorities. */
+			uint32_t n_priorities;
+		} nonleaf;
+
+		/**< Parameters only valid for leaf nodes. */
+		struct {
+			/**< Congestion management mode */
+			enum rte_scheddev_cman_mode cman;
+
+			/**< WRED parameters (valid when *cman* is WRED). */
+			struct {
+				/**< WRED profile for private WRED context. */
+				uint32_t wred_profile_id;
+
+				/**< User allocated array of shared WRED context
+				 * IDs. The absence of a private WRED context
+				 * for current leaf node is indicated by value
+				 * RTE_SCHEDDEV_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/**< Number of shared WRED context IDs in the
+				 * *shared_wred_context_id* array.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+};
+
+/**
+  * Node statistics counter type
+  */
+enum rte_scheddev_stats_counter {
+	/**< Number of packets scheduled from current node. */
+	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS = 1 << 0,
+
+	/**< Number of bytes scheduled from current node. */
+	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES = 1 << 1,
+
+	/**< Number of packets dropped by current node.  */
+	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS_DROPPED = 1 << 2,
+
+	/**< Number of bytes dropped by current node.  */
+	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES_DROPPED = 1 << 3,
+
+	/**< Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS_QUEUED = 1 << 4,
+
+	/**< Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES_QUEUED = 1 << 5,
+};
+
+/**
+  * Node statistics counters
+  */
+struct rte_scheddev_node_stats {
+	/**< Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/**< Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/**< Statistics counters for leaf nodes only. */
+	struct {
+		/**< Number of packets dropped by current leaf node. */
+		uint64_t n_pkts_dropped;
+
+		/**< Number of bytes dropped by current leaf node. */
+		uint64_t n_bytes_dropped;
+
+		/**< Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/**< Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_scheddev_error::cause.
+ */
+enum rte_scheddev_error_type {
+	RTE_SCHEDDEV_ERROR_TYPE_NONE, /**< No error. */
+	RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_PARENT_NODE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_PRIORITY,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_WEIGHT,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_CMAN,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_WRED_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_SHARED_WRED_CONTEXT_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_scheddev_error {
+	enum rte_scheddev_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Scheduler capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Scheduler capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_capabilities_get(uint8_t port_id,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param cap
+ *   Scheduler node capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is currently
+ * at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several scheduler hierarchy
+ * leaf nodes configured to use WRED as the congestion management mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. scheduler hierarchy leaf node) of this
+ * shared WRED context.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is no
+ * longer using the shaper profile previously assigned to it and is updated to
+ * use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. scheduler hierarchy node) of this shared
+ * shaper.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node add
+ *
+ * When *node_id* is not a valid node ID, a new node with this ID is created and
+ * connected as child to the existing node identified by *parent_node_id*.
+ *
+ * When *node_id* is a valid node ID, this node is disconnected from its current
+ * parent and connected as child to another existing node identified by
+ * *parent_node_id *.
+ *
+ * This function can be called during port initialization phase (before the
+ * Ethernet port is started) for building the scheduler start-up hierarchy.
+ * Subject to the specific Ethernet port supporting on-the-fly scheduler
+ * hierarchy updates, this function can also be called during run-time (after
+ * the Ethernet port is started).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID
+ * @param parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has at
+ * least one user (i.e. child node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node suspend
+ *
+ * Suspend an existing node.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node resume
+ *
+ * Resume an existing node that was previously suspended.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler hierarchy set
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the scheduler start-up hierarchy.
+ *
+ * This function fails when the currently configured scheduler hierarchy is not
+ * supported by the Ethernet port, in which case the user can abort or try out
+ * another hierarchy configuration (e.g. a hierarchy with less leaf nodes),
+ * which can be build from scratch (when *clear_on_fail* is enabled) or by
+ * modifying the existing hierarchy configuration (when *clear_on_fail* is
+ * disabled).
+ *
+ * Note that, even when the configured scheduler hierarchy is supported (so this
+ * function is successful), the Ethernet port start might still fail due to e.g.
+ * not enough memory being available in the system, etc.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node parent update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node private shaper update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node shared shapers update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node scheduling mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param scheduling_mode_per_priority
+ *   For each priority, indicates whether the children nodes sharing the same
+ *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates that
+ *   WFQ is to be used for all priorities. When non-NULL, it points to a
+ *   pre-allocated array of *n_priority* elements, with a non-zero value element
+ *   indicating WFQ and a zero value element for WRR.
+ * @param n_priorities
+ *   Number of priorities.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node congestion management mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param cman
+ *   Congestion management mode.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node private WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_SCHEDDEV_WRED_PROFILE_ID_NONE, with
+ *   the latter disabling the private WRED context of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node shared WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared WRED context to current node or to
+ *   zero to delete this shared WRED context from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion Notification
+ * (ECN) field (2 bits). The DSCP field is typically used to encode the traffic
+ * class and/or drop priority (RFC 2597), while the ECN field is used by RFC
+ * 3168 to implement a congestion notification mechanism to be leveraged by
+ * transport layer protocols such as TCP and SCTP that have congestion control
+ * mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10 (values
+ * indicating that source endpoint is ECN-capable) to 2'b11 (meaning that
+ * congestion is experienced). The destination endpoint can use the ECN-Echo
+ * (ECE) TCP flag to relay the congestion indication back to the source
+ * endpoint, which acknowledges it back to the destination endpoint with the
+ * Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ *                       Class 1    Class 2    Class 3    Class 4
+ *                     +----------+----------+----------+----------+
+ *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
+ *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
+ *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
+ *                     +----------+----------+----------+----------+
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2, as
+ * well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler get statistics counter types enabled for all nodes
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param nonleaf_node_capability_stats_mask
+ *   Statistics counter types available per node for all non-leaf nodes. Needs
+ *   to be pre-allocated.
+ * @param nonleaf_node_enabled_stats_mask
+ *   Statistics counter types currently enabled per node for each non-leaf node.
+ *   This is a subset of *nonleaf_node_capability_stats_mask*. Needs to be
+ *   pre-allocated.
+ * @param leaf_node_capability_stats_mask
+ *   Statistics counter types available per node for all leaf nodes. Needs to
+ *   be pre-allocated.
+ * @param leaf_node_enabled_stats_mask
+ *   Statistics counter types currently enabled for each leaf node. This is
+ *   a subset of *leaf_node_capability_stats_mask*. Needs to be pre-allocated.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_stats_get_enabled(uint8_t port_id,
+	uint64_t *nonleaf_node_capability_stats_mask,
+	uint64_t *nonleaf_node_enabled_stats_mask,
+	uint64_t *leaf_node_capability_stats_mask,
+	uint64_t *leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler enable selected statistics counters for all nodes
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param nonleaf_node_enabled_stats_mask
+ *   Statistics counter types to be enabled per node for each non-leaf node.
+ *   This needs to be a subset of the statistics counter types available per
+ *   node for all non-leaf nodes. Any statistics counter type not included in
+ *   this set is to be disabled for all non-leaf nodes.
+ * @param leaf_node_enabled_stats_mask
+ *   Statistics counter types to be enabled per node for each leaf node. This
+ *   needs to be a subset of the statistics counter types available per node for
+ *   all leaf nodes. Any statistics counter type not included in this set is to
+ *   be disabled for all leaf nodes.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_stats_enable(uint8_t port_id,
+	uint64_t nonleaf_node_enabled_stats_mask,
+	uint64_t leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler get statistics counter types enabled for current node
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param capability_stats_mask
+ *   Statistics counter types available for the current node. Needs to be
+ *   pre-allocated.
+ * @param enabled_stats_mask
+ *   Statistics counter types currently enabled for the current node. This is
+ *   a subset of *capability_stats_mask*. Needs to be pre-allocated.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_stats_get_enabled(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t *capability_stats_mask,
+	uint64_t *enabled_stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler enable selected statistics counters for current node
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param enabled_stats_mask
+ *   Statistics counter types to be enabled for the current node. This needs to
+ *   be a subset of the statistics counter types available for the current node.
+ *   Any statistics counter type not included in this set is to be disabled for
+ *   the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_stats_enable(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t enabled_stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node statistics counters read
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read, otherwise
+ *   the statistics counters are left untouched.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int rte_scheddev_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	int clear,
+	struct rte_scheddev_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_SCHEDDEV_H__ */
diff --git a/lib/librte_ether/rte_scheddev_driver.h b/lib/librte_ether/rte_scheddev_driver.h
new file mode 100644
index 0000000..c0a0321
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev_driver.h
@@ -0,0 +1,374 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
+#define __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Hierarchical Scheduler API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_scheddev.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int (*rte_scheddev_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler capabilities get */
+
+typedef int (*rte_scheddev_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node capabilities get */
+
+typedef int (*rte_scheddev_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler WRED profile add */
+
+typedef int (*rte_scheddev_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler WRED profile delete */
+
+typedef int (*rte_scheddev_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared WRED context add */
+
+typedef int (*rte_scheddev_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared WRED context delete */
+
+typedef int (*rte_scheddev_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shaper profile add */
+
+typedef int (*rte_scheddev_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shaper profile delete */
+
+typedef int (*rte_scheddev_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared shaper add/update */
+
+typedef int (*rte_scheddev_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared shaper delete */
+
+typedef int (*rte_scheddev_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node add */
+
+typedef int (*rte_scheddev_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node delete */
+
+typedef int (*rte_scheddev_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node suspend */
+
+typedef int (*rte_scheddev_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node resume */
+
+typedef int (*rte_scheddev_hierarchy_set_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler hierarchy set */
+
+typedef int (*rte_scheddev_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node parent update */
+
+typedef int (*rte_scheddev_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node shaper update */
+
+typedef int (*rte_scheddev_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node shaper update */
+
+typedef int (*rte_scheddev_node_scheduling_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node scheduling mode update */
+
+typedef int (*rte_scheddev_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node congestion management mode update */
+
+typedef int (*rte_scheddev_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node WRED context update */
+
+typedef int (*rte_scheddev_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node WRED context update */
+
+typedef int (*rte_scheddev_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - VLAN DEI */
+
+typedef int (*rte_scheddev_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - IPv4/IPv6 ECN */
+
+typedef int (*rte_scheddev_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - IPv4/IPv6 DSCP */
+
+typedef int (*rte_scheddev_stats_get_enabled_t)(struct rte_eth_dev *dev,
+	uint64_t *nonleaf_node_capability_stats_mask,
+	uint64_t *nonleaf_node_enabled_stats_mask,
+	uint64_t *leaf_node_capability_stats_mask,
+	uint64_t *leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler get set of stats counters enabled for all nodes */
+
+typedef int (*rte_scheddev_stats_enable_t)(struct rte_eth_dev *dev,
+	uint64_t nonleaf_node_enabled_stats_mask,
+	uint64_t leaf_node_enabled_stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler enable selected stats counters for all nodes */
+
+typedef int (*rte_scheddev_node_stats_get_enabled_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t *capability_stats_mask,
+	uint64_t *enabled_stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler get set of stats counters enabled for specific node */
+
+typedef int (*rte_scheddev_node_stats_enable_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t enabled_stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler enable selected stats counters for specific node */
+
+typedef int (*rte_scheddev_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	int clear,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler read stats counters for specific node */
+
+struct rte_scheddev_ops {
+	/** Scheduler capabilities_get */
+	rte_scheddev_capabilities_get_t capabilities_get;
+	/** Scheduler node capabilities get */
+	rte_scheddev_node_capabilities_get_t node_capabilities_get;
+
+	/** Scheduler WRED profile add */
+	rte_scheddev_wred_profile_add_t wred_profile_add;
+	/** Scheduler WRED profile delete */
+	rte_scheddev_wred_profile_delete_t wred_profile_delete;
+	/** Scheduler shared WRED context add/update */
+	rte_scheddev_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Scheduler shared WRED context delete */
+	rte_scheddev_shared_wred_context_delete_t
+		shared_wred_context_delete;
+	/** Scheduler shaper profile add */
+	rte_scheddev_shaper_profile_add_t shaper_profile_add;
+	/** Scheduler shaper profile delete */
+	rte_scheddev_shaper_profile_delete_t shaper_profile_delete;
+	/** Scheduler shared shaper add/update */
+	rte_scheddev_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Scheduler shared shaper delete */
+	rte_scheddev_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Scheduler node add */
+	rte_scheddev_node_add_t node_add;
+	/** Scheduler node delete */
+	rte_scheddev_node_delete_t node_delete;
+	/** Scheduler node suspend */
+	rte_scheddev_node_suspend_t node_suspend;
+	/** Scheduler node resume */
+	rte_scheddev_node_resume_t node_resume;
+	/** Scheduler hierarchy set */
+	rte_scheddev_hierarchy_set_t hierarchy_set;
+
+	/** Scheduler node parent update */
+	rte_scheddev_node_parent_update_t node_parent_update;
+	/** Scheduler node shaper update */
+	rte_scheddev_node_shaper_update_t node_shaper_update;
+	/** Scheduler node shared shaper update */
+	rte_scheddev_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Scheduler node scheduling mode update */
+	rte_scheddev_node_scheduling_mode_update_t node_scheduling_mode_update;
+	/** Scheduler node congestion management mode update */
+	rte_scheddev_node_cman_update_t node_cman_update;
+	/** Scheduler node WRED context update */
+	rte_scheddev_node_wred_context_update_t node_wred_context_update;
+	/** Scheduler node shared WRED context update */
+	rte_scheddev_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+
+	/** Scheduler packet marking - VLAN DEI */
+	rte_scheddev_mark_vlan_dei_t mark_vlan_dei;
+	/** Scheduler packet marking - IPv4/IPv6 ECN */
+	rte_scheddev_mark_ip_ecn_t mark_ip_ecn;
+	/** Scheduler packet marking - IPv4/IPv6 DSCP */
+	rte_scheddev_mark_ip_dscp_t mark_ip_dscp;
+
+	/** Scheduler get statistics counter type enabled for all nodes */
+	rte_scheddev_stats_get_enabled_t stats_get_enabled;
+	/** Scheduler enable selected statistics counters for all nodes */
+	rte_scheddev_stats_enable_t stats_enable;
+	/** Scheduler get statistics counter type enabled for current node */
+	rte_scheddev_node_stats_get_enabled_t node_stats_get_enabled;
+	/** Scheduler enable selected statistics counters for current node */
+	rte_scheddev_node_stats_enable_t node_stats_enable;
+	/** Scheduler read statistics counters for current node */
+	rte_scheddev_node_stats_read_t node_stats_read;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param error
+ *   Pointer to error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error type.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_scheddev_error_set(struct rte_scheddev_error *error,
+		   int code,
+		   enum rte_scheddev_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_scheddev_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic hierarchical scheduler operations structure from a port
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param error
+ *   Error details
+ *
+ * @return
+ *   The hierarchical scheduler operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_scheddev_ops *
+rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_SCHEDDEV_DRIVER_H__ */
-- 
2.5.0

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH] doc: remove announce of Tx preparation
@ 2017-02-13 10:56  9% Thomas Monjalon
  2017-02-13 14:22  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-13 10:56 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

The feature is part of 17.02, so the ABI changes notice can be removed.

Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..326fde4 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -23,13 +23,6 @@ Deprecation Notices
   provide a way to handle device initialization currently being done in
   ``eth_driver``.
 
-* In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
-  extended with new function pointer ``tx_pkt_prepare`` allowing verification
-  and processing of packet burst to meet HW specific requirements before
-  transmit. Also new fields will be added to the ``rte_eth_desc_lim`` structure:
-  ``nb_seg_max`` and ``nb_mtu_seg_max`` providing information about number of
-  segments limit to be transmitted by device for TSO/non-TSO packets.
-
 * ethdev: an API change is planned for 17.02 for the function
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
-- 
2.7.0

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH] doc: postpone ABI changes to 17.05
@ 2017-02-13 11:05 19% Olivier Matz
  2017-02-13 14:21  4% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2017-02-13 11:05 UTC (permalink / raw)
  To: dev, john.mcnamara, thomas.monjalon

Postpone the ABI changes for mempool and mbuf that were planned
for 17.02 to 17.05.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..9d01e86 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -34,7 +34,7 @@ Deprecation Notices
   ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
 
-* ABI changes are planned for 17.02 in the ``rte_mbuf`` structure: some fields
+* ABI changes are planned for 17.05 in the ``rte_mbuf`` structure: some fields
   may be reordered to facilitate the writing of ``data_off``, ``refcnt``, and
   ``nb_segs`` in one operation, because some platforms have an overhead if the
   store address is not naturally aligned. Other mbuf fields, such as the
@@ -44,15 +44,15 @@ Deprecation Notices
 * The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
   are respectively replaced by PKT_RX_VLAN_STRIPPED and
   PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
-  their behavior will be kept until 16.11 and will be removed in 17.02.
+  their behavior will be kept until 17.02 and will be removed in 17.05.
 
 * mempool: The functions ``rte_mempool_count`` and ``rte_mempool_free_count``
-  will be removed in 17.02.
+  will be removed in 17.05.
   They are replaced by ``rte_mempool_avail_count`` and
   ``rte_mempool_in_use_count`` respectively.
 
 * mempool: The functions for single/multi producer/consumer are deprecated
-  and will be removed in 17.02.
+  and will be removed in 17.05.
   It is replaced by ``rte_mempool_generic_get/put`` functions.
 
 * ethdev: the legacy filter API, including
-- 
2.8.1

^ permalink raw reply	[relevance 19%]

* [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
@ 2017-02-13 11:55  9% Shreyansh Jain
  2017-02-13 12:00  0% ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2017-02-13 11:55 UTC (permalink / raw)
  To: dev; +Cc: nhorman, thomas.monjalon, Shreyansh Jain

EAL PCI layer is planned to be restructured in 17.05 to unlink it from
generic structures like eth_driver, rte_cryptodev_driver, and also move
it into a PCI Bus.

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index fbe2fcb..b12d435 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -13,10 +13,14 @@ Deprecation Notices
   has exposed, like the way we have done with uio-pci-generic. This change
   targets release 17.05.
 
-* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
-  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
-  provide a way to handle device initialization currently being done in
-  ``eth_driver``.
+* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
+  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
+  bus.
+
+* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in 17.05.
+  This is to unlink the ethernet driver from PCI dependencies.
+  Similarly, ``rte_pci_driver`` in planned to be removed from
+  ``rte_cryptodev_driver`` in 17.05.
 
 * In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
   extended with new function pointer ``tx_pkt_prepare`` allowing verification
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH] doc: remove deprecation notice for rte_bus
@ 2017-02-13 11:55  5% Shreyansh Jain
  2017-02-13 14:36  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2017-02-13 11:55 UTC (permalink / raw)
  To: dev; +Cc: nhorman, thomas.monjalon, Shreyansh Jain

Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
 doc/guides/rel_notes/deprecation.rst | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..fbe2fcb 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -13,11 +13,6 @@ Deprecation Notices
   has exposed, like the way we have done with uio-pci-generic. This change
   targets release 17.05.
 
-* ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
-  impacted because of introduction of a new ``rte_bus`` hierarchy. This would
-  also impact the way devices are identified by EAL. A bus-device-driver model
-  will be introduced providing a hierarchical view of devices.
-
 * ``eth_driver`` is planned to be removed in 17.02. This currently serves as
   a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
   provide a way to handle device initialization currently being done in
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
  2017-02-13 11:55  9% [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL Shreyansh Jain
@ 2017-02-13 12:00  0% ` Shreyansh Jain
  2017-02-13 14:44  0%   ` Thomas Monjalon
  2017-02-13 21:56  0%   ` Jan Blunck
  0 siblings, 2 replies; 200+ results
From: Shreyansh Jain @ 2017-02-13 12:00 UTC (permalink / raw)
  To: dev; +Cc: nhorman, thomas.monjalon

On Monday 13 February 2017 05:25 PM, Shreyansh Jain wrote:
> EAL PCI layer is planned to be restructured in 17.05 to unlink it from
> generic structures like eth_driver, rte_cryptodev_driver, and also move
> it into a PCI Bus.
>
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index fbe2fcb..b12d435 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -13,10 +13,14 @@ Deprecation Notices
>    has exposed, like the way we have done with uio-pci-generic. This change
>    targets release 17.05.
>
> -* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
> -  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
> -  provide a way to handle device initialization currently being done in
> -  ``eth_driver``.

Just to highlight, above statement was added by me in 16.11.
As of now I plan to work on removing rte_pci_driver from eth_driver,
rather than removing eth_driver all together (which, probably, was
better idea).
If someone still wishes to work on its complete removal, we can keep
the above. (and probably remove the below).

> +* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
> +  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
> +  bus.
> +
> +* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in 17.05.
> +  This is to unlink the ethernet driver from PCI dependencies.
> +  Similarly, ``rte_pci_driver`` in planned to be removed from
> +  ``rte_cryptodev_driver`` in 17.05.
>
>  * In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
>    extended with new function pointer ``tx_pkt_prepare`` allowing verification
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: postpone ABI changes to 17.05
  2017-02-13 11:05 19% [dpdk-dev] [PATCH] doc: postpone ABI changes to 17.05 Olivier Matz
@ 2017-02-13 14:21  4% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:21 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, john.mcnamara

2017-02-13 12:05, Olivier Matz:
> Postpone the ABI changes for mempool and mbuf that were planned
> for 17.02 to 17.05.
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Applied, thanks

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: remove announce of Tx preparation
  2017-02-13 10:56  9% [dpdk-dev] [PATCH] doc: remove announce of Tx preparation Thomas Monjalon
@ 2017-02-13 14:22  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:22 UTC (permalink / raw)
  To: Tomasz Kulasek; +Cc: dev

2017-02-13 11:56, Thomas Monjalon:
> The feature is part of 17.02, so the ABI changes notice can be removed.
> 
> Fixes: 4fb7e803eb1a ("ethdev: add Tx preparation")
> 
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Applied

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: postpone API change in ethdev
@ 2017-02-13 14:26  4% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:26 UTC (permalink / raw)
  To: Bernard Iremonger; +Cc: dev

The change of _rte_eth_dev_callback_process has not been done in 17.02.
Let's postpone to 17.05.

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 3d72241..6532482 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -23,8 +23,8 @@ Deprecation Notices
   provide a way to handle device initialization currently being done in
   ``eth_driver``.
 
-* ethdev: an API change is planned for 17.02 for the function
-  ``_rte_eth_dev_callback_process``. In 17.02 the function will return an ``int``
+* ethdev: an API change is planned for 17.05 for the function
+  ``_rte_eth_dev_callback_process``. In 17.05 the function will return an ``int``
   instead of ``void`` and a fourth parameter ``void *ret_param`` will be added.
 
 * ABI changes are planned for 17.05 in the ``rte_mbuf`` structure: some fields
-- 
2.7.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: remove deprecation notice for rte_bus
  2017-02-13 11:55  5% [dpdk-dev] [PATCH] doc: remove deprecation notice for rte_bus Shreyansh Jain
@ 2017-02-13 14:36  0% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:36 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev

2017-02-13 17:25, Shreyansh Jain:
> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> ---
> -* ABI/API changes are planned for 17.02: ``rte_device``, ``rte_driver`` will be
> -  impacted because of introduction of a new ``rte_bus`` hierarchy. This would
> -  also impact the way devices are identified by EAL. A bus-device-driver model
> -  will be introduced providing a hierarchical view of devices.

Applied, thanks

rte_device/rte_driver have not been impacted and should not be when implementing
the buses.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
  2017-02-13 12:00  0% ` Shreyansh Jain
@ 2017-02-13 14:44  0%   ` Thomas Monjalon
  2017-02-13 21:56  0%   ` Jan Blunck
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 14:44 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, Jan Blunck, Stephen Hemminger

2017-02-13 17:30, Shreyansh Jain:
> On Monday 13 February 2017 05:25 PM, Shreyansh Jain wrote:
> > EAL PCI layer is planned to be restructured in 17.05 to unlink it from
> > generic structures like eth_driver, rte_cryptodev_driver, and also move
> > it into a PCI Bus.
> >
> > Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> > index fbe2fcb..b12d435 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -13,10 +13,14 @@ Deprecation Notices
> >    has exposed, like the way we have done with uio-pci-generic. This change
> >    targets release 17.05.
> >
> > -* ``eth_driver`` is planned to be removed in 17.02. This currently serves as
> > -  a placeholder for PMDs to register themselves. Changes for ``rte_bus`` will
> > -  provide a way to handle device initialization currently being done in
> > -  ``eth_driver``.
> 
> Just to highlight, above statement was added by me in 16.11.
> As of now I plan to work on removing rte_pci_driver from eth_driver,
> rather than removing eth_driver all together (which, probably, was
> better idea).
> If someone still wishes to work on its complete removal, we can keep
> the above. (and probably remove the below).

Yes I think we should keep the original idea.
I will work on it with Jan Blunck and Stephen Hemminger I think.

> > +* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
> > +  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
> > +  bus.
> > +
> > +* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in 17.05.
> > +  This is to unlink the ethernet driver from PCI dependencies.
> > +  Similarly, ``rte_pci_driver`` in planned to be removed from
> > +  ``rte_cryptodev_driver`` in 17.05.

I am going to reword it in a v2.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] cryptodev - Session and queue pair relationship
  @ 2017-02-13 15:09  3%       ` Trahe, Fiona
  0 siblings, 0 replies; 200+ results
From: Trahe, Fiona @ 2017-02-13 15:09 UTC (permalink / raw)
  To: Akhil Goyal, Doherty, Declan, dev, De Lara Guarch, Pablo, Jain, Deepak K
  Cc: hemant.agrawal, Trahe, Fiona

Hi Akhil, 

> -----Original Message-----
> From: Trahe, Fiona
> Sent: Monday, February 13, 2017 2:45 PM
> To: Akhil Goyal <akhil.goyal@nxp.com>; Doherty, Declan
> <declan.doherty@intel.com>; dev@dpdk.org; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; Jain, Deepak K <deepak.k.jain@intel.com>
> Cc: hemant.agrawal@nxp.com; Trahe, Fiona <fiona.trahe@intel.com>
> Subject: RE: cryptodev - Session and queue pair relationship
> 
> Hi Akhil
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> > Sent: Monday, February 13, 2017 2:39 PM
> > To: Doherty, Declan <declan.doherty@intel.com>; dev@dpdk.org; De Lara
> > Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Jain, Deepak K
> > <deepak.k.jain@intel.com>
> > Cc: hemant.agrawal@nxp.com; Trahe, Fiona <fiona.trahe@intel.com>
> > Subject: Re: cryptodev - Session and queue pair relationship
> >
> > On 2/8/2017 2:22 AM, Declan Doherty wrote:
> > > On 06/02/17 13:35, Akhil Goyal wrote:
> > >> Hi,
> > >>
> > > Hey Akhil, see my thoughts inline
> > >
> > >> I have some issues w.r.t the mapping sessions and queue pairs.
> > >>
> > >> As per my understanding:
> > >> - Number of sessions may be large - they are independent of number of
> > >> queue pairs
> > >
> > > Yes, cryptodev assumes no implicit connection between sessions and
> > > queue pairs, the current PMDs just use the crypto session to store the
> > > immutable data (keys etc) for a particular crypto transform or chain of
> > > transforms in a format specific to that PMD with no statefull information.
> > >
> > >> - Queue pairs are L-core specific
> > >
> > > Not exactly, queue pairs like ethdev queues are not thread safe, so we
> > > assume that only a single l-core will be using a queue pair at any time
> > > unless the application layer has introduce a locking mechanism to
> > > provide thread safety.
> > >
> > >> - Depending on the implementation, one queue pair can be mapped to
> > many
> > >> sessions. Or, Only one queue pair for every session- especially in the
> > >> systems having large number of queues (hw).
> > >
> > > Currently none of the software crypto PMDs or Intel QuickAssist hardware
> > > accelerated PMD make any assumptions regarding coupling/mapping of
> > > sessions to queue pairs, so today a users could freely change the queue
> > > pair which a session is processed on, or even go as far using the  ame
> > > session for processing on different queue simultaneously as the sessions
> > > are stateless, obviously this could introduce issues for statefull
> > > higher level protocol using the cryptodev PMD service but the cryptodev
> > > API doesn't prohibit this usage model.
> > >
> > >
> > >> - Sessions can be created on the fly - typical rekeying use-cases.
> > >> Generally done by the control threads.
> > >>
> > >
> > > Sure, there is no restriction on session creation other than an element
> > > being free in the mempool which the session is being created on.
> > >
> > >> There seems to be no straight way for the underlying driver
> > >> implementation to know, what all sessions are mapped to a particular
> > >> queue pair. The session and queue pair information is first time exposed
> > >> in the enqueue command.
> > >>
> > >> One of the NXP Crypto Hardware drivers uses per session data structures
> > >> (descriptors) which need to be configured for hardware queues.  Though
> > >> this information can be extracted from the first enqueue command for a
> > >> particular session, it will add checks in the data path. Also, it will
> > >> bring down the connection setup rate.
> > >
> > > We haven't had to support this model of coupling sessions to queue pairs
> > > in any PMDs before. If I understand correctly, in the hardware model you
> > > need to support a queue pair can only be configured to support the
> > > processing of a single session at any one time and it only supports that
> > > session until it is reconfigured, is this correct? So if a session needs
> > > to be re-keyed the queue pair would need to be reconfigured?
> > yes it is correct.
> > >
> > >>
> > >> In the API rte_cryptodev_sym_session_create(), we create session on a
> > >> particular device, but there is no information of queue pair being
> > >> shared.
> > >>
> > >> 1. We want to propose to change the session create/config API to also
> > >> take queue pair id as argument.
> > >> struct rte_cryptodev_sym_session *
> > >> rte_cryptodev_sym_session_create(uint8_t dev_id,
> > >>                               struct rte_crypto_sym_xform *xform) to
> > >> also take "uint16_t qp;"
> > >>
> > >> This will also return "in-use" error, if the underlying hardware only
> > >> support 1 session/descriptor per qp.
> > >
> > > I my mind the idea of coupling the session_create function to the queue
> > > pair of a device doesn't feel right as it would certainly put
> > > unnecessary constraint on all existing PMDs queue pairs.
> > >
> > > One possible approach would be to extend the the queue_pair_setup
> > > function to take an opaque parameter which would allow you to pass a
> > > session through and would be  an approach more in keeping with the
> > > cryptodev current model, but you would then still need to verify that
> > > the operations being enqueued have the same session as the configured
> > > device, assuming that the packet are being enqueued from the host.
> > >
> > > If you need to re-key or change the session you could re-initialize the
> > > queue pair while the device is still active, but stopping the queue pair.
> > >
> > > Following a sequence something like:
> > > stop_qp()
> > > setup_qp()
> > > start_qp()
> > >
> > >
> > > Another option Fiona suggested would be to add 2 new APIs
> > >
> > >
> >
> rte_cryptodev_queue_pair_attach_sym_session/queue_pair_detach_sym_sess
> > ion this
> > > would allow dynamic attaching of one or more sessions to device if it
> > > supported this sort of static mapping of sessions to queue pairs.
> > >
> > >
> > >>
> > >> 2. Currently the application configures the *nb_descriptors* in the
> > >> *rte_cryptodev_queue_pair_setup*. Should we add the queue pair
> > >> capability API?
> > >>
> > >
> > > Regarding capabilities, I think this should be just propagated through
> > > the device capabilities, something like a max number of session mapped
> > > per queue pair, which would be zero for all/most current devices, and
> > > could be 1 or greater for your device. This is assuming that all queue
> > > pairs can all support the same crypto transforms capabilities and that
> > > different queue pairs have different capabilities which could get very
> > > messy to discover.
> > >
> > >>
> > >> Please share your feedback, I will submit the patch accordingly.
> > >>
> > >> Regards,
> > >> Akhil
> > >>
> > >>
> > >>
> > >
> > >
> > Thanks for your feedback Declan,
> > The suggestion from Fiona looks good. Should I send the patch for this
> > or is it already in discussion in some different thread?
> 
> No, it's not under discussion in any other thread that I'm aware of.
> Go ahead and send it.

It may be useful to add max_nb_sessions_per_qp to 
struct rte_cryptodev_info.sym
I'm assuming where there is a limit this would be the same for all qps on the device?
0 meaning unlimited, >0 meaning limited to that number.
This could be used by the application to know whether it needs to use the attach API or not. 
This will cause an ABI breakage, so must be flagged first before changing.

> 
> >
> > Also, if this new API is added, there would be corresponding change in
> > the ipsec-secgw application as well.
> > This API should be optional and underlying implementation may or may not
> > implement this API.
> >
> > Regards,
> > Akhil
> >

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] doc: deprecation notice for ethdev ops?
@ 2017-02-13 16:02  3% Dumitrescu, Cristian
  2017-02-13 16:09  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2017-02-13 16:02 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Richardson, Bruce, Yigit, Ferruh, Wiles, Keith

Hi Thomas,

When a new member (function pointer) is added to struct eth_dev_ops (as the last member), does it need to go through ABI chance process (e.g. chance notice one release before)?

IMO the answer is no: struct eth_dev_ops is marked as internal and its instances are only accessed through pointers, so the rte_eth_devices array should not be impacted by the ops structure expanding at its end. Unless there is something that I am missing?

My question is in the context of this patch under review for 17.5 release: http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.

Thanks,
Cristian

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-10 13:59  4% ` Trahe, Fiona
@ 2017-02-13 16:07  7%   ` Zhang, Roy Fan
  2017-02-13 17:34  4%     ` Trahe, Fiona
  2017-02-14  0:21  4%   ` Hemant Agrawal
  1 sibling, 1 reply; 200+ results
From: Zhang, Roy Fan @ 2017-02-13 16:07 UTC (permalink / raw)
  To: Trahe, Fiona, dev; +Cc: De Lara Guarch, Pablo

Hi Fiona,

Sorry for my bad English, I will try to explain better here.

"cryptodev_configure_t" is a function prototype with only "rte_cryptodev *dev"
as sole parameter. Structure ``rte_cryptodev_ops`` holds one function pointer
"dev_configure" of it. 

The patch involves in the announcement of adding a parameter of 
"struct rte_cryptodev_config" pointer so the function prototype could look like:

typedef int (*cryptodev_configure_t)(struct rte_cryptodev *dev, struct rte_cryptodev_config *config);

Without this parameter, a specific crypto PMD may not have enough information to
configure itself. Which may not be big problem as other Cryptodevs as all configures
are done in rte_cryptodev_configure(), but it is important for the scheduler PMD as it
needs this parameter to configure all its slaves. Currently the user have to configure
every slave one by one.

The problem is, although I want to change an API of the function prototype "cryptodev_configure_t",
but in order to do that I have to break the ABI of structure "rte_cryptodev_ops". Any help on the grammar
for stating this nicer would be appreciated.

Best regards,
Fan




> -----Original Message-----
> From: Trahe, Fiona
> Sent: Friday, February 10, 2017 2:00 PM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>
> Subject: RE: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> structure
> 
> Hi Fan,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> > Sent: Friday, February 10, 2017 11:39 AM
> > To: dev@dpdk.org
> > Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> > Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> > structure
> >
> > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index 755dc65..564d93a 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -62,3 +62,7 @@ Deprecation Notices
> >    PMDs that implement the latter.
> >    Target release for removal of the legacy API will be defined once most
> >    PMDs have switched to rte_flow.
> > +
> > +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
> > +  The field ``cryptodev_configure_t`` function prototype will be
> > +added a
> > +  parameter of a struct rte_cryptodev_config type pointer.
> > --
> > 2.7.4
> 
> Can you fix the grammar here please. I'm not sure what the change is?

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 16:02  3% [dpdk-dev] doc: deprecation notice for ethdev ops? Dumitrescu, Cristian
@ 2017-02-13 16:09  0% ` Thomas Monjalon
  2017-02-13 16:46  4%   ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-13 16:09 UTC (permalink / raw)
  To: Dumitrescu, Cristian; +Cc: dev, Richardson, Bruce, Yigit, Ferruh, Wiles, Keith

2017-02-13 16:02, Dumitrescu, Cristian:
> Hi Thomas,
> 
> When a new member (function pointer) is added to struct eth_dev_ops (as the last member), does it need to go through ABI chance process (e.g. chance notice one release before)?
> 
> IMO the answer is no: struct eth_dev_ops is marked as internal and its instances are only accessed through pointers, so the rte_eth_devices array should not be impacted by the ops structure expanding at its end. Unless there is something that I am missing?

You are right, it is an internal struct.
So no need of a deprecation notice.

We must clearly separate API and internal code in ethdev.

> My question is in the context of this patch under review for 17.5 release: http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.

I did not look at it yet. Will do after the release.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 16:09  0% ` Thomas Monjalon
@ 2017-02-13 16:46  4%   ` Ferruh Yigit
  2017-02-13 17:21  0%     ` Dumitrescu, Cristian
  2017-02-13 17:38  3%     ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2017-02-13 16:46 UTC (permalink / raw)
  To: Thomas Monjalon, Dumitrescu, Cristian
  Cc: dev, Richardson, Bruce, Wiles, Keith

On 2/13/2017 4:09 PM, Thomas Monjalon wrote:
> 2017-02-13 16:02, Dumitrescu, Cristian:
>> Hi Thomas,
>>
>> When a new member (function pointer) is added to struct eth_dev_ops (as the last member), does it need to go through ABI chance process (e.g. chance notice one release before)?
>>
>> IMO the answer is no: struct eth_dev_ops is marked as internal and its instances are only accessed through pointers, so the rte_eth_devices array should not be impacted by the ops structure expanding at its end. Unless there is something that I am missing?
> 
> You are right, it is an internal struct.
> So no need of a deprecation notice.

When dpdk compiled as dynamic library, application will load PMDs
dynamically as plugin.
Is this use case cause ABI compatibility issue?

I think drivers <--> libraries interface can cause ABI breakages for
dynamic library case, although not sure how common use case this is.


> 
> We must clearly separate API and internal code in ethdev.
> 
>> My question is in the context of this patch under review for 17.5 release: http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.
> 
> I did not look at it yet. Will do after the release.
> 
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 16:46  4%   ` Ferruh Yigit
@ 2017-02-13 17:21  0%     ` Dumitrescu, Cristian
  2017-02-13 17:36  0%       ` Ferruh Yigit
  2017-02-13 17:38  3%     ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2017-02-13 17:21 UTC (permalink / raw)
  To: Yigit, Ferruh, Thomas Monjalon; +Cc: dev, Richardson, Bruce, Wiles, Keith



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Monday, February 13, 2017 4:46 PM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>; Wiles,
> Keith <keith.wiles@intel.com>
> Subject: Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
> 
> On 2/13/2017 4:09 PM, Thomas Monjalon wrote:
> > 2017-02-13 16:02, Dumitrescu, Cristian:
> >> Hi Thomas,
> >>
> >> When a new member (function pointer) is added to struct eth_dev_ops
> (as the last member), does it need to go through ABI chance process (e.g.
> chance notice one release before)?
> >>
> >> IMO the answer is no: struct eth_dev_ops is marked as internal and its
> instances are only accessed through pointers, so the rte_eth_devices array
> should not be impacted by the ops structure expanding at its end. Unless
> there is something that I am missing?
> >
> > You are right, it is an internal struct.
> > So no need of a deprecation notice.
> 
> When dpdk compiled as dynamic library, application will load PMDs
> dynamically as plugin.
> Is this use case cause ABI compatibility issue?
> 
> I think drivers <--> libraries interface can cause ABI breakages for
> dynamic library case, although not sure how common use case this is.
> 

Do you have a specific example that might cause an issue when adding a new function at the end of the ethdev ops structure? I cannot think of any, given that the ops structure is marked as internal and it is only accessed through pointers.

> 
> >
> > We must clearly separate API and internal code in ethdev.
> >
> >> My question is in the context of this patch under review for 17.5 release:
> http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.
> >
> > I did not look at it yet. Will do after the release.
> >
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-13 16:07  7%   ` Zhang, Roy Fan
@ 2017-02-13 17:34  4%     ` Trahe, Fiona
  0 siblings, 0 replies; 200+ results
From: Trahe, Fiona @ 2017-02-13 17:34 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: De Lara Guarch, Pablo

Thanks Fan, now it makes sense.

> -----Original Message-----
> From: Zhang, Roy Fan
> Sent: Monday, February 13, 2017 4:07 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: RE: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> structure
> 
> Hi Fiona,
> 
> Sorry for my bad English, I will try to explain better here.
> 
> "cryptodev_configure_t" is a function prototype with only "rte_cryptodev
> *dev"
> as sole parameter. Structure ``rte_cryptodev_ops`` holds one function pointer
> "dev_configure" of it.
> 
> The patch involves in the announcement of adding a parameter of
> "struct rte_cryptodev_config" pointer so the function prototype could look
> like:
> 
> typedef int (*cryptodev_configure_t)(struct rte_cryptodev *dev, struct
> rte_cryptodev_config *config);
> 
> Without this parameter, a specific crypto PMD may not have enough
> information to
> configure itself. Which may not be big problem as other Cryptodevs as all
> configures
> are done in rte_cryptodev_configure(), but it is important for the scheduler
> PMD as it
> needs this parameter to configure all its slaves. Currently the user have to
> configure
> every slave one by one.
> 
> The problem is, although I want to change an API of the function prototype
> "cryptodev_configure_t",
> but in order to do that I have to break the ABI of structure
> "rte_cryptodev_ops". Any help on the grammar
> for stating this nicer would be appreciated.
> 
> Best regards,
> Fan
> 
> 
> 
> 
> > -----Original Message-----
> > From: Trahe, Fiona
> > Sent: Friday, February 10, 2017 2:00 PM
> > To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> > Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> > <fiona.trahe@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> > structure
> >
> > Hi Fan,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> > > Sent: Friday, February 10, 2017 11:39 AM
> > > To: dev@dpdk.org
> > > Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> > > Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> > > structure
> > >
> > > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > > ---
> > >  doc/guides/rel_notes/deprecation.rst | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > > b/doc/guides/rel_notes/deprecation.rst
> > > index 755dc65..564d93a 100644
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > @@ -62,3 +62,7 @@ Deprecation Notices
> > >    PMDs that implement the latter.
> > >    Target release for removal of the legacy API will be defined once most
> > >    PMDs have switched to rte_flow.
> > > +
> > > +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops``
> structure.
> > > +  The field ``cryptodev_configure_t`` function prototype will be
> > > +added a
> > > +  parameter of a struct rte_cryptodev_config type pointer.
> > > --
> > > 2.7.4
> >
> > Can you fix the grammar here please. I'm not sure what the change is?

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 17:21  0%     ` Dumitrescu, Cristian
@ 2017-02-13 17:36  0%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-02-13 17:36 UTC (permalink / raw)
  To: Dumitrescu, Cristian, Thomas Monjalon
  Cc: dev, Richardson, Bruce, Wiles, Keith

On 2/13/2017 5:21 PM, Dumitrescu, Cristian wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Monday, February 13, 2017 4:46 PM
>> To: Thomas Monjalon <thomas.monjalon@6wind.com>; Dumitrescu, Cristian
>> <cristian.dumitrescu@intel.com>
>> Cc: dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>; Wiles,
>> Keith <keith.wiles@intel.com>
>> Subject: Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
>>
>> On 2/13/2017 4:09 PM, Thomas Monjalon wrote:
>>> 2017-02-13 16:02, Dumitrescu, Cristian:
>>>> Hi Thomas,
>>>>
>>>> When a new member (function pointer) is added to struct eth_dev_ops
>> (as the last member), does it need to go through ABI chance process (e.g.
>> chance notice one release before)?
>>>>
>>>> IMO the answer is no: struct eth_dev_ops is marked as internal and its
>> instances are only accessed through pointers, so the rte_eth_devices array
>> should not be impacted by the ops structure expanding at its end. Unless
>> there is something that I am missing?
>>>
>>> You are right, it is an internal struct.
>>> So no need of a deprecation notice.
>>
>> When dpdk compiled as dynamic library, application will load PMDs
>> dynamically as plugin.
>> Is this use case cause ABI compatibility issue?
>>
>> I think drivers <--> libraries interface can cause ABI breakages for
>> dynamic library case, although not sure how common use case this is.
>>
> 
> Do you have a specific example that might cause an issue when adding a new function at the end of the ethdev ops structure? I cannot think of any, given that the ops structure is marked as internal and it is only accessed through pointers.

Adding at the end of the struct is probably safe.

> 
>>
>>>
>>> We must clearly separate API and internal code in ethdev.
>>>
>>>> My question is in the context of this patch under review for 17.5 release:
>> http://www.dpdk.org/ml/archives/dev/2017-February/057367.html.
>>>
>>> I did not look at it yet. Will do after the release.
>>>
>>>
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] doc: deprecation notice for ethdev ops?
  2017-02-13 16:46  4%   ` Ferruh Yigit
  2017-02-13 17:21  0%     ` Dumitrescu, Cristian
@ 2017-02-13 17:38  3%     ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-13 17:38 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Dumitrescu, Cristian, dev, Richardson, Bruce, Wiles, Keith

2017-02-13 16:46, Ferruh Yigit:
> On 2/13/2017 4:09 PM, Thomas Monjalon wrote:
> > 2017-02-13 16:02, Dumitrescu, Cristian:
> >> Hi Thomas,
> >>
> >> When a new member (function pointer) is added to struct eth_dev_ops (as the last member), does it need to go through ABI chance process (e.g. chance notice one release before)?
> >>
> >> IMO the answer is no: struct eth_dev_ops is marked as internal and its instances are only accessed through pointers, so the rte_eth_devices array should not be impacted by the ops structure expanding at its end. Unless there is something that I am missing?
> > 
> > You are right, it is an internal struct.
> > So no need of a deprecation notice.
> 
> When dpdk compiled as dynamic library, application will load PMDs
> dynamically as plugin.
> Is this use case cause ABI compatibility issue?
> 
> I think drivers <--> libraries interface can cause ABI breakages for
> dynamic library case, although not sure how common use case this is.

Yes it is a problem for drivers/library interface.
It is not an ABI, which is an application/library interface.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
@ 2017-02-13 17:38  9% Bruce Richardson
  2017-02-14  0:32  4% ` Mcnamara, John
                   ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Bruce Richardson @ 2017-02-13 17:38 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Document proposed changes for the rings code in the next release.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..e715fc7 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,25 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
+* ring: Changes are planned to rte_ring APIs in release 17.05. Proposed
+  changes include:
+    - Removing build time options for the ring:
+      CONFIG_RTE_RING_SPLIT_PROD_CONS
+      CONFIG_RTE_RING_PAUSE_REP_COUNT
+    - Adding an additional parameter to enqueue functions to return the
+      amount of free space in the ring
+    - Adding an additional parameter to dequeue functions to return the
+      number of remaining elements in the ring
+    - Removing direct support for watermarks in the rings, since the
+      additional return value from the enqueue function makes it
+      unneeded
+    - Adjusting the return values of the bulk() enq/deq functions to
+      make them consistent with the burst() equivalents. [Note, parameter
+      to these functions are changing too, per points above, so compiler
+      will flag them as needing update in legacy code]
+    - Updates to some library functions e.g. rte_ring_get_memsize() to
+      allow for variably-sized ring elements.
+
 * igb_uio: iomem mapping and sysfs files created for iomem and ioport in
   igb_uio will be removed, because we are able to detect these from what Linux
   has exposed, like the way we have done with uio-pci-generic. This change
-- 
2.9.3

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
  @ 2017-02-13 17:57  4%   ` Thomas Monjalon
  2017-02-14  3:17  4%     ` Jerin Jacob
  2017-02-14 19:37  4%   ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-13 17:57 UTC (permalink / raw)
  To: Bernard Iremonger; +Cc: dev, john.mcnamara

2017-01-05 15:25, Bernard Iremonger:
> In 17.05 nine rte_eth_dev_* functions will be removed from
> librte_ether, renamed and moved to the ixgbe PMD.
> 
> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>

"ixgbe bypass" should be in the title and the description.
I'll reword to:

doc: announce move of ethdev bypass function to ixgbe API

In 17.05, nine rte_eth_dev_* functions for bypass control,
and implemented only in ixgbe, will be removed from ethdev,
renamed and moved to the ixgbe PMD-specific API.

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
  2017-01-23 13:04 12% [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost Yuanhan Liu
@ 2017-02-13 18:02  4% ` Thomas Monjalon
  2017-02-14  3:21  4%   ` Jerin Jacob
  2017-02-14 13:54  4% ` Maxime Coquelin
  2017-02-14 20:28  4% ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-13 18:02 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Maxime Coquelin, John McNamara, Ben Walker

2017-01-23 21:04, Yuanhan Liu:
> I made a vhost ABI/API refactoring at v16.04, meant to avoid such issue
> forever. Well, apparently, I lied.
> 
> People are looking for more vhost-user options now days, other than
> vhost-user net only. For example, SPDK (Storage Performance Development
> Kit) are looking for chance of vhost-user SCSI and vhost-user block.
> 
> Apparently, they also need a vhost-user backend, while DPDK already
> has a (mature enough) backend, they don't want to implement it again
> from scratch. They want to leverage the one DPDK provides.
> 
> However, the last refactoring hasn't done that right, at least it's
> not friendly for extending vhost-user to add more devices support.
> For example, different virtio devices has its own feature set, while
> APIs like rte_vhost_feature_disable(feature_mask) have no option to
> tell the device type. Thus, a more proper API should look like:
> 
>     rte_vhost_feature_disable(device_type, feature_mask);
> 
> Besides that, few public files and structures should be renamed, to
> not let it bind to virtio-net. Specifically, they are:
> 
> - virtio_net_device_ops --> vhost_device_ops
> - rte_virtio_net.h      --> rte_vhost.h
> 
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
  2017-02-13 12:00  0% ` Shreyansh Jain
  2017-02-13 14:44  0%   ` Thomas Monjalon
@ 2017-02-13 21:56  0%   ` Jan Blunck
  2017-02-14  5:18  0%     ` Shreyansh Jain
  1 sibling, 1 reply; 200+ results
From: Jan Blunck @ 2017-02-13 21:56 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, nhorman, Thomas Monjalon

On Mon, Feb 13, 2017 at 1:00 PM, Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
> On Monday 13 February 2017 05:25 PM, Shreyansh Jain wrote:
>>
>> EAL PCI layer is planned to be restructured in 17.05 to unlink it from
>> generic structures like eth_driver, rte_cryptodev_driver, and also move
>> it into a PCI Bus.
>>
>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst
>> b/doc/guides/rel_notes/deprecation.rst
>> index fbe2fcb..b12d435 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -13,10 +13,14 @@ Deprecation Notices
>>    has exposed, like the way we have done with uio-pci-generic. This
>> change
>>    targets release 17.05.
>>
>> -* ``eth_driver`` is planned to be removed in 17.02. This currently serves
>> as
>> -  a placeholder for PMDs to register themselves. Changes for ``rte_bus``
>> will
>> -  provide a way to handle device initialization currently being done in
>> -  ``eth_driver``.
>
>
> Just to highlight, above statement was added by me in 16.11.
> As of now I plan to work on removing rte_pci_driver from eth_driver,
> rather than removing eth_driver all together (which, probably, was
> better idea).
> If someone still wishes to work on its complete removal, we can keep
> the above. (and probably remove the below).
>

There is no benefit in keeping eth_driver and removing rte_pci_driver
from it. Technically it isn't even needed today.

>
>> +* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
>> +  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
>> +  bus.
>> +
>> +* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in
>> 17.05.
>> +  This is to unlink the ethernet driver from PCI dependencies.
>> +  Similarly, ``rte_pci_driver`` in planned to be removed from
>> +  ``rte_cryptodev_driver`` in 17.05.
>>
>>  * In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
>>    extended with new function pointer ``tx_pkt_prepare`` allowing
>> verification
>>
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-10 13:59  4% ` Trahe, Fiona
  2017-02-13 16:07  7%   ` Zhang, Roy Fan
@ 2017-02-14  0:21  4%   ` Hemant Agrawal
  2017-02-14  5:11  4%     ` Hemant Agrawal
  1 sibling, 1 reply; 200+ results
From: Hemant Agrawal @ 2017-02-14  0:21 UTC (permalink / raw)
  To: Trahe, Fiona, Zhang, Roy Fan, dev; +Cc: De Lara Guarch, Pablo

On 2/10/2017 7:59 AM, Trahe, Fiona wrote:
> Hi Fan,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
>> Sent: Friday, February 10, 2017 11:39 AM
>> To: dev@dpdk.org
>> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>> Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
>> structure
>>
>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
>> ---
>>  doc/guides/rel_notes/deprecation.rst | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst
>> b/doc/guides/rel_notes/deprecation.rst
>> index 755dc65..564d93a 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -62,3 +62,7 @@ Deprecation Notices
>>    PMDs that implement the latter.
>>    Target release for removal of the legacy API will be defined once most
>>    PMDs have switched to rte_flow.
>> +
>> +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
>> +  The field ``cryptodev_configure_t`` function prototype will be added a
>> +  parameter of a struct rte_cryptodev_config type pointer.
>> --
>> 2.7.4
>
> Can you fix the grammar here please. I'm not sure what the change is?
>
I also find it hard to understand it first. Not perfect, but I tried to 
reword it.

A new parameter ``struct rte_cryptodev_config *config`` will be added to 
the ``cryptodev_configure_t`` function pointer field.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
  2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
@ 2017-02-14  0:32  4% ` Mcnamara, John
  2017-02-14  3:25  4% ` Jerin Jacob
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2017-02-14  0:32 UTC (permalink / raw)
  To: Richardson, Bruce, dev; +Cc: Richardson, Bruce



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Monday, February 13, 2017 5:39 PM
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>
> Subject: [dpdk-dev] [PATCH] doc: add ABI change notification for ring
> library
> 
> Document proposed changes for the rings code in the next release.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: John McNamara <john.mcnamara@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
  2017-02-13 17:57  4%   ` Thomas Monjalon
@ 2017-02-14  3:17  4%     ` Jerin Jacob
  2017-02-14 10:33  4%       ` Iremonger, Bernard
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2017-02-14  3:17 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Bernard Iremonger, dev, john.mcnamara

On Mon, Feb 13, 2017 at 06:57:20PM +0100, Thomas Monjalon wrote:
> 2017-01-05 15:25, Bernard Iremonger:
> > In 17.05 nine rte_eth_dev_* functions will be removed from
> > librte_ether, renamed and moved to the ixgbe PMD.
> > 
> > Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> 
> "ixgbe bypass" should be in the title and the description.
> I'll reword to:
> 
> doc: announce move of ethdev bypass function to ixgbe API
> 
> In 17.05, nine rte_eth_dev_* functions for bypass control,
> and implemented only in ixgbe, will be removed from ethdev,
> renamed and moved to the ixgbe PMD-specific API.
> 
> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change for cloud filter
  2017-01-20 14:57  4%     ` Thomas Monjalon
@ 2017-02-14  3:19  4%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2017-02-14  3:19 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Lu, Wenzhuo, Adrien Mazarguil, Liu, Yong, dev

On Fri, Jan 20, 2017 at 03:57:28PM +0100, Thomas Monjalon wrote:
> 2017-01-20 02:14, Lu, Wenzhuo:
> > Hi Adrien, Thomas, Yong,
> > 
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Adrien Mazarguil
> > > Sent: Friday, January 20, 2017 2:46 AM
> > > To: Thomas Monjalon
> > > Cc: Liu, Yong; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH] doc: announce ABI change for cloud filter
> > > 
> > > On Thu, Jan 19, 2017 at 10:06:34AM +0100, Thomas Monjalon wrote:
> > > > 2017-01-19 13:34, Yong Liu:
> > > > > +* ABI changes are planned for 17.05: structure
> > > > > +``rte_eth_tunnel_filter_conf``
> > > > > +  will be extended with a new member ``vf_id`` in order to enable
> > > > > +cloud filter
> > > > > +  on VF device.
> > > >
> > > > I think we should stop rely on this API, and migrate to rte_flow instead.
> > > > Adrien any thought?
> > > 
> > > I'm all for using rte_flow in any case. I've already documented an approach to
> > > convert TUNNEL filter rules to rte_flow rules [1], although it may be
> > > incomplete due to my limited experience with this filter type. We already
> > > know several tunnel item types must be added (currently only VXLAN is
> > > defined).
> > > 
> > > I understand ixgbe/i40e currently map rte_flow on top of the legacy
> > > framework, therefore extending this structure might still be needed in the
> > > meantime. Not sure we should prevent this change as long as such rules can be
> > > configured through rte_flow as well.
> > > 
> > > [1] http://dpdk.org/doc/guides/prog_guide/rte_flow.html#tunnel-to-eth-ipv4-
> > > ipv6-vxlan-or-other-queue
> > The problem is we haven't finished transferring all the functions from the regular filters to the generic filters. 
> > For example, igb, fm10k and enic haven't support generic filters yet. Ixgbe and i40e have supported the basic functions, but some advance features are not transferred to generic filters yet.
> > Seems it's not the time to remove the regular filters. Yong, I suggest to support both generic filter and regular filter in parallel.
> > So, we need to announce ABI change for the regular filter, until someday we remove the regular filter API. 
> 
> I disagree.
> There is a new API framework (rte_flow) and we must focus on this transition.
> It means we must stop any work on the legacy API.

I agree with Thomas here.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
  2017-02-13 18:02  4% ` Thomas Monjalon
@ 2017-02-14  3:21  4%   ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2017-02-14  3:21 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Yuanhan Liu, dev, Maxime Coquelin, John McNamara, Ben Walker

On Mon, Feb 13, 2017 at 07:02:56PM +0100, Thomas Monjalon wrote:
> 2017-01-23 21:04, Yuanhan Liu:
> > I made a vhost ABI/API refactoring at v16.04, meant to avoid such issue
> > forever. Well, apparently, I lied.
> > 
> > People are looking for more vhost-user options now days, other than
> > vhost-user net only. For example, SPDK (Storage Performance Development
> > Kit) are looking for chance of vhost-user SCSI and vhost-user block.
> > 
> > Apparently, they also need a vhost-user backend, while DPDK already
> > has a (mature enough) backend, they don't want to implement it again
> > from scratch. They want to leverage the one DPDK provides.
> > 
> > However, the last refactoring hasn't done that right, at least it's
> > not friendly for extending vhost-user to add more devices support.
> > For example, different virtio devices has its own feature set, while
> > APIs like rte_vhost_feature_disable(feature_mask) have no option to
> > tell the device type. Thus, a more proper API should look like:
> > 
> >     rte_vhost_feature_disable(device_type, feature_mask);
> > 
> > Besides that, few public files and structures should be renamed, to
> > not let it bind to virtio-net. Specifically, they are:
> > 
> > - virtio_net_device_ops --> vhost_device_ops
> > - rte_virtio_net.h      --> rte_vhost.h
> > 
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> 
> Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
  2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
  2017-02-14  0:32  4% ` Mcnamara, John
@ 2017-02-14  3:25  4% ` Jerin Jacob
  2017-02-14  8:33  4% ` Olivier Matz
  2017-02-14 18:42  4% ` [dpdk-dev] " Thomas Monjalon
  3 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2017-02-14  3:25 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Mon, Feb 13, 2017 at 05:38:30PM +0000, Bruce Richardson wrote:
> Document proposed changes for the rings code in the next release.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index b49e0a0..e715fc7 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,25 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
>  
> +* ring: Changes are planned to rte_ring APIs in release 17.05. Proposed
> +  changes include:
> +    - Removing build time options for the ring:
> +      CONFIG_RTE_RING_SPLIT_PROD_CONS
> +      CONFIG_RTE_RING_PAUSE_REP_COUNT
> +    - Adding an additional parameter to enqueue functions to return the
> +      amount of free space in the ring
> +    - Adding an additional parameter to dequeue functions to return the
> +      number of remaining elements in the ring
> +    - Removing direct support for watermarks in the rings, since the
> +      additional return value from the enqueue function makes it
> +      unneeded
> +    - Adjusting the return values of the bulk() enq/deq functions to
> +      make them consistent with the burst() equivalents. [Note, parameter
> +      to these functions are changing too, per points above, so compiler
> +      will flag them as needing update in legacy code]
> +    - Updates to some library functions e.g. rte_ring_get_memsize() to
> +      allow for variably-sized ring elements.
> +
>  * igb_uio: iomem mapping and sysfs files created for iomem and ioport in
>    igb_uio will be removed, because we are able to detect these from what Linux
>    has exposed, like the way we have done with uio-pci-generic. This change

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure
  2017-02-14  0:21  4%   ` Hemant Agrawal
@ 2017-02-14  5:11  4%     ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2017-02-14  5:11 UTC (permalink / raw)
  To: Hemant Agrawal, Trahe, Fiona, Zhang, Roy Fan, dev; +Cc: De Lara Guarch, Pablo


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hemant Agrawal
> Sent: Monday, February 13, 2017 6:21 PM
> To: Trahe, Fiona <fiona.trahe@intel.com>; Zhang, Roy Fan
> <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> structure
> 
> On 2/10/2017 7:59 AM, Trahe, Fiona wrote:
> > Hi Fan,
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> >> Sent: Friday, February 10, 2017 11:39 AM
> >> To: dev@dpdk.org
> >> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> >> Subject: [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops
> >> structure
> >>
> >> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst | 4 ++++
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/doc/guides/rel_notes/deprecation.rst
> >> b/doc/guides/rel_notes/deprecation.rst
> >> index 755dc65..564d93a 100644
> >> --- a/doc/guides/rel_notes/deprecation.rst
> >> +++ b/doc/guides/rel_notes/deprecation.rst
> >> @@ -62,3 +62,7 @@ Deprecation Notices
> >>    PMDs that implement the latter.
> >>    Target release for removal of the legacy API will be defined once most
> >>    PMDs have switched to rte_flow.
> >> +
> >> +* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops``
> structure.
> >> +  The field ``cryptodev_configure_t`` function prototype will be
> >> +added a
> >> +  parameter of a struct rte_cryptodev_config type pointer.
> >> --
> >> 2.7.4
> >
> > Can you fix the grammar here please. I'm not sure what the change is?
> >
> I also find it hard to understand it first. Not perfect, but I tried to reword it.
> 
> A new parameter ``struct rte_cryptodev_config *config`` will be added to the
> ``cryptodev_configure_t`` function pointer field.
> 

In any case,
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL
  2017-02-13 21:56  0%   ` Jan Blunck
@ 2017-02-14  5:18  0%     ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2017-02-14  5:18 UTC (permalink / raw)
  To: Jan Blunck; +Cc: dev, nhorman, Thomas Monjalon

On Tuesday 14 February 2017 03:26 AM, Jan Blunck wrote:
> On Mon, Feb 13, 2017 at 1:00 PM, Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
>> On Monday 13 February 2017 05:25 PM, Shreyansh Jain wrote:
>>>
>>> EAL PCI layer is planned to be restructured in 17.05 to unlink it from
>>> generic structures like eth_driver, rte_cryptodev_driver, and also move
>>> it into a PCI Bus.
>>>
>>> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>> ---
>>>  doc/guides/rel_notes/deprecation.rst | 12 ++++++++----
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/doc/guides/rel_notes/deprecation.rst
>>> b/doc/guides/rel_notes/deprecation.rst
>>> index fbe2fcb..b12d435 100644
>>> --- a/doc/guides/rel_notes/deprecation.rst
>>> +++ b/doc/guides/rel_notes/deprecation.rst
>>> @@ -13,10 +13,14 @@ Deprecation Notices
>>>    has exposed, like the way we have done with uio-pci-generic. This
>>> change
>>>    targets release 17.05.
>>>
>>> -* ``eth_driver`` is planned to be removed in 17.02. This currently serves
>>> as
>>> -  a placeholder for PMDs to register themselves. Changes for ``rte_bus``
>>> will
>>> -  provide a way to handle device initialization currently being done in
>>> -  ``eth_driver``.
>>
>>
>> Just to highlight, above statement was added by me in 16.11.
>> As of now I plan to work on removing rte_pci_driver from eth_driver,
>> rather than removing eth_driver all together (which, probably, was
>> better idea).
>> If someone still wishes to work on its complete removal, we can keep
>> the above. (and probably remove the below).
>>
>
> There is no benefit in keeping eth_driver and removing rte_pci_driver
> from it. Technically it isn't even needed today.

I agree with you.
I stopped working on it because I realized that removing it means making
pci_probe call eth_dev_init handlers directly. Or, restructure the whole
of pci probe stack - which, because of pending PCI bus implementation,
was slightly tentative.

Changes are already expected in EAL PCI code for bus movement, probably
this task can be combined with that.

>
>>
>>> +* ABI/API changes are planned for 17.05 for PCI subsystem. This is to
>>> +  unlink EAL dependency on PCI and to move PCI devices to a PCI specific
>>> +  bus.
>>> +
>>> +* ``rte_pci_driver`` is planned to be removed from ``eth_driver`` in
>>> 17.05.
>>> +  This is to unlink the ethernet driver from PCI dependencies.
>>> +  Similarly, ``rte_pci_driver`` in planned to be removed from
>>> +  ``rte_cryptodev_driver`` in 17.05.
>>>
>>>  * In 17.02 ABI changes are planned: the ``rte_eth_dev`` structure will be
>>>    extended with new function pointer ``tx_pkt_prepare`` allowing
>>> verification
>>>
>>
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization
  2017-02-07 14:12  2% ` [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization Bruce Richardson
@ 2017-02-14  8:32  3%   ` Olivier Matz
  2017-02-14  9:39  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2017-02-14  8:32 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: thomas.monjalon, keith.wiles, konstantin.ananyev, stephen, dev

Hi Bruce,

On Tue,  7 Feb 2017 14:12:38 +0000, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> This patchset make a set of, sometimes non-backward compatible,
> cleanup changes to the rte_ring code in order to improve it. The
> resulting code is shorter*, since the existing functions are
> restructured to reduce code duplication, as well as being more
> consistent in behaviour. The specific changes made are explained in
> each patch which makes that change.
> 
> Key incompatibilities:
> * The biggest, and probably most controversial change is that to the
>   enqueue and dequeue APIs. The enqueue/deq burst and bulk functions
> have their function prototypes changed so that they all return an
> additional parameter, indicating the size of next call which is
> guaranteed to succeed. In case on enq, this is the number of
> available slots on the ring, and in case of deq, it is the number of
> objects which can be pulled. As well as this, the return value from
> the bulk functions have been changed to make them compatible with the
> burst functions. In all cases, the functions to enq/deq a set of objs
> now return the number of objects processed, 0 or N, in the case of
> bulk functions, 0, N or any value in between in the case of the burst
> ones. [Due to the extra parameter, the compiler will flag all
> instances of the function to allow the user to also change the return
> value logic at the same time]
> * The parameters to the single object enq/deq functions have not been 
>   changed. Because of that, the return value is also unmodified - as
> the compiler cannot automatically flag this to the user.
> 
> Potential further cleanups:
> * To a certain extent the rte_ring structure has gone from being a
> whole ring structure, including a "ring" element itself, to just
> being a header which can be reused, along with the head/tail update
> functions to create new rings. For now, the enqueue code works by
> assuming that the ring data goes immediately after the header, but
> that can be changed to allow specialised ring implementations to put
> additional metadata of their own after the ring header. I didn't see
> this as being needed right now, but it may be worth considering for a
> V1 patchset.
> * There are 9 enqueue functions and 9 dequeue functions in
> rte_ring.h. I suspect not all of those are used, so personally I
> would consider dropping the functions to enqueue/dequeue a single
> value using single or multi semantics, i.e. drop 
>     rte_ring_sp_enqueue
>     rte_ring_mp_enqueue
>     rte_ring_sc_dequeue
>     rte_ring_mc_dequeue
>   That would still leave a single enqueue and dequeue function for
> working with a single object at a time.
> * It should be possible to merge the head update code for enqueue and
>   dequeue into a single function. The key difference between the two
> is the calculation of how far the index can be moved. I felt that the
>   functions for moving the head index are sufficiently complicated
> with many parameters to them already, that trying to merge in more
> code would impede readability. However, if so desired this change can
> be made at a later stage without affecting ABI or API.
> 
> PERFORMANCE:
> I've run performance autotests on a couple of (Intel) platforms.
> Looking particularly at the core-2-core results, which I expect are
> the main ones of interest, the performance after this patchset is a
> few cycles per packet faster in my testing. I'm hoping it should be
> at least neutral perf-wise.
> 
> REQUEST FOR FEEDBACK:
> * Are all of these changes worth making?

I've quickly browsed all the patches. I think yes, we should do it: it
brings a good cleanup, removing features we don't need, restructuring
the code, and also adding the feature you need :)


> * Should they be made in existing ring code, or do we look to provide
> a new fifo library to completely replace the ring one?

I think it's ok to have it in the existing code. Breaking the ABI
is never suitable, but I think having 2 libs would be even more
confusing.


> * How does the implementation of new ring types using this code
> compare vs that of the previous RFCs?

I prefer this version, especially compared to the first RFC.


Thanks for this big rework. I'll dive into the patches a do a more
exhaustive review soon.

Regards,
Olivier

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
  2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
  2017-02-14  0:32  4% ` Mcnamara, John
  2017-02-14  3:25  4% ` Jerin Jacob
@ 2017-02-14  8:33  4% ` Olivier Matz
  2017-02-14 11:43  4%   ` Hemant Agrawal
  2017-02-14 18:42  4% ` [dpdk-dev] " Thomas Monjalon
  3 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2017-02-14  8:33 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Mon, 13 Feb 2017 17:38:30 +0000, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> Document proposed changes for the rings code in the next release.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization
  2017-02-14  8:32  3%   ` Olivier Matz
@ 2017-02-14  9:39  0%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-14  9:39 UTC (permalink / raw)
  To: Olivier Matz
  Cc: thomas.monjalon, keith.wiles, konstantin.ananyev, stephen, dev

On Tue, Feb 14, 2017 at 09:32:20AM +0100, Olivier Matz wrote:
> Hi Bruce,
> 
> On Tue,  7 Feb 2017 14:12:38 +0000, Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > This patchset make a set of, sometimes non-backward compatible,
> > cleanup changes to the rte_ring code in order to improve it. The
> > resulting code is shorter*, since the existing functions are
> > restructured to reduce code duplication, as well as being more
> > consistent in behaviour. The specific changes made are explained in
> > each patch which makes that change.
> > 
> > Key incompatibilities:
> > * The biggest, and probably most controversial change is that to the
> >   enqueue and dequeue APIs. The enqueue/deq burst and bulk functions
> > have their function prototypes changed so that they all return an
> > additional parameter, indicating the size of next call which is
> > guaranteed to succeed. In case on enq, this is the number of
> > available slots on the ring, and in case of deq, it is the number of
> > objects which can be pulled. As well as this, the return value from
> > the bulk functions have been changed to make them compatible with the
> > burst functions. In all cases, the functions to enq/deq a set of objs
> > now return the number of objects processed, 0 or N, in the case of
> > bulk functions, 0, N or any value in between in the case of the burst
> > ones. [Due to the extra parameter, the compiler will flag all
> > instances of the function to allow the user to also change the return
> > value logic at the same time]
> > * The parameters to the single object enq/deq functions have not been 
> >   changed. Because of that, the return value is also unmodified - as
> > the compiler cannot automatically flag this to the user.
> > 
> > Potential further cleanups:
> > * To a certain extent the rte_ring structure has gone from being a
> > whole ring structure, including a "ring" element itself, to just
> > being a header which can be reused, along with the head/tail update
> > functions to create new rings. For now, the enqueue code works by
> > assuming that the ring data goes immediately after the header, but
> > that can be changed to allow specialised ring implementations to put
> > additional metadata of their own after the ring header. I didn't see
> > this as being needed right now, but it may be worth considering for a
> > V1 patchset.
> > * There are 9 enqueue functions and 9 dequeue functions in
> > rte_ring.h. I suspect not all of those are used, so personally I
> > would consider dropping the functions to enqueue/dequeue a single
> > value using single or multi semantics, i.e. drop 
> >     rte_ring_sp_enqueue
> >     rte_ring_mp_enqueue
> >     rte_ring_sc_dequeue
> >     rte_ring_mc_dequeue
> >   That would still leave a single enqueue and dequeue function for
> > working with a single object at a time.
> > * It should be possible to merge the head update code for enqueue and
> >   dequeue into a single function. The key difference between the two
> > is the calculation of how far the index can be moved. I felt that the
> >   functions for moving the head index are sufficiently complicated
> > with many parameters to them already, that trying to merge in more
> > code would impede readability. However, if so desired this change can
> > be made at a later stage without affecting ABI or API.
> > 
> > PERFORMANCE:
> > I've run performance autotests on a couple of (Intel) platforms.
> > Looking particularly at the core-2-core results, which I expect are
> > the main ones of interest, the performance after this patchset is a
> > few cycles per packet faster in my testing. I'm hoping it should be
> > at least neutral perf-wise.
> > 
> > REQUEST FOR FEEDBACK:
> > * Are all of these changes worth making?
> 
> I've quickly browsed all the patches. I think yes, we should do it: it
> brings a good cleanup, removing features we don't need, restructuring
> the code, and also adding the feature you need :)
> 
> 
> > * Should they be made in existing ring code, or do we look to provide
> > a new fifo library to completely replace the ring one?
> 
> I think it's ok to have it in the existing code. Breaking the ABI
> is never suitable, but I think having 2 libs would be even more
> confusing.
> 
> 
> > * How does the implementation of new ring types using this code
> > compare vs that of the previous RFCs?
> 
> I prefer this version, especially compared to the first RFC.
> 
> 
> Thanks for this big rework. I'll dive into the patches a do a more
> exhaustive review soon.
> 
Great, thanks. I'm aware of a few things that already need to be cleaned
up for V1 e.g. comments are not always correctly updated on functions.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
  2017-02-14  3:17  4%     ` Jerin Jacob
@ 2017-02-14 10:33  4%       ` Iremonger, Bernard
  0 siblings, 0 replies; 200+ results
From: Iremonger, Bernard @ 2017-02-14 10:33 UTC (permalink / raw)
  To: Jerin Jacob, Thomas Monjalon; +Cc: dev, Mcnamara, John



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, February 14, 2017 3:17 AM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>
> Cc: Iremonger, Bernard <bernard.iremonger@intel.com>; dev@dpdk.org;
> Mcnamara, John <john.mcnamara@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for
> ethdev
> 
> On Mon, Feb 13, 2017 at 06:57:20PM +0100, Thomas Monjalon wrote:
> > 2017-01-05 15:25, Bernard Iremonger:
> > > In 17.05 nine rte_eth_dev_* functions will be removed from
> > > librte_ether, renamed and moved to the ixgbe PMD.
> > >
> > > Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> >
> > "ixgbe bypass" should be in the title and the description.
> > I'll reword to:
> >
> > doc: announce move of ethdev bypass function to ixgbe API
> >
> > In 17.05, nine rte_eth_dev_* functions for bypass control, and
> > implemented only in ixgbe, will be removed from ethdev, renamed and
> > moved to the ixgbe PMD-specific API.
> >
> > Acked-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev ops structure
  2017-02-10 11:39  9% [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure Fan Zhang
  2017-02-10 13:59  4% ` Trahe, Fiona
@ 2017-02-14 10:41  9% ` Fan Zhang
  2017-02-14 10:48  4%   ` Doherty, Declan
  2017-02-14 20:37  4%   ` Thomas Monjalon
  1 sibling, 2 replies; 200+ results
From: Fan Zhang @ 2017-02-14 10:41 UTC (permalink / raw)
  To: dev; +Cc: declan.doherty

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
v2:
Rework the grammar

 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index b49e0a0..d64858f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -62,3 +62,7 @@ Deprecation Notices
   PMDs that implement the latter.
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
+
+* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
+  A pointer to a rte_cryptodev_config structure will be added to the
+  function prototype "cryptodev_configuret_t, as a new parameter.
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] crypto drivers in the API
  @ 2017-02-14 10:44  4% ` Doherty, Declan
  2017-02-14 11:04  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Doherty, Declan @ 2017-02-14 10:44 UTC (permalink / raw)
  To: Thomas Monjalon, Declan Doherty; +Cc: dev

On 13/02/2017 1:25 PM, Thomas Monjalon wrote:
> In the crypto API, the drivers are listed.
> In my opinion, it is a wrong designed and these lists should be removed.
> Do we need a deprecation notice to plan this removal in 17.05, while
> working on bus abstraction?
>
...
>

Hey Thomas,
I agree that these need to be removed, and I had planned on doing this 
for 17.05 but I have a concern on the requirements for ABI breakage in 
relation to this. This enum is unfortunately used in both the 
rte_cryptodev and rte_crypto_sym_session structures which are part of 
the libraries public API. I don't think it would be feasible to maintain 
a set of 17.02 compatible APIs with the changes this would introduce, as 
it would require a large number of functions to have 2 versions? Is it 
OK to break the ABI for this case?

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev ops structure
  2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
@ 2017-02-14 10:48  4%   ` Doherty, Declan
  2017-02-14 11:03  4%     ` De Lara Guarch, Pablo
  2017-02-14 20:37  4%   ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Doherty, Declan @ 2017-02-14 10:48 UTC (permalink / raw)
  To: Fan Zhang, dev

On 14/02/2017 10:41 AM, Fan Zhang wrote:
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
...
>

Acked-by: Declan Doherty <declan.doherty@intel.com>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] Further fun with ABI tracking
@ 2017-02-14 10:52  8% Christian Ehrhardt
  2017-02-14 16:19  4% ` Bruce Richardson
  2017-02-14 20:31  9% ` Jan Blunck
  0 siblings, 2 replies; 200+ results
From: Christian Ehrhardt @ 2017-02-14 10:52 UTC (permalink / raw)
  To: dev; +Cc: cjcollier, ricardo.salveti, Luca Boccassi

Hi,
when moving to DPDK 16.11 Debian/Ubuntu packaging of DPDK has hit a new
twist on the (it seems reoccurring) topic of DPDK ABI tracking.

I have found, ... well I don't want to call it solution ..., let's say a
crutch to get around it for the moment. But I wanted to use the example I
had to share a few thoughts on it and to kick off a wider discussion.


*## In library cross-dependencies plus partial ABI bumps ##*

Since the day moving away from the combined shared library we had several
improvements on tracking the ABI versions. These days [1] we have LIBABIVER
per library and it gets bumped to reflect it is breaking with former
versions e.g. removing symbols.

Now in the 16.11 release the ABIs for cryptodev, eal and ethdev got bumped
by [2] and [3].

OTOH please remember that in general two versions of a shared library in
the usual sense are meant to be able to stay alongside on a system without
hurting each other. I picked a random one on my system.
Package              Library
libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160
libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160.0.0
libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95
libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95.5.0
Some link against the new, some against the old library - all fine.
Usually most programs can just be rebuilt against the new library and after
some time the old one can be dropped. That mechanism gives downstream
distributions a way to handle transitions and consumers of libraries which
might not all be ready for the same version every time.
And since the per lib versioning with LIBABIVER and and the version maps we
are good - in fact we qualify for all common cases on [4].

Now in DPDK of those libraries that got an ABI bump eal and ethdev are part
of those which most of us consider "core libraries" and most other libs and
pmds link to them.
And here DPDK continues to be special, due to that inter-dependency with
old and new libraries installed on the same system the following happens on
openvswitch built for an older version of dpdk:
ovs-vswitchd-dpdk
    librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2
    librte_pdump.so.1 => /usr/lib/x86_64-linux-gnu/librte_pdump.so.1
        librte_eal.so.3 => /usr/lib/x86_64-linux-gnu/librte_eal.so.3

You can see that Openvswitch itself depends on the "old" librte_eal.so.2.
But because  librte_pdump.so.1 did not get an ABI bump it got upgraded to
the newer version from DPDK 16.11.
But since the "new" pdump got built with the new DPDK 16.11 it depends on
the "new" librte_eal.so.3.
And having both in the same executable space at the same time causes
segfaults and pain.

As I said for now I have passed the issue with a crutch that I'm not proud
of and I'd like to avoid in the future. For that I'm reaching out to you
with several suggestions to discuss.


*## Thoughts ##*
None of these seems like a perfect solution to me yet, but clearly good to
start discussions on them.

Options that were in discussion so far and that we might adopt next cycle
(some of these are upstream changes, some downstream, some require both to
change - but any of them should have an ack upstream so that we are
agreeing how to proceed with those cases).

1. Downstreams to insert Major version into soname
Distributions could insert the DPDK major version (like 16.11) into the
soname and package names. A common example of this is libboost [5].
That would perfectly allow 16.07.<LIBABIVER> to coexist with
16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
Yet it would mean that anything depending on the old library will have to
be recompiled to pick up the new code, even if it depends on an ABI that is
still present in the new release.
Also - not a technical reason - but it is clearly more work to force update
all dependencies and clean out old packages for every release.


2. ABI Ranges
One could argue that due to the detailed tracking of functions DPDK is
already close to track not ABI levels but actually ABI ranges. DPDK could
track LIBABIVERMIN and LIBABIVER.
Every time functionality is added LIBABIVER would get bumped, but
LIBABIVERMIN only gets moved to the OLDEST still supported ABI when things
are dropped.
So on a given library librte_foo you could have LIBABIVER=5 and
LIBABIVERMIN=3. The make install would then install the shared lib as:
librte_foo.so.5
and additionally links for all compatible versions:
librte_foo.so.3 -> librte_foo.so.5
librte_foo.so.4 -> librte_foo.so.5
Yet, while is has some nice attributes this might make DPDK even more
special and cause ABI level proliferation over time.
Also even with this in place, changes moving LIBABIVERMIN "too fast" (too
fast is different for each downstream) could still cause an issue like the
one I initially described.


3. A lot of conflicts
In packaging one can declare a package to conflict with another package [6].
Now we could declare e.g. librte_eal3 to conflict with librte_eal2 (and the
same for all other bumps).
That would make them not coinstallable, and working on a new release would
mean that all former consumers would become not installable as well and
have to be rebuilt before they all could migrate [7] together.
That "works" in some sense, but it denies the whole purpose of versioned
library packages (to be coninstallable, to allow different library
consumers to depent on different versions)


4. ABI bump is infecting
Another way might be to also bump any dependent DPDK library.
So when core libs like eal are ABI bumped likely all libs would get a bump.
If only e.g. mempool gets a bump only those other parts using it would be
bumped as well.
To some extend this might still proliferate ABI versions more than one
would like.
Also it surely is hard to track if not automated - think of dependencies
that are existing only in certain config cases.

5. back to single ABI
For the sake of giving everybody a chance to re-open old wounds I wanted to
mention that DPDK could also decide to go back to a single ABI again.
This could (but doesn't have to!) be combined with having a single .so file
again.
To decide for this might be a much cleaner and easier to track way to #4.

6. More
I'm sure there are more approaches to this, feel free to come up with more.

I'm sure my five suggestions alone will make the thread messy, Maybe we do
this in two rounds, sorting out the insane and identifying the preferred
ones to then in a second run focus on discussing and maybe implementing the
details of what we like.


[1]: http://dpdk.org/browse/dpdk/tree/doc/guides/contributing/versioning.rst
[2]: http://dpdk.org/browse/dpdk/commit/?id=d7e61ad3ae36
[3]: http://dpdk.org/browse/dpdk/commit/?id=6ba1affa54108
[4]: https://wiki.debian.org/TransitionBestPractices
[5]: https://packages.debian.org/sid/libboost1.62-dev
[6]:
https://www.debian.org/doc/debian-policy/ch-relationships.html#s-conflicts
[7]: https://wiki.ubuntu.com/ProposedMigration

P.S. I beg a pardon for the wall of text

-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev ops structure
  2017-02-14 10:48  4%   ` Doherty, Declan
@ 2017-02-14 11:03  4%     ` De Lara Guarch, Pablo
  0 siblings, 0 replies; 200+ results
From: De Lara Guarch, Pablo @ 2017-02-14 11:03 UTC (permalink / raw)
  To: Doherty, Declan, Zhang, Roy Fan, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Doherty, Declan
> Sent: Tuesday, February 14, 2017 10:48 AM
> To: Zhang, Roy Fan; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev
> ops structure
> 
> On 14/02/2017 10:41 AM, Fan Zhang wrote:
> > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> > ---
> ...
> >
> 
> Acked-by: Declan Doherty <declan.doherty@intel.com>

Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] crypto drivers in the API
  2017-02-14 10:44  4% ` Doherty, Declan
@ 2017-02-14 11:04  0%   ` Thomas Monjalon
  2017-02-14 14:46  4%     ` Doherty, Declan
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2017-02-14 11:04 UTC (permalink / raw)
  To: Doherty, Declan, dev

2017-02-14 10:44, Doherty, Declan:
> On 13/02/2017 1:25 PM, Thomas Monjalon wrote:
> > In the crypto API, the drivers are listed.
> > In my opinion, it is a wrong designed and these lists should be removed.
> > Do we need a deprecation notice to plan this removal in 17.05, while
> > working on bus abstraction?
> >
> ...
> >
> 
> Hey Thomas,
> I agree that these need to be removed, and I had planned on doing this 
> for 17.05 but I have a concern on the requirements for ABI breakage in 
> relation to this. This enum is unfortunately used in both the 
> rte_cryptodev and rte_crypto_sym_session structures which are part of 
> the libraries public API. I don't think it would be feasible to maintain 
> a set of 17.02 compatible APIs with the changes this would introduce, as 
> it would require a large number of functions to have 2 versions? Is it 
> OK to break the ABI for this case?

Yes
If you were planning to do this, you should have sent a deprecation notice
few weeks ago.
Please send it now and we'll see if we have enough supporters shortly.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring library
  2017-02-14  8:33  4% ` Olivier Matz
@ 2017-02-14 11:43  4%   ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2017-02-14 11:43 UTC (permalink / raw)
  To: Olivier Matz, Bruce Richardson; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Tuesday, February 14, 2017 2:34 AM
> To: Bruce Richardson <bruce.richardson@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: add ABI change notification for ring
> library
> 
> On Mon, 13 Feb 2017 17:38:30 +0000, Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> > Document proposed changes for the rings code in the next release.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
  2017-01-23 13:04 12% [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost Yuanhan Liu
  2017-02-13 18:02  4% ` Thomas Monjalon
@ 2017-02-14 13:54  4% ` Maxime Coquelin
  2017-02-14 20:28  4% ` Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2017-02-14 13:54 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: Thomas Monjalon, John McNamara

Hi Yuanhan,

On 01/23/2017 02:04 PM, Yuanhan Liu wrote:
> I made a vhost ABI/API refactoring at v16.04, meant to avoid such issue
> forever. Well, apparently, I lied.
>
> People are looking for more vhost-user options now days, other than
> vhost-user net only. For example, SPDK (Storage Performance Development
> Kit) are looking for chance of vhost-user SCSI and vhost-user block.
>
> Apparently, they also need a vhost-user backend, while DPDK already
> has a (mature enough) backend, they don't want to implement it again
> from scratch. They want to leverage the one DPDK provides.
>
> However, the last refactoring hasn't done that right, at least it's
> not friendly for extending vhost-user to add more devices support.
> For example, different virtio devices has its own feature set, while
> APIs like rte_vhost_feature_disable(feature_mask) have no option to
> tell the device type. Thus, a more proper API should look like:
>
>     rte_vhost_feature_disable(device_type, feature_mask);

I wonder if we could also change it to be per-instance, instead of
disabling features globally:
rte_vhost_feature_disable(vid, device_type, feature_mask);

It could be useful for live-migration with different backend versions on
the hosts, as it would allow to run instances with different compat
modes (like running vhost's DPDK v17.08 with v17.05-only supported
features).
I made a proposal about cross-version migration, but we are far from a
conclusion on the design.

>
> Besides that, few public files and structures should be renamed, to
> not let it bind to virtio-net. Specifically, they are:
>
> - virtio_net_device_ops --> vhost_device_ops
> - rte_virtio_net.h      --> rte_vhost.h
>
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Anyway, the change you propose is necessary:
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] crypto drivers in the API
  2017-02-14 11:04  0%   ` Thomas Monjalon
@ 2017-02-14 14:46  4%     ` Doherty, Declan
  2017-02-14 15:47  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Doherty, Declan @ 2017-02-14 14:46 UTC (permalink / raw)
  To: Thomas Monjalon, dev, Pablo DeLara Guarch

On 14/02/2017 11:04 AM, Thomas Monjalon wrote:
> 2017-02-14 10:44, Doherty, Declan:
>> On 13/02/2017 1:25 PM, Thomas Monjalon wrote:
>>> In the crypto API, the drivers are listed.
>>> In my opinion, it is a wrong designed and these lists should be removed.
>>> Do we need a deprecation notice to plan this removal in 17.05, while
>>> working on bus abstraction?
>>>
>> ...
>>>
...
>
> Yes
> If you were planning to do this, you should have sent a deprecation notice
> few weeks ago.
> Please send it now and we'll see if we have enough supporters shortly.
>

Thomas, there are a couple of other changes we are looking at in the 
cryptodev which would require API changes as well as break ABI including 
adding support for a multi-device sessions, and changes to crypto 
operation layout and field changes for performance but these but will 
require RFCs or at least more discussion of the proposals. Given the 
time constrains for the V1 deadline for 17.05 I would prefer to work on 
the RFCs and get them out as soon as possible over the next few weeks 
and then make all the ABI breaking changes in R17.08 in a single release.

Otherwise we will end up breaking ABI 2 release in a row which I would 
like to avoid if possible.

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1] doc: update release notes for 17.02
@ 2017-02-14 15:32  4% John McNamara
  2017-02-14 16:26  2% ` [dpdk-dev] [PATCH v2] " John McNamara
  0 siblings, 1 reply; 200+ results
From: John McNamara @ 2017-02-14 15:32 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 16721 bytes --]

Fix grammar, spelling and formatting of DPDK 17.02 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---

Note: The "ABI Changes" section is currently empty.


 doc/guides/rel_notes/release_17_02.rst | 255 ++++++++++++++-------------------
 1 file changed, 111 insertions(+), 144 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 6420a87..b7c188a 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -40,46 +40,47 @@ New Features
 
 * **Added support for representing buses in EAL**
 
-  A new structure ``rte_bus`` is introduced in EAL. This allows for devices to
-  be represented by buses they are connected to. A new bus can be added to
-  DPDK by extending the ``rte_bus`` structure and implementing the scan and
-  probe functions. Once a new bus is registered using provided APIs, new
-  devices can be detected and initialized using bus scan and probe callbacks.
+  The ``rte_bus`` structure was introduced into the EAL. This allows for
+  devices to be represented by buses they are connected to. A new bus can be
+  added to DPDK by extending the ``rte_bus`` structure and implementing the
+  scan and probe functions. Once a new bus is registered using the provided
+  APIs, new devices can be detected and initialized using bus scan and probe
+  callbacks.
 
-  With this change, devices other than PCI or VDEV type can also be represented
-  in DPDK framework.
+  With this change, devices other than PCI or VDEV type can be represented
+  in the DPDK framework.
 
 * **Added generic EAL API for I/O device memory read/write operations.**
 
-  This API introduces 8-bit, 16-bit, 32bit, 64bit I/O device
-  memory read/write operations along with the relaxed versions.
+  This API introduces 8 bit, 16 bit, 32 bit and 64 bit I/O device
+  memory read/write operations along with "relaxed" versions.
 
-  The weakly-ordered machine like ARM needs additional I/O barrier for
-  device memory read/write access over PCI bus.
-  By introducing the EAL abstraction for I/O device memory read/write access,
-  The drivers can access I/O device memory in architecture-agnostic manner.
-  The relaxed version does not have additional I/O memory barrier, useful in
-  accessing the device registers of integrated controllers which
-  implicitly strongly ordered with respect to memory access.
+  Weakly-ordered architectures like ARM need an additional I/O barrier for
+  device memory read/write access over PCI bus. By introducing the EAL
+  abstraction for I/O device memory read/write access, the drivers can access
+  I/O device memory in an architecture-agnostic manner. The relaxed version
+  does not have an additional I/O memory barrier, which is useful in accessing
+  the device registers of integrated controllers which is implicitly strongly
+  ordered with respect to memory access.
 
 * **Added generic flow API (rte_flow).**
 
   This API provides a generic means to configure hardware to match specific
-  ingress or egress traffic, alter its fate and query related counters
+  ingress or egress traffic, alter its behavior and query related counters
   according to any number of user-defined rules.
 
-  It is slightly higher-level than the legacy filtering framework which it
-  encompasses and supersedes (including all functions and filter types) in
-  order to expose a single interface with an unambiguous behavior that is
-  common to all poll-mode drivers (PMDs).
+  In order to expose a single interface with an unambiguous behavior that is
+  common to all poll-mode drivers (PMDs) the ``rte_flow`` API is slightly
+  higher-level than the legacy filtering framework, which it encompasses and
+  supersedes (including all functions and filter types) .
 
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
 * **Added firmware version get API.**
 
-  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
-  version by a given device.
+  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch the firmware
+  version for a given device.
 
 * **Added APIs for MACsec offload support to the ixgbe PMD.**
 
@@ -90,54 +91,58 @@ New Features
 
   Added support for I219 Intel 1GbE NICs.
 
-* **Added VF Daemon (VFD) on i40e. - EXPERIMENTAL**
-
-  This's an EXPERIMENTAL feature to enhance the capability of DPDK PF as many
-  VF management features are not supported by kernel PF driver.
-  Some new private APIs are implemented in PMD without abstrction layer.
-  They can be used directly by some users who have the need.
-
-  The new APIs to control VFs directly from PF include,
-  1) set VF MAC anti-spoofing
-  2) set VF VLAN anti-spoofing
-  3) set TX loopback
-  4) set VF unicast promiscuous mode
-  5) set VF multicast promiscuous mode
-  6) set VF MTU
-  7) get/reset VF stats
-  8) set VF MAC address
-  9) set VF VLAN stripping
-  10) VF VLAN insertion
-  12) set VF broadcast mode
-  13) set VF VLAN tag
-  14) set VF VLAN filter
-  VFD also includes VF to PF mailbox message management by APP.
-  When PF receives mailbox messages from VF, PF should call the callback
-  provided by APP to know if they're permitted to be processed.
-
-  As an EXPERIMENTAL feature, please aware it can be changed or even
+* **Added VF Daemon (VFD) for i40e. - EXPERIMENTAL**
+
+  This is an EXPERIMENTAL feature to enhance the capability of the DPDK PF as
+  many VF management features are not currently supported by the kernel PF
+  driver. Some new private APIs are implemented directly in the PMD without an
+  abstraction layer. They can be used directly by some users who have the
+  need.
+
+  The new APIs to control VFs directly from PF include:
+
+  * Set VF MAC anti-spoofing.
+  * Set VF VLAN anti-spoofing.
+  * Set TX loopback.
+  * Set VF unicast promiscuous mode.
+  * Set VF multicast promiscuous mode.
+  * Set VF MTU.
+  * Get/reset VF stats.
+  * Set VF MAC address.
+  * Set VF VLAN stripping.
+  * Vf VLAN insertion.
+  * Set VF broadcast mode.
+  * Set VF VLAN tag.
+  * Set VF VLAN filter.
+
+  VFD also includes VF to PF mailbox message management from an application.
+  When the PF receives mailbox messages from the VF the PF should call the
+  callback provided by the application to know if they're permitted to be
+  processed.
+
+  As an EXPERIMENTAL feature, please be aware it can be changed or even
   removed without prior notice.
 
 * **Updated the i40e base driver.**
 
-  updated the i40e base driver, including the following changes:
+  Updated the i40e base driver, including the following changes:
 
-  * replace existing legacy memcpy() calls with i40e_memcpy() calls.
-  * use BIT() macro instead of bit fields
-  * add clear all WoL filters implementation
-  * add broadcast promiscuous control per VLAN
-  * remove unused X722_SUPPORT and I40E_NDIS_SUPPORT MARCOs
+  * Replace existing legacy ``memcpy()`` calls with ``i40e_memcpy()`` calls.
+  * Use ``BIT()`` macro instead of bit fields.
+  * Add clear all WoL filters implementation.
+  * Add broadcast promiscuous control per VLAN.
+  * Remove unused ``X722_SUPPORT`` and ``I40E_NDIS_SUPPORT`` macros.
 
 * **Updated the enic driver.**
 
-  * Set new Rx checksum flags in mbufs to indicate unknown, good or bad.
+  * Set new Rx checksum flags in mbufs to indicate unknown, good or bad checksums.
   * Fix set/remove of MAC addresses. Allow up to 64 addresses per device.
   * Enable TSO on outer headers.
 
 * **Added Solarflare libefx-based network PMD.**
 
-  A new network PMD which supports Solarflare SFN7xxx and SFN8xxx family
-  of 10/40 Gbps adapters has been added.
+  Added a new network PMD which supports Solarflare SFN7xxx and SFN8xxx family
+  of 10/40 Gbps adapters.
 
 * **Updated the mlx4 driver.**
 
@@ -145,8 +150,8 @@ New Features
 
 * **Added support for Mellanox ConnectX-5 adapters (mlx5).**
 
-  Support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps adapters
-  has been added to the existing mlx5 PMD.
+  Added support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps
+  adapters to the existing mlx5 PMD.
 
 * **Updated the mlx5 driver.**
 
@@ -161,47 +166,47 @@ New Features
 
 * **virtio-user with vhost-kernel as another exceptional path.**
 
-  Previously, we upstreamed a virtual device, virtio-user with vhost-user
-  as the backend, as a way for IPC (Inter-Process Communication) and user
+  Previously, we upstreamed a virtual device, virtio-user with vhost-user as
+  the backend as a way of enabling IPC (Inter-Process Communication) and user
   space container networking.
 
-  Virtio-user with vhost-kernel as the backend is a solution for exceptional
-  path, such as KNI, which exchanges packets with kernel networking stack.
+  Virtio-user with vhost-kernel as the backend is a solution for the exception
+  path, such as KNI, which exchanges packets with the kernel networking stack.
   This solution is very promising in:
 
-  * maintenance: vhost and vhost-net (kernel) is upstreamed and extensively
+  * Maintenance: vhost and vhost-net (kernel) is an upstreamed and extensively
     used kernel module.
-  * features: vhost-net is born to be a networking solution, which has
+  * Features: vhost-net is designed to be a networking solution, which has
     lots of networking related features, like multi-queue, TSO, multi-seg
     mbuf, etc.
-  * performance: similar to KNI, this solution would use one or more
+  * Performance: similar to KNI, this solution would use one or more
     kthreads to send/receive packets from user space DPDK applications,
     which has little impact on user space polling thread (except that
     it might enter into kernel space to wake up those kthreads if
     necessary).
 
-* **Added virtio Rx interrupt suppprt.**
+* **Added virtio Rx interrupt support.**
 
-  This feature enables Rx interrupt mode for virtio pci net devices as
-  binded to VFIO (noiommu mode) and drived by virtio PMD.
+  Added a feature to enable Rx interrupt mode for virtio pci net devices as
+  bound to VFIO (noiommu mode) and driven by virtio PMD.
 
-  With this feature, virtio PMD can switch between polling mode and
+  With this feature, the virtio PMD can switch between polling mode and
   interrupt mode, to achieve best performance, and at the same time save
-  power. It can work on both legacy and modern virtio devices. At this mode,
-  each rxq is mapped with an exluded MSIx interrupt.
+  power. It can work on both legacy and modern virtio devices. In this mode,
+  each ``rxq`` is mapped with an excluded MSIx interrupt.
 
   See the :ref:`Virtio Interrupt Mode <virtio_interrupt_mode>` documentation
   for more information.
 
 * **Added ARMv8 crypto PMD.**
 
-  A new crypto PMD has been added, which provides combined mode cryptografic
+  A new crypto PMD has been added, which provides combined mode cryptographic
   operations optimized for ARMv8 processors. The driver can be used to enhance
   performance in processing chained operations such as cipher + HMAC.
 
 * **Updated the QAT PMD.**
 
-  The QAT PMD was updated with additional support for:
+  The QAT PMD has been updated with additional support for:
 
   * DES algorithm.
   * Scatter-gather list (SGL) support.
@@ -210,100 +215,61 @@ New Features
 
   * The Intel(R) Multi Buffer Crypto for IPsec library used in
     AESNI MB PMD has been moved to a new repository, in GitHub.
-  * Support for single operations (cipher only and authentication only).
+  * Support has been added for single operations (cipher only and
+    authentication only).
 
 * **Updated the AES-NI GCM PMD.**
 
-  The AES-NI GCM PMD was migrated from MB library to ISA-L library.
-  The migration entailed the following additional support for:
+  The AES-NI GCM PMD was migrated from the Multi Buffer library to the ISA-L
+  library. The migration entailed adding additional support for:
 
   * GMAC algorithm.
   * 256-bit cipher key.
   * Session-less mode.
   * Out-of place processing
-  * Scatter-gatter support for chained mbufs (only out-of place and destination
+  * Scatter-gather support for chained mbufs (only out-of place and destination
     mbuf must be contiguous)
 
 * **Added crypto performance test application.**
 
-  A new performance test application allows measuring performance parameters
-  of PMDs available in crypto tree.
+  Added a new performance test application for measuring performance
+  parameters of PMDs available in the crypto tree.
 
 * **Added Elastic Flow Distributor library (rte_efd).**
 
-  This new library uses perfect hashing to determine a target/value for a
-  given incoming flow key.
+  Added a new library which uses perfect hashing to determine a target/value
+  for a given incoming flow key.
 
-  It does not store the key itself for lookup operations, and therefore,
-  lookup performance is not dependent on the key size. Also, the target/value
-  can be any arbitrary value (8 bits by default). Finally, the storage requirement
-  is much smaller than a hash-based flow table and therefore, it can better fit for
-  CPU cache, being able to scale to millions of flow keys.
+  The library does not store the key itself for lookup operations, and
+  therefore, lookup performance is not dependent on the key size. Also, the
+  target/value can be any arbitrary value (8 bits by default). Finally, the
+  storage requirement is much smaller than a hash-based flow table and
+  therefore, it can better fit in CPU cache and scale to millions of flow
+  keys.
 
   See the :ref:`Elastic Flow Distributor Library <Efd_Library>` documentation in
   the Programmers Guide document, for more information.
 
 
-Resolved Issues
----------------
-
-.. This section should contain bug fixes added to the relevant sections. Sample format:
-
-   * **code/section Fixed issue in the past tense with a full stop.**
-
-     Add a short 1-2 sentence description of the resolved issue in the past tense.
-     The title should contain the code/lib section like a commit message.
-     Add the entries in alphabetic order in the relevant sections below.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
-EAL
-~~~
-
 
 Drivers
 ~~~~~~~
 
 * **net/virtio: Fixed multiple process support.**
 
-  Fixed few regressions introduced in recent releases that break the virtio
+  Fixed a few regressions introduced in recent releases that break the virtio
   multiple process support.
 
 
-Libraries
-~~~~~~~~~
-
-
 Examples
 ~~~~~~~~
 
 * **examples/ethtool: Fixed crash with non-PCI devices.**
 
-  Querying a non-PCI device was dereferencing non-existent PCI data
-  resulting in a segmentation fault.
+  Fixed issue where querying a non-PCI device was dereferencing non-existent
+  PCI data resulting in a segmentation fault.
 
 
-Other
-~~~~~
-
-
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue in the present
-     tense. Add information on any known workarounds.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
 
 API Changes
 -----------
@@ -319,25 +285,26 @@ API Changes
 
 * **Moved five APIs for VF management from the ethdev to the ixgbe PMD.**
 
-  The following five APIs for VF management from the PF have been removed from the ethdev,
-  renamed and added to the ixgbe PMD::
+  The following five APIs for VF management from the PF have been removed from
+  the ethdev, renamed, and added to the ixgbe PMD::
 
-    rte_eth_dev_set_vf_rate_limit
-    rte_eth_dev_set_vf_rx
-    rte_eth_dev_set_vf_rxmode
-    rte_eth_dev_set_vf_tx
-    rte_eth_dev_set_vf_vlan_filter
+     rte_eth_dev_set_vf_rate_limit()
+     rte_eth_dev_set_vf_rx()
+     rte_eth_dev_set_vf_rxmode()
+     rte_eth_dev_set_vf_tx()
+     rte_eth_dev_set_vf_vlan_filter()
 
   The API's have been renamed to the following::
 
-    rte_pmd_ixgbe_set_vf_rate_limit
-    rte_pmd_ixgbe_set_vf_rx
-    rte_pmd_ixgbe_set_vf_rxmode
-    rte_pmd_ixgbe_set_vf_tx
-    rte_pmd_ixgbe_set_vf_vlan_filter
+     rte_pmd_ixgbe_set_vf_rate_limit()
+     rte_pmd_ixgbe_set_vf_rx()
+     rte_pmd_ixgbe_set_vf_rxmode()
+     rte_pmd_ixgbe_set_vf_tx()
+     rte_pmd_ixgbe_set_vf_vlan_filter()
 
   The declarations for the API’s can be found in ``rte_pmd_ixgbe.h``.
 
+
 ABI Changes
 -----------
 
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] crypto drivers in the API
  2017-02-14 14:46  4%     ` Doherty, Declan
@ 2017-02-14 15:47  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 15:47 UTC (permalink / raw)
  To: Doherty, Declan; +Cc: dev, Pablo DeLara Guarch

2017-02-14 14:46, Doherty, Declan:
> On 14/02/2017 11:04 AM, Thomas Monjalon wrote:
> > 2017-02-14 10:44, Doherty, Declan:
> >> On 13/02/2017 1:25 PM, Thomas Monjalon wrote:
> >>> In the crypto API, the drivers are listed.
> >>> In my opinion, it is a wrong designed and these lists should be removed.
> >>> Do we need a deprecation notice to plan this removal in 17.05, while
> >>> working on bus abstraction?
> >>>
> >> ...
> >>>
> ...
> >
> > Yes
> > If you were planning to do this, you should have sent a deprecation notice
> > few weeks ago.
> > Please send it now and we'll see if we have enough supporters shortly.
> >
> 
> Thomas, there are a couple of other changes we are looking at in the 
> cryptodev which would require API changes as well as break ABI including 
> adding support for a multi-device sessions, and changes to crypto 
> operation layout and field changes for performance but these but will 
> require RFCs or at least more discussion of the proposals. Given the 
> time constrains for the V1 deadline for 17.05 I would prefer to work on 
> the RFCs and get them out as soon as possible over the next few weeks 
> and then make all the ABI breaking changes in R17.08 in a single release.
> 
> Otherwise we will end up breaking ABI 2 release in a row which I would 
> like to avoid if possible.

OK, seems good. Thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-14 10:52  8% [dpdk-dev] Further fun with ABI tracking Christian Ehrhardt
@ 2017-02-14 16:19  4% ` Bruce Richardson
  2017-02-14 20:31  9% ` Jan Blunck
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-14 16:19 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, cjcollier, ricardo.salveti, Luca Boccassi

On Tue, Feb 14, 2017 at 11:52:00AM +0100, Christian Ehrhardt wrote:
> Hi,
> when moving to DPDK 16.11 Debian/Ubuntu packaging of DPDK has hit a new
> twist on the (it seems reoccurring) topic of DPDK ABI tracking.
> 
> I have found, ... well I don't want to call it solution ..., let's say a
> crutch to get around it for the moment. But I wanted to use the example I
> had to share a few thoughts on it and to kick off a wider discussion.
> 
> 
> *## In library cross-dependencies plus partial ABI bumps ##*
> 
> Since the day moving away from the combined shared library we had several
> improvements on tracking the ABI versions. These days [1] we have LIBABIVER
> per library and it gets bumped to reflect it is breaking with former
> versions e.g. removing symbols.
> 
> Now in the 16.11 release the ABIs for cryptodev, eal and ethdev got bumped
> by [2] and [3].
> 
> OTOH please remember that in general two versions of a shared library in
> the usual sense are meant to be able to stay alongside on a system without
> hurting each other. I picked a random one on my system.
> Package              Library
> libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160
> libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160.0.0
> libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95
> libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95.5.0
> Some link against the new, some against the old library - all fine.
> Usually most programs can just be rebuilt against the new library and after
> some time the old one can be dropped. That mechanism gives downstream
> distributions a way to handle transitions and consumers of libraries which
> might not all be ready for the same version every time.
> And since the per lib versioning with LIBABIVER and and the version maps we
> are good - in fact we qualify for all common cases on [4].
> 
> Now in DPDK of those libraries that got an ABI bump eal and ethdev are part
> of those which most of us consider "core libraries" and most other libs and
> pmds link to them.
> And here DPDK continues to be special, due to that inter-dependency with
> old and new libraries installed on the same system the following happens on
> openvswitch built for an older version of dpdk:
> ovs-vswitchd-dpdk
>     librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2
>     librte_pdump.so.1 => /usr/lib/x86_64-linux-gnu/librte_pdump.so.1
>         librte_eal.so.3 => /usr/lib/x86_64-linux-gnu/librte_eal.so.3
> 
> You can see that Openvswitch itself depends on the "old" librte_eal.so.2.
> But because  librte_pdump.so.1 did not get an ABI bump it got upgraded to
> the newer version from DPDK 16.11.
> But since the "new" pdump got built with the new DPDK 16.11 it depends on
> the "new" librte_eal.so.3.
> And having both in the same executable space at the same time causes
> segfaults and pain.
> 
> As I said for now I have passed the issue with a crutch that I'm not proud
> of and I'd like to avoid in the future. For that I'm reaching out to you
> with several suggestions to discuss.
> 
> 
> *## Thoughts ##*
> None of these seems like a perfect solution to me yet, but clearly good to
> start discussions on them.
> 
> Options that were in discussion so far and that we might adopt next cycle
> (some of these are upstream changes, some downstream, some require both to
> change - but any of them should have an ack upstream so that we are
> agreeing how to proceed with those cases).
> 
> 1. Downstreams to insert Major version into soname
> Distributions could insert the DPDK major version (like 16.11) into the
> soname and package names. A common example of this is libboost [5].
> That would perfectly allow 16.07.<LIBABIVER> to coexist with
> 16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
> Yet it would mean that anything depending on the old library will have to
> be recompiled to pick up the new code, even if it depends on an ABI that is
> still present in the new release.
> Also - not a technical reason - but it is clearly more work to force update
> all dependencies and clean out old packages for every release.
> 
> 
> 2. ABI Ranges
> One could argue that due to the detailed tracking of functions DPDK is
> already close to track not ABI levels but actually ABI ranges. DPDK could
> track LIBABIVERMIN and LIBABIVER.
> Every time functionality is added LIBABIVER would get bumped, but
> LIBABIVERMIN only gets moved to the OLDEST still supported ABI when things
> are dropped.
> So on a given library librte_foo you could have LIBABIVER=5 and
> LIBABIVERMIN=3. The make install would then install the shared lib as:
> librte_foo.so.5
> and additionally links for all compatible versions:
> librte_foo.so.3 -> librte_foo.so.5
> librte_foo.so.4 -> librte_foo.so.5
> Yet, while is has some nice attributes this might make DPDK even more
> special and cause ABI level proliferation over time.
> Also even with this in place, changes moving LIBABIVERMIN "too fast" (too
> fast is different for each downstream) could still cause an issue like the
> one I initially described.
> 
> 
> 3. A lot of conflicts
> In packaging one can declare a package to conflict with another package [6].
> Now we could declare e.g. librte_eal3 to conflict with librte_eal2 (and the
> same for all other bumps).
> That would make them not coinstallable, and working on a new release would
> mean that all former consumers would become not installable as well and
> have to be rebuilt before they all could migrate [7] together.
> That "works" in some sense, but it denies the whole purpose of versioned
> library packages (to be coninstallable, to allow different library
> consumers to depent on different versions)
> 
> 
> 4. ABI bump is infecting
> Another way might be to also bump any dependent DPDK library.
> So when core libs like eal are ABI bumped likely all libs would get a bump.
> If only e.g. mempool gets a bump only those other parts using it would be
> bumped as well.
> To some extend this might still proliferate ABI versions more than one
> would like.
> Also it surely is hard to track if not automated - think of dependencies
> that are existing only in certain config cases.
> 
> 5. back to single ABI
> For the sake of giving everybody a chance to re-open old wounds I wanted to
> mention that DPDK could also decide to go back to a single ABI again.
> This could (but doesn't have to!) be combined with having a single .so file
> again.
> To decide for this might be a much cleaner and easier to track way to #4.
> 
> 6. More
> I'm sure there are more approaches to this, feel free to come up with more.
> 
> I'm sure my five suggestions alone will make the thread messy, Maybe we do
> this in two rounds, sorting out the insane and identifying the preferred
> ones to then in a second run focus on discussing and maybe implementing the
> details of what we like.
> 
> 

Of the 5 options you propose, No 4 looks most appealing to me. If it
does cause problems with different config cases, then that looks a good
reason to cut down on the allowed configs. :-)

/Bruce

> [1]: http://dpdk.org/browse/dpdk/tree/doc/guides/contributing/versioning.rst
> [2]: http://dpdk.org/browse/dpdk/commit/?id=d7e61ad3ae36
> [3]: http://dpdk.org/browse/dpdk/commit/?id=6ba1affa54108
> [4]: https://wiki.debian.org/TransitionBestPractices
> [5]: https://packages.debian.org/sid/libboost1.62-dev
> [6]:
> https://www.debian.org/doc/debian-policy/ch-relationships.html#s-conflicts
> [7]: https://wiki.ubuntu.com/ProposedMigration
> 
> P.S. I beg a pardon for the wall of text
> 
> -- 
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] doc: update release notes for 17.02
  2017-02-14 15:32  4% [dpdk-dev] [PATCH v1] doc: update release notes for 17.02 John McNamara
@ 2017-02-14 16:26  2% ` John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2017-02-14 16:26 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 16202 bytes --]

Fix grammar, spelling and formatting of DPDK 17.02 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/release_17_02.rst | 241 +++++++++++++++------------------
 1 file changed, 111 insertions(+), 130 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 6420a87..357965a 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -40,46 +40,47 @@ New Features
 
 * **Added support for representing buses in EAL**
 
-  A new structure ``rte_bus`` is introduced in EAL. This allows for devices to
-  be represented by buses they are connected to. A new bus can be added to
-  DPDK by extending the ``rte_bus`` structure and implementing the scan and
-  probe functions. Once a new bus is registered using provided APIs, new
-  devices can be detected and initialized using bus scan and probe callbacks.
+  The ``rte_bus`` structure was introduced into the EAL. This allows for
+  devices to be represented by buses they are connected to. A new bus can be
+  added to DPDK by extending the ``rte_bus`` structure and implementing the
+  scan and probe functions. Once a new bus is registered using the provided
+  APIs, new devices can be detected and initialized using bus scan and probe
+  callbacks.
 
-  With this change, devices other than PCI or VDEV type can also be represented
-  in DPDK framework.
+  With this change, devices other than PCI or VDEV type can be represented
+  in the DPDK framework.
 
 * **Added generic EAL API for I/O device memory read/write operations.**
 
-  This API introduces 8-bit, 16-bit, 32bit, 64bit I/O device
-  memory read/write operations along with the relaxed versions.
+  This API introduces 8 bit, 16 bit, 32 bit and 64 bit I/O device
+  memory read/write operations along with "relaxed" versions.
 
-  The weakly-ordered machine like ARM needs additional I/O barrier for
-  device memory read/write access over PCI bus.
-  By introducing the EAL abstraction for I/O device memory read/write access,
-  The drivers can access I/O device memory in architecture-agnostic manner.
-  The relaxed version does not have additional I/O memory barrier, useful in
-  accessing the device registers of integrated controllers which
-  implicitly strongly ordered with respect to memory access.
+  Weakly-ordered architectures like ARM need an additional I/O barrier for
+  device memory read/write access over PCI bus. By introducing the EAL
+  abstraction for I/O device memory read/write access, the drivers can access
+  I/O device memory in an architecture-agnostic manner. The relaxed version
+  does not have an additional I/O memory barrier, which is useful in accessing
+  the device registers of integrated controllers which is implicitly strongly
+  ordered with respect to memory access.
 
 * **Added generic flow API (rte_flow).**
 
   This API provides a generic means to configure hardware to match specific
-  ingress or egress traffic, alter its fate and query related counters
+  ingress or egress traffic, alter its behavior and query related counters
   according to any number of user-defined rules.
 
-  It is slightly higher-level than the legacy filtering framework which it
-  encompasses and supersedes (including all functions and filter types) in
-  order to expose a single interface with an unambiguous behavior that is
-  common to all poll-mode drivers (PMDs).
+  In order to expose a single interface with an unambiguous behavior that is
+  common to all poll-mode drivers (PMDs) the ``rte_flow`` API is slightly
+  higher-level than the legacy filtering framework, which it encompasses and
+  supersedes (including all functions and filter types) .
 
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
 * **Added firmware version get API.**
 
-  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch firmware
-  version by a given device.
+  Added a new function ``rte_eth_dev_fw_version_get()`` to fetch the firmware
+  version for a given device.
 
 * **Added APIs for MACsec offload support to the ixgbe PMD.**
 
@@ -90,54 +91,58 @@ New Features
 
   Added support for I219 Intel 1GbE NICs.
 
-* **Added VF Daemon (VFD) on i40e. - EXPERIMENTAL**
-
-  This's an EXPERIMENTAL feature to enhance the capability of DPDK PF as many
-  VF management features are not supported by kernel PF driver.
-  Some new private APIs are implemented in PMD without abstrction layer.
-  They can be used directly by some users who have the need.
-
-  The new APIs to control VFs directly from PF include,
-  1) set VF MAC anti-spoofing
-  2) set VF VLAN anti-spoofing
-  3) set TX loopback
-  4) set VF unicast promiscuous mode
-  5) set VF multicast promiscuous mode
-  6) set VF MTU
-  7) get/reset VF stats
-  8) set VF MAC address
-  9) set VF VLAN stripping
-  10) VF VLAN insertion
-  12) set VF broadcast mode
-  13) set VF VLAN tag
-  14) set VF VLAN filter
-  VFD also includes VF to PF mailbox message management by APP.
-  When PF receives mailbox messages from VF, PF should call the callback
-  provided by APP to know if they're permitted to be processed.
-
-  As an EXPERIMENTAL feature, please aware it can be changed or even
+* **Added VF Daemon (VFD) for i40e. - EXPERIMENTAL**
+
+  This is an EXPERIMENTAL feature to enhance the capability of the DPDK PF as
+  many VF management features are not currently supported by the kernel PF
+  driver. Some new private APIs are implemented directly in the PMD without an
+  abstraction layer. They can be used directly by some users who have the
+  need.
+
+  The new APIs to control VFs directly from PF include:
+
+  * Set VF MAC anti-spoofing.
+  * Set VF VLAN anti-spoofing.
+  * Set TX loopback.
+  * Set VF unicast promiscuous mode.
+  * Set VF multicast promiscuous mode.
+  * Set VF MTU.
+  * Get/reset VF stats.
+  * Set VF MAC address.
+  * Set VF VLAN stripping.
+  * Vf VLAN insertion.
+  * Set VF broadcast mode.
+  * Set VF VLAN tag.
+  * Set VF VLAN filter.
+
+  VFD also includes VF to PF mailbox message management from an application.
+  When the PF receives mailbox messages from the VF the PF should call the
+  callback provided by the application to know if they're permitted to be
+  processed.
+
+  As an EXPERIMENTAL feature, please be aware it can be changed or even
   removed without prior notice.
 
 * **Updated the i40e base driver.**
 
-  updated the i40e base driver, including the following changes:
+  Updated the i40e base driver, including the following changes:
 
-  * replace existing legacy memcpy() calls with i40e_memcpy() calls.
-  * use BIT() macro instead of bit fields
-  * add clear all WoL filters implementation
-  * add broadcast promiscuous control per VLAN
-  * remove unused X722_SUPPORT and I40E_NDIS_SUPPORT MARCOs
+  * Replace existing legacy ``memcpy()`` calls with ``i40e_memcpy()`` calls.
+  * Use ``BIT()`` macro instead of bit fields.
+  * Add clear all WoL filters implementation.
+  * Add broadcast promiscuous control per VLAN.
+  * Remove unused ``X722_SUPPORT`` and ``I40E_NDIS_SUPPORT`` macros.
 
 * **Updated the enic driver.**
 
-  * Set new Rx checksum flags in mbufs to indicate unknown, good or bad.
+  * Set new Rx checksum flags in mbufs to indicate unknown, good or bad checksums.
   * Fix set/remove of MAC addresses. Allow up to 64 addresses per device.
   * Enable TSO on outer headers.
 
 * **Added Solarflare libefx-based network PMD.**
 
-  A new network PMD which supports Solarflare SFN7xxx and SFN8xxx family
-  of 10/40 Gbps adapters has been added.
+  Added a new network PMD which supports Solarflare SFN7xxx and SFN8xxx family
+  of 10/40 Gbps adapters.
 
 * **Updated the mlx4 driver.**
 
@@ -145,8 +150,8 @@ New Features
 
 * **Added support for Mellanox ConnectX-5 adapters (mlx5).**
 
-  Support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps adapters
-  has been added to the existing mlx5 PMD.
+  Added support for Mellanox ConnectX-5 family of 10/25/40/50/100 Gbps
+  adapters to the existing mlx5 PMD.
 
 * **Updated the mlx5 driver.**
 
@@ -161,47 +166,47 @@ New Features
 
 * **virtio-user with vhost-kernel as another exceptional path.**
 
-  Previously, we upstreamed a virtual device, virtio-user with vhost-user
-  as the backend, as a way for IPC (Inter-Process Communication) and user
+  Previously, we upstreamed a virtual device, virtio-user with vhost-user as
+  the backend as a way of enabling IPC (Inter-Process Communication) and user
   space container networking.
 
-  Virtio-user with vhost-kernel as the backend is a solution for exceptional
-  path, such as KNI, which exchanges packets with kernel networking stack.
+  Virtio-user with vhost-kernel as the backend is a solution for the exception
+  path, such as KNI, which exchanges packets with the kernel networking stack.
   This solution is very promising in:
 
-  * maintenance: vhost and vhost-net (kernel) is upstreamed and extensively
+  * Maintenance: vhost and vhost-net (kernel) is an upstreamed and extensively
     used kernel module.
-  * features: vhost-net is born to be a networking solution, which has
+  * Features: vhost-net is designed to be a networking solution, which has
     lots of networking related features, like multi-queue, TSO, multi-seg
     mbuf, etc.
-  * performance: similar to KNI, this solution would use one or more
+  * Performance: similar to KNI, this solution would use one or more
     kthreads to send/receive packets from user space DPDK applications,
     which has little impact on user space polling thread (except that
     it might enter into kernel space to wake up those kthreads if
     necessary).
 
-* **Added virtio Rx interrupt suppprt.**
+* **Added virtio Rx interrupt support.**
 
-  This feature enables Rx interrupt mode for virtio pci net devices as
-  binded to VFIO (noiommu mode) and drived by virtio PMD.
+  Added a feature to enable Rx interrupt mode for virtio pci net devices as
+  bound to VFIO (noiommu mode) and driven by virtio PMD.
 
-  With this feature, virtio PMD can switch between polling mode and
+  With this feature, the virtio PMD can switch between polling mode and
   interrupt mode, to achieve best performance, and at the same time save
-  power. It can work on both legacy and modern virtio devices. At this mode,
-  each rxq is mapped with an exluded MSIx interrupt.
+  power. It can work on both legacy and modern virtio devices. In this mode,
+  each ``rxq`` is mapped with an excluded MSIx interrupt.
 
   See the :ref:`Virtio Interrupt Mode <virtio_interrupt_mode>` documentation
   for more information.
 
 * **Added ARMv8 crypto PMD.**
 
-  A new crypto PMD has been added, which provides combined mode cryptografic
+  A new crypto PMD has been added, which provides combined mode cryptographic
   operations optimized for ARMv8 processors. The driver can be used to enhance
   performance in processing chained operations such as cipher + HMAC.
 
 * **Updated the QAT PMD.**
 
-  The QAT PMD was updated with additional support for:
+  The QAT PMD has been updated with additional support for:
 
   * DES algorithm.
   * Scatter-gather list (SGL) support.
@@ -210,35 +215,37 @@ New Features
 
   * The Intel(R) Multi Buffer Crypto for IPsec library used in
     AESNI MB PMD has been moved to a new repository, in GitHub.
-  * Support for single operations (cipher only and authentication only).
+  * Support has been added for single operations (cipher only and
+    authentication only).
 
 * **Updated the AES-NI GCM PMD.**
 
-  The AES-NI GCM PMD was migrated from MB library to ISA-L library.
-  The migration entailed the following additional support for:
+  The AES-NI GCM PMD was migrated from the Multi Buffer library to the ISA-L
+  library. The migration entailed adding additional support for:
 
   * GMAC algorithm.
   * 256-bit cipher key.
   * Session-less mode.
   * Out-of place processing
-  * Scatter-gatter support for chained mbufs (only out-of place and destination
+  * Scatter-gather support for chained mbufs (only out-of place and destination
     mbuf must be contiguous)
 
 * **Added crypto performance test application.**
 
-  A new performance test application allows measuring performance parameters
-  of PMDs available in crypto tree.
+  Added a new performance test application for measuring performance
+  parameters of PMDs available in the crypto tree.
 
 * **Added Elastic Flow Distributor library (rte_efd).**
 
-  This new library uses perfect hashing to determine a target/value for a
-  given incoming flow key.
+  Added a new library which uses perfect hashing to determine a target/value
+  for a given incoming flow key.
 
-  It does not store the key itself for lookup operations, and therefore,
-  lookup performance is not dependent on the key size. Also, the target/value
-  can be any arbitrary value (8 bits by default). Finally, the storage requirement
-  is much smaller than a hash-based flow table and therefore, it can better fit for
-  CPU cache, being able to scale to millions of flow keys.
+  The library does not store the key itself for lookup operations, and
+  therefore, lookup performance is not dependent on the key size. Also, the
+  target/value can be any arbitrary value (8 bits by default). Finally, the
+  storage requirement is much smaller than a hash-based flow table and
+  therefore, it can better fit in CPU cache and scale to millions of flow
+  keys.
 
   See the :ref:`Elastic Flow Distributor Library <Efd_Library>` documentation in
   the Programmers Guide document, for more information.
@@ -259,51 +266,24 @@ Resolved Issues
    Also, make sure to start the actual text at the margin.
    =========================================================
 
-
-EAL
-~~~
-
-
 Drivers
 ~~~~~~~
 
 * **net/virtio: Fixed multiple process support.**
 
-  Fixed few regressions introduced in recent releases that break the virtio
+  Fixed a few regressions introduced in recent releases that break the virtio
   multiple process support.
 
 
-Libraries
-~~~~~~~~~
-
-
 Examples
 ~~~~~~~~
 
 * **examples/ethtool: Fixed crash with non-PCI devices.**
 
-  Querying a non-PCI device was dereferencing non-existent PCI data
-  resulting in a segmentation fault.
+  Fixed issue where querying a non-PCI device was dereferencing non-existent
+  PCI data resulting in a segmentation fault.
 
 
-Other
-~~~~~
-
-
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue in the present
-     tense. Add information on any known workarounds.
-
-   This section is a comment. do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
 
 API Changes
 -----------
@@ -319,25 +299,26 @@ API Changes
 
 * **Moved five APIs for VF management from the ethdev to the ixgbe PMD.**
 
-  The following five APIs for VF management from the PF have been removed from the ethdev,
-  renamed and added to the ixgbe PMD::
+  The following five APIs for VF management from the PF have been removed from
+  the ethdev, renamed, and added to the ixgbe PMD::
 
-    rte_eth_dev_set_vf_rate_limit
-    rte_eth_dev_set_vf_rx
-    rte_eth_dev_set_vf_rxmode
-    rte_eth_dev_set_vf_tx
-    rte_eth_dev_set_vf_vlan_filter
+     rte_eth_dev_set_vf_rate_limit()
+     rte_eth_dev_set_vf_rx()
+     rte_eth_dev_set_vf_rxmode()
+     rte_eth_dev_set_vf_tx()
+     rte_eth_dev_set_vf_vlan_filter()
 
   The API's have been renamed to the following::
 
-    rte_pmd_ixgbe_set_vf_rate_limit
-    rte_pmd_ixgbe_set_vf_rx
-    rte_pmd_ixgbe_set_vf_rxmode
-    rte_pmd_ixgbe_set_vf_tx
-    rte_pmd_ixgbe_set_vf_vlan_filter
+     rte_pmd_ixgbe_set_vf_rate_limit()
+     rte_pmd_ixgbe_set_vf_rx()
+     rte_pmd_ixgbe_set_vf_rxmode()
+     rte_pmd_ixgbe_set_vf_tx()
+     rte_pmd_ixgbe_set_vf_vlan_filter()
 
   The declarations for the API’s can be found in ``rte_pmd_ixgbe.h``.
 
+
 ABI Changes
 -----------
 
-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] doc: add ABI change notification for ring library
  2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
                   ` (2 preceding siblings ...)
  2017-02-14  8:33  4% ` Olivier Matz
@ 2017-02-14 18:42  4% ` Thomas Monjalon
  3 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 18:42 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce API and ABI change for ethdev
    2017-02-13 17:57  4%   ` Thomas Monjalon
@ 2017-02-14 19:37  4%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 19:37 UTC (permalink / raw)
  To: Bernard Iremonger; +Cc: dev

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost
  2017-01-23 13:04 12% [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost Yuanhan Liu
  2017-02-13 18:02  4% ` Thomas Monjalon
  2017-02-14 13:54  4% ` Maxime Coquelin
@ 2017-02-14 20:28  4% ` Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 20:28 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-14 10:52  8% [dpdk-dev] Further fun with ABI tracking Christian Ehrhardt
  2017-02-14 16:19  4% ` Bruce Richardson
@ 2017-02-14 20:31  9% ` Jan Blunck
  2017-02-22 13:12  7%   ` Christian Ehrhardt
  1 sibling, 1 reply; 200+ results
From: Jan Blunck @ 2017-02-14 20:31 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, cjcollier, ricardo.salveti, Luca Boccassi

On Tue, Feb 14, 2017 at 11:52 AM, Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
> Hi,
> when moving to DPDK 16.11 Debian/Ubuntu packaging of DPDK has hit a new
> twist on the (it seems reoccurring) topic of DPDK ABI tracking.
>
> I have found, ... well I don't want to call it solution ..., let's say a
> crutch to get around it for the moment. But I wanted to use the example I
> had to share a few thoughts on it and to kick off a wider discussion.
>
>
> *## In library cross-dependencies plus partial ABI bumps ##*
>
> Since the day moving away from the combined shared library we had several
> improvements on tracking the ABI versions. These days [1] we have LIBABIVER
> per library and it gets bumped to reflect it is breaking with former
> versions e.g. removing symbols.
>
> Now in the 16.11 release the ABIs for cryptodev, eal and ethdev got bumped
> by [2] and [3].
>
> OTOH please remember that in general two versions of a shared library in
> the usual sense are meant to be able to stay alongside on a system without
> hurting each other. I picked a random one on my system.
> Package              Library
> libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160
> libisc-export160: /lib/x86_64-linux-gnu/libisc-export.so.160.0.0
> libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95
> libisc-export95: /lib/x86_64-linux-gnu/libisc-export.so.95.5.0
> Some link against the new, some against the old library - all fine.
> Usually most programs can just be rebuilt against the new library and after
> some time the old one can be dropped. That mechanism gives downstream
> distributions a way to handle transitions and consumers of libraries which
> might not all be ready for the same version every time.
> And since the per lib versioning with LIBABIVER and and the version maps we
> are good - in fact we qualify for all common cases on [4].
>
> Now in DPDK of those libraries that got an ABI bump eal and ethdev are part
> of those which most of us consider "core libraries" and most other libs and
> pmds link to them.
> And here DPDK continues to be special, due to that inter-dependency with
> old and new libraries installed on the same system the following happens on
> openvswitch built for an older version of dpdk:
> ovs-vswitchd-dpdk
>     librte_eal.so.2 => /usr/lib/x86_64-linux-gnu/librte_eal.so.2
>     librte_pdump.so.1 => /usr/lib/x86_64-linux-gnu/librte_pdump.so.1
>         librte_eal.so.3 => /usr/lib/x86_64-linux-gnu/librte_eal.so.3
>
> You can see that Openvswitch itself depends on the "old" librte_eal.so.2.
> But because  librte_pdump.so.1 did not get an ABI bump it got upgraded to
> the newer version from DPDK 16.11.
> But since the "new" pdump got built with the new DPDK 16.11 it depends on
> the "new" librte_eal.so.3.
> And having both in the same executable space at the same time causes
> segfaults and pain.
>
> As I said for now I have passed the issue with a crutch that I'm not proud
> of and I'd like to avoid in the future. For that I'm reaching out to you
> with several suggestions to discuss.
>
>
> *## Thoughts ##*
> None of these seems like a perfect solution to me yet, but clearly good to
> start discussions on them.
>
> Options that were in discussion so far and that we might adopt next cycle
> (some of these are upstream changes, some downstream, some require both to
> change - but any of them should have an ack upstream so that we are
> agreeing how to proceed with those cases).
>
> 1. Downstreams to insert Major version into soname
> Distributions could insert the DPDK major version (like 16.11) into the
> soname and package names. A common example of this is libboost [5].
> That would perfectly allow 16.07.<LIBABIVER> to coexist with
> 16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
> Yet it would mean that anything depending on the old library will have to
> be recompiled to pick up the new code, even if it depends on an ABI that is
> still present in the new release.
> Also - not a technical reason - but it is clearly more work to force update
> all dependencies and clean out old packages for every release.

Actually this isn't exactly what I proposed during the summit. Just
keep it simple and fix the ABI version of all libraries at 16.11.0.
This is a proven approach and has been used for years with different
libraries. You could easily do this independently of us upstream
fixing the ABI problems.


> 2. ABI Ranges

ABI is either backwards compatible (same major) or not. A range
doesn't solve the problem.

>
> 3. A lot of conflicts
>

This doesn't allow us to have multiple version of the library
available at runtime. So in the end it doesn't solve the problem for
the distro either.


>
> 4. ABI bump is infecting
>
> 5. back to single ABI
>

This is very similar to approach 1. It just uses up a lot more ABI versions.


> 6. More
> I'm sure there are more approaches to this, feel free to come up with more.
>

The problem is that we do not detect and fix the ABI changes that
"shine-through" the dependencies of our libraries. We need to work on
them and fix them one by one. Long-term we need to invest into keeping
the API/ABI stable and adding backward compatible symbols as well as
making structures opaque.



> I'm sure my five suggestions alone will make the thread messy, Maybe we do
> this in two rounds, sorting out the insane and identifying the preferred
> ones to then in a second run focus on discussing and maybe implementing the
> details of what we like.
>
>
> [1]: http://dpdk.org/browse/dpdk/tree/doc/guides/contributing/versioning.rst
> [2]: http://dpdk.org/browse/dpdk/commit/?id=d7e61ad3ae36
> [3]: http://dpdk.org/browse/dpdk/commit/?id=6ba1affa54108
> [4]: https://wiki.debian.org/TransitionBestPractices
> [5]: https://packages.debian.org/sid/libboost1.62-dev
> [6]:
> https://www.debian.org/doc/debian-policy/ch-relationships.html#s-conflicts
> [7]: https://wiki.ubuntu.com/ProposedMigration
>
> P.S. I beg a pardon for the wall of text
>
> --
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v2] doc: annouce ABI change for cryptodev ops structure
  2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
  2017-02-14 10:48  4%   ` Doherty, Declan
@ 2017-02-14 20:37  4%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-14 20:37 UTC (permalink / raw)
  To: Fan Zhang; +Cc: dev

Applied

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1] doc: add template release notes for 17.05
@ 2017-02-15 12:38  6% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2017-02-15 12:38 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Add template release notes for DPDK 17.05 with inline
comments and explanations of the various sections.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/index.rst         |   1 +
 doc/guides/rel_notes/release_17_05.rst | 195 +++++++++++++++++++++++++++++++++
 2 files changed, 196 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_17_05.rst

diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index cf8f167..c4d243c 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -36,6 +36,7 @@ Release Notes
     :numbered:
 
     rel_description
+    release_17_05
     release_17_02
     release_16_11
     release_16_07
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
new file mode 100644
index 0000000..e5a0a9e
--- /dev/null
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -0,0 +1,195 @@
+DPDK Release 17.05
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+
+      firefox build/doc/html/guides/rel_notes/release_17_05.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release. Sample
+   format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense. The description
+     should be enough to allow someone scanning the release notes to
+     understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list like
+     this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     This section is a comment. do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =========================================================
+
+
+Resolved Issues
+---------------
+
+.. This section should contain bug fixes added to the relevant
+   sections. Sample format:
+
+   * **code/section Fixed issue in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description of the resolved issue in the past
+     tense.
+
+     The title should contain the code/lib section like a commit message.
+
+     Add the entries in alphabetic order in the relevant sections below.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+EAL
+~~~
+
+
+Drivers
+~~~~~~~
+
+
+Libraries
+~~~~~~~~~
+
+
+Examples
+~~~~~~~~
+
+
+Other
+~~~~~
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue in the present
+     tense. Add information on any known workarounds.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change. Use fixed width
+     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
+     tense.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * Add a short 1-2 sentence description of the ABI change that was announced
+     in the previous releases and made in this release. Use fixed width quotes
+     for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+
+Shared Library Versions
+-----------------------
+
+.. Update any library version updated in this release and prepend with a ``+``
+   sign, like this:
+
+     librte_acl.so.2
+   + librte_cfgfile.so.2
+     librte_cmdline.so.2
+
+   This section is a comment. do not overwrite or remove it.
+   =========================================================
+
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+     librte_acl.so.2
+     librte_cfgfile.so.2
+     librte_cmdline.so.2
+     librte_cryptodev.so.2
+     librte_distributor.so.1
+     librte_eal.so.3
+     librte_ethdev.so.6
+     librte_hash.so.2
+     librte_ip_frag.so.1
+     librte_jobstats.so.1
+     librte_kni.so.2
+     librte_kvargs.so.1
+     librte_lpm.so.2
+     librte_mbuf.so.2
+     librte_mempool.so.2
+     librte_meter.so.1
+     librte_net.so.1
+     librte_pdump.so.1
+     librte_pipeline.so.3
+     librte_pmd_bond.so.1
+     librte_pmd_ring.so.2
+     librte_port.so.3
+     librte_power.so.1
+     librte_reorder.so.1
+     librte_ring.so.1
+     librte_sched.so.1
+     librte_table.so.2
+     librte_timer.so.1
+     librte_vhost.so.3
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested with this
+   release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH] kni: remove KNI vhost support
@ 2017-02-15 13:15  1% Ferruh Yigit
  2017-02-20 14:30  5% ` [dpdk-dev] [PATCH v2 1/2] doc: add removed items section to release notes Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-02-15 13:15 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 config/common_base                             |   3 -
 devtools/test-build.sh                         |   1 -
 doc/guides/prog_guide/index.rst                |   4 -
 doc/guides/prog_guide/kernel_nic_interface.rst | 113 ----
 doc/guides/rel_notes/deprecation.rst           |   6 -
 lib/librte_eal/linuxapp/kni/Makefile           |   1 -
 lib/librte_eal/linuxapp/kni/kni_dev.h          |  33 -
 lib/librte_eal/linuxapp/kni/kni_fifo.h         |  14 -
 lib/librte_eal/linuxapp/kni/kni_misc.c         |  22 -
 lib/librte_eal/linuxapp/kni/kni_net.c          |  13 -
 lib/librte_eal/linuxapp/kni/kni_vhost.c        | 842 -------------------------
 11 files changed, 1052 deletions(-)
 delete mode 100644 lib/librte_eal/linuxapp/kni/kni_vhost.c

diff --git a/config/common_base b/config/common_base
index 71a4fcb..aeee13e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -584,9 +584,6 @@ CONFIG_RTE_LIBRTE_KNI=n
 CONFIG_RTE_KNI_KMOD=n
 CONFIG_RTE_KNI_KMOD_ETHTOOL=n
 CONFIG_RTE_KNI_PREEMPT_DEFAULT=y
-CONFIG_RTE_KNI_VHOST=n
-CONFIG_RTE_KNI_VHOST_MAX_CACHE_SIZE=1024
-CONFIG_RTE_KNI_VHOST_VNET_HDR_EN=n
 
 #
 # Compile the pdump library
diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 0f131fc..84d3165 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -194,7 +194,6 @@ config () # <directory> <target> <options>
 		sed -ri        's,(PMD_OPENSSL=)n,\1y,' $1/.config
 		test "$DPDK_DEP_SSL" != y || \
 		sed -ri            's,(PMD_QAT=)n,\1y,' $1/.config
-		sed -ri        's,(KNI_VHOST.*=)n,\1y,' $1/.config
 		sed -ri           's,(SCHED_.*=)n,\1y,' $1/.config
 		build_config_hook $1 $2 $3
 
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 7f825cb..77f427e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -127,10 +127,6 @@ Programmer's Guide
 
 :numref:`figure_pkt_flow_kni` :ref:`figure_pkt_flow_kni`
 
-:numref:`figure_vhost_net_arch2` :ref:`figure_vhost_net_arch2`
-
-:numref:`figure_kni_traffic_flow` :ref:`figure_kni_traffic_flow`
-
 
 :numref:`figure_pkt_proc_pipeline_qos` :ref:`figure_pkt_proc_pipeline_qos`
 
diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index 4f25595..6f7fd28 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -168,116 +168,3 @@ The application handlers can be registered upon interface creation or explicitly
 This provides flexibility in multiprocess scenarios
 (where the KNI is created in the primary process but the callbacks are handled in the secondary one).
 The constraint is that a single process can register and handle the requests.
-
-.. _kni_vhost_backend-label:
-
-KNI Working as a Kernel vHost Backend
--------------------------------------
-
-vHost is a kernel module usually working as the backend of virtio (a para- virtualization driver framework)
-to accelerate the traffic from the guest to the host.
-The DPDK Kernel NIC interface provides the ability to hookup vHost traffic into userspace DPDK application.
-Together with the DPDK PMD virtio, it significantly improves the throughput between guest and host.
-In the scenario where DPDK is running as fast path in the host, kni-vhost is an efficient path for the traffic.
-
-Overview
-~~~~~~~~
-
-vHost-net has three kinds of real backend implementations. They are: 1) tap, 2) macvtap and 3) RAW socket.
-The main idea behind kni-vhost is making the KNI work as a RAW socket, attaching it as the backend instance of vHost-net.
-It is using the existing interface with vHost-net, so it does not require any kernel hacking,
-and is fully-compatible with the kernel vhost module.
-As vHost is still taking responsibility for communicating with the front-end virtio,
-it naturally supports both legacy virtio -net and the DPDK PMD virtio.
-There is a little penalty that comes from the non-polling mode of vhost.
-However, it scales throughput well when using KNI in multi-thread mode.
-
-.. _figure_vhost_net_arch2:
-
-.. figure:: img/vhost_net_arch.*
-
-   vHost-net Architecture Overview
-
-
-Packet Flow
-~~~~~~~~~~~
-
-There is only a minor difference from the original KNI traffic flows.
-On transmit side, vhost kthread calls the RAW socket's ops sendmsg and it puts the packets into the KNI transmit FIFO.
-On the receive side, the kni kthread gets packets from the KNI receive FIFO, puts them into the queue of the raw socket,
-and wakes up the task in vhost kthread to begin receiving.
-All the packet copying, irrespective of whether it is on the transmit or receive side,
-happens in the context of vhost kthread.
-Every vhost-net device is exposed to a front end virtio device in the guest.
-
-.. _figure_kni_traffic_flow:
-
-.. figure:: img/kni_traffic_flow.*
-
-   KNI Traffic Flow
-
-
-Sample Usage
-~~~~~~~~~~~~
-
-Before starting to use KNI as the backend of vhost, the CONFIG_RTE_KNI_VHOST configuration option must be turned on.
-Otherwise, by default, KNI will not enable its backend support capability.
-
-Of course, as a prerequisite, the vhost/vhost-net kernel CONFIG should be chosen before compiling the kernel.
-
-#.  Compile the DPDK and insert uio_pci_generic/igb_uio kernel modules as normal.
-
-#.  Insert the KNI kernel module:
-
-    .. code-block:: console
-
-        insmod ./rte_kni.ko
-
-    If using KNI in multi-thread mode, use the following command line:
-
-    .. code-block:: console
-
-        insmod ./rte_kni.ko kthread_mode=multiple
-
-#.  Running the KNI sample application:
-
-    .. code-block:: console
-
-        examples/kni/build/app/kni -c -0xf0 -n 4 -- -p 0x3 -P --config="(0,4,6),(1,5,7)"
-
-    This command runs the kni sample application with two physical ports.
-    Each port pins two forwarding cores (ingress/egress) in user space.
-
-#.  Assign a raw socket to vhost-net during qemu-kvm startup.
-    The DPDK does not provide a script to do this since it is easy for the user to customize.
-    The following shows the key steps to launch qemu-kvm with kni-vhost:
-
-    .. code-block:: bash
-
-        #!/bin/bash
-        echo 1 > /sys/class/net/vEth0/sock_en
-        fd=`cat /sys/class/net/vEth0/sock_fd`
-        qemu-kvm \
-        -name vm1 -cpu host -m 2048 -smp 1 -hda /opt/vm-fc16.img \
-        -netdev tap,fd=$fd,id=hostnet1,vhost=on \
-        -device virti-net-pci,netdev=hostnet1,id=net1,bus=pci.0,addr=0x4
-
-It is simple to enable raw socket using sysfs sock_en and get raw socket fd using sock_fd under the KNI device node.
-
-Then, using the qemu-kvm command with the -netdev option to assign such raw socket fd as vhost's backend.
-
-.. note::
-
-    The key word tap must exist as qemu-kvm now only supports vhost with a tap backend, so here we cheat qemu-kvm by an existing fd.
-
-Compatibility Configure Option
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-There is a CONFIG_RTE_KNI_VHOST_VNET_HDR_EN configuration option in DPDK configuration file.
-By default, it set to n, which means do not turn on the virtio net header,
-which is used to support additional features (such as, csum offload, vlan offload, generic-segmentation and so on),
-since the kni-vhost does not yet support those features.
-
-Even if the option is turned on, kni-vhost will ignore the information that the header contains.
-When working with legacy virtio on the guest, it is better to turn off unsupported offload features using ethtool -K.
-Otherwise, there may be problems such as an incorrect L4 checksum error.
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9d4dfcc..66ca596 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -113,12 +113,6 @@ Deprecation Notices
   has different feature set, meaning functions like ``rte_vhost_feature_disable``
   need be changed. Last, file rte_virtio_net.h will be renamed to rte_vhost.h.
 
-* kni: Remove :ref:`kni_vhost_backend-label` feature (KNI_VHOST) in 17.05 release.
-  :doc:`Vhost Library </prog_guide/vhost_lib>` is currently preferred method for
-  guest - host communication. Just for clarification, this is not to remove KNI
-  or VHOST feature, but KNI_VHOST which is a KNI feature enabled via a compile
-  time option, and disabled by default.
-
 * ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
   A pointer to a rte_cryptodev_config structure will be added to the
   function prototype ``cryptodev_configure_t``, as a new parameter.
diff --git a/lib/librte_eal/linuxapp/kni/Makefile b/lib/librte_eal/linuxapp/kni/Makefile
index 3c22b63..7864a2a 100644
--- a/lib/librte_eal/linuxapp/kni/Makefile
+++ b/lib/librte_eal/linuxapp/kni/Makefile
@@ -61,7 +61,6 @@ DEPDIRS-y += lib/librte_eal/linuxapp/eal
 #
 SRCS-y := kni_misc.c
 SRCS-y += kni_net.c
-SRCS-$(CONFIG_RTE_KNI_VHOST) += kni_vhost.c
 SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += kni_ethtool.c
 
 SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += ethtool/ixgbe/ixgbe_main.c
diff --git a/lib/librte_eal/linuxapp/kni/kni_dev.h b/lib/librte_eal/linuxapp/kni/kni_dev.h
index 58cbadd..002e5fa 100644
--- a/lib/librte_eal/linuxapp/kni/kni_dev.h
+++ b/lib/librte_eal/linuxapp/kni/kni_dev.h
@@ -37,10 +37,6 @@
 #include <linux/spinlock.h>
 #include <linux/list.h>
 
-#ifdef RTE_KNI_VHOST
-#include <net/sock.h>
-#endif
-
 #include <exec-env/rte_kni_common.h>
 #define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
 
@@ -102,15 +98,6 @@ struct kni_dev {
 	/* synchro for request processing */
 	unsigned long synchro;
 
-#ifdef RTE_KNI_VHOST
-	struct kni_vhost_queue *vhost_queue;
-
-	volatile enum {
-		BE_STOP = 0x1,
-		BE_START = 0x2,
-		BE_FINISH = 0x4,
-	} vq_status;
-#endif
 	/* buffers */
 	void *pa[MBUF_BURST_SZ];
 	void *va[MBUF_BURST_SZ];
@@ -118,26 +105,6 @@ struct kni_dev {
 	void *alloc_va[MBUF_BURST_SZ];
 };
 
-#ifdef RTE_KNI_VHOST
-uint32_t
-kni_poll(struct file *file, struct socket *sock, poll_table * wait);
-int kni_chk_vhost_rx(struct kni_dev *kni);
-int kni_vhost_init(struct kni_dev *kni);
-int kni_vhost_backend_release(struct kni_dev *kni);
-
-struct kni_vhost_queue {
-	struct sock sk;
-	struct socket *sock;
-	int vnet_hdr_sz;
-	struct kni_dev *kni;
-	int sockfd;
-	uint32_t flags;
-	struct sk_buff *cache;
-	struct rte_kni_fifo *fifo;
-};
-
-#endif
-
 void kni_net_rx(struct kni_dev *kni);
 void kni_net_init(struct net_device *dev);
 void kni_net_config_lo_mode(char *lo_str);
diff --git a/lib/librte_eal/linuxapp/kni/kni_fifo.h b/lib/librte_eal/linuxapp/kni/kni_fifo.h
index 025ec1c..14f4141 100644
--- a/lib/librte_eal/linuxapp/kni/kni_fifo.h
+++ b/lib/librte_eal/linuxapp/kni/kni_fifo.h
@@ -91,18 +91,4 @@ kni_fifo_free_count(struct rte_kni_fifo *fifo)
 	return (fifo->read - fifo->write - 1) & (fifo->len - 1);
 }
 
-#ifdef RTE_KNI_VHOST
-/**
- * Initializes the kni fifo structure
- */
-static inline void
-kni_fifo_init(struct rte_kni_fifo *fifo, uint32_t size)
-{
-	fifo->write = 0;
-	fifo->read = 0;
-	fifo->len = size;
-	fifo->elem_size = sizeof(void *);
-}
-#endif
-
 #endif /* _KNI_FIFO_H_ */
diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c b/lib/librte_eal/linuxapp/kni/kni_misc.c
index 33b61f2..f1f6bea 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -140,11 +140,7 @@ kni_thread_single(void *data)
 		down_read(&knet->kni_list_lock);
 		for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
 			list_for_each_entry(dev, &knet->kni_list_head, list) {
-#ifdef RTE_KNI_VHOST
-				kni_chk_vhost_rx(dev);
-#else
 				kni_net_rx(dev);
-#endif
 				kni_net_poll_resp(dev);
 			}
 		}
@@ -167,11 +163,7 @@ kni_thread_multiple(void *param)
 
 	while (!kthread_should_stop()) {
 		for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
-#ifdef RTE_KNI_VHOST
-			kni_chk_vhost_rx(dev);
-#else
 			kni_net_rx(dev);
-#endif
 			kni_net_poll_resp(dev);
 		}
 #ifdef RTE_KNI_PREEMPT_DEFAULT
@@ -248,9 +240,6 @@ kni_release(struct inode *inode, struct file *file)
 			dev->pthread = NULL;
 		}
 
-#ifdef RTE_KNI_VHOST
-		kni_vhost_backend_release(dev);
-#endif
 		kni_dev_remove(dev);
 		list_del(&dev->list);
 	}
@@ -397,10 +386,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	kni->sync_va = dev_info.sync_va;
 	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
 
-#ifdef RTE_KNI_VHOST
-	kni->vhost_queue = NULL;
-	kni->vq_status = BE_STOP;
-#endif
 	kni->mbuf_size = dev_info.mbuf_size;
 
 	pr_debug("tx_phys:      0x%016llx, tx_q addr:      0x%p\n",
@@ -490,10 +475,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 		return -ENODEV;
 	}
 
-#ifdef RTE_KNI_VHOST
-	kni_vhost_init(kni);
-#endif
-
 	ret = kni_run_thread(knet, kni, dev_info.force_bind);
 	if (ret != 0)
 		return ret;
@@ -537,9 +518,6 @@ kni_ioctl_release(struct net *net, uint32_t ioctl_num,
 			dev->pthread = NULL;
 		}
 
-#ifdef RTE_KNI_VHOST
-		kni_vhost_backend_release(dev);
-#endif
 		kni_dev_remove(dev);
 		list_del(&dev->list);
 		ret = 0;
diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c b/lib/librte_eal/linuxapp/kni/kni_net.c
index 4ac99cf..db9f489 100644
--- a/lib/librte_eal/linuxapp/kni/kni_net.c
+++ b/lib/librte_eal/linuxapp/kni/kni_net.c
@@ -198,18 +198,6 @@ kni_net_config(struct net_device *dev, struct ifmap *map)
 /*
  * Transmit a packet (called by the kernel)
  */
-#ifdef RTE_KNI_VHOST
-static int
-kni_net_tx(struct sk_buff *skb, struct net_device *dev)
-{
-	struct kni_dev *kni = netdev_priv(dev);
-
-	dev_kfree_skb(skb);
-	kni->stats.tx_dropped++;
-
-	return NETDEV_TX_OK;
-}
-#else
 static int
 kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 {
@@ -289,7 +277,6 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 
 	return NETDEV_TX_OK;
 }
-#endif
 
 /*
  * RX: normal working mode
diff --git a/lib/librte_eal/linuxapp/kni/kni_vhost.c b/lib/librte_eal/linuxapp/kni/kni_vhost.c
deleted file mode 100644
index f54c34b..0000000
--- a/lib/librte_eal/linuxapp/kni/kni_vhost.c
+++ /dev/null
@@ -1,842 +0,0 @@
-/*-
- * GPL LICENSE SUMMARY
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *
- *   This program is free software; you can redistribute it and/or modify
- *   it under the terms of version 2 of the GNU General Public License as
- *   published by the Free Software Foundation.
- *
- *   This program is distributed in the hope that it will be useful, but
- *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- *   General Public License for more details.
- *
- *   You should have received a copy of the GNU General Public License
- *   along with this program; if not, write to the Free Software
- *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
- *   The full GNU General Public License is included in this distribution
- *   in the file called LICENSE.GPL.
- *
- *   Contact Information:
- *   Intel Corporation
- */
-
-#include <linux/module.h>
-#include <linux/net.h>
-#include <net/sock.h>
-#include <linux/virtio_net.h>
-#include <linux/wait.h>
-#include <linux/mm.h>
-#include <linux/nsproxy.h>
-#include <linux/sched.h>
-#include <linux/if_tun.h>
-#include <linux/version.h>
-#include <linux/file.h>
-
-#include "compat.h"
-#include "kni_dev.h"
-#include "kni_fifo.h"
-
-#define RX_BURST_SZ 4
-
-#ifdef HAVE_STATIC_SOCK_MAP_FD
-static int kni_sock_map_fd(struct socket *sock)
-{
-	struct file *file;
-	int fd = get_unused_fd_flags(0);
-
-	if (fd < 0)
-		return fd;
-
-	file = sock_alloc_file(sock, 0, NULL);
-	if (IS_ERR(file)) {
-		put_unused_fd(fd);
-		return PTR_ERR(file);
-	}
-	fd_install(fd, file);
-	return fd;
-}
-#endif
-
-static struct proto kni_raw_proto = {
-	.name = "kni_vhost",
-	.owner = THIS_MODULE,
-	.obj_size = sizeof(struct kni_vhost_queue),
-};
-
-static inline int
-kni_vhost_net_tx(struct kni_dev *kni, struct msghdr *m,
-		 uint32_t offset, uint32_t len)
-{
-	struct rte_kni_mbuf *pkt_kva = NULL;
-	struct rte_kni_mbuf *pkt_va = NULL;
-	int ret;
-
-	pr_debug("tx offset=%d, len=%d, iovlen=%d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   offset, len, (int)m->msg_iter.iov->iov_len);
-#else
-		   offset, len, (int)m->msg_iov->iov_len);
-#endif
-
-	/**
-	 * Check if it has at least one free entry in tx_q and
-	 * one entry in alloc_q.
-	 */
-	if (kni_fifo_free_count(kni->tx_q) == 0 ||
-	    kni_fifo_count(kni->alloc_q) == 0) {
-		/**
-		 * If no free entry in tx_q or no entry in alloc_q,
-		 * drops skb and goes out.
-		 */
-		goto drop;
-	}
-
-	/* dequeue a mbuf from alloc_q */
-	ret = kni_fifo_get(kni->alloc_q, (void **)&pkt_va, 1);
-	if (likely(ret == 1)) {
-		void *data_kva;
-
-		pkt_kva = (void *)pkt_va - kni->mbuf_va + kni->mbuf_kva;
-		data_kva = pkt_kva->buf_addr + pkt_kva->data_off
-			- kni->mbuf_va + kni->mbuf_kva;
-
-#ifdef HAVE_IOV_ITER_MSGHDR
-		copy_from_iter(data_kva, len, &m->msg_iter);
-#else
-		memcpy_fromiovecend(data_kva, m->msg_iov, offset, len);
-#endif
-
-		if (unlikely(len < ETH_ZLEN)) {
-			memset(data_kva + len, 0, ETH_ZLEN - len);
-			len = ETH_ZLEN;
-		}
-		pkt_kva->pkt_len = len;
-		pkt_kva->data_len = len;
-
-		/* enqueue mbuf into tx_q */
-		ret = kni_fifo_put(kni->tx_q, (void **)&pkt_va, 1);
-		if (unlikely(ret != 1)) {
-			/* Failing should not happen */
-			pr_err("Fail to enqueue mbuf into tx_q\n");
-			goto drop;
-		}
-	} else {
-		/* Failing should not happen */
-		pr_err("Fail to dequeue mbuf from alloc_q\n");
-		goto drop;
-	}
-
-	/* update statistics */
-	kni->stats.tx_bytes += len;
-	kni->stats.tx_packets++;
-
-	return 0;
-
-drop:
-	/* update statistics */
-	kni->stats.tx_dropped++;
-
-	return 0;
-}
-
-static inline int
-kni_vhost_net_rx(struct kni_dev *kni, struct msghdr *m,
-		 uint32_t offset, uint32_t len)
-{
-	uint32_t pkt_len;
-	struct rte_kni_mbuf *kva;
-	struct rte_kni_mbuf *va;
-	void *data_kva;
-	struct sk_buff *skb;
-	struct kni_vhost_queue *q = kni->vhost_queue;
-
-	if (unlikely(q == NULL))
-		return 0;
-
-	/* ensure at least one entry in free_q */
-	if (unlikely(kni_fifo_free_count(kni->free_q) == 0))
-		return 0;
-
-	skb = skb_dequeue(&q->sk.sk_receive_queue);
-	if (unlikely(skb == NULL))
-		return 0;
-
-	kva = (struct rte_kni_mbuf *)skb->data;
-
-	/* free skb to cache */
-	skb->data = NULL;
-	if (unlikely(kni_fifo_put(q->fifo, (void **)&skb, 1) != 1))
-		/* Failing should not happen */
-		pr_err("Fail to enqueue entries into rx cache fifo\n");
-
-	pkt_len = kva->data_len;
-	if (unlikely(pkt_len > len))
-		goto drop;
-
-	pr_debug("rx offset=%d, len=%d, pkt_len=%d, iovlen=%d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   offset, len, pkt_len, (int)m->msg_iter.iov->iov_len);
-#else
-		   offset, len, pkt_len, (int)m->msg_iov->iov_len);
-#endif
-
-	data_kva = kva->buf_addr + kva->data_off - kni->mbuf_va + kni->mbuf_kva;
-#ifdef HAVE_IOV_ITER_MSGHDR
-	if (unlikely(copy_to_iter(data_kva, pkt_len, &m->msg_iter)))
-#else
-	if (unlikely(memcpy_toiovecend(m->msg_iov, data_kva, offset, pkt_len)))
-#endif
-		goto drop;
-
-	/* Update statistics */
-	kni->stats.rx_bytes += pkt_len;
-	kni->stats.rx_packets++;
-
-	/* enqueue mbufs into free_q */
-	va = (void *)kva - kni->mbuf_kva + kni->mbuf_va;
-	if (unlikely(kni_fifo_put(kni->free_q, (void **)&va, 1) != 1))
-		/* Failing should not happen */
-		pr_err("Fail to enqueue entries into free_q\n");
-
-	pr_debug("receive done %d\n", pkt_len);
-
-	return pkt_len;
-
-drop:
-	/* Update drop statistics */
-	kni->stats.rx_dropped++;
-
-	return 0;
-}
-
-static uint32_t
-kni_sock_poll(struct file *file, struct socket *sock, poll_table *wait)
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-	uint32_t mask = 0;
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return POLLERR;
-
-	kni = q->kni;
-#ifdef HAVE_SOCKET_WQ
-	pr_debug("start kni_poll on group %d, wq 0x%16llx\n",
-		  kni->group_id, (uint64_t)sock->wq);
-	poll_wait(file, &sock->wq->wait, wait);
-#else
-	pr_debug("start kni_poll on group %d, wait at 0x%16llx\n",
-		  kni->group_id, (uint64_t)&sock->wait);
-	poll_wait(file, &sock->wait, wait);
-#endif
-
-	if (kni_fifo_count(kni->rx_q) > 0)
-		mask |= POLLIN | POLLRDNORM;
-
-	if (sock_writeable(&q->sk) ||
-#ifdef SOCKWQ_ASYNC_NOSPACE
-		(!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &q->sock->flags) &&
-			sock_writeable(&q->sk)))
-#else
-		(!test_and_set_bit(SOCK_ASYNC_NOSPACE, &q->sock->flags) &&
-			sock_writeable(&q->sk)))
-#endif
-		mask |= POLLOUT | POLLWRNORM;
-
-	return mask;
-}
-
-static inline void
-kni_vhost_enqueue(struct kni_dev *kni, struct kni_vhost_queue *q,
-		  struct sk_buff *skb, struct rte_kni_mbuf *va)
-{
-	struct rte_kni_mbuf *kva;
-
-	kva = (void *)(va) - kni->mbuf_va + kni->mbuf_kva;
-	(skb)->data = (unsigned char *)kva;
-	(skb)->len = kva->data_len;
-	skb_queue_tail(&q->sk.sk_receive_queue, skb);
-}
-
-static inline void
-kni_vhost_enqueue_burst(struct kni_dev *kni, struct kni_vhost_queue *q,
-	  struct sk_buff **skb, struct rte_kni_mbuf **va)
-{
-	int i;
-
-	for (i = 0; i < RX_BURST_SZ; skb++, va++, i++)
-		kni_vhost_enqueue(kni, q, *skb, *va);
-}
-
-int
-kni_chk_vhost_rx(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q = kni->vhost_queue;
-	uint32_t nb_in, nb_mbuf, nb_skb;
-	const uint32_t BURST_MASK = RX_BURST_SZ - 1;
-	uint32_t nb_burst, nb_backlog, i;
-	struct sk_buff *skb[RX_BURST_SZ];
-	struct rte_kni_mbuf *va[RX_BURST_SZ];
-
-	if (unlikely(BE_STOP & kni->vq_status)) {
-		kni->vq_status |= BE_FINISH;
-		return 0;
-	}
-
-	if (unlikely(q == NULL))
-		return 0;
-
-	nb_skb = kni_fifo_count(q->fifo);
-	nb_mbuf = kni_fifo_count(kni->rx_q);
-
-	nb_in = min(nb_mbuf, nb_skb);
-	nb_in = min_t(uint32_t, nb_in, RX_BURST_SZ);
-	nb_burst   = (nb_in & ~BURST_MASK);
-	nb_backlog = (nb_in & BURST_MASK);
-
-	/* enqueue skb_queue per BURST_SIZE bulk */
-	if (nb_burst != 0) {
-		if (unlikely(kni_fifo_get(kni->rx_q, (void **)&va, RX_BURST_SZ)
-				!= RX_BURST_SZ))
-			goto except;
-
-		if (unlikely(kni_fifo_get(q->fifo, (void **)&skb, RX_BURST_SZ)
-				!= RX_BURST_SZ))
-			goto except;
-
-		kni_vhost_enqueue_burst(kni, q, skb, va);
-	}
-
-	/* all leftover, do one by one */
-	for (i = 0; i < nb_backlog; ++i) {
-		if (unlikely(kni_fifo_get(kni->rx_q, (void **)&va, 1) != 1))
-			goto except;
-
-		if (unlikely(kni_fifo_get(q->fifo, (void **)&skb, 1) != 1))
-			goto except;
-
-		kni_vhost_enqueue(kni, q, *skb, *va);
-	}
-
-	/* Ondemand wake up */
-	if ((nb_in == RX_BURST_SZ) || (nb_skb == 0) ||
-	    ((nb_mbuf < RX_BURST_SZ) && (nb_mbuf != 0))) {
-		wake_up_interruptible_poll(sk_sleep(&q->sk),
-				   POLLIN | POLLRDNORM | POLLRDBAND);
-		pr_debug("RX CHK KICK nb_mbuf %d, nb_skb %d, nb_in %d\n",
-			   nb_mbuf, nb_skb, nb_in);
-	}
-
-	return 0;
-
-except:
-	/* Failing should not happen */
-	pr_err("Fail to enqueue fifo, it shouldn't happen\n");
-	BUG_ON(1);
-
-	return 0;
-}
-
-static int
-#ifdef HAVE_KIOCB_MSG_PARAM
-kni_sock_sndmsg(struct kiocb *iocb, struct socket *sock,
-	   struct msghdr *m, size_t total_len)
-#else
-kni_sock_sndmsg(struct socket *sock,
-	   struct msghdr *m, size_t total_len)
-#endif /* HAVE_KIOCB_MSG_PARAM */
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	int vnet_hdr_len = 0;
-	unsigned long len = total_len;
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return 0;
-
-	pr_debug("kni_sndmsg len %ld, flags 0x%08x, nb_iov %d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   len, q->flags, (int)m->msg_iter.iov->iov_len);
-#else
-		   len, q->flags, (int)m->msg_iovlen);
-#endif
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	if (likely(q->flags & IFF_VNET_HDR)) {
-		vnet_hdr_len = q->vnet_hdr_sz;
-		if (unlikely(len < vnet_hdr_len))
-			return -EINVAL;
-		len -= vnet_hdr_len;
-	}
-#endif
-
-	if (unlikely(len < ETH_HLEN + q->vnet_hdr_sz))
-		return -EINVAL;
-
-	return kni_vhost_net_tx(q->kni, m, vnet_hdr_len, len);
-}
-
-static int
-#ifdef HAVE_KIOCB_MSG_PARAM
-kni_sock_rcvmsg(struct kiocb *iocb, struct socket *sock,
-	   struct msghdr *m, size_t len, int flags)
-#else
-kni_sock_rcvmsg(struct socket *sock,
-	   struct msghdr *m, size_t len, int flags)
-#endif /* HAVE_KIOCB_MSG_PARAM */
-{
-	int vnet_hdr_len = 0;
-	int pkt_len = 0;
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	static struct virtio_net_hdr
-		__attribute__ ((unused)) vnet_hdr = {
-		.flags = 0,
-		.gso_type = VIRTIO_NET_HDR_GSO_NONE
-	};
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return 0;
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	if (likely(q->flags & IFF_VNET_HDR)) {
-		vnet_hdr_len = q->vnet_hdr_sz;
-		len -= vnet_hdr_len;
-		if (len < 0)
-			return -EINVAL;
-	}
-#endif
-
-	pkt_len = kni_vhost_net_rx(q->kni, m, vnet_hdr_len, len);
-	if (unlikely(pkt_len == 0))
-		return 0;
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	/* no need to copy hdr when no pkt received */
-#ifdef HAVE_IOV_ITER_MSGHDR
-	if (unlikely(copy_to_iter((void *)&vnet_hdr, vnet_hdr_len,
-		&m->msg_iter)))
-#else
-	if (unlikely(memcpy_toiovecend(m->msg_iov,
-		(void *)&vnet_hdr, 0, vnet_hdr_len)))
-#endif /* HAVE_IOV_ITER_MSGHDR */
-		return -EFAULT;
-#endif /* RTE_KNI_VHOST_VNET_HDR_EN */
-	pr_debug("kni_rcvmsg expect_len %ld, flags 0x%08x, pkt_len %d\n",
-		   (unsigned long)len, q->flags, pkt_len);
-
-	return pkt_len + vnet_hdr_len;
-}
-
-/* dummy tap like ioctl */
-static int
-kni_sock_ioctl(struct socket *sock, uint32_t cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-	struct ifreq __user *ifr = argp;
-	uint32_t __user *up = argp;
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-	uint32_t u;
-	int __user *sp = argp;
-	int s;
-	int ret;
-
-	pr_debug("tap ioctl cmd 0x%08x\n", cmd);
-
-	switch (cmd) {
-	case TUNSETIFF:
-		pr_debug("TUNSETIFF\n");
-		/* ignore the name, just look at flags */
-		if (get_user(u, &ifr->ifr_flags))
-			return -EFAULT;
-
-		ret = 0;
-		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
-			ret = -EINVAL;
-		else
-			q->flags = u;
-
-		return ret;
-
-	case TUNGETIFF:
-		pr_debug("TUNGETIFF\n");
-		rcu_read_lock_bh();
-		kni = rcu_dereference_bh(q->kni);
-		if (kni)
-			dev_hold(kni->net_dev);
-		rcu_read_unlock_bh();
-
-		if (!kni)
-			return -ENOLINK;
-
-		ret = 0;
-		if (copy_to_user(&ifr->ifr_name, kni->net_dev->name, IFNAMSIZ)
-				|| put_user(q->flags, &ifr->ifr_flags))
-			ret = -EFAULT;
-		dev_put(kni->net_dev);
-		return ret;
-
-	case TUNGETFEATURES:
-		pr_debug("TUNGETFEATURES\n");
-		u = IFF_TAP | IFF_NO_PI;
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-		u |= IFF_VNET_HDR;
-#endif
-		if (put_user(u, up))
-			return -EFAULT;
-		return 0;
-
-	case TUNSETSNDBUF:
-		pr_debug("TUNSETSNDBUF\n");
-		if (get_user(u, up))
-			return -EFAULT;
-
-		q->sk.sk_sndbuf = u;
-		return 0;
-
-	case TUNGETVNETHDRSZ:
-		s = q->vnet_hdr_sz;
-		if (put_user(s, sp))
-			return -EFAULT;
-		pr_debug("TUNGETVNETHDRSZ %d\n", s);
-		return 0;
-
-	case TUNSETVNETHDRSZ:
-		if (get_user(s, sp))
-			return -EFAULT;
-		if (s < (int)sizeof(struct virtio_net_hdr))
-			return -EINVAL;
-
-		pr_debug("TUNSETVNETHDRSZ %d\n", s);
-		q->vnet_hdr_sz = s;
-		return 0;
-
-	case TUNSETOFFLOAD:
-		pr_debug("TUNSETOFFLOAD %lx\n", arg);
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-		/* not support any offload yet */
-		if (!(q->flags & IFF_VNET_HDR))
-			return  -EINVAL;
-
-		return 0;
-#else
-		return -EINVAL;
-#endif
-
-	default:
-		pr_debug("NOT SUPPORT\n");
-		return -EINVAL;
-	}
-}
-
-static int
-kni_sock_compat_ioctl(struct socket *sock, uint32_t cmd,
-		     unsigned long arg)
-{
-	/* 32 bits app on 64 bits OS to be supported later */
-	pr_debug("Not implemented.\n");
-
-	return -EINVAL;
-}
-
-#define KNI_VHOST_WAIT_WQ_SAFE()                        \
-do {							\
-	while ((BE_FINISH | BE_STOP) == kni->vq_status) \
-		msleep(1);				\
-} while (0)						\
-
-
-static int
-kni_sock_release(struct socket *sock)
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-
-	if (q == NULL)
-		return 0;
-
-	kni = q->kni;
-	if (kni != NULL) {
-		kni->vq_status = BE_STOP;
-		KNI_VHOST_WAIT_WQ_SAFE();
-		kni->vhost_queue = NULL;
-		q->kni = NULL;
-	}
-
-	if (q->sockfd != -1)
-		q->sockfd = -1;
-
-	sk_set_socket(&q->sk, NULL);
-	sock->sk = NULL;
-
-	sock_put(&q->sk);
-
-	pr_debug("dummy sock release done\n");
-
-	return 0;
-}
-
-int
-kni_sock_getname(struct socket *sock, struct sockaddr *addr,
-		int *sockaddr_len, int peer)
-{
-	pr_debug("dummy sock getname\n");
-	((struct sockaddr_ll *)addr)->sll_family = AF_PACKET;
-	return 0;
-}
-
-static const struct proto_ops kni_socket_ops = {
-	.getname = kni_sock_getname,
-	.sendmsg = kni_sock_sndmsg,
-	.recvmsg = kni_sock_rcvmsg,
-	.release = kni_sock_release,
-	.poll    = kni_sock_poll,
-	.ioctl   = kni_sock_ioctl,
-	.compat_ioctl = kni_sock_compat_ioctl,
-};
-
-static void
-kni_sk_write_space(struct sock *sk)
-{
-	wait_queue_head_t *wqueue;
-
-	if (!sock_writeable(sk) ||
-#ifdef SOCKWQ_ASYNC_NOSPACE
-	    !test_and_clear_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags))
-#else
-	    !test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags))
-#endif
-		return;
-	wqueue = sk_sleep(sk);
-	if (wqueue && waitqueue_active(wqueue))
-		wake_up_interruptible_poll(
-			wqueue, POLLOUT | POLLWRNORM | POLLWRBAND);
-}
-
-static void
-kni_sk_destruct(struct sock *sk)
-{
-	struct kni_vhost_queue *q =
-		container_of(sk, struct kni_vhost_queue, sk);
-
-	if (!q)
-		return;
-
-	/* make sure there's no packet in buffer */
-	while (skb_dequeue(&sk->sk_receive_queue) != NULL)
-		;
-
-	mb();
-
-	if (q->fifo != NULL) {
-		kfree(q->fifo);
-		q->fifo = NULL;
-	}
-
-	if (q->cache != NULL) {
-		kfree(q->cache);
-		q->cache = NULL;
-	}
-}
-
-static int
-kni_vhost_backend_init(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q;
-	struct net *net = current->nsproxy->net_ns;
-	int err, i, sockfd;
-	struct rte_kni_fifo *fifo;
-	struct sk_buff *elem;
-
-	if (kni->vhost_queue != NULL)
-		return -1;
-
-#ifdef HAVE_SK_ALLOC_KERN_PARAM
-	q = (struct kni_vhost_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-			&kni_raw_proto, 0);
-#else
-	q = (struct kni_vhost_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-			&kni_raw_proto);
-#endif
-	if (!q)
-		return -ENOMEM;
-
-	err = sock_create_lite(AF_UNSPEC, SOCK_RAW, IPPROTO_RAW, &q->sock);
-	if (err)
-		goto free_sk;
-
-	sockfd = kni_sock_map_fd(q->sock);
-	if (sockfd < 0) {
-		err = sockfd;
-		goto free_sock;
-	}
-
-	/* cache init */
-	q->cache = kzalloc(
-		RTE_KNI_VHOST_MAX_CACHE_SIZE * sizeof(struct sk_buff),
-		GFP_KERNEL);
-	if (!q->cache)
-		goto free_fd;
-
-	fifo = kzalloc(RTE_KNI_VHOST_MAX_CACHE_SIZE * sizeof(void *)
-			+ sizeof(struct rte_kni_fifo), GFP_KERNEL);
-	if (!fifo)
-		goto free_cache;
-
-	kni_fifo_init(fifo, RTE_KNI_VHOST_MAX_CACHE_SIZE);
-
-	for (i = 0; i < RTE_KNI_VHOST_MAX_CACHE_SIZE; i++) {
-		elem = &q->cache[i];
-		kni_fifo_put(fifo, (void **)&elem, 1);
-	}
-	q->fifo = fifo;
-
-	/* store sockfd in vhost_queue */
-	q->sockfd = sockfd;
-
-	/* init socket */
-	q->sock->type = SOCK_RAW;
-	q->sock->state = SS_CONNECTED;
-	q->sock->ops = &kni_socket_ops;
-	sock_init_data(q->sock, &q->sk);
-
-	/* init sock data */
-	q->sk.sk_write_space = kni_sk_write_space;
-	q->sk.sk_destruct = kni_sk_destruct;
-	q->flags = IFF_NO_PI | IFF_TAP;
-	q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	q->flags |= IFF_VNET_HDR;
-#endif
-
-	/* bind kni_dev with vhost_queue */
-	q->kni = kni;
-	kni->vhost_queue = q;
-
-	wmb();
-
-	kni->vq_status = BE_START;
-
-#ifdef HAVE_SOCKET_WQ
-	pr_debug("backend init sockfd=%d, sock->wq=0x%16llx,sk->sk_wq=0x%16llx",
-		  q->sockfd, (uint64_t)q->sock->wq,
-		  (uint64_t)q->sk.sk_wq);
-#else
-	pr_debug("backend init sockfd=%d, sock->wait at 0x%16llx,sk->sk_sleep=0x%16llx",
-		  q->sockfd, (uint64_t)&q->sock->wait,
-		  (uint64_t)q->sk.sk_sleep);
-#endif
-
-	return 0;
-
-free_cache:
-	kfree(q->cache);
-	q->cache = NULL;
-
-free_fd:
-	put_unused_fd(sockfd);
-
-free_sock:
-	q->kni = NULL;
-	kni->vhost_queue = NULL;
-	kni->vq_status |= BE_FINISH;
-	sock_release(q->sock);
-	q->sock->ops = NULL;
-	q->sock = NULL;
-
-free_sk:
-	sk_free((struct sock *)q);
-
-	return err;
-}
-
-/* kni vhost sock sysfs */
-static ssize_t
-show_sock_fd(struct device *dev, struct device_attribute *attr,
-	     char *buf)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-	int sockfd = -1;
-
-	if (kni->vhost_queue != NULL)
-		sockfd = kni->vhost_queue->sockfd;
-	return snprintf(buf, 10, "%d\n", sockfd);
-}
-
-static ssize_t
-show_sock_en(struct device *dev, struct device_attribute *attr,
-	     char *buf)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-
-	return snprintf(buf, 10, "%u\n", (kni->vhost_queue == NULL ? 0 : 1));
-}
-
-static ssize_t
-set_sock_en(struct device *dev, struct device_attribute *attr,
-	      const char *buf, size_t count)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-	unsigned long en;
-	int err = 0;
-
-	if (kstrtoul(buf, 0, &en) != 0)
-		return -EINVAL;
-
-	if (en)
-		err = kni_vhost_backend_init(kni);
-
-	return err ? err : count;
-}
-
-static DEVICE_ATTR(sock_fd, S_IRUGO | S_IRUSR, show_sock_fd, NULL);
-static DEVICE_ATTR(sock_en, S_IRUGO | S_IWUSR, show_sock_en, set_sock_en);
-static struct attribute *dev_attrs[] = {
-	&dev_attr_sock_fd.attr,
-	&dev_attr_sock_en.attr,
-	NULL,
-};
-
-static const struct attribute_group dev_attr_grp = {
-	.attrs = dev_attrs,
-};
-
-int
-kni_vhost_backend_release(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q = kni->vhost_queue;
-
-	if (q == NULL)
-		return 0;
-
-	/* dettach from kni */
-	q->kni = NULL;
-
-	pr_debug("release backend done\n");
-
-	return 0;
-}
-
-int
-kni_vhost_init(struct kni_dev *kni)
-{
-	struct net_device *dev = kni->net_dev;
-
-	if (sysfs_create_group(&dev->dev.kobj, &dev_attr_grp))
-		sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-
-	kni->vq_status = BE_STOP;
-
-	pr_debug("kni_vhost_init done\n");
-
-	return 0;
-}
-- 
2.9.3

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH 3/3] doc: remove deprecation notice
  @ 2017-02-17 12:01  5% ` Fan Zhang
  0 siblings, 0 replies; 200+ results
From: Fan Zhang @ 2017-02-17 12:01 UTC (permalink / raw)
  To: dev; +Cc: pablo.de.lara.guarch

Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9d4dfcc..3e17b20 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -119,10 +119,6 @@ Deprecation Notices
   or VHOST feature, but KNI_VHOST which is a KNI feature enabled via a compile
   time option, and disabled by default.
 
-* ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
-  A pointer to a rte_cryptodev_config structure will be added to the
-  function prototype ``cryptodev_configure_t``, as a new parameter.
-
 * cryptodev: A new parameter ``max_nb_sessions_per_qp`` will be added to
   ``rte_cryptodev_info.sym``. Some drivers may support limited number of
   sessions per queue_pair. With this new parameter application will know
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH] lpm: extend IPv6 next hop field
@ 2017-02-19 17:14  4% Vladyslav Buslov
  2017-02-21 14:46  4% ` [dpdk-dev] [PATCH v2] " Vladyslav Buslov
  0 siblings, 1 reply; 200+ results
From: Vladyslav Buslov @ 2017-02-19 17:14 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev

This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
---
 app/test/test_lpm6.c                            | 114 ++++++++++++++------
 app/test/test_lpm6_perf.c                       |   4 +-
 doc/guides/prog_guide/lpm6_lib.rst              |   2 +-
 doc/guides/rel_notes/release_17_05.rst          |   5 +
 examples/ip_fragmentation/main.c                |  16 +--
 examples/ip_reassembly/main.c                   |  16 +--
 examples/ipsec-secgw/ipsec-secgw.c              |   2 +-
 examples/l3fwd/l3fwd_lpm_sse.h                  |  20 ++--
 examples/performance-thread/l3fwd-thread/main.c |   9 +-
 lib/librte_lpm/rte_lpm6.c                       | 133 +++++++++++++++++++++---
 lib/librte_lpm/rte_lpm6.h                       |  29 +++++-
 lib/librte_lpm/rte_lpm_version.map              |  10 ++
 lib/librte_table/rte_table_lpm_ipv6.c           |   9 +-
 13 files changed, 282 insertions(+), 87 deletions(-)

diff --git a/app/test/test_lpm6.c b/app/test/test_lpm6.c
index 61134f7..2950aae 100644
--- a/app/test/test_lpm6.c
+++ b/app/test/test_lpm6.c
@@ -79,6 +79,7 @@ static int32_t test24(void);
 static int32_t test25(void);
 static int32_t test26(void);
 static int32_t test27(void);
+static int32_t test28(void);
 
 rte_lpm6_test tests6[] = {
 /* Test Cases */
@@ -110,6 +111,7 @@ rte_lpm6_test tests6[] = {
 	test25,
 	test26,
 	test27,
+	test28,
 };
 
 #define NUM_LPM6_TESTS                (sizeof(tests6)/sizeof(tests6[0]))
@@ -354,7 +356,7 @@ test6(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -392,7 +394,7 @@ test7(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[10][16];
-	int16_t next_hop_return[10];
+	int32_t next_hop_return[10];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -469,7 +471,8 @@ test9(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 16, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 	uint8_t i;
 
@@ -513,7 +516,8 @@ test10(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -557,7 +561,8 @@ test11(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -617,7 +622,8 @@ test12(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -655,7 +661,8 @@ test13(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = 2;
@@ -702,7 +709,8 @@ test14(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 25, next_hop_add = 100;
+	uint8_t depth = 25;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -748,7 +756,8 @@ test15(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 24, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 24;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -784,7 +793,8 @@ test16(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {12,12,1,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 128, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 128;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -828,7 +838,8 @@ test17(void)
 	uint8_t ip1[] = {127,255,255,255,255,255,255,255,255,
 			255,255,255,255,255,255,255};
 	uint8_t ip2[] = {128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -857,7 +868,7 @@ test17(void)
 
 	/* Loop with rte_lpm6_delete. */
 	for (depth = 16; depth >= 1; depth--) {
-		next_hop_add = (uint8_t) (depth - 1);
+		next_hop_add = (depth - 1);
 
 		status = rte_lpm6_delete(lpm, ip2, depth);
 		TEST_LPM_ASSERT(status == 0);
@@ -893,8 +904,9 @@ test18(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16], ip_1[16], ip_2[16];
-	uint8_t depth, depth_1, depth_2, next_hop_add, next_hop_add_1,
-		next_hop_add_2, next_hop_return;
+	uint8_t depth, depth_1, depth_2;
+	uint32_t next_hop_add, next_hop_add_1,
+			next_hop_add_2, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1055,7 +1067,8 @@ test19(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1253,7 +1266,8 @@ test20(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1320,8 +1334,9 @@ test21(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[4][16];
-	uint8_t depth, next_hop_add;
-	int16_t next_hop_return[4];
+	uint8_t depth;
+	uint32_t next_hop_add;
+	int32_t next_hop_return[4];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1378,8 +1393,9 @@ test22(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[5][16];
-	uint8_t depth[5], next_hop_add;
-	int16_t next_hop_return[5];
+	uint8_t depth[5];
+	uint32_t next_hop_add;
+	int32_t next_hop_return[5];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1495,7 +1511,8 @@ test23(void)
 	struct rte_lpm6_config config;
 	uint32_t i;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1579,7 +1596,8 @@ test25(void)
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
 	uint32_t i;
-	uint8_t depth, next_hop_add, next_hop_return, next_hop_expected;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return, next_hop_expected;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1632,10 +1650,10 @@ test26(void)
 	uint8_t d_ip_10_32 = 32;
 	uint8_t	d_ip_10_24 = 24;
 	uint8_t	d_ip_20_25 = 25;
-	uint8_t next_hop_ip_10_32 = 100;
-	uint8_t	next_hop_ip_10_24 = 105;
-	uint8_t	next_hop_ip_20_25 = 111;
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_ip_10_32 = 100;
+	uint32_t next_hop_ip_10_24 = 105;
+	uint32_t next_hop_ip_20_25 = 111;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1650,7 +1668,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_32, &next_hop_return);
-	uint8_t test_hop_10_32 = next_hop_return;
+	uint32_t test_hop_10_32 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_32);
 
@@ -1659,7 +1677,7 @@ test26(void)
 			return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_24, &next_hop_return);
-	uint8_t test_hop_10_24 = next_hop_return;
+	uint32_t test_hop_10_24 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_24);
 
@@ -1668,7 +1686,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_20_25, &next_hop_return);
-	uint8_t test_hop_20_25 = next_hop_return;
+	uint32_t test_hop_20_25 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_20_25);
 
@@ -1707,7 +1725,8 @@ test27(void)
 		struct rte_lpm6 *lpm = NULL;
 		struct rte_lpm6_config config;
 		uint8_t ip[] = {128,128,128,128,128,128,128,128,128,128,128,128,128,128,0,0};
-		uint8_t depth = 128, next_hop_add = 100, next_hop_return;
+		uint8_t depth = 128;
+		uint32_t next_hop_add = 100, next_hop_return;
 		int32_t status = 0;
 		int i, j;
 
@@ -1746,6 +1765,41 @@ test27(void)
 }
 
 /*
+ * Call add, lookup and delete for a single rule with maximum 21bit next_hop size.
+ * Check that next_hop returned from lookup is equal to provisioned value.
+ * Delete the rule and check that the same test returs a miss.
+ */
+int32_t
+test28(void)
+{
+	struct rte_lpm6 *lpm = NULL;
+	struct rte_lpm6_config config;
+	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 0x001FFFFF, next_hop_return = 0;
+	int32_t status = 0;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	status = rte_lpm6_add(lpm, ip, depth, next_hop_add);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm6_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add));
+
+	status = rte_lpm6_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	rte_lpm6_free(lpm);
+
+	return PASS;
+}
+
+/*
  * Do all unit tests.
  */
 static int
diff --git a/app/test/test_lpm6_perf.c b/app/test/test_lpm6_perf.c
index 0723081..30be430 100644
--- a/app/test/test_lpm6_perf.c
+++ b/app/test/test_lpm6_perf.c
@@ -86,7 +86,7 @@ test_lpm6_perf(void)
 	struct rte_lpm6_config config;
 	uint64_t begin, total_time;
 	unsigned i, j;
-	uint8_t next_hop_add = 0xAA, next_hop_return = 0;
+	uint32_t next_hop_add = 0xAA, next_hop_return = 0;
 	int status = 0;
 	int64_t count = 0;
 
@@ -148,7 +148,7 @@ test_lpm6_perf(void)
 	count = 0;
 
 	uint8_t ip_batch[NUM_IPS_ENTRIES][16];
-	int16_t next_hops[NUM_IPS_ENTRIES];
+	int32_t next_hops[NUM_IPS_ENTRIES];
 
 	for (i = 0; i < NUM_IPS_ENTRIES; i++)
 		memcpy(ip_batch[i], large_ips_table[i].ip, 16);
diff --git a/doc/guides/prog_guide/lpm6_lib.rst b/doc/guides/prog_guide/lpm6_lib.rst
index 0aea5c5..f791507 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -53,7 +53,7 @@ several thousand IPv6 rules, but the number can vary depending on the case.
 An LPM prefix is represented by a pair of parameters (128-bit key, depth), with depth in the range of 1 to 128.
 An LPM rule is represented by an LPM prefix and some user data associated with the prefix.
 The prefix serves as the unique identifier for the LPM rule.
-In this implementation, the user data is 1-byte long and is called "next hop",
+In this implementation, the user data is 21-bits long and is called "next hop",
 which corresponds to its main use of storing the ID of the next hop in a routing table entry.
 
 The main methods exported for the LPM component are:
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 48fb5bd..723e085 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,6 +41,9 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Increased number of next hops for LPM IPv6 to 2^21.**
+
+  The next_hop field is extended from 8 bits to 21 bits for IPv6.
 
 Resolved Issues
 ---------------
@@ -110,6 +113,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
+  while keeping ABI compatibility.
 
 ABI Changes
 -----------
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index e1e32c6..51035f5 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -265,8 +265,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		uint8_t queueid, uint8_t port_in)
 {
 	struct rx_queue *rxq;
-	uint32_t i, len, next_hop_ipv4;
-	uint8_t next_hop_ipv6, port_out, ipv6;
+	uint32_t i, len, next_hop;
+	uint8_t port_out, ipv6;
 	int32_t len2;
 
 	ipv6 = 0;
@@ -290,9 +290,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			port_out = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
@@ -326,9 +326,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_hdr = rte_pktmbuf_mtod(m, struct ipv6_hdr *);
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			port_out = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 50fe422..50730a2 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -346,8 +346,8 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	struct rte_ip_frag_death_row *dr;
 	struct rx_queue *rxq;
 	void *d_addr_bytes;
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6, dst_port;
+	uint32_t next_hop;
+	uint8_t dst_port;
 
 	rxq = &qconf->rx_queue_list[queue];
 
@@ -390,9 +390,9 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			dst_port = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
@@ -427,9 +427,9 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			dst_port = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv6);
diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
index 5a4c9b7..5744c46 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -618,7 +618,7 @@ route4_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 static inline void
 route6_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 {
-	int16_t hop[MAX_PKT_BURST * 2];
+	int32_t hop[MAX_PKT_BURST * 2];
 	uint8_t dst_ip[MAX_PKT_BURST * 2][16];
 	uint8_t *ip6_dst;
 	uint16_t i, offset;
diff --git a/examples/l3fwd/l3fwd_lpm_sse.h b/examples/l3fwd/l3fwd_lpm_sse.h
index 538fe3d..1ef70d3 100644
--- a/examples/l3fwd/l3fwd_lpm_sse.h
+++ b/examples/l3fwd/l3fwd_lpm_sse.h
@@ -40,8 +40,7 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ipv4_hdr *ipv4_hdr;
 	struct ether_hdr *eth_hdr;
@@ -52,8 +51,8 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct,
-				rte_be_to_cpu_32(ipv4_hdr->dst_addr), &next_hop_ipv4) == 0) ?
-						next_hop_ipv4 : portid);
+				rte_be_to_cpu_32(ipv4_hdr->dst_addr), &next_hop) == 0) ?
+						next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -61,8 +60,8 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
@@ -78,14 +77,13 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
-			&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+			&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -93,8 +91,8 @@ lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
diff --git a/examples/performance-thread/l3fwd-thread/main.c b/examples/performance-thread/l3fwd-thread/main.c
index 53083df..510d6e8 100644
--- a/examples/performance-thread/l3fwd-thread/main.c
+++ b/examples/performance-thread/l3fwd-thread/main.c
@@ -909,7 +909,7 @@ static inline uint8_t
 get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid,
 		lookup6_struct_t *ipv6_l3fwd_lookup_struct)
 {
-	uint8_t next_hop;
+	uint32_t next_hop;
 
 	return (uint8_t) ((rte_lpm6_lookup(ipv6_l3fwd_lookup_struct,
 			((struct ipv6_hdr *)ipv6_hdr)->dst_addr, &next_hop) == 0) ?
@@ -1396,15 +1396,14 @@ rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
 static inline __attribute__((always_inline)) uint16_t
 get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv4_lookup_struct, dst_ipv4,
-				&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+				&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -1413,7 +1412,7 @@ get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 
 		return (uint16_t) ((rte_lpm6_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0) ? next_hop_ipv6 :
+				ipv6_hdr->dst_addr, &next_hop) == 0) ? next_hop :
 						portid);
 
 	}
diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 32fdba0..8915fff 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -97,7 +97,7 @@ struct rte_lpm6_tbl_entry {
 /** Rules tbl entry structure. */
 struct rte_lpm6_rule {
 	uint8_t ip[RTE_LPM6_IPV6_ADDR_SIZE]; /**< Rule IP address. */
-	uint8_t next_hop; /**< Rule next hop. */
+	uint32_t next_hop; /**< Rule next hop. */
 	uint8_t depth; /**< Rule depth. */
 };
 
@@ -297,7 +297,7 @@ rte_lpm6_free(struct rte_lpm6 *lpm)
  * the nexthop if so. Otherwise it adds a new rule if enough space is available.
  */
 static inline int32_t
-rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
+rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint32_t next_hop, uint8_t depth)
 {
 	uint32_t rule_index;
 
@@ -340,7 +340,7 @@ rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
  */
 static void
 expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
-		uint8_t next_hop)
+		uint32_t next_hop)
 {
 	uint32_t tbl8_group_end, tbl8_gindex_next, j;
 
@@ -377,7 +377,7 @@ expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
 static inline int
 add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
 		struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip, uint8_t bytes,
-		uint8_t first_byte, uint8_t depth, uint8_t next_hop)
+		uint8_t first_byte, uint8_t depth, uint32_t next_hop)
 {
 	uint32_t tbl_index, tbl_range, tbl8_group_start, tbl8_group_end, i;
 	int32_t tbl8_gindex;
@@ -507,9 +507,17 @@ add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
  * Add a route
  */
 int
-rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop)
 {
+	return rte_lpm6_add_v1705(lpm, ip, depth, next_hop);
+}
+VERSION_SYMBOL(rte_lpm6_add, _v20, 2.0);
+
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop)
+{
 	struct rte_lpm6_tbl_entry *tbl;
 	struct rte_lpm6_tbl_entry *tbl_next;
 	int32_t rule_index;
@@ -560,6 +568,9 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_add, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+				uint32_t next_hop), rte_lpm6_add_v1705);
 
 /*
  * Takes a pointer to a table entry and inspect one level.
@@ -569,7 +580,7 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 static inline int
 lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		const struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip,
-		uint8_t first_byte, uint8_t *next_hop)
+		uint8_t first_byte, uint32_t *next_hop)
 {
 	uint32_t tbl8_index, tbl_entry;
 
@@ -589,7 +600,7 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		return 1;
 	} else {
 		/* If not extended then we can have a match. */
-		*next_hop = (uint8_t)tbl_entry;
+		*next_hop = ((uint32_t)tbl_entry & RTE_LPM6_TBL8_BITMASK);
 		return (tbl_entry & RTE_LPM6_LOOKUP_SUCCESS) ? 0 : -ENOENT;
 	}
 }
@@ -598,7 +609,26 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
  * Looks up an IP
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL) {
+		return -EINVAL;
+	}
+
+	status = rte_lpm6_lookup_v1705(lpm, ip, &next_hop32);
+	if (status == 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+}
+VERSION_SYMBOL(rte_lpm6_lookup, _v20, 2.0);
+
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop)
 {
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
@@ -625,20 +655,23 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip,
+				uint32_t *next_hop), rte_lpm6_lookup_v1705);
 
 /*
  * Looks up a group of IP addresses
  */
 int
-rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n)
 {
 	unsigned i;
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
-	uint32_t tbl24_index;
-	uint8_t first_byte, next_hop;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
 	int status;
 
 	/* DEBUG: Check user input arguments. */
@@ -664,11 +697,58 @@ rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		if (status < 0)
 			next_hops[i] = -1;
 		else
-			next_hops[i] = next_hop;
+			next_hops[i] = (int16_t)next_hop;
 	}
 
 	return 0;
 }
+VERSION_SYMBOL(rte_lpm6_lookup_bulk_func, _v20, 2.0);
+
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t * next_hops, unsigned n)
+{
+	unsigned i;
+	const struct rte_lpm6_tbl_entry *tbl;
+	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
+	int status;
+
+	/* DEBUG: Check user input arguments. */
+	if ((lpm == NULL) || (ips == NULL) || (next_hops == NULL)) {
+		return -EINVAL;
+	}
+
+	for (i = 0; i < n; i++) {
+		first_byte = LOOKUP_FIRST_BYTE;
+		tbl24_index = (ips[i][0] << BYTES2_SIZE) |
+				(ips[i][1] << BYTE_SIZE) | ips[i][2];
+
+		/* Calculate pointer to the first entry to be inspected */
+		tbl = &lpm->tbl24[tbl24_index];
+
+		do {
+			/* Continue inspecting following levels until success or failure */
+			status = lookup_step(lpm, tbl, &tbl_next, ips[i], first_byte++,
+					&next_hop);
+			tbl = tbl_next;
+		} while (status == 1);
+
+		if (status < 0)
+			next_hops[i] = -1;
+		else
+			next_hops[i] = (int32_t)next_hop;
+	}
+
+	return 0;
+}
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup_bulk_func, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+				uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+				int32_t * next_hops, unsigned n),
+		rte_lpm6_lookup_bulk_func_v1705);
 
 /*
  * Finds a rule in rule table.
@@ -698,8 +778,29 @@ rule_find(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
  * Look for a rule in the high-level rules table
  */
 int
-rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop)
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL) {
+		return -EINVAL;
+	}
+
+	status = rte_lpm6_is_rule_present_v1705(lpm, ip, depth, &next_hop32);
+	if (status > 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+
+}
+VERSION_SYMBOL(rte_lpm6_is_rule_present, _v20, 2.0);
+
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop)
 {
 	uint8_t ip_masked[RTE_LPM6_IPV6_ADDR_SIZE];
 	int32_t rule_index;
@@ -724,6 +825,10 @@ uint8_t *next_hop)
 	/* If rule is not found return 0. */
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_is_rule_present, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip,
+				uint8_t depth, uint32_t *next_hop),
+		rte_lpm6_is_rule_present_v1705);
 
 /*
  * Delete a rule from the rule table.
diff --git a/lib/librte_lpm/rte_lpm6.h b/lib/librte_lpm/rte_lpm6.h
index 13d027f..0ab54d4 100644
--- a/lib/librte_lpm/rte_lpm6.h
+++ b/lib/librte_lpm/rte_lpm6.h
@@ -39,6 +39,7 @@
  */
 
 #include <stdint.h>
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -123,7 +124,13 @@ rte_lpm6_free(struct rte_lpm6 *lpm);
  */
 int
 rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
+int
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop);
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
 
 /**
  * Check if a rule is present in the LPM table,
@@ -142,7 +149,13 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
  */
 int
 rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop);
+		uint32_t *next_hop);
+int
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop);
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop);
 
 /**
  * Delete a rule from the LPM table.
@@ -199,7 +212,11 @@ rte_lpm6_delete_all(struct rte_lpm6 *lpm);
  *   -EINVAL for incorrect arguments, -ENOENT on lookup miss, 0 on lookup hit
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop);
+int
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop);
 
 /**
  * Lookup multiple IP addresses in an LPM table.
@@ -220,7 +237,15 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
 int
 rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t * next_hops, unsigned n);
+int
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n);
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t * next_hops, unsigned n);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 239b371..90beac8 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -34,3 +34,13 @@ DPDK_16.04 {
 	rte_lpm_delete_all;
 
 } DPDK_2.0;
+
+DPDK_17.05 {
+	global:
+
+	rte_lpm6_add;
+	rte_lpm6_is_rule_present;
+	rte_lpm6_lookup;
+	rte_lpm6_lookup_bulk_func;
+
+} DPDK_16.04;
diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c
index 836f4cf..1e1a173 100644
--- a/lib/librte_table/rte_table_lpm_ipv6.c
+++ b/lib/librte_table/rte_table_lpm_ipv6.c
@@ -211,9 +211,8 @@ rte_table_lpm_ipv6_entry_add(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint32_t nht_pos, nht_pos0_valid;
+	uint32_t nht_pos, nht_pos0, nht_pos0_valid;
 	int status;
-	uint8_t nht_pos0;
 
 	/* Check input parameters */
 	if (lpm == NULL) {
@@ -256,7 +255,7 @@ rte_table_lpm_ipv6_entry_add(
 
 	/* Add rule to low level LPM table */
 	if (rte_lpm6_add(lpm->lpm, ip_prefix->ip, ip_prefix->depth,
-		(uint8_t) nht_pos) < 0) {
+		nht_pos) < 0) {
 		RTE_LOG(ERR, TABLE, "%s: LPM IPv6 rule add failed\n", __func__);
 		return -1;
 	}
@@ -280,7 +279,7 @@ rte_table_lpm_ipv6_entry_delete(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint8_t nht_pos;
+	uint32_t nht_pos;
 	int status;
 
 	/* Check input parameters */
@@ -356,7 +355,7 @@ rte_table_lpm_ipv6_lookup(
 			uint8_t *ip = RTE_MBUF_METADATA_UINT8_PTR(pkt,
 				lpm->offset);
 			int status;
-			uint8_t nht_pos;
+			uint32_t nht_pos;
 
 			status = rte_lpm6_lookup(lpm->lpm, ip, &nht_pos);
 			if (status == 0) {
-- 
2.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 1/2] doc: add removed items section to release notes
  2017-02-15 13:15  1% [dpdk-dev] [PATCH] kni: remove KNI vhost support Ferruh Yigit
@ 2017-02-20 14:30  5% ` Ferruh Yigit
  2017-02-20 14:30  1%   ` [dpdk-dev] [PATCH v2 2/2] kni: remove KNI vhost support Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-02-20 14:30 UTC (permalink / raw)
  To: Thomas Monjalon, John McNamara; +Cc: dev, Bruce Richardson, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 doc/guides/rel_notes/release_17_05.rst | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 48fb5bd..59929b0 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -125,6 +125,18 @@ ABI Changes
    =========================================================
 
 
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item in the past
+     tense.
+
+   This section is a comment. do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
 
 Shared Library Versions
 -----------------------
-- 
2.9.3

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v2 2/2] kni: remove KNI vhost support
  2017-02-20 14:30  5% ` [dpdk-dev] [PATCH v2 1/2] doc: add removed items section to release notes Ferruh Yigit
@ 2017-02-20 14:30  1%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-02-20 14:30 UTC (permalink / raw)
  To: Thomas Monjalon, John McNamara; +Cc: dev, Bruce Richardson, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                             |   3 -
 devtools/test-build.sh                         |   1 -
 doc/guides/prog_guide/index.rst                |   4 -
 doc/guides/prog_guide/kernel_nic_interface.rst | 113 ----
 doc/guides/rel_notes/deprecation.rst           |   6 -
 doc/guides/rel_notes/release_17_05.rst         |   2 +
 lib/librte_eal/linuxapp/kni/Makefile           |   1 -
 lib/librte_eal/linuxapp/kni/kni_dev.h          |  33 -
 lib/librte_eal/linuxapp/kni/kni_fifo.h         |  14 -
 lib/librte_eal/linuxapp/kni/kni_misc.c         |  22 -
 lib/librte_eal/linuxapp/kni/kni_net.c          |  13 -
 lib/librte_eal/linuxapp/kni/kni_vhost.c        | 842 -------------------------
 12 files changed, 2 insertions(+), 1052 deletions(-)
 delete mode 100644 lib/librte_eal/linuxapp/kni/kni_vhost.c

diff --git a/config/common_base b/config/common_base
index 71a4fcb..aeee13e 100644
--- a/config/common_base
+++ b/config/common_base
@@ -584,9 +584,6 @@ CONFIG_RTE_LIBRTE_KNI=n
 CONFIG_RTE_KNI_KMOD=n
 CONFIG_RTE_KNI_KMOD_ETHTOOL=n
 CONFIG_RTE_KNI_PREEMPT_DEFAULT=y
-CONFIG_RTE_KNI_VHOST=n
-CONFIG_RTE_KNI_VHOST_MAX_CACHE_SIZE=1024
-CONFIG_RTE_KNI_VHOST_VNET_HDR_EN=n
 
 #
 # Compile the pdump library
diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 0f131fc..84d3165 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -194,7 +194,6 @@ config () # <directory> <target> <options>
 		sed -ri        's,(PMD_OPENSSL=)n,\1y,' $1/.config
 		test "$DPDK_DEP_SSL" != y || \
 		sed -ri            's,(PMD_QAT=)n,\1y,' $1/.config
-		sed -ri        's,(KNI_VHOST.*=)n,\1y,' $1/.config
 		sed -ri           's,(SCHED_.*=)n,\1y,' $1/.config
 		build_config_hook $1 $2 $3
 
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 7f825cb..77f427e 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -127,10 +127,6 @@ Programmer's Guide
 
 :numref:`figure_pkt_flow_kni` :ref:`figure_pkt_flow_kni`
 
-:numref:`figure_vhost_net_arch2` :ref:`figure_vhost_net_arch2`
-
-:numref:`figure_kni_traffic_flow` :ref:`figure_kni_traffic_flow`
-
 
 :numref:`figure_pkt_proc_pipeline_qos` :ref:`figure_pkt_proc_pipeline_qos`
 
diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index 4f25595..6f7fd28 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -168,116 +168,3 @@ The application handlers can be registered upon interface creation or explicitly
 This provides flexibility in multiprocess scenarios
 (where the KNI is created in the primary process but the callbacks are handled in the secondary one).
 The constraint is that a single process can register and handle the requests.
-
-.. _kni_vhost_backend-label:
-
-KNI Working as a Kernel vHost Backend
--------------------------------------
-
-vHost is a kernel module usually working as the backend of virtio (a para- virtualization driver framework)
-to accelerate the traffic from the guest to the host.
-The DPDK Kernel NIC interface provides the ability to hookup vHost traffic into userspace DPDK application.
-Together with the DPDK PMD virtio, it significantly improves the throughput between guest and host.
-In the scenario where DPDK is running as fast path in the host, kni-vhost is an efficient path for the traffic.
-
-Overview
-~~~~~~~~
-
-vHost-net has three kinds of real backend implementations. They are: 1) tap, 2) macvtap and 3) RAW socket.
-The main idea behind kni-vhost is making the KNI work as a RAW socket, attaching it as the backend instance of vHost-net.
-It is using the existing interface with vHost-net, so it does not require any kernel hacking,
-and is fully-compatible with the kernel vhost module.
-As vHost is still taking responsibility for communicating with the front-end virtio,
-it naturally supports both legacy virtio -net and the DPDK PMD virtio.
-There is a little penalty that comes from the non-polling mode of vhost.
-However, it scales throughput well when using KNI in multi-thread mode.
-
-.. _figure_vhost_net_arch2:
-
-.. figure:: img/vhost_net_arch.*
-
-   vHost-net Architecture Overview
-
-
-Packet Flow
-~~~~~~~~~~~
-
-There is only a minor difference from the original KNI traffic flows.
-On transmit side, vhost kthread calls the RAW socket's ops sendmsg and it puts the packets into the KNI transmit FIFO.
-On the receive side, the kni kthread gets packets from the KNI receive FIFO, puts them into the queue of the raw socket,
-and wakes up the task in vhost kthread to begin receiving.
-All the packet copying, irrespective of whether it is on the transmit or receive side,
-happens in the context of vhost kthread.
-Every vhost-net device is exposed to a front end virtio device in the guest.
-
-.. _figure_kni_traffic_flow:
-
-.. figure:: img/kni_traffic_flow.*
-
-   KNI Traffic Flow
-
-
-Sample Usage
-~~~~~~~~~~~~
-
-Before starting to use KNI as the backend of vhost, the CONFIG_RTE_KNI_VHOST configuration option must be turned on.
-Otherwise, by default, KNI will not enable its backend support capability.
-
-Of course, as a prerequisite, the vhost/vhost-net kernel CONFIG should be chosen before compiling the kernel.
-
-#.  Compile the DPDK and insert uio_pci_generic/igb_uio kernel modules as normal.
-
-#.  Insert the KNI kernel module:
-
-    .. code-block:: console
-
-        insmod ./rte_kni.ko
-
-    If using KNI in multi-thread mode, use the following command line:
-
-    .. code-block:: console
-
-        insmod ./rte_kni.ko kthread_mode=multiple
-
-#.  Running the KNI sample application:
-
-    .. code-block:: console
-
-        examples/kni/build/app/kni -c -0xf0 -n 4 -- -p 0x3 -P --config="(0,4,6),(1,5,7)"
-
-    This command runs the kni sample application with two physical ports.
-    Each port pins two forwarding cores (ingress/egress) in user space.
-
-#.  Assign a raw socket to vhost-net during qemu-kvm startup.
-    The DPDK does not provide a script to do this since it is easy for the user to customize.
-    The following shows the key steps to launch qemu-kvm with kni-vhost:
-
-    .. code-block:: bash
-
-        #!/bin/bash
-        echo 1 > /sys/class/net/vEth0/sock_en
-        fd=`cat /sys/class/net/vEth0/sock_fd`
-        qemu-kvm \
-        -name vm1 -cpu host -m 2048 -smp 1 -hda /opt/vm-fc16.img \
-        -netdev tap,fd=$fd,id=hostnet1,vhost=on \
-        -device virti-net-pci,netdev=hostnet1,id=net1,bus=pci.0,addr=0x4
-
-It is simple to enable raw socket using sysfs sock_en and get raw socket fd using sock_fd under the KNI device node.
-
-Then, using the qemu-kvm command with the -netdev option to assign such raw socket fd as vhost's backend.
-
-.. note::
-
-    The key word tap must exist as qemu-kvm now only supports vhost with a tap backend, so here we cheat qemu-kvm by an existing fd.
-
-Compatibility Configure Option
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-There is a CONFIG_RTE_KNI_VHOST_VNET_HDR_EN configuration option in DPDK configuration file.
-By default, it set to n, which means do not turn on the virtio net header,
-which is used to support additional features (such as, csum offload, vlan offload, generic-segmentation and so on),
-since the kni-vhost does not yet support those features.
-
-Even if the option is turned on, kni-vhost will ignore the information that the header contains.
-When working with legacy virtio on the guest, it is better to turn off unsupported offload features using ethtool -K.
-Otherwise, there may be problems such as an incorrect L4 checksum error.
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9d4dfcc..66ca596 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -113,12 +113,6 @@ Deprecation Notices
   has different feature set, meaning functions like ``rte_vhost_feature_disable``
   need be changed. Last, file rte_virtio_net.h will be renamed to rte_vhost.h.
 
-* kni: Remove :ref:`kni_vhost_backend-label` feature (KNI_VHOST) in 17.05 release.
-  :doc:`Vhost Library </prog_guide/vhost_lib>` is currently preferred method for
-  guest - host communication. Just for clarification, this is not to remove KNI
-  or VHOST feature, but KNI_VHOST which is a KNI feature enabled via a compile
-  time option, and disabled by default.
-
 * ABI changes are planned for 17.05 in the ``rte_cryptodev_ops`` structure.
   A pointer to a rte_cryptodev_config structure will be added to the
   function prototype ``cryptodev_configure_t``, as a new parameter.
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 59929b0..e25ea9f 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -137,6 +137,8 @@ Removed Items
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* KNI vhost support removed.
+
 
 Shared Library Versions
 -----------------------
diff --git a/lib/librte_eal/linuxapp/kni/Makefile b/lib/librte_eal/linuxapp/kni/Makefile
index 3c22b63..7864a2a 100644
--- a/lib/librte_eal/linuxapp/kni/Makefile
+++ b/lib/librte_eal/linuxapp/kni/Makefile
@@ -61,7 +61,6 @@ DEPDIRS-y += lib/librte_eal/linuxapp/eal
 #
 SRCS-y := kni_misc.c
 SRCS-y += kni_net.c
-SRCS-$(CONFIG_RTE_KNI_VHOST) += kni_vhost.c
 SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += kni_ethtool.c
 
 SRCS-$(CONFIG_RTE_KNI_KMOD_ETHTOOL) += ethtool/ixgbe/ixgbe_main.c
diff --git a/lib/librte_eal/linuxapp/kni/kni_dev.h b/lib/librte_eal/linuxapp/kni/kni_dev.h
index 58cbadd..002e5fa 100644
--- a/lib/librte_eal/linuxapp/kni/kni_dev.h
+++ b/lib/librte_eal/linuxapp/kni/kni_dev.h
@@ -37,10 +37,6 @@
 #include <linux/spinlock.h>
 #include <linux/list.h>
 
-#ifdef RTE_KNI_VHOST
-#include <net/sock.h>
-#endif
-
 #include <exec-env/rte_kni_common.h>
 #define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
 
@@ -102,15 +98,6 @@ struct kni_dev {
 	/* synchro for request processing */
 	unsigned long synchro;
 
-#ifdef RTE_KNI_VHOST
-	struct kni_vhost_queue *vhost_queue;
-
-	volatile enum {
-		BE_STOP = 0x1,
-		BE_START = 0x2,
-		BE_FINISH = 0x4,
-	} vq_status;
-#endif
 	/* buffers */
 	void *pa[MBUF_BURST_SZ];
 	void *va[MBUF_BURST_SZ];
@@ -118,26 +105,6 @@ struct kni_dev {
 	void *alloc_va[MBUF_BURST_SZ];
 };
 
-#ifdef RTE_KNI_VHOST
-uint32_t
-kni_poll(struct file *file, struct socket *sock, poll_table * wait);
-int kni_chk_vhost_rx(struct kni_dev *kni);
-int kni_vhost_init(struct kni_dev *kni);
-int kni_vhost_backend_release(struct kni_dev *kni);
-
-struct kni_vhost_queue {
-	struct sock sk;
-	struct socket *sock;
-	int vnet_hdr_sz;
-	struct kni_dev *kni;
-	int sockfd;
-	uint32_t flags;
-	struct sk_buff *cache;
-	struct rte_kni_fifo *fifo;
-};
-
-#endif
-
 void kni_net_rx(struct kni_dev *kni);
 void kni_net_init(struct net_device *dev);
 void kni_net_config_lo_mode(char *lo_str);
diff --git a/lib/librte_eal/linuxapp/kni/kni_fifo.h b/lib/librte_eal/linuxapp/kni/kni_fifo.h
index 025ec1c..14f4141 100644
--- a/lib/librte_eal/linuxapp/kni/kni_fifo.h
+++ b/lib/librte_eal/linuxapp/kni/kni_fifo.h
@@ -91,18 +91,4 @@ kni_fifo_free_count(struct rte_kni_fifo *fifo)
 	return (fifo->read - fifo->write - 1) & (fifo->len - 1);
 }
 
-#ifdef RTE_KNI_VHOST
-/**
- * Initializes the kni fifo structure
- */
-static inline void
-kni_fifo_init(struct rte_kni_fifo *fifo, uint32_t size)
-{
-	fifo->write = 0;
-	fifo->read = 0;
-	fifo->len = size;
-	fifo->elem_size = sizeof(void *);
-}
-#endif
-
 #endif /* _KNI_FIFO_H_ */
diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c b/lib/librte_eal/linuxapp/kni/kni_misc.c
index 33b61f2..f1f6bea 100644
--- a/lib/librte_eal/linuxapp/kni/kni_misc.c
+++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
@@ -140,11 +140,7 @@ kni_thread_single(void *data)
 		down_read(&knet->kni_list_lock);
 		for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
 			list_for_each_entry(dev, &knet->kni_list_head, list) {
-#ifdef RTE_KNI_VHOST
-				kni_chk_vhost_rx(dev);
-#else
 				kni_net_rx(dev);
-#endif
 				kni_net_poll_resp(dev);
 			}
 		}
@@ -167,11 +163,7 @@ kni_thread_multiple(void *param)
 
 	while (!kthread_should_stop()) {
 		for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
-#ifdef RTE_KNI_VHOST
-			kni_chk_vhost_rx(dev);
-#else
 			kni_net_rx(dev);
-#endif
 			kni_net_poll_resp(dev);
 		}
 #ifdef RTE_KNI_PREEMPT_DEFAULT
@@ -248,9 +240,6 @@ kni_release(struct inode *inode, struct file *file)
 			dev->pthread = NULL;
 		}
 
-#ifdef RTE_KNI_VHOST
-		kni_vhost_backend_release(dev);
-#endif
 		kni_dev_remove(dev);
 		list_del(&dev->list);
 	}
@@ -397,10 +386,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	kni->sync_va = dev_info.sync_va;
 	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
 
-#ifdef RTE_KNI_VHOST
-	kni->vhost_queue = NULL;
-	kni->vq_status = BE_STOP;
-#endif
 	kni->mbuf_size = dev_info.mbuf_size;
 
 	pr_debug("tx_phys:      0x%016llx, tx_q addr:      0x%p\n",
@@ -490,10 +475,6 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 		return -ENODEV;
 	}
 
-#ifdef RTE_KNI_VHOST
-	kni_vhost_init(kni);
-#endif
-
 	ret = kni_run_thread(knet, kni, dev_info.force_bind);
 	if (ret != 0)
 		return ret;
@@ -537,9 +518,6 @@ kni_ioctl_release(struct net *net, uint32_t ioctl_num,
 			dev->pthread = NULL;
 		}
 
-#ifdef RTE_KNI_VHOST
-		kni_vhost_backend_release(dev);
-#endif
 		kni_dev_remove(dev);
 		list_del(&dev->list);
 		ret = 0;
diff --git a/lib/librte_eal/linuxapp/kni/kni_net.c b/lib/librte_eal/linuxapp/kni/kni_net.c
index 4ac99cf..db9f489 100644
--- a/lib/librte_eal/linuxapp/kni/kni_net.c
+++ b/lib/librte_eal/linuxapp/kni/kni_net.c
@@ -198,18 +198,6 @@ kni_net_config(struct net_device *dev, struct ifmap *map)
 /*
  * Transmit a packet (called by the kernel)
  */
-#ifdef RTE_KNI_VHOST
-static int
-kni_net_tx(struct sk_buff *skb, struct net_device *dev)
-{
-	struct kni_dev *kni = netdev_priv(dev);
-
-	dev_kfree_skb(skb);
-	kni->stats.tx_dropped++;
-
-	return NETDEV_TX_OK;
-}
-#else
 static int
 kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 {
@@ -289,7 +277,6 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 
 	return NETDEV_TX_OK;
 }
-#endif
 
 /*
  * RX: normal working mode
diff --git a/lib/librte_eal/linuxapp/kni/kni_vhost.c b/lib/librte_eal/linuxapp/kni/kni_vhost.c
deleted file mode 100644
index f54c34b..0000000
--- a/lib/librte_eal/linuxapp/kni/kni_vhost.c
+++ /dev/null
@@ -1,842 +0,0 @@
-/*-
- * GPL LICENSE SUMMARY
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *
- *   This program is free software; you can redistribute it and/or modify
- *   it under the terms of version 2 of the GNU General Public License as
- *   published by the Free Software Foundation.
- *
- *   This program is distributed in the hope that it will be useful, but
- *   WITHOUT ANY WARRANTY; without even the implied warranty of
- *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- *   General Public License for more details.
- *
- *   You should have received a copy of the GNU General Public License
- *   along with this program; if not, write to the Free Software
- *   Foundation, Inc., 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
- *   The full GNU General Public License is included in this distribution
- *   in the file called LICENSE.GPL.
- *
- *   Contact Information:
- *   Intel Corporation
- */
-
-#include <linux/module.h>
-#include <linux/net.h>
-#include <net/sock.h>
-#include <linux/virtio_net.h>
-#include <linux/wait.h>
-#include <linux/mm.h>
-#include <linux/nsproxy.h>
-#include <linux/sched.h>
-#include <linux/if_tun.h>
-#include <linux/version.h>
-#include <linux/file.h>
-
-#include "compat.h"
-#include "kni_dev.h"
-#include "kni_fifo.h"
-
-#define RX_BURST_SZ 4
-
-#ifdef HAVE_STATIC_SOCK_MAP_FD
-static int kni_sock_map_fd(struct socket *sock)
-{
-	struct file *file;
-	int fd = get_unused_fd_flags(0);
-
-	if (fd < 0)
-		return fd;
-
-	file = sock_alloc_file(sock, 0, NULL);
-	if (IS_ERR(file)) {
-		put_unused_fd(fd);
-		return PTR_ERR(file);
-	}
-	fd_install(fd, file);
-	return fd;
-}
-#endif
-
-static struct proto kni_raw_proto = {
-	.name = "kni_vhost",
-	.owner = THIS_MODULE,
-	.obj_size = sizeof(struct kni_vhost_queue),
-};
-
-static inline int
-kni_vhost_net_tx(struct kni_dev *kni, struct msghdr *m,
-		 uint32_t offset, uint32_t len)
-{
-	struct rte_kni_mbuf *pkt_kva = NULL;
-	struct rte_kni_mbuf *pkt_va = NULL;
-	int ret;
-
-	pr_debug("tx offset=%d, len=%d, iovlen=%d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   offset, len, (int)m->msg_iter.iov->iov_len);
-#else
-		   offset, len, (int)m->msg_iov->iov_len);
-#endif
-
-	/**
-	 * Check if it has at least one free entry in tx_q and
-	 * one entry in alloc_q.
-	 */
-	if (kni_fifo_free_count(kni->tx_q) == 0 ||
-	    kni_fifo_count(kni->alloc_q) == 0) {
-		/**
-		 * If no free entry in tx_q or no entry in alloc_q,
-		 * drops skb and goes out.
-		 */
-		goto drop;
-	}
-
-	/* dequeue a mbuf from alloc_q */
-	ret = kni_fifo_get(kni->alloc_q, (void **)&pkt_va, 1);
-	if (likely(ret == 1)) {
-		void *data_kva;
-
-		pkt_kva = (void *)pkt_va - kni->mbuf_va + kni->mbuf_kva;
-		data_kva = pkt_kva->buf_addr + pkt_kva->data_off
-			- kni->mbuf_va + kni->mbuf_kva;
-
-#ifdef HAVE_IOV_ITER_MSGHDR
-		copy_from_iter(data_kva, len, &m->msg_iter);
-#else
-		memcpy_fromiovecend(data_kva, m->msg_iov, offset, len);
-#endif
-
-		if (unlikely(len < ETH_ZLEN)) {
-			memset(data_kva + len, 0, ETH_ZLEN - len);
-			len = ETH_ZLEN;
-		}
-		pkt_kva->pkt_len = len;
-		pkt_kva->data_len = len;
-
-		/* enqueue mbuf into tx_q */
-		ret = kni_fifo_put(kni->tx_q, (void **)&pkt_va, 1);
-		if (unlikely(ret != 1)) {
-			/* Failing should not happen */
-			pr_err("Fail to enqueue mbuf into tx_q\n");
-			goto drop;
-		}
-	} else {
-		/* Failing should not happen */
-		pr_err("Fail to dequeue mbuf from alloc_q\n");
-		goto drop;
-	}
-
-	/* update statistics */
-	kni->stats.tx_bytes += len;
-	kni->stats.tx_packets++;
-
-	return 0;
-
-drop:
-	/* update statistics */
-	kni->stats.tx_dropped++;
-
-	return 0;
-}
-
-static inline int
-kni_vhost_net_rx(struct kni_dev *kni, struct msghdr *m,
-		 uint32_t offset, uint32_t len)
-{
-	uint32_t pkt_len;
-	struct rte_kni_mbuf *kva;
-	struct rte_kni_mbuf *va;
-	void *data_kva;
-	struct sk_buff *skb;
-	struct kni_vhost_queue *q = kni->vhost_queue;
-
-	if (unlikely(q == NULL))
-		return 0;
-
-	/* ensure at least one entry in free_q */
-	if (unlikely(kni_fifo_free_count(kni->free_q) == 0))
-		return 0;
-
-	skb = skb_dequeue(&q->sk.sk_receive_queue);
-	if (unlikely(skb == NULL))
-		return 0;
-
-	kva = (struct rte_kni_mbuf *)skb->data;
-
-	/* free skb to cache */
-	skb->data = NULL;
-	if (unlikely(kni_fifo_put(q->fifo, (void **)&skb, 1) != 1))
-		/* Failing should not happen */
-		pr_err("Fail to enqueue entries into rx cache fifo\n");
-
-	pkt_len = kva->data_len;
-	if (unlikely(pkt_len > len))
-		goto drop;
-
-	pr_debug("rx offset=%d, len=%d, pkt_len=%d, iovlen=%d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   offset, len, pkt_len, (int)m->msg_iter.iov->iov_len);
-#else
-		   offset, len, pkt_len, (int)m->msg_iov->iov_len);
-#endif
-
-	data_kva = kva->buf_addr + kva->data_off - kni->mbuf_va + kni->mbuf_kva;
-#ifdef HAVE_IOV_ITER_MSGHDR
-	if (unlikely(copy_to_iter(data_kva, pkt_len, &m->msg_iter)))
-#else
-	if (unlikely(memcpy_toiovecend(m->msg_iov, data_kva, offset, pkt_len)))
-#endif
-		goto drop;
-
-	/* Update statistics */
-	kni->stats.rx_bytes += pkt_len;
-	kni->stats.rx_packets++;
-
-	/* enqueue mbufs into free_q */
-	va = (void *)kva - kni->mbuf_kva + kni->mbuf_va;
-	if (unlikely(kni_fifo_put(kni->free_q, (void **)&va, 1) != 1))
-		/* Failing should not happen */
-		pr_err("Fail to enqueue entries into free_q\n");
-
-	pr_debug("receive done %d\n", pkt_len);
-
-	return pkt_len;
-
-drop:
-	/* Update drop statistics */
-	kni->stats.rx_dropped++;
-
-	return 0;
-}
-
-static uint32_t
-kni_sock_poll(struct file *file, struct socket *sock, poll_table *wait)
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-	uint32_t mask = 0;
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return POLLERR;
-
-	kni = q->kni;
-#ifdef HAVE_SOCKET_WQ
-	pr_debug("start kni_poll on group %d, wq 0x%16llx\n",
-		  kni->group_id, (uint64_t)sock->wq);
-	poll_wait(file, &sock->wq->wait, wait);
-#else
-	pr_debug("start kni_poll on group %d, wait at 0x%16llx\n",
-		  kni->group_id, (uint64_t)&sock->wait);
-	poll_wait(file, &sock->wait, wait);
-#endif
-
-	if (kni_fifo_count(kni->rx_q) > 0)
-		mask |= POLLIN | POLLRDNORM;
-
-	if (sock_writeable(&q->sk) ||
-#ifdef SOCKWQ_ASYNC_NOSPACE
-		(!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &q->sock->flags) &&
-			sock_writeable(&q->sk)))
-#else
-		(!test_and_set_bit(SOCK_ASYNC_NOSPACE, &q->sock->flags) &&
-			sock_writeable(&q->sk)))
-#endif
-		mask |= POLLOUT | POLLWRNORM;
-
-	return mask;
-}
-
-static inline void
-kni_vhost_enqueue(struct kni_dev *kni, struct kni_vhost_queue *q,
-		  struct sk_buff *skb, struct rte_kni_mbuf *va)
-{
-	struct rte_kni_mbuf *kva;
-
-	kva = (void *)(va) - kni->mbuf_va + kni->mbuf_kva;
-	(skb)->data = (unsigned char *)kva;
-	(skb)->len = kva->data_len;
-	skb_queue_tail(&q->sk.sk_receive_queue, skb);
-}
-
-static inline void
-kni_vhost_enqueue_burst(struct kni_dev *kni, struct kni_vhost_queue *q,
-	  struct sk_buff **skb, struct rte_kni_mbuf **va)
-{
-	int i;
-
-	for (i = 0; i < RX_BURST_SZ; skb++, va++, i++)
-		kni_vhost_enqueue(kni, q, *skb, *va);
-}
-
-int
-kni_chk_vhost_rx(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q = kni->vhost_queue;
-	uint32_t nb_in, nb_mbuf, nb_skb;
-	const uint32_t BURST_MASK = RX_BURST_SZ - 1;
-	uint32_t nb_burst, nb_backlog, i;
-	struct sk_buff *skb[RX_BURST_SZ];
-	struct rte_kni_mbuf *va[RX_BURST_SZ];
-
-	if (unlikely(BE_STOP & kni->vq_status)) {
-		kni->vq_status |= BE_FINISH;
-		return 0;
-	}
-
-	if (unlikely(q == NULL))
-		return 0;
-
-	nb_skb = kni_fifo_count(q->fifo);
-	nb_mbuf = kni_fifo_count(kni->rx_q);
-
-	nb_in = min(nb_mbuf, nb_skb);
-	nb_in = min_t(uint32_t, nb_in, RX_BURST_SZ);
-	nb_burst   = (nb_in & ~BURST_MASK);
-	nb_backlog = (nb_in & BURST_MASK);
-
-	/* enqueue skb_queue per BURST_SIZE bulk */
-	if (nb_burst != 0) {
-		if (unlikely(kni_fifo_get(kni->rx_q, (void **)&va, RX_BURST_SZ)
-				!= RX_BURST_SZ))
-			goto except;
-
-		if (unlikely(kni_fifo_get(q->fifo, (void **)&skb, RX_BURST_SZ)
-				!= RX_BURST_SZ))
-			goto except;
-
-		kni_vhost_enqueue_burst(kni, q, skb, va);
-	}
-
-	/* all leftover, do one by one */
-	for (i = 0; i < nb_backlog; ++i) {
-		if (unlikely(kni_fifo_get(kni->rx_q, (void **)&va, 1) != 1))
-			goto except;
-
-		if (unlikely(kni_fifo_get(q->fifo, (void **)&skb, 1) != 1))
-			goto except;
-
-		kni_vhost_enqueue(kni, q, *skb, *va);
-	}
-
-	/* Ondemand wake up */
-	if ((nb_in == RX_BURST_SZ) || (nb_skb == 0) ||
-	    ((nb_mbuf < RX_BURST_SZ) && (nb_mbuf != 0))) {
-		wake_up_interruptible_poll(sk_sleep(&q->sk),
-				   POLLIN | POLLRDNORM | POLLRDBAND);
-		pr_debug("RX CHK KICK nb_mbuf %d, nb_skb %d, nb_in %d\n",
-			   nb_mbuf, nb_skb, nb_in);
-	}
-
-	return 0;
-
-except:
-	/* Failing should not happen */
-	pr_err("Fail to enqueue fifo, it shouldn't happen\n");
-	BUG_ON(1);
-
-	return 0;
-}
-
-static int
-#ifdef HAVE_KIOCB_MSG_PARAM
-kni_sock_sndmsg(struct kiocb *iocb, struct socket *sock,
-	   struct msghdr *m, size_t total_len)
-#else
-kni_sock_sndmsg(struct socket *sock,
-	   struct msghdr *m, size_t total_len)
-#endif /* HAVE_KIOCB_MSG_PARAM */
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	int vnet_hdr_len = 0;
-	unsigned long len = total_len;
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return 0;
-
-	pr_debug("kni_sndmsg len %ld, flags 0x%08x, nb_iov %d\n",
-#ifdef HAVE_IOV_ITER_MSGHDR
-		   len, q->flags, (int)m->msg_iter.iov->iov_len);
-#else
-		   len, q->flags, (int)m->msg_iovlen);
-#endif
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	if (likely(q->flags & IFF_VNET_HDR)) {
-		vnet_hdr_len = q->vnet_hdr_sz;
-		if (unlikely(len < vnet_hdr_len))
-			return -EINVAL;
-		len -= vnet_hdr_len;
-	}
-#endif
-
-	if (unlikely(len < ETH_HLEN + q->vnet_hdr_sz))
-		return -EINVAL;
-
-	return kni_vhost_net_tx(q->kni, m, vnet_hdr_len, len);
-}
-
-static int
-#ifdef HAVE_KIOCB_MSG_PARAM
-kni_sock_rcvmsg(struct kiocb *iocb, struct socket *sock,
-	   struct msghdr *m, size_t len, int flags)
-#else
-kni_sock_rcvmsg(struct socket *sock,
-	   struct msghdr *m, size_t len, int flags)
-#endif /* HAVE_KIOCB_MSG_PARAM */
-{
-	int vnet_hdr_len = 0;
-	int pkt_len = 0;
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	static struct virtio_net_hdr
-		__attribute__ ((unused)) vnet_hdr = {
-		.flags = 0,
-		.gso_type = VIRTIO_NET_HDR_GSO_NONE
-	};
-
-	if (unlikely(q == NULL || q->kni == NULL))
-		return 0;
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	if (likely(q->flags & IFF_VNET_HDR)) {
-		vnet_hdr_len = q->vnet_hdr_sz;
-		len -= vnet_hdr_len;
-		if (len < 0)
-			return -EINVAL;
-	}
-#endif
-
-	pkt_len = kni_vhost_net_rx(q->kni, m, vnet_hdr_len, len);
-	if (unlikely(pkt_len == 0))
-		return 0;
-
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	/* no need to copy hdr when no pkt received */
-#ifdef HAVE_IOV_ITER_MSGHDR
-	if (unlikely(copy_to_iter((void *)&vnet_hdr, vnet_hdr_len,
-		&m->msg_iter)))
-#else
-	if (unlikely(memcpy_toiovecend(m->msg_iov,
-		(void *)&vnet_hdr, 0, vnet_hdr_len)))
-#endif /* HAVE_IOV_ITER_MSGHDR */
-		return -EFAULT;
-#endif /* RTE_KNI_VHOST_VNET_HDR_EN */
-	pr_debug("kni_rcvmsg expect_len %ld, flags 0x%08x, pkt_len %d\n",
-		   (unsigned long)len, q->flags, pkt_len);
-
-	return pkt_len + vnet_hdr_len;
-}
-
-/* dummy tap like ioctl */
-static int
-kni_sock_ioctl(struct socket *sock, uint32_t cmd, unsigned long arg)
-{
-	void __user *argp = (void __user *)arg;
-	struct ifreq __user *ifr = argp;
-	uint32_t __user *up = argp;
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-	uint32_t u;
-	int __user *sp = argp;
-	int s;
-	int ret;
-
-	pr_debug("tap ioctl cmd 0x%08x\n", cmd);
-
-	switch (cmd) {
-	case TUNSETIFF:
-		pr_debug("TUNSETIFF\n");
-		/* ignore the name, just look at flags */
-		if (get_user(u, &ifr->ifr_flags))
-			return -EFAULT;
-
-		ret = 0;
-		if ((u & ~IFF_VNET_HDR) != (IFF_NO_PI | IFF_TAP))
-			ret = -EINVAL;
-		else
-			q->flags = u;
-
-		return ret;
-
-	case TUNGETIFF:
-		pr_debug("TUNGETIFF\n");
-		rcu_read_lock_bh();
-		kni = rcu_dereference_bh(q->kni);
-		if (kni)
-			dev_hold(kni->net_dev);
-		rcu_read_unlock_bh();
-
-		if (!kni)
-			return -ENOLINK;
-
-		ret = 0;
-		if (copy_to_user(&ifr->ifr_name, kni->net_dev->name, IFNAMSIZ)
-				|| put_user(q->flags, &ifr->ifr_flags))
-			ret = -EFAULT;
-		dev_put(kni->net_dev);
-		return ret;
-
-	case TUNGETFEATURES:
-		pr_debug("TUNGETFEATURES\n");
-		u = IFF_TAP | IFF_NO_PI;
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-		u |= IFF_VNET_HDR;
-#endif
-		if (put_user(u, up))
-			return -EFAULT;
-		return 0;
-
-	case TUNSETSNDBUF:
-		pr_debug("TUNSETSNDBUF\n");
-		if (get_user(u, up))
-			return -EFAULT;
-
-		q->sk.sk_sndbuf = u;
-		return 0;
-
-	case TUNGETVNETHDRSZ:
-		s = q->vnet_hdr_sz;
-		if (put_user(s, sp))
-			return -EFAULT;
-		pr_debug("TUNGETVNETHDRSZ %d\n", s);
-		return 0;
-
-	case TUNSETVNETHDRSZ:
-		if (get_user(s, sp))
-			return -EFAULT;
-		if (s < (int)sizeof(struct virtio_net_hdr))
-			return -EINVAL;
-
-		pr_debug("TUNSETVNETHDRSZ %d\n", s);
-		q->vnet_hdr_sz = s;
-		return 0;
-
-	case TUNSETOFFLOAD:
-		pr_debug("TUNSETOFFLOAD %lx\n", arg);
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-		/* not support any offload yet */
-		if (!(q->flags & IFF_VNET_HDR))
-			return  -EINVAL;
-
-		return 0;
-#else
-		return -EINVAL;
-#endif
-
-	default:
-		pr_debug("NOT SUPPORT\n");
-		return -EINVAL;
-	}
-}
-
-static int
-kni_sock_compat_ioctl(struct socket *sock, uint32_t cmd,
-		     unsigned long arg)
-{
-	/* 32 bits app on 64 bits OS to be supported later */
-	pr_debug("Not implemented.\n");
-
-	return -EINVAL;
-}
-
-#define KNI_VHOST_WAIT_WQ_SAFE()                        \
-do {							\
-	while ((BE_FINISH | BE_STOP) == kni->vq_status) \
-		msleep(1);				\
-} while (0)						\
-
-
-static int
-kni_sock_release(struct socket *sock)
-{
-	struct kni_vhost_queue *q =
-		container_of(sock->sk, struct kni_vhost_queue, sk);
-	struct kni_dev *kni;
-
-	if (q == NULL)
-		return 0;
-
-	kni = q->kni;
-	if (kni != NULL) {
-		kni->vq_status = BE_STOP;
-		KNI_VHOST_WAIT_WQ_SAFE();
-		kni->vhost_queue = NULL;
-		q->kni = NULL;
-	}
-
-	if (q->sockfd != -1)
-		q->sockfd = -1;
-
-	sk_set_socket(&q->sk, NULL);
-	sock->sk = NULL;
-
-	sock_put(&q->sk);
-
-	pr_debug("dummy sock release done\n");
-
-	return 0;
-}
-
-int
-kni_sock_getname(struct socket *sock, struct sockaddr *addr,
-		int *sockaddr_len, int peer)
-{
-	pr_debug("dummy sock getname\n");
-	((struct sockaddr_ll *)addr)->sll_family = AF_PACKET;
-	return 0;
-}
-
-static const struct proto_ops kni_socket_ops = {
-	.getname = kni_sock_getname,
-	.sendmsg = kni_sock_sndmsg,
-	.recvmsg = kni_sock_rcvmsg,
-	.release = kni_sock_release,
-	.poll    = kni_sock_poll,
-	.ioctl   = kni_sock_ioctl,
-	.compat_ioctl = kni_sock_compat_ioctl,
-};
-
-static void
-kni_sk_write_space(struct sock *sk)
-{
-	wait_queue_head_t *wqueue;
-
-	if (!sock_writeable(sk) ||
-#ifdef SOCKWQ_ASYNC_NOSPACE
-	    !test_and_clear_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags))
-#else
-	    !test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags))
-#endif
-		return;
-	wqueue = sk_sleep(sk);
-	if (wqueue && waitqueue_active(wqueue))
-		wake_up_interruptible_poll(
-			wqueue, POLLOUT | POLLWRNORM | POLLWRBAND);
-}
-
-static void
-kni_sk_destruct(struct sock *sk)
-{
-	struct kni_vhost_queue *q =
-		container_of(sk, struct kni_vhost_queue, sk);
-
-	if (!q)
-		return;
-
-	/* make sure there's no packet in buffer */
-	while (skb_dequeue(&sk->sk_receive_queue) != NULL)
-		;
-
-	mb();
-
-	if (q->fifo != NULL) {
-		kfree(q->fifo);
-		q->fifo = NULL;
-	}
-
-	if (q->cache != NULL) {
-		kfree(q->cache);
-		q->cache = NULL;
-	}
-}
-
-static int
-kni_vhost_backend_init(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q;
-	struct net *net = current->nsproxy->net_ns;
-	int err, i, sockfd;
-	struct rte_kni_fifo *fifo;
-	struct sk_buff *elem;
-
-	if (kni->vhost_queue != NULL)
-		return -1;
-
-#ifdef HAVE_SK_ALLOC_KERN_PARAM
-	q = (struct kni_vhost_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-			&kni_raw_proto, 0);
-#else
-	q = (struct kni_vhost_queue *)sk_alloc(net, AF_UNSPEC, GFP_KERNEL,
-			&kni_raw_proto);
-#endif
-	if (!q)
-		return -ENOMEM;
-
-	err = sock_create_lite(AF_UNSPEC, SOCK_RAW, IPPROTO_RAW, &q->sock);
-	if (err)
-		goto free_sk;
-
-	sockfd = kni_sock_map_fd(q->sock);
-	if (sockfd < 0) {
-		err = sockfd;
-		goto free_sock;
-	}
-
-	/* cache init */
-	q->cache = kzalloc(
-		RTE_KNI_VHOST_MAX_CACHE_SIZE * sizeof(struct sk_buff),
-		GFP_KERNEL);
-	if (!q->cache)
-		goto free_fd;
-
-	fifo = kzalloc(RTE_KNI_VHOST_MAX_CACHE_SIZE * sizeof(void *)
-			+ sizeof(struct rte_kni_fifo), GFP_KERNEL);
-	if (!fifo)
-		goto free_cache;
-
-	kni_fifo_init(fifo, RTE_KNI_VHOST_MAX_CACHE_SIZE);
-
-	for (i = 0; i < RTE_KNI_VHOST_MAX_CACHE_SIZE; i++) {
-		elem = &q->cache[i];
-		kni_fifo_put(fifo, (void **)&elem, 1);
-	}
-	q->fifo = fifo;
-
-	/* store sockfd in vhost_queue */
-	q->sockfd = sockfd;
-
-	/* init socket */
-	q->sock->type = SOCK_RAW;
-	q->sock->state = SS_CONNECTED;
-	q->sock->ops = &kni_socket_ops;
-	sock_init_data(q->sock, &q->sk);
-
-	/* init sock data */
-	q->sk.sk_write_space = kni_sk_write_space;
-	q->sk.sk_destruct = kni_sk_destruct;
-	q->flags = IFF_NO_PI | IFF_TAP;
-	q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
-#ifdef RTE_KNI_VHOST_VNET_HDR_EN
-	q->flags |= IFF_VNET_HDR;
-#endif
-
-	/* bind kni_dev with vhost_queue */
-	q->kni = kni;
-	kni->vhost_queue = q;
-
-	wmb();
-
-	kni->vq_status = BE_START;
-
-#ifdef HAVE_SOCKET_WQ
-	pr_debug("backend init sockfd=%d, sock->wq=0x%16llx,sk->sk_wq=0x%16llx",
-		  q->sockfd, (uint64_t)q->sock->wq,
-		  (uint64_t)q->sk.sk_wq);
-#else
-	pr_debug("backend init sockfd=%d, sock->wait at 0x%16llx,sk->sk_sleep=0x%16llx",
-		  q->sockfd, (uint64_t)&q->sock->wait,
-		  (uint64_t)q->sk.sk_sleep);
-#endif
-
-	return 0;
-
-free_cache:
-	kfree(q->cache);
-	q->cache = NULL;
-
-free_fd:
-	put_unused_fd(sockfd);
-
-free_sock:
-	q->kni = NULL;
-	kni->vhost_queue = NULL;
-	kni->vq_status |= BE_FINISH;
-	sock_release(q->sock);
-	q->sock->ops = NULL;
-	q->sock = NULL;
-
-free_sk:
-	sk_free((struct sock *)q);
-
-	return err;
-}
-
-/* kni vhost sock sysfs */
-static ssize_t
-show_sock_fd(struct device *dev, struct device_attribute *attr,
-	     char *buf)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-	int sockfd = -1;
-
-	if (kni->vhost_queue != NULL)
-		sockfd = kni->vhost_queue->sockfd;
-	return snprintf(buf, 10, "%d\n", sockfd);
-}
-
-static ssize_t
-show_sock_en(struct device *dev, struct device_attribute *attr,
-	     char *buf)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-
-	return snprintf(buf, 10, "%u\n", (kni->vhost_queue == NULL ? 0 : 1));
-}
-
-static ssize_t
-set_sock_en(struct device *dev, struct device_attribute *attr,
-	      const char *buf, size_t count)
-{
-	struct net_device *net_dev = container_of(dev, struct net_device, dev);
-	struct kni_dev *kni = netdev_priv(net_dev);
-	unsigned long en;
-	int err = 0;
-
-	if (kstrtoul(buf, 0, &en) != 0)
-		return -EINVAL;
-
-	if (en)
-		err = kni_vhost_backend_init(kni);
-
-	return err ? err : count;
-}
-
-static DEVICE_ATTR(sock_fd, S_IRUGO | S_IRUSR, show_sock_fd, NULL);
-static DEVICE_ATTR(sock_en, S_IRUGO | S_IWUSR, show_sock_en, set_sock_en);
-static struct attribute *dev_attrs[] = {
-	&dev_attr_sock_fd.attr,
-	&dev_attr_sock_en.attr,
-	NULL,
-};
-
-static const struct attribute_group dev_attr_grp = {
-	.attrs = dev_attrs,
-};
-
-int
-kni_vhost_backend_release(struct kni_dev *kni)
-{
-	struct kni_vhost_queue *q = kni->vhost_queue;
-
-	if (q == NULL)
-		return 0;
-
-	/* dettach from kni */
-	q->kni = NULL;
-
-	pr_debug("release backend done\n");
-
-	return 0;
-}
-
-int
-kni_vhost_init(struct kni_dev *kni)
-{
-	struct net_device *dev = kni->net_dev;
-
-	if (sysfs_create_group(&dev->dev.kobj, &dev_attr_grp))
-		sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
-
-	kni->vq_status = BE_STOP;
-
-	pr_debug("kni_vhost_init done\n");
-
-	return 0;
-}
-- 
2.9.3

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH] maintainers: claim responsability for xen
  @ 2017-02-20 17:36  3%               ` Joao Martins
  0 siblings, 0 replies; 200+ results
From: Joao Martins @ 2017-02-20 17:36 UTC (permalink / raw)
  To: Jan Blunck, Konrad Rzeszutek Wilk
  Cc: Vincent JARDIN, Thomas Monjalon, Tan, Jianfeng,
	Konrad Rzeszutek Wilk, dev, Bruce Richardson, Yuanhan Liu,
	Xen-devel

On 02/20/2017 09:56 AM, Jan Blunck wrote:
> On Fri, Feb 17, 2017 at 5:07 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Thu, Feb 16, 2017 at 10:51:44PM +0100, Vincent JARDIN wrote:
>>> Le 16/02/2017 à 14:36, Konrad Rzeszutek Wilk a écrit :
>>>>> Is it time now to officially remove Dom0 support?
>>>> So we do have an prototype implementation of netback but it is waiting
>>>> for review of xen-devel to the spec.
>>>>
>>>> And I believe the implementation does utilize some of the dom0
>>>> parts of code in DPDK.
>>>
>>> Please, do you have URLs/pointers about it? It would be interesting to share
>>> it with DPDK community too.
>>
>> Joao, would it be possible to include an tarball of the patches? I know
>> they are no in the right state with the review of the staging
>> grants API - they are incompatible, but it may help folks to get
>> a feel for what DPDK APIs you used?
>>
>> Staging grants API:
>> https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg01878.html
> 
> The topic of the grants API is unrelated to the dom0 memory pool. The
> memory pool which uses xen_create_contiguous_region() is used in cases
> we know that there are no hugepages available.
Correct, I think what Konrad was trying to say was that xen-netback normally
lives in a PV domain which doesn't have superpages, therefore such driver would
need that memory pool part in order to work. The mentioned spec are additions to
xen netif ABI for backend to safely map a fixed set of grant references
(recycled overtime, provided by frontend) with the purpose of avoiding grant ops
- DPDK would be one of the users.

> Joao and I met in Dublin and I whined about not being able to call
> into the grants API from userspace and instead need to kick a kernel
> driver to do the work for every burst. It would be great if that could
> change in the future.
Hm, I recall about that discussion. AFAIK you can do both grant alloc/revoke of
pages through xengntshr_share_pages(...) and xengntshr_unshare(...) APIs
provided by libxengnttab[0] starting 4.7 or, libxc on older versions with
xc_gntshr_share_pages/xc_gntshr_munmap[2]. For the notification (or kicks) you
can allocate the event channel in the guest with libevtchn[1] starting 4.7, with
xenevtchn_bind_unbound_port(...) or libxc on older versions with
xc_evtchn_bind_unbound_port(...)[2]. And kick the guest with xenevtchn_notify or
xc_evtchn_notify(...) [latter on older versions]. In short these APIs are ioctls
to /dev/gntdev and /dev/evtchn. xenstore operations can also be done in
userspace with libxenstore[3].

To have the (similar) behavior of VRING_AVAIL_F_NO_INTERRUPT (i.e. avoiding the
kicks) you "just" don't set rsp_event in ring (e.g. no calls to
RING_FINAL_CHECK_FOR_RESPONSES), and keep checking for unconsumed Rx/Tx
responses. For guest request notification (to wake up the backend for new Tx/Rx
requests), you're dependent on whether backend requests it since it's the one
setting req_event index. If it indeed sets it then you gotta use the evtchn
notify that I depicted in the previous paragraph.

Hope that helps!

Joao

[0]
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libs/gnttab/include/xengnttab.h;hb=HEAD
[1]
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libs/evtchn/include/xenevtchn.h;hb=HEAD
[2]
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/libxc/include/xenctrl_compat.h;hb=HEAD
[3]
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/xenstore/include/xenstore.h;hb=HEAD

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements
  @ 2017-02-21  3:17  2% ` David Hunt
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-24 14:01  0%   ` [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
  0 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v4 changes:
   * fixed issue building shared libraries

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x

[01/17] lib: rename legacy distributor lib files
[02/17] lib: symbol versioning of functions in distributor
[03/17] lib: create rte_distributor_private.h
[04/17] lib: add new burst oriented distributor structs
[05/17] lib: add new distributor code
[06/17] lib: add SIMD flow matching to distributor
[07/17] lib: apply symbol versioning to distibutor lib
[08/17] test: change params to distributor autotest
[09/17] test: switch distributor test over to burst API
[10/17] test: test single and burst distributor API
[11/17] test: add perf test for distributor burst mode
[12/17] example: add extra stats to distributor sample
[13/17] sample: distributor: wait for ports to come up
[14/17] sample: switch to new distributor API
[15/17] lib: make v20 header file private
[16/17] doc: distributor library changes for new burst api
[17/17] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17  2% ` [dpdk-dev] [PATCH v7 0/17] distributor library " David Hunt
@ 2017-02-21  3:17  1%   ` David Hunt
  2017-02-21 10:27  0%     ` Hunt, David
                       ` (2 more replies)
  2017-02-24 14:01  0%   ` [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
  1 sibling, 3 replies; 200+ results
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c                  |   2 +-
 app/test/test_distributor_perf.c             |   2 +-
 examples/distributor/main.c                  |   2 +-
 lib/librte_distributor/Makefile              |   4 +-
 lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
 lib/librte_distributor/rte_distributor.h     | 247 --------------
 lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++
 8 files changed, 739 insertions(+), 739 deletions(-)
 delete mode 100644 lib/librte_distributor/rte_distributor.c
 delete mode 100644 lib/librte_distributor/rte_distributor.h
 create mode 100644 lib/librte_distributor/rte_distributor_v20.c
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 85cb8f3..ba402e2 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -39,7 +39,7 @@
 #include <rte_errno.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index 7947fe9..fe0c97d 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -39,7 +39,7 @@
 #include <rte_cycles.h>
 #include <rte_common.h>
 #include <rte_mbuf.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..fba5446 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -43,7 +43,7 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define RX_RING_SIZE 256
 #define TX_RING_SIZE 512
diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..60837ed 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,10 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
deleted file mode 100644
index f3f778c..0000000
--- a/lib/librte_distributor/rte_distributor.c
+++ /dev/null
@@ -1,487 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <stdio.h>
-#include <sys/queue.h>
-#include <string.h>
-#include <rte_mbuf.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_errno.h>
-#include <rte_string_fns.h>
-#include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
-
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
-
-TAILQ_HEAD(rte_distributor_list, rte_distributor);
-
-static struct rte_tailq_elem rte_distributor_tailq = {
-	.name = "RTE_DISTRIBUTOR",
-};
-EAL_REGISTER_TAILQ(rte_distributor_tailq)
-
-/**** APIs called by workers ****/
-
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
-			| RTE_DISTRIB_GET_BUF;
-	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
-		rte_pause();
-	buf->bufptr64 = req;
-}
-
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
-		return NULL;
-
-	/* since bufptr64 is signed, this should be an arithmetic shift */
-	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
-	return (struct rte_mbuf *)((uintptr_t)ret);
-}
-
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	struct rte_mbuf *ret;
-	rte_distributor_request_pkt(d, worker_id, oldpkt);
-	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
-		rte_pause();
-	return ret;
-}
-
-int
-rte_distributor_return_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
-			| RTE_DISTRIB_RETURN_BUF;
-	buf->bufptr64 = req;
-	return 0;
-}
-
-/**** APIs called on distributor core ***/
-
-/* as name suggests, adds a packet to the backlog for a particular worker */
-static int
-add_to_backlog(struct rte_distributor_backlog *bl, int64_t item)
-{
-	if (bl->count == RTE_DISTRIB_BACKLOG_SIZE)
-		return -1;
-
-	bl->pkts[(bl->start + bl->count++) & (RTE_DISTRIB_BACKLOG_MASK)]
-			= item;
-	return 0;
-}
-
-/* takes the next packet for a worker off the backlog */
-static int64_t
-backlog_pop(struct rte_distributor_backlog *bl)
-{
-	bl->count--;
-	return bl->pkts[bl->start++ & RTE_DISTRIB_BACKLOG_MASK];
-}
-
-/* stores a packet returned from a worker inside the returns array */
-static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor *d,
-		unsigned *ret_start, unsigned *ret_count)
-{
-	/* store returns in a circular buffer - code is branch-free */
-	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
-			= (void *)oldbuf;
-	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
-	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
-}
-
-static inline void
-handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
-{
-	d->in_flight_tags[wkr] = 0;
-	d->in_flight_bitmask &= ~(1UL << wkr);
-	d->bufs[wkr].bufptr64 = 0;
-	if (unlikely(d->backlog[wkr].count != 0)) {
-		/* On return of a packet, we need to move the
-		 * queued packets for this core elsewhere.
-		 * Easiest solution is to set things up for
-		 * a recursive call. That will cause those
-		 * packets to be queued up for the next free
-		 * core, i.e. it will return as soon as a
-		 * core becomes free to accept the first
-		 * packet, as subsequent ones will be added to
-		 * the backlog for that core.
-		 */
-		struct rte_mbuf *pkts[RTE_DISTRIB_BACKLOG_SIZE];
-		unsigned i;
-		struct rte_distributor_backlog *bl = &d->backlog[wkr];
-
-		for (i = 0; i < bl->count; i++) {
-			unsigned idx = (bl->start + i) &
-					RTE_DISTRIB_BACKLOG_MASK;
-			pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >>
-					RTE_DISTRIB_FLAG_BITS));
-		}
-		/* recursive call.
-		 * Note that the tags were set before first level call
-		 * to rte_distributor_process.
-		 */
-		rte_distributor_process(d, pkts, i);
-		bl->count = bl->start = 0;
-	}
-}
-
-/* this function is called when process() fn is called without any new
- * packets. It goes through all the workers and clears any returned packets
- * to do a partial flush.
- */
-static int
-process_returns(struct rte_distributor *d)
-{
-	unsigned wkr;
-	unsigned flushed = 0;
-	unsigned ret_start = d->returns.start,
-			ret_count = d->returns.count;
-
-	for (wkr = 0; wkr < d->num_workers; wkr++) {
-
-		const int64_t data = d->bufs[wkr].bufptr64;
-		uintptr_t oldbuf = 0;
-
-		if (data & RTE_DISTRIB_GET_BUF) {
-			flushed++;
-			if (d->backlog[wkr].count)
-				d->bufs[wkr].bufptr64 =
-						backlog_pop(&d->backlog[wkr]);
-			else {
-				d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
-				d->in_flight_tags[wkr] = 0;
-				d->in_flight_bitmask &= ~(1UL << wkr);
-			}
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		} else if (data & RTE_DISTRIB_RETURN_BUF) {
-			handle_worker_shutdown(d, wkr);
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		}
-
-		store_return(oldbuf, d, &ret_start, &ret_count);
-	}
-
-	d->returns.start = ret_start;
-	d->returns.count = ret_count;
-
-	return flushed;
-}
-
-/* process a set of packets to distribute them to workers */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs)
-{
-	unsigned next_idx = 0;
-	unsigned wkr = 0;
-	struct rte_mbuf *next_mb = NULL;
-	int64_t next_value = 0;
-	uint32_t new_tag = 0;
-	unsigned ret_start = d->returns.start,
-			ret_count = d->returns.count;
-
-	if (unlikely(num_mbufs == 0))
-		return process_returns(d);
-
-	while (next_idx < num_mbufs || next_mb != NULL) {
-
-		int64_t data = d->bufs[wkr].bufptr64;
-		uintptr_t oldbuf = 0;
-
-		if (!next_mb) {
-			next_mb = mbufs[next_idx++];
-			next_value = (((int64_t)(uintptr_t)next_mb)
-					<< RTE_DISTRIB_FLAG_BITS);
-			/*
-			 * User is advocated to set tag vaue for each
-			 * mbuf before calling rte_distributor_process.
-			 * User defined tags are used to identify flows,
-			 * or sessions.
-			 */
-			new_tag = next_mb->hash.usr;
-
-			/*
-			 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64
-			 * then the size of match has to be expanded.
-			 */
-			uint64_t match = 0;
-			unsigned i;
-			/*
-			 * to scan for a match use "xor" and "not" to get a 0/1
-			 * value, then use shifting to merge to single "match"
-			 * variable, where a one-bit indicates a match for the
-			 * worker given by the bit-position
-			 */
-			for (i = 0; i < d->num_workers; i++)
-				match |= (!(d->in_flight_tags[i] ^ new_tag)
-					<< i);
-
-			/* Only turned-on bits are considered as match */
-			match &= d->in_flight_bitmask;
-
-			if (match) {
-				next_mb = NULL;
-				unsigned worker = __builtin_ctzl(match);
-				if (add_to_backlog(&d->backlog[worker],
-						next_value) < 0)
-					next_idx--;
-			}
-		}
-
-		if ((data & RTE_DISTRIB_GET_BUF) &&
-				(d->backlog[wkr].count || next_mb)) {
-
-			if (d->backlog[wkr].count)
-				d->bufs[wkr].bufptr64 =
-						backlog_pop(&d->backlog[wkr]);
-
-			else {
-				d->bufs[wkr].bufptr64 = next_value;
-				d->in_flight_tags[wkr] = new_tag;
-				d->in_flight_bitmask |= (1UL << wkr);
-				next_mb = NULL;
-			}
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		} else if (data & RTE_DISTRIB_RETURN_BUF) {
-			handle_worker_shutdown(d, wkr);
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		}
-
-		/* store returns in a circular buffer */
-		store_return(oldbuf, d, &ret_start, &ret_count);
-
-		if (++wkr == d->num_workers)
-			wkr = 0;
-	}
-	/* to finish, check all workers for backlog and schedule work for them
-	 * if they are ready */
-	for (wkr = 0; wkr < d->num_workers; wkr++)
-		if (d->backlog[wkr].count &&
-				(d->bufs[wkr].bufptr64 & RTE_DISTRIB_GET_BUF)) {
-
-			int64_t oldbuf = d->bufs[wkr].bufptr64 >>
-					RTE_DISTRIB_FLAG_BITS;
-			store_return(oldbuf, d, &ret_start, &ret_count);
-
-			d->bufs[wkr].bufptr64 = backlog_pop(&d->backlog[wkr]);
-		}
-
-	d->returns.start = ret_start;
-	d->returns.count = ret_count;
-	return num_mbufs;
-}
-
-/* return to the caller, packets returned from workers */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs)
-{
-	struct rte_distributor_returned_pkts *returns = &d->returns;
-	unsigned retval = (max_mbufs < returns->count) ?
-			max_mbufs : returns->count;
-	unsigned i;
-
-	for (i = 0; i < retval; i++) {
-		unsigned idx = (returns->start + i) & RTE_DISTRIB_RETURNS_MASK;
-		mbufs[i] = returns->mbufs[idx];
-	}
-	returns->start += i;
-	returns->count -= i;
-
-	return retval;
-}
-
-/* return the number of packets in-flight in a distributor, i.e. packets
- * being workered on or queued up in a backlog. */
-static inline unsigned
-total_outstanding(const struct rte_distributor *d)
-{
-	unsigned wkr, total_outstanding;
-
-	total_outstanding = __builtin_popcountl(d->in_flight_bitmask);
-
-	for (wkr = 0; wkr < d->num_workers; wkr++)
-		total_outstanding += d->backlog[wkr].count;
-
-	return total_outstanding;
-}
-
-/* flush the distributor, so that there are no outstanding packets in flight or
- * queued up. */
-int
-rte_distributor_flush(struct rte_distributor *d)
-{
-	const unsigned flushed = total_outstanding(d);
-
-	while (total_outstanding(d) > 0)
-		rte_distributor_process(d, NULL, 0);
-
-	return flushed;
-}
-
-/* clears the internal returns array in the distributor */
-void
-rte_distributor_clear_returns(struct rte_distributor *d)
-{
-	d->returns.start = d->returns.count = 0;
-#ifndef __OPTIMIZE__
-	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
-#endif
-}
-
-/* creates a distributor instance */
-struct rte_distributor *
-rte_distributor_create(const char *name,
-		unsigned socket_id,
-		unsigned num_workers)
-{
-	struct rte_distributor *d;
-	struct rte_distributor_list *distributor_list;
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-
-	/* compilation-time checks */
-	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
-	RTE_BUILD_BUG_ON(RTE_DISTRIB_MAX_WORKERS >
-				sizeof(d->in_flight_bitmask) * CHAR_BIT);
-
-	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
-	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
-	if (mz == NULL) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-
-	d = mz->addr;
-	snprintf(d->name, sizeof(d->name), "%s", name);
-	d->num_workers = num_workers;
-
-	distributor_list = RTE_TAILQ_CAST(rte_distributor_tailq.head,
-					  rte_distributor_list);
-
-	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
-	TAILQ_INSERT_TAIL(distributor_list, d, next);
-	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
-
-	return d;
-}
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
deleted file mode 100644
index 7d36bc8..0000000
--- a/lib/librte_distributor/rte_distributor.h
+++ /dev/null
@@ -1,247 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _RTE_DISTRIBUTE_H_
-#define _RTE_DISTRIBUTE_H_
-
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
new file mode 100644
index 0000000..b890947
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -0,0 +1,487 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_v20.h"
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/* we will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits. */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS	64
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+struct rte_distributor_backlog {
+	unsigned start;
+	unsigned count;
+	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
+};
+
+struct rte_distributor_returned_pkts {
+	unsigned start;
+	unsigned count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned num_workers;                 /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+TAILQ_HEAD(rte_distributor_list, rte_distributor);
+
+static struct rte_tailq_elem rte_distributor_tailq = {
+	.name = "RTE_DISTRIBUTOR",
+};
+EAL_REGISTER_TAILQ(rte_distributor_tailq)
+
+/**** APIs called by workers ****/
+
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
+			| RTE_DISTRIB_GET_BUF;
+	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
+		rte_pause();
+	buf->bufptr64 = req;
+}
+
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned worker_id)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
+		return NULL;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
+	return (struct rte_mbuf *)((uintptr_t)ret);
+}
+
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	struct rte_mbuf *ret;
+	rte_distributor_request_pkt(d, worker_id, oldpkt);
+	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
+		rte_pause();
+	return ret;
+}
+
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
+			| RTE_DISTRIB_RETURN_BUF;
+	buf->bufptr64 = req;
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* as name suggests, adds a packet to the backlog for a particular worker */
+static int
+add_to_backlog(struct rte_distributor_backlog *bl, int64_t item)
+{
+	if (bl->count == RTE_DISTRIB_BACKLOG_SIZE)
+		return -1;
+
+	bl->pkts[(bl->start + bl->count++) & (RTE_DISTRIB_BACKLOG_MASK)]
+			= item;
+	return 0;
+}
+
+/* takes the next packet for a worker off the backlog */
+static int64_t
+backlog_pop(struct rte_distributor_backlog *bl)
+{
+	bl->count--;
+	return bl->pkts[bl->start++ & RTE_DISTRIB_BACKLOG_MASK];
+}
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
+		unsigned *ret_start, unsigned *ret_count)
+{
+	/* store returns in a circular buffer - code is branch-free */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
+}
+
+static inline void
+handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
+{
+	d->in_flight_tags[wkr] = 0;
+	d->in_flight_bitmask &= ~(1UL << wkr);
+	d->bufs[wkr].bufptr64 = 0;
+	if (unlikely(d->backlog[wkr].count != 0)) {
+		/* On return of a packet, we need to move the
+		 * queued packets for this core elsewhere.
+		 * Easiest solution is to set things up for
+		 * a recursive call. That will cause those
+		 * packets to be queued up for the next free
+		 * core, i.e. it will return as soon as a
+		 * core becomes free to accept the first
+		 * packet, as subsequent ones will be added to
+		 * the backlog for that core.
+		 */
+		struct rte_mbuf *pkts[RTE_DISTRIB_BACKLOG_SIZE];
+		unsigned i;
+		struct rte_distributor_backlog *bl = &d->backlog[wkr];
+
+		for (i = 0; i < bl->count; i++) {
+			unsigned idx = (bl->start + i) &
+					RTE_DISTRIB_BACKLOG_MASK;
+			pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >>
+					RTE_DISTRIB_FLAG_BITS));
+		}
+		/* recursive call.
+		 * Note that the tags were set before first level call
+		 * to rte_distributor_process.
+		 */
+		rte_distributor_process(d, pkts, i);
+		bl->count = bl->start = 0;
+	}
+}
+
+/* this function is called when process() fn is called without any new
+ * packets. It goes through all the workers and clears any returned packets
+ * to do a partial flush.
+ */
+static int
+process_returns(struct rte_distributor *d)
+{
+	unsigned wkr;
+	unsigned flushed = 0;
+	unsigned ret_start = d->returns.start,
+			ret_count = d->returns.count;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++) {
+
+		const int64_t data = d->bufs[wkr].bufptr64;
+		uintptr_t oldbuf = 0;
+
+		if (data & RTE_DISTRIB_GET_BUF) {
+			flushed++;
+			if (d->backlog[wkr].count)
+				d->bufs[wkr].bufptr64 =
+						backlog_pop(&d->backlog[wkr]);
+			else {
+				d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
+				d->in_flight_tags[wkr] = 0;
+				d->in_flight_bitmask &= ~(1UL << wkr);
+			}
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		} else if (data & RTE_DISTRIB_RETURN_BUF) {
+			handle_worker_shutdown(d, wkr);
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		}
+
+		store_return(oldbuf, d, &ret_start, &ret_count);
+	}
+
+	d->returns.start = ret_start;
+	d->returns.count = ret_count;
+
+	return flushed;
+}
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned num_mbufs)
+{
+	unsigned next_idx = 0;
+	unsigned wkr = 0;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint32_t new_tag = 0;
+	unsigned ret_start = d->returns.start,
+			ret_count = d->returns.count;
+
+	if (unlikely(num_mbufs == 0))
+		return process_returns(d);
+
+	while (next_idx < num_mbufs || next_mb != NULL) {
+
+		int64_t data = d->bufs[wkr].bufptr64;
+		uintptr_t oldbuf = 0;
+
+		if (!next_mb) {
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb)
+					<< RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			new_tag = next_mb->hash.usr;
+
+			/*
+			 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64
+			 * then the size of match has to be expanded.
+			 */
+			uint64_t match = 0;
+			unsigned i;
+			/*
+			 * to scan for a match use "xor" and "not" to get a 0/1
+			 * value, then use shifting to merge to single "match"
+			 * variable, where a one-bit indicates a match for the
+			 * worker given by the bit-position
+			 */
+			for (i = 0; i < d->num_workers; i++)
+				match |= (!(d->in_flight_tags[i] ^ new_tag)
+					<< i);
+
+			/* Only turned-on bits are considered as match */
+			match &= d->in_flight_bitmask;
+
+			if (match) {
+				next_mb = NULL;
+				unsigned worker = __builtin_ctzl(match);
+				if (add_to_backlog(&d->backlog[worker],
+						next_value) < 0)
+					next_idx--;
+			}
+		}
+
+		if ((data & RTE_DISTRIB_GET_BUF) &&
+				(d->backlog[wkr].count || next_mb)) {
+
+			if (d->backlog[wkr].count)
+				d->bufs[wkr].bufptr64 =
+						backlog_pop(&d->backlog[wkr]);
+
+			else {
+				d->bufs[wkr].bufptr64 = next_value;
+				d->in_flight_tags[wkr] = new_tag;
+				d->in_flight_bitmask |= (1UL << wkr);
+				next_mb = NULL;
+			}
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		} else if (data & RTE_DISTRIB_RETURN_BUF) {
+			handle_worker_shutdown(d, wkr);
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		}
+
+		/* store returns in a circular buffer */
+		store_return(oldbuf, d, &ret_start, &ret_count);
+
+		if (++wkr == d->num_workers)
+			wkr = 0;
+	}
+	/* to finish, check all workers for backlog and schedule work for them
+	 * if they are ready */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		if (d->backlog[wkr].count &&
+				(d->bufs[wkr].bufptr64 & RTE_DISTRIB_GET_BUF)) {
+
+			int64_t oldbuf = d->bufs[wkr].bufptr64 >>
+					RTE_DISTRIB_FLAG_BITS;
+			store_return(oldbuf, d, &ret_start, &ret_count);
+
+			d->bufs[wkr].bufptr64 = backlog_pop(&d->backlog[wkr]);
+		}
+
+	d->returns.start = ret_start;
+	d->returns.count = ret_count;
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned idx = (returns->start + i) & RTE_DISTRIB_RETURNS_MASK;
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/* return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog. */
+static inline unsigned
+total_outstanding(const struct rte_distributor *d)
+{
+	unsigned wkr, total_outstanding;
+
+	total_outstanding = __builtin_popcountl(d->in_flight_bitmask);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/* flush the distributor, so that there are no outstanding packets in flight or
+ * queued up. */
+int
+rte_distributor_flush(struct rte_distributor *d)
+{
+	const unsigned flushed = total_outstanding(d);
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process(d, NULL, 0);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns(struct rte_distributor *d)
+{
+	d->returns.start = d->returns.count = 0;
+#ifndef __OPTIMIZE__
+	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
+#endif
+}
+
+/* creates a distributor instance */
+struct rte_distributor *
+rte_distributor_create(const char *name,
+		unsigned socket_id,
+		unsigned num_workers)
+{
+	struct rte_distributor *d;
+	struct rte_distributor_list *distributor_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+	RTE_BUILD_BUG_ON(RTE_DISTRIB_MAX_WORKERS >
+				sizeof(d->in_flight_bitmask) * CHAR_BIT);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+	distributor_list = RTE_TAILQ_CAST(rte_distributor_tailq.head,
+					  rte_distributor_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(distributor_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..7d36bc8
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTE_H_
+#define _RTE_DISTRIBUTE_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned socket_id,
+		unsigned num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be procesed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH] maintainers: fix script paths
@ 2017-02-21 10:22 16% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-02-21 10:22 UTC (permalink / raw)
  To: dev

The directory scripts does not exist anymore.
The files have been moved but some paths were not updated
in the maintainers list.

Fixes: 9a98f50e890b ("scripts: move to devtools")

Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8305237..24e0eff 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -70,7 +70,7 @@ ABI versioning
 M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
-F: scripts/validate-abi.sh
+F: devtools/validate-abi.sh
 
 Driver information
 F: buildtools/pmdinfogen/
@@ -241,7 +241,7 @@ F: app/test/test_mbuf.c
 Ethernet API
 M: Thomas Monjalon <thomas.monjalon@6wind.com>
 F: lib/librte_ether/
-F: scripts/test-null.sh
+F: devtools/test-null.sh
 
 Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
-- 
2.7.0

^ permalink raw reply	[relevance 16%]

* Re: [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
@ 2017-02-21 10:27  0%     ` Hunt, David
  2017-02-24 14:03  0%     ` Bruce Richardson
  2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2 siblings, 0 replies; 200+ results
From: Hunt, David @ 2017-02-21 10:27 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson


On 21/2/2017 3:17 AM, David Hunt wrote:
> Move files out of the way so that we can replace with new
> versions of the distributor libtrary. Files are named in
> such a way as to match the symbol versioning that we will
> apply for backward ABI compatibility.
>
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>
---snip--

Apologies, this patch should have been sent with '--find-renames', thus 
reducing the
size of this patch significantly, and eliminating checkpatch 
warnings/errors.

Dave.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] ethdev: add hierarchical scheduler API
  2017-02-10 14:05  1% ` [dpdk-dev] [PATCH 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
@ 2017-02-21 10:35  0%   ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2017-02-21 10:35 UTC (permalink / raw)
  To: Cristian Dumitrescu, dev; +Cc: thomas.monjalon, jerin.jacob

On 2/10/2017 7:35 PM, Cristian Dumitrescu wrote:
> This patch introduces the generic ethdev API for the hierarchical scheduler
> capability.
>
> Main features:
> - Exposed as ethdev plugin capability (similar to rte_flow approach)
> - Capability query API per port and per hierarchy node
> - Scheduling algorithms: strict priority (SP), Weighed Fair Queuing (WFQ),
>   Weighted Round Robin (WRR)
> - Traffic shaping: single/dual rate, private (per node) and shared (by multiple
>   nodes) shapers
> - Congestion management for hierarchy leaf nodes: algorithms of tail drop,
>   head drop, WRED; private (per node) and shared (by multiple nodes) WRED
>   contexts
> - Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
>   TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)
>
> Changes since RFC [1]:
> - Implemented as ethdev plugin (similar to rte_flow) as opposed to more
>   monolithic additions to ethdev itself
> - Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
>   suggested items with only one exception, see the long list below, hopefully
>   nothing was forgotten.
>     - The item not done (hopefully for a good reason): driver-generated object
>       IDs. IMO the choice to have application-generated object IDs adds marginal
>       complexity to the driver (search ID function required), but it provides
>       huge simplification for the application. The app does not need to worry
>       about building & managing tree-like structure for storing driver-generated
>       object IDs, the app can use its own convention for node IDs depending on
>       the specific hierarchy that it needs. Trivial example: identify all
>       level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
>       on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
>       310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
>       112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
>       the other related simplification that was implemented: leaf nodes now have
>       predefined IDs that are the same with their Ethernet TX queue ID (
>       therefore no translation is required for leaf nodes).
> - Capability API. Done per port and per node as well.
> - Dual rate shapers
> - Added configuration of private shaper (per node) directly from the shaper
>   profile as part of node API (no shaper ID needed for private shapers), while
>   the shared shapers are configured outside of the node API using shaper profile
>   and communicated to the node using shared shaper ID. So there is no
>   configuration overhead for shared shapers if the app does not use any of them.
> - Leaf nodes now have predefined IDs that are the same with their Ethernet TX
>   queue ID (therefore no translation is required for leaf nodes). This is also
>   used to differentiate between a leaf node and a non-leaf node.
> - Domain-specific errors to give a precise indication of the error cause (same
>   as done by rte_flow)
> - Packet marking API
> - Packet length optional adjustment for shapers, positive (e.g. for adding
>   Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
>   based on IP packet bytes)
>
> Next steps:
> - SW fallback based on librte_sched library (to be later introduced by
>   standalone patch set)
>
> [1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
> [2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
> [3] Hemants’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
>
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> ---
>  MAINTAINERS                            |    4 +
>  lib/librte_ether/Makefile              |    5 +-
>  lib/librte_ether/rte_ether_version.map |   30 +
>  lib/librte_ether/rte_scheddev.c        |  790 ++++++++++++++++++++
>  lib/librte_ether/rte_scheddev.h        | 1273 ++++++++++++++++++++++++++++++++
>  lib/librte_ether/rte_scheddev_driver.h |  374 ++++++++++
>  6 files changed, 2475 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_ether/rte_scheddev.c
>  create mode 100644 lib/librte_ether/rte_scheddev.h
>  create mode 100644 lib/librte_ether/rte_scheddev_driver.h
>

...<snip>

> +
> +#ifndef __INCLUDE_RTE_SCHEDDEV_H__
> +#define __INCLUDE_RTE_SCHEDDEV_H__
> +
> +/**
> + * @file
> + * RTE Generic Hierarchical Scheduler API
> + *
> + * This interface provides the ability to configure the hierarchical scheduler
> + * feature in a generic way.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_red.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** Ethernet framing overhead
> +  *
> +  * Overhead fields per Ethernet frame:
> +  * 1. Preamble:                                            7 bytes;
> +  * 2. Start of Frame Delimiter (SFD):                      1 byte;
> +  * 3. Inter-Frame Gap (IFG):                              12 bytes.
> +  */
> +#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD                  20
> +
> +/**
> +  * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
> +  * is generated and added at the end of the Ethernet frame on TX side without
> +  * any SW intervention.
> +  */
> +#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS              24
> +
> +/**< Invalid WRED profile ID */
> +#define RTE_SCHEDDEV_WRED_PROFILE_ID_NONE                  UINT32_MAX
> +
> +/**< Invalid shaper profile ID */
> +#define RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE                UINT32_MAX
> +
> +/**< Scheduler hierarchy root node ID */
> +#define RTE_SCHEDDEV_ROOT_NODE_ID                          UINT32_MAX
> +
> +
> +/**
> +  * Scheduler node capabilities
> +  */
> +struct rte_scheddev_node_capabilities {
> +	/**< Private shaper support. */
> +	int shaper_private_supported;
> +
> +	/**< Dual rate shaping support for private shaper. Valid only when
> +	 * private shaper is supported.
> +	 */
> +	int shaper_private_dual_rate_supported;
> +
> +	/**< Minimum committed/peak rate (bytes per second) for private
> +	 * shaper. Valid only when private shaper is supported.
> +	 */
> +	uint64_t shaper_private_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for private
> +	 * shaper. Valid only when private shaper is supported.
> +	 */
> +	uint64_t shaper_private_rate_max;
> +
> +	/**< Maximum number of supported shared shapers. The value of zero
> +	 * indicates that shared shapers are not supported.
> +	 */
> +	uint32_t shaper_shared_n_max;
> +
> +	/**< Items valid only for non-leaf nodes. */
> +	struct {
> +		/**< Maximum number of children nodes. */
> +		uint32_t n_children_max;
> +
> +		/**< Lowest priority supported. The value of 1 indicates that
> +		 * only priority 0 is supported, which essentially means that
> +		 * Strict Priority (SP) algorithm is not supported.
> +		 */
> +		uint32_t sp_priority_min;
> +
This can be  simply sp_priority_level, with 0 indicating no support
1 indicates '0' and '1' priority.  or 7 indicates '0' to '7' i.e. total 
8 priorities.

> +		/**< Maximum number of sibling nodes that can have the same
> +		 * priority at any given time. When equal to *n_children_max*,
> +		 * it indicates that WFQ/WRR algorithms are not supported.
> +		 */
> +		uint32_t sp_n_children_max;
not clear to me.
OK, more than 1 children can have same priority, than you apply WRR/WFQ 
among them.

However, there can be different sets,  e.g prio '0' and '1' has only 1 
children. while prio '2' has 6 children, than you apply WRR/WFQ among them.

> +
> +		/**< WFQ algorithm support. */
> +		int scheduling_wfq_supported;
> +
> +		/**< WRR algorithm support. */
> +		int scheduling_wrr_supported;
> +
> +		/**< Maximum WFQ/WRR weight. */
> +		uint32_t scheduling_wfq_wrr_weight_max;
> +	} nonleaf;
> +
> +	/**< Items valid only for leaf nodes. */
> +	struct {
> +		/**< Head drop algorithm support. */
> +		int cman_head_drop_supported;
> +
> +		/**< Private WRED context support. */
> +		int cman_wred_context_private_supported;
> +

The context part is not clear to me.

> +		/**< Maximum number of shared WRED contexts supported. The value
> +		 * of zero indicates that shared WRED contexts are not
> +		 * supported.
> +		 */
> +		uint32_t cman_wred_context_shared_n_max;
> +	} leaf;

non-leaf nodes may have different capabilities.

your leaf node is like a QoS Queue, are you supporting shapper on leaf 
node as well?


I will still prefer if you separate QoS Queue from a standard Sched 
node, the capabilities are different and it will be cleaner at the cost 
of increased structure and number of APIs.

> +};
> +
> +/**
> +  * Scheduler capabilities
> +  */
> +struct rte_scheddev_capabilities {
> +	/**< Maximum number of nodes. */
> +	uint32_t n_nodes_max;
> +
> +	/**< Maximum number of levels (i.e. number of nodes connecting the root
> +	 * node with any leaf node, including the root and the leaf).
> +	 */
> +	uint32_t n_levels_max;
> +
> +	/**< Maximum number of shapers, either private or shared. In case the
> +	 * implementation does not share any resource between private and
> +	 * shared shapers, it is typically equal to the sum between
> +	 * *shaper_private_n_max* and *shaper_shared_n_max*.
> +	 */
> +	uint32_t shaper_n_max;
> +
> +	/**< Maximum number of private shapers. Indicates the maximum number of
> +	 * nodes that can concurrently have the private shaper enabled.
> +	 */
> +	uint32_t shaper_private_n_max;
> +
> +	/**< Maximum number of shared shapers. The value of zero indicates that
> +	  * shared shapers are not supported.
> +	  */
> +	uint32_t shaper_shared_n_max;
> +
> +	/**< Maximum number of nodes that can share the same shared shaper. Only
> +	  * valid when shared shapers are supported.
> +	  */
> +	uint32_t shaper_shared_n_nodes_max;
> +
> +	/**< Maximum number of shared shapers that can be configured with dual
> +	  * rate shaping. The value of zero indicates that dual rate shaping
> +	  * support is not available for shared shapers.
> +	  */
> +	uint32_t shaper_shared_dual_rate_n_max;
> +
> +	/**< Minimum committed/peak rate (bytes per second) for shared
> +	  * shapers. Only valid when shared shapers are supported.
> +	  */
> +	uint64_t shaper_shared_rate_min;
> +
> +	/**< Maximum committed/peak rate (bytes per second) for shared
> +	  * shaper. Only valid when shared shapers are supported.
> +	  */
> +	uint64_t shaper_shared_rate_max;
> +
> +	/**< Minimum value allowed for packet length adjustment for
> +	  * private/shared shapers.
> +	  */
> +	int shaper_pkt_length_adjust_min;
> +
> +	/**< Maximum value allowed for packet length adjustment for
> +	  * private/shared shapers.
> +	  */
> +	int shaper_pkt_length_adjust_max;
> +
> +	/**< Maximum number of WRED contexts. */
> +	uint32_t cman_wred_context_n_max;
> +
> +	/**< Maximum number of private WRED contexts. Indicates the maximum
> +	  * number of leaf nodes that can concurrently have the private WRED
> +	  * context enabled.
> +	  */
> +	uint32_t cman_wred_context_private_n_max;
> +
> +	/**< Maximum number of shared WRED contexts. The value of zero indicates
> +	  * that shared WRED contexts are not supported.
> +	  */
> +	uint32_t cman_wred_context_shared_n_max;
> +
> +	/**< Maximum number of leaf nodes that can share the same WRED context.
> +	  * Only valid when shared WRED contexts are supported.
> +	  */
> +	uint32_t cman_wred_context_shared_n_nodes_max;
> +
> +	/**< Support for VLAN DEI packet marking. */
> +	int mark_vlan_dei_supported;
> +
> +	/**< Support for IPv4/IPv6 ECN marking of TCP packets. */
> +	int mark_ip_ecn_tcp_supported;
> +
> +	/**< Support for IPv4/IPv6 ECN marking of SCTP packets. */
> +	int mark_ip_ecn_sctp_supported;
> +
> +	/**< Support for IPv4/IPv6 DSCP packet marking. */
> +	int mark_ip_dscp_supported;
> +
> +	/**< Summary of node-level capabilities across all nodes. */
> +	struct rte_scheddev_node_capabilities node;

This should be array of numbers of levels supported in the system. 
Non-leaf node at level 2 can have different capabilities than level 3 node.

> +};
> +
> +/**
> +  * Congestion management (CMAN) mode
> +  *
> +  * This is used for controlling the admission of packets into a packet queue or
> +  * group of packet queues on congestion. On request of writing a new packet
> +  * into the current queue while the queue is full, the *tail drop* algorithm
> +  * drops the new packet while leaving the queue unmodified, as opposed to *head
> +  * drop* algorithm, which drops the packet at the head of the queue (the oldest
> +  * packet waiting in the queue) and admits the new packet at the tail of the
> +  * queue.
> +  *
> +  * The *Random Early Detection (RED)* algorithm works by proactively dropping
> +  * more and more input packets as the queue occupancy builds up. When the queue
> +  * is full or almost full, RED effectively works as *tail drop*. The *Weighted
> +  * RED* algorithm uses a separate set of RED thresholds for each packet color.
> +  */
> +enum rte_scheddev_cman_mode {
> +	RTE_SCHEDDEV_CMAN_TAIL_DROP = 0, /**< Tail drop */
> +	RTE_SCHEDDEV_CMAN_HEAD_DROP, /**< Head drop */
> +	RTE_SCHEDDEV_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
> +};
> +
> +/**
> +  * Color
> +  */
> +enum rte_scheddev_color {
> +	e_RTE_SCHEDDEV_GREEN = 0, /**< Green */
> +	e_RTE_SCHEDDEV_YELLOW,    /**< Yellow */
> +	e_RTE_SCHEDDEV_RED,       /**< Red */
> +	e_RTE_SCHEDDEV_COLORS     /**< Number of colors */
> +};
> +
> +/**
> +  * WRED profile
> +  */
> +struct rte_scheddev_wred_params {
> +	/**< One set of RED parameters per packet color */
> +	struct rte_red_params red_params[e_RTE_SCHEDDEV_COLORS];
> +};
> +
> +/**
> +  * Token bucket
> +  */
> +struct rte_scheddev_token_bucket {
> +	/**< Token bucket rate (bytes per second) */
> +	uint64_t rate;
> +
> +	/**< Token bucket size (bytes), a.k.a. max burst size */
> +	uint64_t size;
> +};
> +
> +/**
> +  * Shaper (rate limiter) profile
> +  *
> +  * Multiple shaper instances can share the same shaper profile. Each node has
> +  * zero or one private shaper (only one node using it) and/or zero, one or
> +  * several shared shapers (multiple nodes use the same shaper instance).
> +  *
> +  * Single rate shapers use a single token bucket. A single rate shaper can be
> +  * configured by setting the rate of the committed bucket to zero, which
> +  * effectively disables this bucket. The peak bucket is used to limit the rate
> +  * and the burst size for the current shaper.
> +  *
> +  * Dual rate shapers use both the committed and the peak token buckets. The
> +  * rate of the committed bucket has to be less than or equal to the rate of the
> +  * peak bucket.
> +  */
> +struct rte_scheddev_shaper_params {
> +	/**< Committed token bucket */
> +	struct rte_scheddev_token_bucket committed;
> +
> +	/**< Peak token bucket */
> +	struct rte_scheddev_token_bucket peak;
> +
> +	/**< Signed value to be added to the length of each packet for the
> +	 * purpose of shaping. Can be used to correct the packet length with
> +	 * the framing overhead bytes that are also consumed on the wire (e.g.
> +	 * RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS).
> +	 */
> +	int32_t pkt_length_adjust;
> +};
> +
> +/**
> +  * Node parameters
> +  *
> +  * Each scheduler hierarchy node has multiple inputs (children nodes of the
> +  * current parent node) and a single output (which is input to its parent
> +  * node). The current node arbitrates its inputs using Strict Priority (SP),
> +  * Weighted Fair Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to
> +  * schedule input packets on its output while observing its shaping (rate
> +  * limiting) constraints.
> +  *
> +  * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
> +  * approximations of the ideal of WFQ and are assimilated to WFQ, although
> +  * an associated implementation-dependent trade-off on accuracy, performance
> +  * and resource usage might exist.
> +  *
> +  * Children nodes with different priorities are scheduled using the SP
> +  * algorithm, based on their priority, with zero (0) as the highest priority.
> +  * Children with same priority are scheduled using the WFQ or WRR algorithm,
> +  * based on their weight, which is relative to the sum of the weights of all
> +  * siblings with same priority, with one (1) as the lowest weight.
> +  *
> +  * Each leaf node sits on on top of a TX queue of the current Ethernet port.
> +  * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
> +  * where N is the number of TX queues configured for the current Ethernet port.
> +  * The non-leaf nodes have their IDs generated by the application.
> +  */


Ok, that means 0 to N-1 is reserved for leaf nodes. the application will 
choose any value for non-leaf nodes?
What will be the parent node id for the root node?

> +struct rte_scheddev_node_params {
> +	/**< Shaper profile for the private shaper. The absence of the private
> +	 * shaper for the current node is indicated by setting this parameter
> +	 * to RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE.
> +	 */
> +	uint32_t shaper_profile_id;
> +
> +	/**< User allocated array of valid shared shaper IDs. */
> +	uint32_t *shared_shaper_id;
> +
> +	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
> +	uint32_t n_shared_shapers;
> +
> +	union {
> +		/**< Parameters only valid for non-leaf nodes. */
> +		struct {
> +			/**< For each priority, indicates whether the children
> +			 * nodes sharing the same priority are to be scheduled
> +			 * by WFQ or by WRR. When NULL, it indicates that WFQ
> +			 * is to be used for all priorities. When non-NULL, it
> +			 * points to a pre-allocated array of *n_priority*
> +			 * elements, with a non-zero value element indicating
> +			 * WFQ and a zero value element for WRR.
> +			 */
> +			int *scheduling_mode_per_priority;

what is the structure of the pointer element? Just a bool array?

> +
> +			/**< Number of priorities. */
> +			uint32_t n_priorities;
> +		} nonleaf;
> +
> +		/**< Parameters only valid for leaf nodes. */
> +		struct {
> +			/**< Congestion management mode */
> +			enum rte_scheddev_cman_mode cman;
> +
> +			/**< WRED parameters (valid when *cman* is WRED). */
> +			struct {
> +				/**< WRED profile for private WRED context. */
> +				uint32_t wred_profile_id;
> +
> +				/**< User allocated array of shared WRED context
> +				 * IDs. The absence of a private WRED context
> +				 * for current leaf node is indicated by value
> +				 * RTE_SCHEDDEV_WRED_PROFILE_ID_NONE.
> +				 */
> +				uint32_t *shared_wred_context_id;
> +
> +				/**< Number of shared WRED context IDs in the
> +				 * *shared_wred_context_id* array.
> +				 */
> +				uint32_t n_shared_wred_contexts;
> +			} wred;
> +		} leaf;

need a bool is_leaf here to differentiate between leaf and non-leaf node.

> +	};
> +};
> +
> +/**
> +  * Node statistics counter type
> +  */
> +enum rte_scheddev_stats_counter {
> +	/**< Number of packets scheduled from current node. */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS = 1 << 0,
> +
> +	/**< Number of bytes scheduled from current node. */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES = 1 << 1,
> +
> +	/**< Number of packets dropped by current node.  */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS_DROPPED = 1 << 2,
> +
> +	/**< Number of bytes dropped by current node.  */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES_DROPPED = 1 << 3,
> +
> +	/**< Number of packets currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_PKTS_QUEUED = 1 << 4,
> +
> +	/**< Number of bytes currently waiting in the packet queue of current
> +	 * leaf node.
> +	 */
> +	RTE_SCHEDDEV_STATS_COUNTER_N_BYTES_QUEUED = 1 << 5,
> +};
> +
> +/**
> +  * Node statistics counters
> +  */
> +struct rte_scheddev_node_stats {
> +	/**< Number of packets scheduled from current node. */
> +	uint64_t n_pkts;
> +
> +	/**< Number of bytes scheduled from current node. */
> +	uint64_t n_bytes;
> +
> +	/**< Statistics counters for leaf nodes only. */
> +	struct {
> +		/**< Number of packets dropped by current leaf node. */
> +		uint64_t n_pkts_dropped;
> +
> +		/**< Number of bytes dropped by current leaf node. */
> +		uint64_t n_bytes_dropped;
> +
> +		/**< Number of packets currently waiting in the packet queue of
> +		 * current leaf node.
> +		 */
> +		uint64_t n_pkts_queued;
> +
> +		/**< Number of bytes currently waiting in the packet queue of
> +		 * current leaf node.
> +		 */
> +		uint64_t n_bytes_queued;
> +	} leaf;
> +};
> +
> +/**
> + * Verbose error types.
> + *
> + * Most of them provide the type of the object referenced by struct
> + * rte_scheddev_error::cause.
> + */
> +enum rte_scheddev_error_type {
> +	RTE_SCHEDDEV_ERROR_TYPE_NONE, /**< No error. */
> +	RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE,
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_GREEN,
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_YELLOW,
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_RED,
> +	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE,
> +	RTE_SCHEDDEV_ERROR_TYPE_SHARED_SHAPER_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_PARENT_NODE_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_PRIORITY,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_WEIGHT,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_CMAN,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_WRED_PROFILE_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_LEAF_SHARED_WRED_CONTEXT_ID,
> +	RTE_SCHEDDEV_ERROR_TYPE_NODE_ID,
> +};
> +
> +/**
> + * Verbose error structure definition.
> + *
> + * This object is normally allocated by applications and set by PMDs, the
> + * message points to a constant string which does not need to be freed by
> + * the application, however its pointer can be considered valid only as long
> + * as its associated DPDK port remains configured. Closing the underlying
> + * device or unloading the PMD invalidates it.
> + *
> + * Both cause and message may be NULL regardless of the error type.
> + */
> +struct rte_scheddev_error {
> +	enum rte_scheddev_error_type type; /**< Cause field and error type. */
> +	const void *cause; /**< Object responsible for the error. */
> +	const char *message; /**< Human-readable error message. */
> +};
> +
> +/**
> + * Scheduler capabilities get
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param cap
> + *   Scheduler capabilities. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_capabilities_get(uint8_t port_id,
> +	struct rte_scheddev_capabilities *cap,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node capabilities get
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param cap
> + *   Scheduler node capabilities. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_capabilities_get(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_node_capabilities *cap,
> +	struct rte_scheddev_error *error);
> +

Node capabilities is already part of scheddev_capabilities?

What are you expecting different here. Unless you support different 
capability for each level, this may not be useful.

> +/**
> + * Scheduler WRED profile add
> + *
> + * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
> + * is used to create one or several WRED contexts.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param wred_profile_id
> + *   WRED profile ID for the new profile. Needs to be unused.
> + * @param profile
> + *   WRED profile parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_wred_profile_add(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_wred_params *profile,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler WRED profile delete
> + *
> + * Delete an existing WRED profile. This operation fails when there is currently
> + * at least one user (i.e. WRED context) of this WRED profile.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param wred_profile_id
> + *   WRED profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_wred_profile_delete(uint8_t port_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shared WRED context add or update
> + *
> + * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
> + * created by using the WRED profile identified by *wred_profile_id*.
> + *
> + * When *shared_wred_context_id* is valid, this WRED context is no longer using
> + * the profile previously assigned to it and is updated to use the profile
> + * identified by *wred_profile_id*.
> + *
> + * A valid shared WRED context can be assigned to several scheduler hierarchy
> + * leaf nodes configured to use WRED as the congestion management mode.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID
> + * @param wred_profile_id
> + *   WRED profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shared WRED context delete
> + *
> + * Delete an existing shared WRED context. This operation fails when there is
> + * currently at least one user (i.e. scheduler hierarchy leaf node) of this
> + * shared WRED context.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shared_wred_context_delete(uint8_t port_id,
> +	uint32_t shared_wred_context_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shaper profile add
> + *
> + * Create a new shaper profile with ID set to *shaper_profile_id*. The new
> + * shaper profile is used to create one or several shapers.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shaper_profile_id
> + *   Shaper profile ID for the new profile. Needs to be unused.
> + * @param profile
> + *   Shaper profile parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shaper_profile_add(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_shaper_params *profile,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shaper profile delete
> + *
> + * Delete an existing shaper profile. This operation fails when there is
> + * currently at least one user (i.e. shaper) of this shaper profile.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shaper_profile_id
> + *   Shaper profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shaper_profile_delete(uint8_t port_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shared shaper add or update
> + *
> + * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
> + * with this ID is created using the shaper profile identified by
> + * *shaper_profile_id*.
> + *
> + * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is no
> + * longer using the shaper profile previously assigned to it and is updated to
> + * use the shaper profile identified by *shaper_profile_id*.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_shaper_id
> + *   Shared shaper ID
> + * @param shaper_profile_id
> + *   Shaper profile ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shared_shaper_add_update(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler shared shaper delete
> + *
> + * Delete an existing shared shaper. This operation fails when there is
> + * currently at least one user (i.e. scheduler hierarchy node) of this shared
> + * shaper.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param shared_shaper_id
> + *   Shared shaper ID. Needs to be the valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_shared_shaper_delete(uint8_t port_id,
> +	uint32_t shared_shaper_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node add
> + *
> + * When *node_id* is not a valid node ID, a new node with this ID is created and
> + * connected as child to the existing node identified by *parent_node_id*.
> + *
> + * When *node_id* is a valid node ID, this node is disconnected from its current
> + * parent and connected as child to another existing node identified by
> + * *parent_node_id *.
> + *
> + * This function can be called during port initialization phase (before the
> + * Ethernet port is started) for building the scheduler start-up hierarchy.
> + * Subject to the specific Ethernet port supporting on-the-fly scheduler
> + * hierarchy updates, this function can also be called during run-time (after
> + * the Ethernet port is started).

This should  a capability, whether dynamic_hierarchy_updates are 
supported or not.

> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID
> + * @param parent_node_id
> + *   Parent node ID. Needs to be the valid.

What will be the parent node id for the root node?  how the root node is 
created on the ethernet port?

> + * @param priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param params
> + *   Node parameters. Needs to be pre-allocated and valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_add(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_scheddev_node_params *params,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node delete
> + *
> + * Delete an existing node. This operation fails when this node currently has at
> + * least one user (i.e. child node).
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_delete(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node suspend
> + *
> + * Suspend an existing node.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_suspend(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node resume
> + *
> + * Resume an existing node that was previously suspended.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_resume(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler hierarchy set
> + *
> + * This function is called during the port initialization phase (before the
> + * Ethernet port is started) to freeze the scheduler start-up hierarchy.
> + *
> + * This function fails when the currently configured scheduler hierarchy is not
> + * supported by the Ethernet port, in which case the user can abort or try out
> + * another hierarchy configuration (e.g. a hierarchy with less leaf nodes),
> + * which can be build from scratch (when *clear_on_fail* is enabled) or by
> + * modifying the existing hierarchy configuration (when *clear_on_fail* is
> + * disabled).
> + *
> + * Note that, even when the configured scheduler hierarchy is supported (so this
> + * function is successful), the Ethernet port start might still fail due to e.g.
> + * not enough memory being available in the system, etc.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param clear_on_fail
> + *   On function call failure, hierarchy is cleared when this parameter is
> + *   non-zero and preserved when this parameter is equal to zero.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_hierarchy_set(uint8_t port_id,
> +	int clear_on_fail,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node parent update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param parent_node_id
> + *   Node ID for the new parent. Needs to be valid.
> + * @param priority
> + *   Node priority. The highest node priority is zero. Used by the SP algorithm
> + *   running on the parent of the current node for scheduling this child node.
> + * @param weight
> + *   Node weight. The node weight is relative to the weight sum of all siblings
> + *   that have the same priority. The lowest weight is zero. Used by the WFQ/WRR
> + *   algorithm running on the parent of the current node for scheduling this
> + *   child node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_parent_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_scheddev_error *error);
> +

The usages are not clear. How it is different from node_add API.
is the intention to update a specific node or change the connection of a 
specific node to a existing or new parent.


> +/**
> + * Scheduler node private shaper update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param shaper_profile_id
> + *   Shaper profile ID for the private shaper of the current node. Needs to be
> + *   either valid shaper profile ID or RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE, with
> + *   the latter disabling the private shaper of the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node shared shapers update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param shared_shaper_id
> + *   Shared shaper ID. Needs to be valid.
> + * @param add
> + *   Set to non-zero value to add this shared shaper to current node or to zero
> + *   to delete this shared shaper from current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_shared_shaper_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int add,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node scheduling mode update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param scheduling_mode_per_priority
> + *   For each priority, indicates whether the children nodes sharing the same
> + *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates that
> + *   WFQ is to be used for all priorities. When non-NULL, it points to a
> + *   pre-allocated array of *n_priority* elements, with a non-zero value element
> + *   indicating WFQ and a zero value element for WRR.
> + * @param n_priorities
> + *   Number of priorities.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
> +	uint32_t node_id,
> +	int *scheduling_mode_per_priority,
> +	uint32_t n_priorities,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node congestion management mode update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param cman
> + *   Congestion management mode.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_cman_update(uint8_t port_id,
> +	uint32_t node_id,
> +	enum rte_scheddev_cman_mode cman,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node private WRED context update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param wred_profile_id
> + *   WRED profile ID for the private WRED context of the current node. Needs to
> + *   be either valid WRED profile ID or RTE_SCHEDDEV_WRED_PROFILE_ID_NONE, with
> + *   the latter disabling the private WRED context of the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node shared WRED context update
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid leaf node ID.
> + * @param shared_wred_context_id
> + *   Shared WRED context ID. Needs to be valid.
> + * @param add
> + *   Set to non-zero value to add this shared WRED context to current node or to
> + *   zero to delete this shared WRED context from current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler packet marking - VLAN DEI (IEEE 802.1Q)
> + *
> + * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
> + * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
> + * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
> + * Format Indicator (CFI).
> + *
> + * All VLAN frames of a given color get their DEI bit set if marking is enabled
> + * for this color; otherwise, their DEI bit is left as is (either set or not).
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_mark_vlan_dei(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
> + *
> + * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
> + * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
> + * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion Notification
> + * (ECN) field (2 bits). The DSCP field is typically used to encode the traffic
> + * class and/or drop priority (RFC 2597), while the ECN field is used by RFC
> + * 3168 to implement a congestion notification mechanism to be leveraged by
> + * transport layer protocols such as TCP and SCTP that have congestion control
> + * mechanisms.
> + *
> + * When congestion is experienced, as alternative to dropping the packet,
> + * routers can change the ECN field of input packets from 2'b01 or 2'b10 (values
> + * indicating that source endpoint is ECN-capable) to 2'b11 (meaning that
> + * congestion is experienced). The destination endpoint can use the ECN-Echo
> + * (ECE) TCP flag to relay the congestion indication back to the source
> + * endpoint, which acknowledges it back to the destination endpoint with the
> + * Congestion Window Reduced (CWR) TCP flag.
> + *
> + * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
> + * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
> + * enabled for the current color, otherwise the ECN field is left as is.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_mark_ip_ecn(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
> + *
> + * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
> + * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
> + * values proposed by this RFC:
> + *
> + *                       Class 1    Class 2    Class 3    Class 4
> + *                     +----------+----------+----------+----------+
> + *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
> + *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
> + *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
> + *                     +----------+----------+----------+----------+
> + *
> + * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2, as
> + * well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
> + *
> + * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
> + * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
> + * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
> + * for each color; when not enabled for a given color, the DSCP field of all
> + * packets with that color is left as is.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param mark_green
> + *   Set to non-zero value to enable marking of green packets and to zero to
> + *   disable it.
> + * @param mark_yellow
> + *   Set to non-zero value to enable marking of yellow packets and to zero to
> + *   disable it.
> + * @param mark_red
> + *   Set to non-zero value to enable marking of red packets and to zero to
> + *   disable it.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_mark_ip_dscp(uint8_t port_id,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler get statistics counter types enabled for all nodes
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param nonleaf_node_capability_stats_mask
> + *   Statistics counter types available per node for all non-leaf nodes. Needs
> + *   to be pre-allocated.
> + * @param nonleaf_node_enabled_stats_mask
> + *   Statistics counter types currently enabled per node for each non-leaf node.
> + *   This is a subset of *nonleaf_node_capability_stats_mask*. Needs to be
> + *   pre-allocated.
> + * @param leaf_node_capability_stats_mask
> + *   Statistics counter types available per node for all leaf nodes. Needs to
> + *   be pre-allocated.
> + * @param leaf_node_enabled_stats_mask
> + *   Statistics counter types currently enabled for each leaf node. This is
> + *   a subset of *leaf_node_capability_stats_mask*. Needs to be pre-allocated.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_stats_get_enabled(uint8_t port_id,
> +	uint64_t *nonleaf_node_capability_stats_mask,
> +	uint64_t *nonleaf_node_enabled_stats_mask,
> +	uint64_t *leaf_node_capability_stats_mask,
> +	uint64_t *leaf_node_enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler enable selected statistics counters for all nodes
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param nonleaf_node_enabled_stats_mask
> + *   Statistics counter types to be enabled per node for each non-leaf node.
> + *   This needs to be a subset of the statistics counter types available per
> + *   node for all non-leaf nodes. Any statistics counter type not included in
> + *   this set is to be disabled for all non-leaf nodes.
> + * @param leaf_node_enabled_stats_mask
> + *   Statistics counter types to be enabled per node for each leaf node. This
> + *   needs to be a subset of the statistics counter types available per node for
> + *   all leaf nodes. Any statistics counter type not included in this set is to
> + *   be disabled for all leaf nodes.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_stats_enable(uint8_t port_id,
> +	uint64_t nonleaf_node_enabled_stats_mask,
> +	uint64_t leaf_node_enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler get statistics counter types enabled for current node
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param capability_stats_mask
> + *   Statistics counter types available for the current node. Needs to be
> + *   pre-allocated.
> + * @param enabled_stats_mask
> + *   Statistics counter types currently enabled for the current node. This is
> + *   a subset of *capability_stats_mask*. Needs to be pre-allocated.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_stats_get_enabled(uint8_t port_id,
> +	uint32_t node_id,
> +	uint64_t *capability_stats_mask,
> +	uint64_t *enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler enable selected statistics counters for current node
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param enabled_stats_mask
> + *   Statistics counter types to be enabled for the current node. This needs to
> + *   be a subset of the statistics counter types available for the current node.
> + *   Any statistics counter type not included in this set is to be disabled for
> + *   the current node.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_stats_enable(uint8_t port_id,
> +	uint32_t node_id,
> +	uint64_t enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +
> +/**
> + * Scheduler node statistics counters read
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param node_id
> + *   Node ID. Needs to be valid.
> + * @param stats
> + *   When non-NULL, it contains the current value for the statistics counters
> + *   enabled for the current node.
> + * @param clear
> + *   When this parameter has a non-zero value, the statistics counters are
> + *   cleared (i.e. set to zero) immediately after they have been read, otherwise
> + *   the statistics counters are left untouched.
> + * @param error
> + *   Error details. Filled in only on error, when not NULL.
> + * @return
> + *   0 on success, non-zero error code otherwise.
> + */
> +int rte_scheddev_node_stats_read(uint8_t port_id,
> +	uint32_t node_id,
> +	struct rte_scheddev_node_stats *stats,
> +	int clear,
> +	struct rte_scheddev_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* __INCLUDE_RTE_SCHEDDEV_H__ */
> diff --git a/lib/librte_ether/rte_scheddev_driver.h b/lib/librte_ether/rte_scheddev_driver.h
> new file mode 100644
> index 0000000..c0a0321
> --- /dev/null
> +++ b/lib/librte_ether/rte_scheddev_driver.h
> @@ -0,0 +1,374 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
> +#define __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
> +
> +/**
> + * @file
> + * RTE Generic Hierarchical Scheduler API (Driver Side)
> + *
> + * This file provides implementation helpers for internal use by PMDs, they
> + * are not intended to be exposed to applications and are not subject to ABI
> + * versioning.
> + */
> +
> +#include <stdint.h>
> +
> +#include <rte_errno.h>
> +#include "rte_ethdev.h"
> +#include "rte_scheddev.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +typedef int (*rte_scheddev_capabilities_get_t)(struct rte_eth_dev *dev,
> +	struct rte_scheddev_capabilities *cap,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler capabilities get */
> +
> +typedef int (*rte_scheddev_node_capabilities_get_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_node_capabilities *cap,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node capabilities get */
> +
> +typedef int (*rte_scheddev_wred_profile_add_t)(struct rte_eth_dev *dev,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_wred_params *profile,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler WRED profile add */
> +
> +typedef int (*rte_scheddev_wred_profile_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler WRED profile delete */
> +
> +typedef int (*rte_scheddev_shared_wred_context_add_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t shared_wred_context_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shared WRED context add */
> +
> +typedef int (*rte_scheddev_shared_wred_context_delete_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t shared_wred_context_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shared WRED context delete */
> +
> +typedef int (*rte_scheddev_shaper_profile_add_t)(struct rte_eth_dev *dev,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_shaper_params *profile,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shaper profile add */
> +
> +typedef int (*rte_scheddev_shaper_profile_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shaper profile delete */
> +
> +typedef int (*rte_scheddev_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
> +	uint32_t shared_shaper_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shared shaper add/update */
> +
> +typedef int (*rte_scheddev_shared_shaper_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t shared_shaper_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler shared shaper delete */
> +
> +typedef int (*rte_scheddev_node_add_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_scheddev_node_params *params,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node add */
> +
> +typedef int (*rte_scheddev_node_delete_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node delete */
> +
> +typedef int (*rte_scheddev_node_suspend_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node suspend */
> +
> +typedef int (*rte_scheddev_node_resume_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node resume */
> +
> +typedef int (*rte_scheddev_hierarchy_set_t)(struct rte_eth_dev *dev,
> +	int clear_on_fail,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler hierarchy set */
> +
> +typedef int (*rte_scheddev_node_parent_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t parent_node_id,
> +	uint32_t priority,
> +	uint32_t weight,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node parent update */
> +
> +typedef int (*rte_scheddev_node_shaper_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shaper_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node shaper update */
> +
> +typedef int (*rte_scheddev_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shared_shaper_id,
> +	int32_t add,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node shaper update */
> +
> +typedef int (*rte_scheddev_node_scheduling_mode_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	int *scheduling_mode_per_priority,
> +	uint32_t n_priorities,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node scheduling mode update */
> +
> +typedef int (*rte_scheddev_node_cman_update_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	enum rte_scheddev_cman_mode cman,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node congestion management mode update */
> +
> +typedef int (*rte_scheddev_node_wred_context_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t wred_profile_id,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node WRED context update */
> +
> +typedef int (*rte_scheddev_node_shared_wred_context_update_t)(
> +	struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint32_t shared_wred_context_id,
> +	int add,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler node WRED context update */
> +
> +typedef int (*rte_scheddev_mark_vlan_dei_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler packet marking - VLAN DEI */
> +
> +typedef int (*rte_scheddev_mark_ip_ecn_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler packet marking - IPv4/IPv6 ECN */
> +
> +typedef int (*rte_scheddev_mark_ip_dscp_t)(struct rte_eth_dev *dev,
> +	int mark_green,
> +	int mark_yellow,
> +	int mark_red,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler packet marking - IPv4/IPv6 DSCP */
> +
> +typedef int (*rte_scheddev_stats_get_enabled_t)(struct rte_eth_dev *dev,
> +	uint64_t *nonleaf_node_capability_stats_mask,
> +	uint64_t *nonleaf_node_enabled_stats_mask,
> +	uint64_t *leaf_node_capability_stats_mask,
> +	uint64_t *leaf_node_enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler get set of stats counters enabled for all nodes */
> +
> +typedef int (*rte_scheddev_stats_enable_t)(struct rte_eth_dev *dev,
> +	uint64_t nonleaf_node_enabled_stats_mask,
> +	uint64_t leaf_node_enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler enable selected stats counters for all nodes */
> +
> +typedef int (*rte_scheddev_node_stats_get_enabled_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint64_t *capability_stats_mask,
> +	uint64_t *enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler get set of stats counters enabled for specific node */
> +
> +typedef int (*rte_scheddev_node_stats_enable_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	uint64_t enabled_stats_mask,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler enable selected stats counters for specific node */
> +
> +typedef int (*rte_scheddev_node_stats_read_t)(struct rte_eth_dev *dev,
> +	uint32_t node_id,
> +	struct rte_scheddev_node_stats *stats,
> +	int clear,
> +	struct rte_scheddev_error *error);
> +/**< @internal Scheduler read stats counters for specific node */
> +
> +struct rte_scheddev_ops {
> +	/** Scheduler capabilities_get */
> +	rte_scheddev_capabilities_get_t capabilities_get;
> +	/** Scheduler node capabilities get */
> +	rte_scheddev_node_capabilities_get_t node_capabilities_get;
> +
> +	/** Scheduler WRED profile add */
> +	rte_scheddev_wred_profile_add_t wred_profile_add;
> +	/** Scheduler WRED profile delete */
> +	rte_scheddev_wred_profile_delete_t wred_profile_delete;
> +	/** Scheduler shared WRED context add/update */
> +	rte_scheddev_shared_wred_context_add_update_t
> +		shared_wred_context_add_update;
> +	/** Scheduler shared WRED context delete */
> +	rte_scheddev_shared_wred_context_delete_t
> +		shared_wred_context_delete;
> +	/** Scheduler shaper profile add */
> +	rte_scheddev_shaper_profile_add_t shaper_profile_add;
> +	/** Scheduler shaper profile delete */
> +	rte_scheddev_shaper_profile_delete_t shaper_profile_delete;
> +	/** Scheduler shared shaper add/update */
> +	rte_scheddev_shared_shaper_add_update_t shared_shaper_add_update;
> +	/** Scheduler shared shaper delete */
> +	rte_scheddev_shared_shaper_delete_t shared_shaper_delete;
> +
> +	/** Scheduler node add */
> +	rte_scheddev_node_add_t node_add;
> +	/** Scheduler node delete */
> +	rte_scheddev_node_delete_t node_delete;
> +	/** Scheduler node suspend */
> +	rte_scheddev_node_suspend_t node_suspend;
> +	/** Scheduler node resume */
> +	rte_scheddev_node_resume_t node_resume;
> +	/** Scheduler hierarchy set */
> +	rte_scheddev_hierarchy_set_t hierarchy_set;
> +
> +	/** Scheduler node parent update */
> +	rte_scheddev_node_parent_update_t node_parent_update;
> +	/** Scheduler node shaper update */
> +	rte_scheddev_node_shaper_update_t node_shaper_update;
> +	/** Scheduler node shared shaper update */
> +	rte_scheddev_node_shared_shaper_update_t node_shared_shaper_update;
> +	/** Scheduler node scheduling mode update */
> +	rte_scheddev_node_scheduling_mode_update_t node_scheduling_mode_update;
> +	/** Scheduler node congestion management mode update */
> +	rte_scheddev_node_cman_update_t node_cman_update;
> +	/** Scheduler node WRED context update */
> +	rte_scheddev_node_wred_context_update_t node_wred_context_update;
> +	/** Scheduler node shared WRED context update */
> +	rte_scheddev_node_shared_wred_context_update_t
> +		node_shared_wred_context_update;
> +
> +	/** Scheduler packet marking - VLAN DEI */
> +	rte_scheddev_mark_vlan_dei_t mark_vlan_dei;
> +	/** Scheduler packet marking - IPv4/IPv6 ECN */
> +	rte_scheddev_mark_ip_ecn_t mark_ip_ecn;
> +	/** Scheduler packet marking - IPv4/IPv6 DSCP */
> +	rte_scheddev_mark_ip_dscp_t mark_ip_dscp;
> +
> +	/** Scheduler get statistics counter type enabled for all nodes */
> +	rte_scheddev_stats_get_enabled_t stats_get_enabled;
> +	/** Scheduler enable selected statistics counters for all nodes */
> +	rte_scheddev_stats_enable_t stats_enable;
> +	/** Scheduler get statistics counter type enabled for current node */
> +	rte_scheddev_node_stats_get_enabled_t node_stats_get_enabled;
> +	/** Scheduler enable selected statistics counters for current node */
> +	rte_scheddev_node_stats_enable_t node_stats_enable;
> +	/** Scheduler read statistics counters for current node */
> +	rte_scheddev_node_stats_read_t node_stats_read;
> +};
> +
> +/**
> + * Initialize generic error structure.
> + *
> + * This function also sets rte_errno to a given value.
> + *
> + * @param error
> + *   Pointer to error structure (may be NULL).
> + * @param code
> + *   Related error code (rte_errno).
> + * @param type
> + *   Cause field and error type.
> + * @param cause
> + *   Object responsible for the error.
> + * @param message
> + *   Human-readable error message.
> + *
> + * @return
> + *   Error code.
> + */
> +static inline int
> +rte_scheddev_error_set(struct rte_scheddev_error *error,
> +		   int code,
> +		   enum rte_scheddev_error_type type,
> +		   const void *cause,
> +		   const char *message)
> +{
> +	if (error) {
> +		*error = (struct rte_scheddev_error){
> +			.type = type,
> +			.cause = cause,
> +			.message = message,
> +		};
> +	}
> +	rte_errno = code;
> +	return code;
> +}
> +
> +/**
> + * Get generic hierarchical scheduler operations structure from a port
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param error
> + *   Error details
> + *
> + * @return
> + *   The hierarchical scheduler operations structure associated with port_id on
> + *   success, NULL otherwise.
> + */
> +const struct rte_scheddev_ops *
> +rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* __INCLUDE_RTE_SCHEDDEV_DRIVER_H__ */
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/8] eal: use different constructor priorities for initcalls
  @ 2017-02-21 12:30  3%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-02-21 12:30 UTC (permalink / raw)
  To: Jan Blunck, dev; +Cc: david.marchand, shreyansh.jain

On 2/20/2017 2:17 PM, Jan Blunck wrote:
> This introduces different initcall macros to allow for late registration of
> the virtual device bus.
> 
> Signed-off-by: Jan Blunck <jblunck@infradead.org>
> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

<...>

>  
> -#define RTE_INIT(func) \
> -static void __attribute__((constructor, used)) func(void)
> +#define RTE_EAL_INIT(func) \
> +static void __attribute__((constructor(101), used)) func(void)
> +
> +#define RTE_POST_EAL_INIT(func) \
> +static void __attribute__((constructor(102), used)) func(void)
> +
> +#define RTE_DEV_INIT(func) \
> +static void __attribute__((constructor(103), used)) func(void)
> +
> +#define RTE_INIT(func) RTE_DEV_INIT(func)

Does it make sense to give some gaps among priorities,
101, 102, 103 --> 100, 200 , 300

When new priorities added (not sure if that ever will happen), is
changing previous priorities cause a ABI breakage?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2] lpm: extend IPv6 next hop field
  2017-02-19 17:14  4% [dpdk-dev] [PATCH] lpm: extend IPv6 next hop field Vladyslav Buslov
@ 2017-02-21 14:46  4% ` Vladyslav Buslov
  0 siblings, 0 replies; 200+ results
From: Vladyslav Buslov @ 2017-02-21 14:46 UTC (permalink / raw)
  To: bruce.richardson; +Cc: dev

This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
---
 app/test/test_lpm6.c                            | 115 ++++++++++++++------
 app/test/test_lpm6_perf.c                       |   4 +-
 doc/guides/prog_guide/lpm6_lib.rst              |   2 +-
 doc/guides/rel_notes/release_17_05.rst          |   5 +
 examples/ip_fragmentation/main.c                |  17 +--
 examples/ip_reassembly/main.c                   |  17 +--
 examples/ipsec-secgw/ipsec-secgw.c              |   2 +-
 examples/l3fwd/l3fwd_lpm_sse.h                  |  24 ++---
 examples/performance-thread/l3fwd-thread/main.c |  11 +-
 lib/librte_lpm/rte_lpm6.c                       | 134 +++++++++++++++++++++---
 lib/librte_lpm/rte_lpm6.h                       |  32 +++++-
 lib/librte_lpm/rte_lpm_version.map              |  10 ++
 lib/librte_table/rte_table_lpm_ipv6.c           |   9 +-
 13 files changed, 292 insertions(+), 90 deletions(-)

diff --git a/app/test/test_lpm6.c b/app/test/test_lpm6.c
index 61134f7..e0e7bf0 100644
--- a/app/test/test_lpm6.c
+++ b/app/test/test_lpm6.c
@@ -79,6 +79,7 @@ static int32_t test24(void);
 static int32_t test25(void);
 static int32_t test26(void);
 static int32_t test27(void);
+static int32_t test28(void);
 
 rte_lpm6_test tests6[] = {
 /* Test Cases */
@@ -110,6 +111,7 @@ rte_lpm6_test tests6[] = {
 	test25,
 	test26,
 	test27,
+	test28,
 };
 
 #define NUM_LPM6_TESTS                (sizeof(tests6)/sizeof(tests6[0]))
@@ -354,7 +356,7 @@ test6(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -392,7 +394,7 @@ test7(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[10][16];
-	int16_t next_hop_return[10];
+	int32_t next_hop_return[10];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -469,7 +471,8 @@ test9(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 16, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 	uint8_t i;
 
@@ -513,7 +516,8 @@ test10(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -557,7 +561,8 @@ test11(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -617,7 +622,8 @@ test12(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -655,7 +661,8 @@ test13(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = 2;
@@ -702,7 +709,8 @@ test14(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 25, next_hop_add = 100;
+	uint8_t depth = 25;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -748,7 +756,8 @@ test15(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 24, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 24;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -784,7 +793,8 @@ test16(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {12,12,1,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 128, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 128;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -828,7 +838,8 @@ test17(void)
 	uint8_t ip1[] = {127,255,255,255,255,255,255,255,255,
 			255,255,255,255,255,255,255};
 	uint8_t ip2[] = {128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -857,7 +868,7 @@ test17(void)
 
 	/* Loop with rte_lpm6_delete. */
 	for (depth = 16; depth >= 1; depth--) {
-		next_hop_add = (uint8_t) (depth - 1);
+		next_hop_add = (depth - 1);
 
 		status = rte_lpm6_delete(lpm, ip2, depth);
 		TEST_LPM_ASSERT(status == 0);
@@ -893,8 +904,9 @@ test18(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16], ip_1[16], ip_2[16];
-	uint8_t depth, depth_1, depth_2, next_hop_add, next_hop_add_1,
-		next_hop_add_2, next_hop_return;
+	uint8_t depth, depth_1, depth_2;
+	uint32_t next_hop_add, next_hop_add_1,
+			next_hop_add_2, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1055,7 +1067,8 @@ test19(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1253,7 +1266,8 @@ test20(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1320,8 +1334,9 @@ test21(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[4][16];
-	uint8_t depth, next_hop_add;
-	int16_t next_hop_return[4];
+	uint8_t depth;
+	uint32_t next_hop_add;
+	int32_t next_hop_return[4];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1378,8 +1393,9 @@ test22(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[5][16];
-	uint8_t depth[5], next_hop_add;
-	int16_t next_hop_return[5];
+	uint8_t depth[5];
+	uint32_t next_hop_add;
+	int32_t next_hop_return[5];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1495,7 +1511,8 @@ test23(void)
 	struct rte_lpm6_config config;
 	uint32_t i;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1579,7 +1596,8 @@ test25(void)
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
 	uint32_t i;
-	uint8_t depth, next_hop_add, next_hop_return, next_hop_expected;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return, next_hop_expected;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1632,10 +1650,10 @@ test26(void)
 	uint8_t d_ip_10_32 = 32;
 	uint8_t	d_ip_10_24 = 24;
 	uint8_t	d_ip_20_25 = 25;
-	uint8_t next_hop_ip_10_32 = 100;
-	uint8_t	next_hop_ip_10_24 = 105;
-	uint8_t	next_hop_ip_20_25 = 111;
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_ip_10_32 = 100;
+	uint32_t next_hop_ip_10_24 = 105;
+	uint32_t next_hop_ip_20_25 = 111;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1650,7 +1668,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_32, &next_hop_return);
-	uint8_t test_hop_10_32 = next_hop_return;
+	uint32_t test_hop_10_32 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_32);
 
@@ -1659,7 +1677,7 @@ test26(void)
 			return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_24, &next_hop_return);
-	uint8_t test_hop_10_24 = next_hop_return;
+	uint32_t test_hop_10_24 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_24);
 
@@ -1668,7 +1686,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_20_25, &next_hop_return);
-	uint8_t test_hop_20_25 = next_hop_return;
+	uint32_t test_hop_20_25 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_20_25);
 
@@ -1707,7 +1725,8 @@ test27(void)
 		struct rte_lpm6 *lpm = NULL;
 		struct rte_lpm6_config config;
 		uint8_t ip[] = {128,128,128,128,128,128,128,128,128,128,128,128,128,128,0,0};
-		uint8_t depth = 128, next_hop_add = 100, next_hop_return;
+		uint8_t depth = 128;
+		uint32_t next_hop_add = 100, next_hop_return;
 		int32_t status = 0;
 		int i, j;
 
@@ -1746,6 +1765,42 @@ test27(void)
 }
 
 /*
+ * Call add, lookup and delete for a single rule with maximum 21bit next_hop
+ * size.
+ * Check that next_hop returned from lookup is equal to provisioned value.
+ * Delete the rule and check that the same test returs a miss.
+ */
+int32_t
+test28(void)
+{
+	struct rte_lpm6 *lpm = NULL;
+	struct rte_lpm6_config config;
+	uint8_t ip[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 0x001FFFFF, next_hop_return = 0;
+	int32_t status = 0;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	status = rte_lpm6_add(lpm, ip, depth, next_hop_add);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm6_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add));
+
+	status = rte_lpm6_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	rte_lpm6_free(lpm);
+
+	return PASS;
+}
+
+/*
  * Do all unit tests.
  */
 static int
diff --git a/app/test/test_lpm6_perf.c b/app/test/test_lpm6_perf.c
index 0723081..30be430 100644
--- a/app/test/test_lpm6_perf.c
+++ b/app/test/test_lpm6_perf.c
@@ -86,7 +86,7 @@ test_lpm6_perf(void)
 	struct rte_lpm6_config config;
 	uint64_t begin, total_time;
 	unsigned i, j;
-	uint8_t next_hop_add = 0xAA, next_hop_return = 0;
+	uint32_t next_hop_add = 0xAA, next_hop_return = 0;
 	int status = 0;
 	int64_t count = 0;
 
@@ -148,7 +148,7 @@ test_lpm6_perf(void)
 	count = 0;
 
 	uint8_t ip_batch[NUM_IPS_ENTRIES][16];
-	int16_t next_hops[NUM_IPS_ENTRIES];
+	int32_t next_hops[NUM_IPS_ENTRIES];
 
 	for (i = 0; i < NUM_IPS_ENTRIES; i++)
 		memcpy(ip_batch[i], large_ips_table[i].ip, 16);
diff --git a/doc/guides/prog_guide/lpm6_lib.rst b/doc/guides/prog_guide/lpm6_lib.rst
index 0aea5c5..f791507 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -53,7 +53,7 @@ several thousand IPv6 rules, but the number can vary depending on the case.
 An LPM prefix is represented by a pair of parameters (128-bit key, depth), with depth in the range of 1 to 128.
 An LPM rule is represented by an LPM prefix and some user data associated with the prefix.
 The prefix serves as the unique identifier for the LPM rule.
-In this implementation, the user data is 1-byte long and is called "next hop",
+In this implementation, the user data is 21-bits long and is called "next hop",
 which corresponds to its main use of storing the ID of the next hop in a routing table entry.
 
 The main methods exported for the LPM component are:
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 48fb5bd..723e085 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,6 +41,9 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Increased number of next hops for LPM IPv6 to 2^21.**
+
+  The next_hop field is extended from 8 bits to 21 bits for IPv6.
 
 Resolved Issues
 ---------------
@@ -110,6 +113,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
+  while keeping ABI compatibility.
 
 ABI Changes
 -----------
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index e1e32c6..89d08c8 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -265,8 +265,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		uint8_t queueid, uint8_t port_in)
 {
 	struct rx_queue *rxq;
-	uint32_t i, len, next_hop_ipv4;
-	uint8_t next_hop_ipv6, port_out, ipv6;
+	uint32_t i, len, next_hop;
+	uint8_t port_out, ipv6;
 	int32_t len2;
 
 	ipv6 = 0;
@@ -290,9 +290,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			port_out = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
@@ -326,9 +326,10 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_hdr = rte_pktmbuf_mtod(m, struct ipv6_hdr *);
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			port_out = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+						&next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 50fe422..661b64f 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -346,8 +346,8 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	struct rte_ip_frag_death_row *dr;
 	struct rx_queue *rxq;
 	void *d_addr_bytes;
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6, dst_port;
+	uint32_t next_hop;
+	uint8_t dst_port;
 
 	rxq = &qconf->rx_queue_list[queue];
 
@@ -390,9 +390,9 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			dst_port = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
@@ -427,9 +427,10 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			dst_port = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+						&next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv6);
diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
index 5a4c9b7..5744c46 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -618,7 +618,7 @@ route4_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 static inline void
 route6_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 {
-	int16_t hop[MAX_PKT_BURST * 2];
+	int32_t hop[MAX_PKT_BURST * 2];
 	uint8_t dst_ip[MAX_PKT_BURST * 2][16];
 	uint8_t *ip6_dst;
 	uint16_t i, offset;
diff --git a/examples/l3fwd/l3fwd_lpm_sse.h b/examples/l3fwd/l3fwd_lpm_sse.h
index 538fe3d..aa06b6d 100644
--- a/examples/l3fwd/l3fwd_lpm_sse.h
+++ b/examples/l3fwd/l3fwd_lpm_sse.h
@@ -40,8 +40,7 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ipv4_hdr *ipv4_hdr;
 	struct ether_hdr *eth_hdr;
@@ -51,9 +50,11 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
 		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
 
-		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct,
-				rte_be_to_cpu_32(ipv4_hdr->dst_addr), &next_hop_ipv4) == 0) ?
-						next_hop_ipv4 : portid);
+		return (uint16_t) (
+			(rte_lpm_lookup(qconf->ipv4_lookup_struct,
+					rte_be_to_cpu_32(ipv4_hdr->dst_addr),
+					&next_hop) == 0) ?
+						next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -61,8 +62,8 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
@@ -78,14 +79,13 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
-			&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+			&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -93,8 +93,8 @@ lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
diff --git a/examples/performance-thread/l3fwd-thread/main.c b/examples/performance-thread/l3fwd-thread/main.c
index 53083df..fa99daf 100644
--- a/examples/performance-thread/l3fwd-thread/main.c
+++ b/examples/performance-thread/l3fwd-thread/main.c
@@ -909,7 +909,7 @@ static inline uint8_t
 get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid,
 		lookup6_struct_t *ipv6_l3fwd_lookup_struct)
 {
-	uint8_t next_hop;
+	uint32_t next_hop;
 
 	return (uint8_t) ((rte_lpm6_lookup(ipv6_l3fwd_lookup_struct,
 			((struct ipv6_hdr *)ipv6_hdr)->dst_addr, &next_hop) == 0) ?
@@ -1396,15 +1396,14 @@ rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
 static inline __attribute__((always_inline)) uint16_t
 get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv4_lookup_struct, dst_ipv4,
-				&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+				&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -1413,8 +1412,8 @@ get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 
 		return (uint16_t) ((rte_lpm6_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0) ? next_hop_ipv6 :
-						portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0) ?
+				next_hop : portid);
 
 	}
 
diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 32fdba0..9cc7be7 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -97,7 +97,7 @@ struct rte_lpm6_tbl_entry {
 /** Rules tbl entry structure. */
 struct rte_lpm6_rule {
 	uint8_t ip[RTE_LPM6_IPV6_ADDR_SIZE]; /**< Rule IP address. */
-	uint8_t next_hop; /**< Rule next hop. */
+	uint32_t next_hop; /**< Rule next hop. */
 	uint8_t depth; /**< Rule depth. */
 };
 
@@ -297,7 +297,7 @@ rte_lpm6_free(struct rte_lpm6 *lpm)
  * the nexthop if so. Otherwise it adds a new rule if enough space is available.
  */
 static inline int32_t
-rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
+rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint32_t next_hop, uint8_t depth)
 {
 	uint32_t rule_index;
 
@@ -340,7 +340,7 @@ rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
  */
 static void
 expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
-		uint8_t next_hop)
+		uint32_t next_hop)
 {
 	uint32_t tbl8_group_end, tbl8_gindex_next, j;
 
@@ -377,7 +377,7 @@ expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
 static inline int
 add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
 		struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip, uint8_t bytes,
-		uint8_t first_byte, uint8_t depth, uint8_t next_hop)
+		uint8_t first_byte, uint8_t depth, uint32_t next_hop)
 {
 	uint32_t tbl_index, tbl_range, tbl8_group_start, tbl8_group_end, i;
 	int32_t tbl8_gindex;
@@ -507,9 +507,17 @@ add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
  * Add a route
  */
 int
-rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop)
 {
+	return rte_lpm6_add_v1705(lpm, ip, depth, next_hop);
+}
+VERSION_SYMBOL(rte_lpm6_add, _v20, 2.0);
+
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop)
+{
 	struct rte_lpm6_tbl_entry *tbl;
 	struct rte_lpm6_tbl_entry *tbl_next;
 	int32_t rule_index;
@@ -560,6 +568,10 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_add, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip,
+				uint8_t depth, uint32_t next_hop),
+		rte_lpm6_add_v1705);
 
 /*
  * Takes a pointer to a table entry and inspect one level.
@@ -569,7 +581,7 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 static inline int
 lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		const struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip,
-		uint8_t first_byte, uint8_t *next_hop)
+		uint8_t first_byte, uint32_t *next_hop)
 {
 	uint32_t tbl8_index, tbl_entry;
 
@@ -589,7 +601,7 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		return 1;
 	} else {
 		/* If not extended then we can have a match. */
-		*next_hop = (uint8_t)tbl_entry;
+		*next_hop = ((uint32_t)tbl_entry & RTE_LPM6_TBL8_BITMASK);
 		return (tbl_entry & RTE_LPM6_LOOKUP_SUCCESS) ? 0 : -ENOENT;
 	}
 }
@@ -598,7 +610,26 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
  * Looks up an IP
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL)
+		return -EINVAL;
+
+	status = rte_lpm6_lookup_v1705(lpm, ip, &next_hop32);
+	if (status == 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+}
+VERSION_SYMBOL(rte_lpm6_lookup, _v20, 2.0);
+
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
+		uint32_t *next_hop)
 {
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
@@ -625,20 +656,23 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip,
+				uint32_t *next_hop), rte_lpm6_lookup_v1705);
 
 /*
  * Looks up a group of IP addresses
  */
 int
-rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n)
 {
 	unsigned i;
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
-	uint32_t tbl24_index;
-	uint8_t first_byte, next_hop;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
 	int status;
 
 	/* DEBUG: Check user input arguments. */
@@ -664,11 +698,59 @@ rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		if (status < 0)
 			next_hops[i] = -1;
 		else
-			next_hops[i] = next_hop;
+			next_hops[i] = (int16_t)next_hop;
+	}
+
+	return 0;
+}
+VERSION_SYMBOL(rte_lpm6_lookup_bulk_func, _v20, 2.0);
+
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t *next_hops, unsigned int n)
+{
+	unsigned int i;
+	const struct rte_lpm6_tbl_entry *tbl;
+	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
+	int status;
+
+	/* DEBUG: Check user input arguments. */
+	if ((lpm == NULL) || (ips == NULL) || (next_hops == NULL))
+		return -EINVAL;
+
+	for (i = 0; i < n; i++) {
+		first_byte = LOOKUP_FIRST_BYTE;
+		tbl24_index = (ips[i][0] << BYTES2_SIZE) |
+				(ips[i][1] << BYTE_SIZE) | ips[i][2];
+
+		/* Calculate pointer to the first entry to be inspected */
+		tbl = &lpm->tbl24[tbl24_index];
+
+		do {
+			/* Continue inspecting following levels
+			 * until success or failure
+			 */
+			status = lookup_step(lpm, tbl, &tbl_next, ips[i],
+					first_byte++, &next_hop);
+			tbl = tbl_next;
+		} while (status == 1);
+
+		if (status < 0)
+			next_hops[i] = -1;
+		else
+			next_hops[i] = (int32_t)next_hop;
 	}
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup_bulk_func, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+				uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+				int32_t *next_hops, unsigned int n),
+		rte_lpm6_lookup_bulk_func_v1705);
 
 /*
  * Finds a rule in rule table.
@@ -698,8 +780,28 @@ rule_find(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
  * Look for a rule in the high-level rules table
  */
 int
-rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop)
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL)
+		return -EINVAL;
+
+	status = rte_lpm6_is_rule_present_v1705(lpm, ip, depth, &next_hop32);
+	if (status > 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+
+}
+VERSION_SYMBOL(rte_lpm6_is_rule_present, _v20, 2.0);
+
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop)
 {
 	uint8_t ip_masked[RTE_LPM6_IPV6_ADDR_SIZE];
 	int32_t rule_index;
@@ -724,6 +826,10 @@ uint8_t *next_hop)
 	/* If rule is not found return 0. */
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_is_rule_present, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_is_rule_present(struct rte_lpm6 *lpm,
+				uint8_t *ip, uint8_t depth, uint32_t *next_hop),
+		rte_lpm6_is_rule_present_v1705);
 
 /*
  * Delete a rule from the rule table.
diff --git a/lib/librte_lpm/rte_lpm6.h b/lib/librte_lpm/rte_lpm6.h
index 13d027f..3a3342d 100644
--- a/lib/librte_lpm/rte_lpm6.h
+++ b/lib/librte_lpm/rte_lpm6.h
@@ -39,6 +39,7 @@
  */
 
 #include <stdint.h>
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -123,7 +124,13 @@ rte_lpm6_free(struct rte_lpm6 *lpm);
  */
 int
 rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
+int
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop);
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
 
 /**
  * Check if a rule is present in the LPM table,
@@ -142,7 +149,13 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
  */
 int
 rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop);
+		uint32_t *next_hop);
+int
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop);
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop);
 
 /**
  * Delete a rule from the LPM table.
@@ -199,7 +212,12 @@ rte_lpm6_delete_all(struct rte_lpm6 *lpm);
  *   -EINVAL for incorrect arguments, -ENOENT on lookup miss, 0 on lookup hit
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop);
+int
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
+		uint32_t *next_hop);
 
 /**
  * Lookup multiple IP addresses in an LPM table.
@@ -220,7 +238,15 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
 int
 rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
-		int16_t * next_hops, unsigned n);
+		int32_t *next_hops, unsigned int n);
+int
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int16_t *next_hops, unsigned int n);
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t *next_hops, unsigned int n);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 239b371..90beac8 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -34,3 +34,13 @@ DPDK_16.04 {
 	rte_lpm_delete_all;
 
 } DPDK_2.0;
+
+DPDK_17.05 {
+	global:
+
+	rte_lpm6_add;
+	rte_lpm6_is_rule_present;
+	rte_lpm6_lookup;
+	rte_lpm6_lookup_bulk_func;
+
+} DPDK_16.04;
diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c
index 836f4cf..1e1a173 100644
--- a/lib/librte_table/rte_table_lpm_ipv6.c
+++ b/lib/librte_table/rte_table_lpm_ipv6.c
@@ -211,9 +211,8 @@ rte_table_lpm_ipv6_entry_add(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint32_t nht_pos, nht_pos0_valid;
+	uint32_t nht_pos, nht_pos0, nht_pos0_valid;
 	int status;
-	uint8_t nht_pos0;
 
 	/* Check input parameters */
 	if (lpm == NULL) {
@@ -256,7 +255,7 @@ rte_table_lpm_ipv6_entry_add(
 
 	/* Add rule to low level LPM table */
 	if (rte_lpm6_add(lpm->lpm, ip_prefix->ip, ip_prefix->depth,
-		(uint8_t) nht_pos) < 0) {
+		nht_pos) < 0) {
 		RTE_LOG(ERR, TABLE, "%s: LPM IPv6 rule add failed\n", __func__);
 		return -1;
 	}
@@ -280,7 +279,7 @@ rte_table_lpm_ipv6_entry_delete(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint8_t nht_pos;
+	uint32_t nht_pos;
 	int status;
 
 	/* Check input parameters */
@@ -356,7 +355,7 @@ rte_table_lpm_ipv6_lookup(
 			uint8_t *ip = RTE_MBUF_METADATA_UINT8_PTR(pkt,
 				lpm->offset);
 			int status;
-			uint8_t nht_pos;
+			uint32_t nht_pos;
 
 			status = rte_lpm6_lookup(lpm->lpm, ip, &nht_pos);
 			if (status == 0) {
-- 
2.1.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-14 20:31  9% ` Jan Blunck
@ 2017-02-22 13:12  7%   ` Christian Ehrhardt
  2017-02-22 13:24 20%     ` [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version Christian Ehrhardt
  2017-02-23 18:48  4%     ` [dpdk-dev] Further fun with ABI tracking Ferruh Yigit
  0 siblings, 2 replies; 200+ results
From: Christian Ehrhardt @ 2017-02-22 13:12 UTC (permalink / raw)
  To: Jan Blunck; +Cc: dev, cjcollier, ricardo.salveti, Luca Boccassi

On Tue, Feb 14, 2017 at 9:31 PM, Jan Blunck <jblunck@infradead.org> wrote:

> > 1. Downstreams to insert Major version into soname
> > Distributions could insert the DPDK major version (like 16.11) into the
> > soname and package names. A common example of this is libboost [5].
> > That would perfectly allow 16.07.<LIBABIVER> to coexist with
> > 16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
> > Yet it would mean that anything depending on the old library will have to
> > be recompiled to pick up the new code, even if it depends on an ABI that
> is
> > still present in the new release.
> > Also - not a technical reason - but it is clearly more work to force
> update
> > all dependencies and clean out old packages for every release.
>
> Actually this isn't exactly what I proposed during the summit. Just
> keep it simple and fix the ABI version of all libraries at 16.11.0.
> This is a proven approach and has been used for years with different
> libraries.


Since there was no other response I'll try to wrap up.

Yes #1 also is my preferred solution at the moment.
We tried with individual following the tracking of LIBABIVER upstream but
as outlined before we hit too many issues.
I discussed it in the deb_dpdk group which acked as well to use this as
general approach.
The other options have too obvious flaws as I listed on my initial report
and - thanks btw - you added a few more.

@Bruce - sorry I don't think dropping config options is the solution. Yet
my suggestion does not prevent you from doing so.



> You could easily do this independently of us upstream
> fixing the ABI problems.



I agree, but I'd like to suggest the mechanism I want to implement.
An ack by upstream for the Feature to set such a major ABI would be great.
Actually since it is optional and can help more people integrating DPDK
getting it accepted upstream be even better.

I'll send a patch in reply to this thread later today that implements what
I have in mind.


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version
  2017-02-22 13:12  7%   ` Christian Ehrhardt
@ 2017-02-22 13:24 20%     ` Christian Ehrhardt
  2017-02-28  8:34  4%       ` Jan Blunck
  2017-02-23 18:48  4%     ` [dpdk-dev] Further fun with ABI tracking Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Christian Ehrhardt @ 2017-02-22 13:24 UTC (permalink / raw)
  To: dev
  Cc: Christian Ehrhardt, cjcollier @ linuxfoundation . org,
	ricardo.salveti, Luca Boccassi

Downstreams might want to provide different DPDK releases at the same
time to support multiple consumers of DPDK linked against older and newer
sonames.

Also due to the interdependencies that DPDK libraries can have applications
might end up with an executable space in which multiple versions of a
library are mapped by ld.so.

Think of LibA that got an ABI bump and LibB that did not get an ABI bump
but is depending on LibA.

    Application
    \-> LibA.old
    \-> LibB.new -> LibA.new

That is a conflict which can be avoided by setting CONFIG_RTE_MAJOR_ABI.
If set CONFIG_RTE_MAJOR_ABI overwrites any LIBABIVER value.
An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all
libraries librte<?>.so.16.11 instead of librte<?>.so.<LIBABIVER>.

We need to cut arbitrary long stings after the .so now and this would work
for any ABI version in LIBABIVER:
  $(Q)ln -s -f $< $(patsubst %.$(LIBABIVER),%,$@)
But using the following instead additionally allows to simplify the Make
File for the CONFIG_RTE_NEXT_ABI case.
  $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 config/common_base                     |  5 +++++
 doc/guides/contributing/versioning.rst | 25 +++++++++++++++++++++++++
 mk/rte.lib.mk                          | 12 +++++++-----
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/config/common_base b/config/common_base
index aeee13e..37aa1e1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -75,6 +75,11 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
 CONFIG_RTE_NEXT_ABI=y
 
 #
+# Major ABI to overwrite library specific LIBABIVER
+#
+CONFIG_RTE_MAJOR_ABI=
+
+#
 # Machine's cache line size
 #
 CONFIG_RTE_CACHE_LINE_SIZE=64
diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
index fbc44a7..8aaf370 100644
--- a/doc/guides/contributing/versioning.rst
+++ b/doc/guides/contributing/versioning.rst
@@ -133,6 +133,31 @@ The macros exported are:
   fully qualified function ``p``, so that if a symbol becomes versioned, it
   can still be mapped back to the public symbol name.
 
+Setting a Major ABI version
+---------------------------
+
+Downstreams might want to provide different DPDK releases at the same time to
+support multiple consumers of DPDK linked against older and newer sonames.
+
+Also due to the interdependencies that DPDK libraries can have applications
+might end up with an executable space in which multiple versions of a library
+are mapped by ld.so.
+
+Think of LibA that got an ABI bump and LibB that did not get an ABI bump but is
+depending on LibA.
+
+.. note::
+
+    Application
+    \-> LibA.old
+    \-> LibB.new -> LibA.new
+
+That is a conflict which can be avoided by setting ``CONFIG_RTE_MAJOR_ABI``.
+If set, the value of ``CONFIG_RTE_MAJOR_ABI`` overwrites all - otherwise per
+library - versions defined in the libraries ``LIBABIVER``.
+An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all libraries
+``librte<?>.so.16.11`` instead of ``librte<?>.so.<LIBABIVER>``.
+
 Examples of ABI Macro use
 -------------------------
 
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 33a5f5a..06046c2 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -40,6 +40,12 @@ EXTLIB_BUILD ?= n
 # VPATH contains at least SRCDIR
 VPATH += $(SRCDIR)
 
+ifneq ($(CONFIG_RTE_MAJOR_ABI),)
+ifneq ($(LIBABIVER),)
+LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
+endif
+endif
+
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
 ifeq ($(EXTLIB_BUILD),n)
@@ -156,11 +162,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
 	@[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
 	$(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
-ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
-	$(Q)ln -s -f $< $(basename $(basename $@))
-else
-	$(Q)ln -s -f $< $(basename $@)
-endif
+	$(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
 endif
 
 #
-- 
2.7.4

^ permalink raw reply	[relevance 20%]

* [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  @ 2017-02-23 17:23  4% ` Bruce Richardson
  2017-02-28 11:35  0%   ` Jerin Jacob
  2017-02-23 17:23  3% ` [dpdk-dev] [PATCH v1 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Users compiling DPDK should not need to know or care about the arrangement
of cachelines in the rte_ring structure. Therefore just remove the build
option and set the structures to be always split. For improved
performance use 128B rather than 64B alignment since it stops the producer
and consumer data being on adjacent cachelines.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                     | 1 -
 doc/guides/rel_notes/release_17_05.rst | 6 ++++++
 lib/librte_ring/rte_ring.c             | 2 --
 lib/librte_ring/rte_ring.h             | 8 ++------
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/config/common_base b/config/common_base
index aeee13e..099ffda 100644
--- a/config/common_base
+++ b/config/common_base
@@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 #
 CONFIG_RTE_LIBRTE_RING=y
 CONFIG_RTE_LIBRTE_RING_DEBUG=n
-CONFIG_RTE_RING_SPLIT_PROD_CONS=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e25ea9f..ea45e0c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -110,6 +110,12 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Reworked rte_ring library**
+
+  The rte_ring library has been reworked and updated. The following changes
+  have been made to it:
+
+  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index ca0a108..4bc6da1 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	/* compilation-time checks */
 	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_RING_SPLIT_PROD_CONS
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #ifdef RTE_LIBRTE_RING_DEBUG
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 72ccca5..04fe667 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -168,7 +168,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Producer head. */
 		volatile uint32_t tail;  /**< Producer tail. */
-	} prod __rte_cache_aligned;
+	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
 
 	/** Ring consumer status. */
 	struct cons {
@@ -177,11 +177,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Consumer head. */
 		volatile uint32_t tail;  /**< Consumer tail. */
-#ifdef RTE_RING_SPLIT_PROD_CONS
-	} cons __rte_cache_aligned;
-#else
-	} cons;
-#endif
+	} cons __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
 
 #ifdef RTE_LIBRTE_RING_DEBUG
 	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1 03/14] ring: eliminate duplication of size and mask fields
    2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting Bruce Richardson
@ 2017-02-23 17:23  3% ` Bruce Richardson
  2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 04/14] ring: remove debug setting Bruce Richardson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_ring.c       |  6 +++---
 lib/librte_ring/rte_ring.c | 20 ++++++++++----------
 lib/librte_ring/rte_ring.h | 32 ++++++++++++++++----------------
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index ebcb896..5f09097 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -148,7 +148,7 @@ check_live_watermark_change(__attribute__((unused)) void *dummy)
 		}
 
 		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->prod.watermark;
+		watermark = r->watermark;
 		if (watermark != watermark_old &&
 		    (watermark_old != 16 || watermark != 32)) {
 			printf("Bad watermark change %u -> %u\n", watermark_old,
@@ -213,7 +213,7 @@ test_set_watermark( void ){
 		printf( " ring lookup failed\n" );
 		goto error;
 	}
-	count = r->prod.size*2;
+	count = r->size * 2;
 	setwm = rte_ring_set_water_mark(r, count);
 	if (setwm != -EINVAL){
 		printf("Test failed to detect invalid watermark count value\n");
@@ -222,7 +222,7 @@ test_set_watermark( void ){
 
 	count = 0;
 	rte_ring_set_water_mark(r, count);
-	if (r->prod.watermark != r->prod.size) {
+	if (r->watermark != r->size) {
 		printf("Test failed to detect invalid watermark count value\n");
 		goto error;
 	}
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4bc6da1..80fc356 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -144,11 +144,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.watermark = count;
+	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
-	r->prod.size = r->cons.size = count;
-	r->prod.mask = r->cons.mask = count-1;
+	r->size = count;
+	r->mask = count - 1;
 	r->prod.head = r->cons.head = 0;
 	r->prod.tail = r->cons.tail = 0;
 
@@ -269,14 +269,14 @@ rte_ring_free(struct rte_ring *r)
 int
 rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 {
-	if (count >= r->prod.size)
+	if (count >= r->size)
 		return -EINVAL;
 
 	/* if count is 0, disable the watermarking */
 	if (count == 0)
-		count = r->prod.size;
+		count = r->size;
 
-	r->prod.watermark = count;
+	r->watermark = count;
 	return 0;
 }
 
@@ -291,17 +291,17 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->prod.watermark == r->prod.size)
+	if (r->watermark == r->size)
 		fprintf(f, "  watermark=0\n");
 	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->prod.watermark);
+		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 
 	/* sum and dump statistics */
 #ifdef RTE_LIBRTE_RING_DEBUG
@@ -318,7 +318,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
 		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
 	}
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
 	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
 	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 0c8defd..6e75c15 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -143,13 +143,10 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 struct rte_ring_ht_ptr {
 	volatile uint32_t head;  /**< Prod/consumer head. */
 	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
 	union {
 		uint32_t sp_enqueue; /**< True, if single producer. */
 		uint32_t sc_dequeue; /**< True, if single consumer. */
 	};
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 };
 
 /**
@@ -169,9 +166,12 @@ struct rte_ring {
 	 * next time the ABI changes
 	 */
 	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
+	int flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_ht_ptr prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
@@ -350,7 +350,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * Placed here since identical code needed in both
  * single and multi producer enqueue functions */
 #define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
+	const uint32_t size = r->size; \
 	uint32_t idx = prod_head & mask; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
@@ -377,7 +377,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * single and multi consumer dequeue functions */
 #define DEQUEUE_PTRS() do { \
 	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
+	const uint32_t size = r->size; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
 			obj_table[i] = r->ring[idx]; \
@@ -432,7 +432,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -480,7 +480,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -539,7 +539,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	prod_head = r->prod.head;
@@ -575,7 +575,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -625,7 +625,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -722,7 +722,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
 	prod_tail = r->prod.tail;
@@ -1051,7 +1051,7 @@ rte_ring_full(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+	return ((cons_tail - prod_tail - 1) & r->mask) == 0;
 }
 
 /**
@@ -1084,7 +1084,7 @@ rte_ring_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (prod_tail - cons_tail) & r->prod.mask;
+	return (prod_tail - cons_tail) & r->mask;
 }
 
 /**
@@ -1100,7 +1100,7 @@ rte_ring_free_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (cons_tail - prod_tail - 1) & r->prod.mask;
+	return (cons_tail - prod_tail - 1) & r->mask;
 }
 
 /**
@@ -1114,7 +1114,7 @@ rte_ring_free_count(const struct rte_ring *r)
 static inline unsigned int
 rte_ring_get_size(const struct rte_ring *r)
 {
-	return r->prod.size;
+	return r->size;
 }
 
 /**
-- 
2.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1 04/14] ring: remove debug setting
    2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting Bruce Richardson
  2017-02-23 17:23  3% ` [dpdk-dev] [PATCH v1 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
@ 2017-02-23 17:23  2% ` Bruce Richardson
  2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

The debug option only provided statistics to the user, most of
which could be tracked by the application itself. Remove this as a
compile time option, and feature, simplifying the code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/test_ring.c                   | 410 ---------------------------------
 config/common_base                     |   1 -
 doc/guides/prog_guide/ring_lib.rst     |   7 -
 doc/guides/rel_notes/release_17_05.rst |   1 +
 lib/librte_ring/rte_ring.c             |  41 ----
 lib/librte_ring/rte_ring.h             |  97 +-------
 6 files changed, 13 insertions(+), 544 deletions(-)

diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 5f09097..3891f5d 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -763,412 +763,6 @@ test_ring_burst_basic(void)
 	return -1;
 }
 
-static int
-test_ring_stats(void)
-{
-
-#ifndef RTE_LIBRTE_RING_DEBUG
-	printf("Enable RTE_LIBRTE_RING_DEBUG to test ring stats.\n");
-	return 0;
-#else
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i;
-	unsigned num_items            = 0;
-	unsigned failed_enqueue_ops   = 0;
-	unsigned failed_enqueue_items = 0;
-	unsigned failed_dequeue_ops   = 0;
-	unsigned failed_dequeue_items = 0;
-	unsigned last_enqueue_ops     = 0;
-	unsigned last_enqueue_items   = 0;
-	unsigned last_quota_ops       = 0;
-	unsigned last_quota_items     = 0;
-	unsigned lcore_id = rte_lcore_id();
-	struct rte_ring_debug_stats *ring_stats = &r->stats[lcore_id];
-
-	printf("Test the ring stats.\n");
-
-	/* Reset the watermark in case it was set in another test. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Allocate some dummy object pointers. */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
-
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-
-	/* Allocate some memory for copied objects. */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-
-	/* Set the head and tail pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	/* Do Enqueue tests. */
-	printf("Test the dequeue stats.\n");
-
-	/* Fill the ring up to RING_SIZE -1. */
-	printf("Fill the ring.\n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK); i++) {
-		rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK);
-		cur_src += MAX_BULK;
-	}
-
-	/* Adjust for final enqueue = MAX_BULK -1. */
-	cur_src--;
-
-	printf("Verify that the ring is full.\n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
-
-
-	printf("Verify the enqueue success stats.\n");
-	/* Stats should match above enqueue operations to fill the ring. */
-	if (ring_stats->enq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Current max objects is RING_SIZE -1. */
-	if (ring_stats->enq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any failures yet. */
-	if (ring_stats->enq_fail_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_fail_objs != 0)
-		goto fail;
-
-
-	printf("Test stats for SP burst enqueue to a full ring.\n");
-	num_items = 2;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for SP bulk enqueue to a full ring.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP burst enqueue to a full ring.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP bulk enqueue to a full ring.\n");
-	num_items = 16;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	/* Do Dequeue tests. */
-	printf("Test the dequeue stats.\n");
-
-	printf("Empty the ring.\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* There was only RING_SIZE -1 objects to dequeue. */
-	cur_dst++;
-
-	printf("Verify ring is empty.\n");
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	printf("Verify the dequeue success stats.\n");
-	/* Stats should match above dequeue operations. */
-	if (ring_stats->deq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Objects dequeued is RING_SIZE -1. */
-	if (ring_stats->deq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any dequeue failure stats yet. */
-	if (ring_stats->deq_fail_bulk != 0)
-		goto fail;
-
-	printf("Test stats for SC burst dequeue with an empty ring.\n");
-	num_items = 2;
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for SC bulk dequeue with an empty ring.\n");
-	num_items = 4;
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC burst dequeue with an empty ring.\n");
-	num_items = 8;
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC bulk dequeue with an empty ring.\n");
-	num_items = 16;
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test total enqueue/dequeue stats.\n");
-	/* At this point the enqueue and dequeue stats should be the same. */
-	if (ring_stats->enq_success_bulk != ring_stats->deq_success_bulk)
-		goto fail;
-	if (ring_stats->enq_success_objs != ring_stats->deq_success_objs)
-		goto fail;
-	if (ring_stats->enq_fail_bulk    != ring_stats->deq_fail_bulk)
-		goto fail;
-	if (ring_stats->enq_fail_objs    != ring_stats->deq_fail_objs)
-		goto fail;
-
-
-	/* Watermark Tests. */
-	printf("Test the watermark/quota stats.\n");
-
-	printf("Verify the initial watermark stats.\n");
-	/* Watermark stats should be 0 since there is no watermark. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Set a watermark. */
-	rte_ring_set_water_mark(r, 16);
-
-	/* Reset pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue below watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should still be 0. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Success stats should have increased. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops + 1)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items + num_items)
-		goto fail;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue at watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != 1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP burst enqueue above watermark.\n");
-	num_items = 1;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP burst enqueue above watermark.\n");
-	num_items = 2;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP bulk enqueue above watermark.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP bulk enqueue above watermark.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	printf("Test watermark success stats.\n");
-	/* Success stats should be same as last non-watermarked enqueue. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items)
-		goto fail;
-
-
-	/* Cleanup. */
-
-	/* Empty the ring. */
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* Reset the watermark. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
-	return 0;
-
-fail:
-	free(src);
-	free(dst);
-	return -1;
-#endif
-}
-
 /*
  * it will always fail to create ring with a wrong ring size number in this function
  */
@@ -1335,10 +929,6 @@ test_ring(void)
 	if (test_ring_basic() < 0)
 		return -1;
 
-	/* ring stats */
-	if (test_ring_stats() < 0)
-		return -1;
-
 	/* basic operations */
 	if (test_live_watermark_change() < 0)
 		return -1;
diff --git a/config/common_base b/config/common_base
index 099ffda..b3d8272 100644
--- a/config/common_base
+++ b/config/common_base
@@ -447,7 +447,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_LIBRTE_RING_DEBUG=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index 9f69753..d4ab502 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -110,13 +110,6 @@ Once an enqueue operation reaches the high water mark, the producer is notified,
 
 This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE.
 
-Debug
-~~~~~
-
-When debug is enabled (CONFIG_RTE_LIBRTE_RING_DEBUG is set),
-the library stores some per-ring statistic counters about the number of enqueues/dequeues.
-These statistics are per-core to avoid concurrent accesses or atomic operations.
-
 Use Cases
 ---------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index ea45e0c..e0ebd71 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -116,6 +116,7 @@ API Changes
   have been made to it:
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
+  * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 80fc356..90ee63f 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -131,12 +131,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 			  RTE_CACHE_LINE_MASK) != 0);
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_LIBRTE_RING_DEBUG
-	RTE_BUILD_BUG_ON((sizeof(struct rte_ring_debug_stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
@@ -284,11 +278,6 @@ rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
 {
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats sum;
-	unsigned lcore_id;
-#endif
-
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
 	fprintf(f, "  size=%"PRIu32"\n", r->size);
@@ -302,36 +291,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		fprintf(f, "  watermark=0\n");
 	else
 		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
-
-	/* sum and dump statistics */
-#ifdef RTE_LIBRTE_RING_DEBUG
-	memset(&sum, 0, sizeof(sum));
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		sum.enq_success_bulk += r->stats[lcore_id].enq_success_bulk;
-		sum.enq_success_objs += r->stats[lcore_id].enq_success_objs;
-		sum.enq_quota_bulk += r->stats[lcore_id].enq_quota_bulk;
-		sum.enq_quota_objs += r->stats[lcore_id].enq_quota_objs;
-		sum.enq_fail_bulk += r->stats[lcore_id].enq_fail_bulk;
-		sum.enq_fail_objs += r->stats[lcore_id].enq_fail_objs;
-		sum.deq_success_bulk += r->stats[lcore_id].deq_success_bulk;
-		sum.deq_success_objs += r->stats[lcore_id].deq_success_objs;
-		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
-		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
-	}
-	fprintf(f, "  size=%"PRIu32"\n", r->size);
-	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
-	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
-	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
-	fprintf(f, "  enq_quota_objs=%"PRIu64"\n", sum.enq_quota_objs);
-	fprintf(f, "  enq_fail_bulk=%"PRIu64"\n", sum.enq_fail_bulk);
-	fprintf(f, "  enq_fail_objs=%"PRIu64"\n", sum.enq_fail_objs);
-	fprintf(f, "  deq_success_bulk=%"PRIu64"\n", sum.deq_success_bulk);
-	fprintf(f, "  deq_success_objs=%"PRIu64"\n", sum.deq_success_objs);
-	fprintf(f, "  deq_fail_bulk=%"PRIu64"\n", sum.deq_fail_bulk);
-	fprintf(f, "  deq_fail_objs=%"PRIu64"\n", sum.deq_fail_objs);
-#else
-	fprintf(f, "  no statistics available\n");
-#endif
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 6e75c15..814f593 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -109,24 +109,6 @@ enum rte_ring_queue_behavior {
 	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
 };
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-/**
- * A structure that stores the ring statistics (per-lcore).
- */
-struct rte_ring_debug_stats {
-	uint64_t enq_success_bulk; /**< Successful enqueues number. */
-	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
-	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
-	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
-	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
-	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
-	uint64_t deq_success_bulk; /**< Successful dequeues number. */
-	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
-	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
-	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
-} __rte_cache_aligned;
-#endif
-
 #define RTE_RING_MZ_PREFIX "RG_"
 /**< The maximum length of a ring name. */
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
@@ -179,10 +161,6 @@ struct rte_ring {
 	/** Ring consumer status. */
 	struct rte_ring_ht_ptr cons __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-#endif
-
 	void *ring[] __rte_cache_aligned;   /**< Memory space of ring starts here.
 	                                     * not volatile so need to be careful
 	                                     * about compiler re-ordering */
@@ -194,27 +172,6 @@ struct rte_ring {
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
- * @internal When debug is enabled, store ring statistics.
- * @param r
- *   A pointer to the ring.
- * @param name
- *   The name of the statistics field to increment in the ring.
- * @param n
- *   The number to add to the object-oriented statistics.
- */
-#ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {                        \
-		unsigned __lcore_id = rte_lcore_id();           \
-		if (__lcore_id < RTE_MAX_LCORE) {               \
-			r->stats[__lcore_id].name##_objs += n;  \
-			r->stats[__lcore_id].name##_bulk += 1;  \
-		}                                               \
-	} while(0)
-#else
-#define __RING_STAT_ADD(r, name, n) do {} while(0)
-#endif
-
-/**
  * Calculate the memory size needed for a ring
  *
  * This function returns the number of bytes needed for a ring, given
@@ -455,17 +412,12 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOBUFS;
-			}
 			else {
 				/* No free entry available */
-				if (unlikely(free_entries == 0)) {
-					__RING_STAT_ADD(r, enq_fail, n);
+				if (unlikely(free_entries == 0))
 					return 0;
-				}
-
 				n = free_entries;
 			}
 		}
@@ -480,15 +432,11 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	/*
 	 * If there are other enqueues in progress that preceded us,
@@ -552,17 +500,12 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, enq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOBUFS;
-		}
 		else {
 			/* No free entry available */
-			if (unlikely(free_entries == 0)) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (unlikely(free_entries == 0))
 				return 0;
-			}
-
 			n = free_entries;
 		}
 	}
@@ -575,15 +518,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	r->prod.tail = prod_next;
 	return ret;
@@ -647,16 +586,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOENT;
-			}
 			else {
-				if (unlikely(entries == 0)){
-					__RING_STAT_ADD(r, deq_fail, n);
+				if (unlikely(entries == 0))
 					return 0;
-				}
-
 				n = entries;
 			}
 		}
@@ -686,7 +620,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 			sched_yield();
 		}
 	}
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -733,16 +666,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	entries = prod_tail - cons_head;
 
 	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, deq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOENT;
-		}
 		else {
-			if (unlikely(entries == 0)){
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (unlikely(entries == 0))
 				return 0;
-			}
-
 			n = entries;
 		}
 	}
@@ -754,7 +682,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	DEQUEUE_PTRS();
 	rte_smp_rmb();
 
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
 }
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 05/14] ring: remove the yield when waiting for tail update
                     ` (2 preceding siblings ...)
  2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 04/14] ring: remove debug setting Bruce Richardson
@ 2017-02-23 17:23  4% ` Bruce Richardson
  2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 06/14] ring: remove watermark support Bruce Richardson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

There was a compile time setting to enable a ring to yield when
it entered a loop in mp or mc rings waiting for the tail pointer update.
Build time settings are not recommended for enabling/disabling features,
and since this was off by default, remove it completely. If needed, a
runtime enabled equivalent can be used.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                              |  1 -
 doc/guides/prog_guide/env_abstraction_layer.rst |  5 ----
 doc/guides/rel_notes/release_17_05.rst          |  1 +
 lib/librte_ring/rte_ring.h                      | 35 +++++--------------------
 4 files changed, 7 insertions(+), 35 deletions(-)

diff --git a/config/common_base b/config/common_base
index b3d8272..d5beadd 100644
--- a/config/common_base
+++ b/config/common_base
@@ -447,7 +447,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
 # Compile librte_mempool
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 10a10a8..7c39cd2 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -352,11 +352,6 @@ Known Issues
 
   3. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
 
-  ``RTE_RING_PAUSE_REP_COUNT`` is defined for rte_ring to reduce contention. It's mainly for case 2, a yield is issued after number of times pause repeat.
-
-  It adds a sched_yield() syscall if the thread spins for too long while waiting on the other thread to finish its operations on the ring.
-  This gives the preempted thread a chance to proceed and finish with the ring enqueue/dequeue operation.
-
 + rte_timer
 
   Running  ``rte_timer_manager()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e0ebd71..c69ca8f 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -117,6 +117,7 @@ API Changes
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
+  * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 814f593..0f95c84 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -114,11 +114,6 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-#ifndef RTE_RING_PAUSE_REP_COUNT
-#define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
-                                    *   if RTE_RING_PAUSE_REP not defined. */
-#endif
-
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
 /* structure to hold a pair of head/tail values and other metadata */
@@ -388,7 +383,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t cons_tail, free_entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -442,18 +437,9 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->prod.tail != prod_head)) {
+	while (unlikely(r->prod.tail != prod_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->prod.tail = prod_next;
 	return ret;
 }
@@ -486,7 +472,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 {
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -563,7 +549,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_next, entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -608,18 +594,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * If there are other dequeues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->cons.tail != cons_head)) {
+	while (unlikely(r->cons.tail != cons_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -654,7 +631,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1 06/14] ring: remove watermark support
                     ` (3 preceding siblings ...)
  2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
@ 2017-02-23 17:23  2% ` Bruce Richardson
  2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:23 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Remove the watermark support. A future commit will add support for having
enqueue functions return the amount of free space in the ring, which will
allow applications to implement their own watermark checks, while also
being more useful to the app.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test/commands.c                    |  52 ------------
 app/test/test_ring.c                   | 149 +--------------------------------
 doc/guides/rel_notes/release_17_05.rst |   2 +
 examples/Makefile                      |   2 +-
 lib/librte_ring/rte_ring.c             |  23 -----
 lib/librte_ring/rte_ring.h             |  58 +------------
 6 files changed, 8 insertions(+), 278 deletions(-)

diff --git a/app/test/commands.c b/app/test/commands.c
index 2df46b0..551c81d 100644
--- a/app/test/commands.c
+++ b/app/test/commands.c
@@ -228,57 +228,6 @@ cmdline_parse_inst_t cmd_dump_one = {
 
 /****************/
 
-struct cmd_set_ring_result {
-	cmdline_fixed_string_t set;
-	cmdline_fixed_string_t name;
-	uint32_t value;
-};
-
-static void cmd_set_ring_parsed(void *parsed_result, struct cmdline *cl,
-				__attribute__((unused)) void *data)
-{
-	struct cmd_set_ring_result *res = parsed_result;
-	struct rte_ring *r;
-	int ret;
-
-	r = rte_ring_lookup(res->name);
-	if (r == NULL) {
-		cmdline_printf(cl, "Cannot find ring\n");
-		return;
-	}
-
-	if (!strcmp(res->set, "set_watermark")) {
-		ret = rte_ring_set_water_mark(r, res->value);
-		if (ret != 0)
-			cmdline_printf(cl, "Cannot set water mark\n");
-	}
-}
-
-cmdline_parse_token_string_t cmd_set_ring_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, set,
-				 "set_watermark");
-
-cmdline_parse_token_string_t cmd_set_ring_name =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, name, NULL);
-
-cmdline_parse_token_num_t cmd_set_ring_value =
-	TOKEN_NUM_INITIALIZER(struct cmd_set_ring_result, value, UINT32);
-
-cmdline_parse_inst_t cmd_set_ring = {
-	.f = cmd_set_ring_parsed,  /* function to call */
-	.data = NULL,      /* 2nd arg of func */
-	.help_str = "set watermark: "
-			"set_watermark <ring_name> <value>",
-	.tokens = {        /* token list, NULL terminated */
-		(void *)&cmd_set_ring_set,
-		(void *)&cmd_set_ring_name,
-		(void *)&cmd_set_ring_value,
-		NULL,
-	},
-};
-
-/****************/
-
 struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
@@ -419,7 +368,6 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_autotest,
 	(cmdline_parse_inst_t *)&cmd_dump,
 	(cmdline_parse_inst_t *)&cmd_dump_one,
-	(cmdline_parse_inst_t *)&cmd_set_ring,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx_anchor,
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 3891f5d..666a451 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -78,21 +78,6 @@
  *      - Dequeue one object, two objects, MAX_BULK objects
  *      - Check that dequeued pointers are correct
  *
- *    - Test watermark and default bulk enqueue/dequeue:
- *
- *      - Set watermark
- *      - Set default bulk value
- *      - Enqueue objects, check that -EDQUOT is returned when
- *        watermark is exceeded
- *      - Check that dequeued pointers are correct
- *
- * #. Check live watermark change
- *
- *    - Start a loop on another lcore that will enqueue and dequeue
- *      objects in a ring. It will monitor the value of watermark.
- *    - At the same time, change the watermark on the master lcore.
- *    - The slave lcore will check that watermark changes from 16 to 32.
- *
  * #. Performance tests.
  *
  * Tests done in test_ring_perf.c
@@ -115,123 +100,6 @@ static struct rte_ring *r;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
-static int
-check_live_watermark_change(__attribute__((unused)) void *dummy)
-{
-	uint64_t hz = rte_get_timer_hz();
-	void *obj_table[MAX_BULK];
-	unsigned watermark, watermark_old = 16;
-	uint64_t cur_time, end_time;
-	int64_t diff = 0;
-	int i, ret;
-	unsigned count = 4;
-
-	/* init the object table */
-	memset(obj_table, 0, sizeof(obj_table));
-	end_time = rte_get_timer_cycles() + (hz / 4);
-
-	/* check that bulk and watermark are 4 and 32 (respectively) */
-	while (diff >= 0) {
-
-		/* add in ring until we reach watermark */
-		ret = 0;
-		for (i = 0; i < 16; i ++) {
-			if (ret != 0)
-				break;
-			ret = rte_ring_enqueue_bulk(r, obj_table, count);
-		}
-
-		if (ret != -EDQUOT) {
-			printf("Cannot enqueue objects, or watermark not "
-			       "reached (ret=%d)\n", ret);
-			return -1;
-		}
-
-		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->watermark;
-		if (watermark != watermark_old &&
-		    (watermark_old != 16 || watermark != 32)) {
-			printf("Bad watermark change %u -> %u\n", watermark_old,
-			       watermark);
-			return -1;
-		}
-		watermark_old = watermark;
-
-		/* dequeue objects from ring */
-		while (i--) {
-			ret = rte_ring_dequeue_bulk(r, obj_table, count);
-			if (ret != 0) {
-				printf("Cannot dequeue (ret=%d)\n", ret);
-				return -1;
-			}
-		}
-
-		cur_time = rte_get_timer_cycles();
-		diff = end_time - cur_time;
-	}
-
-	if (watermark_old != 32 ) {
-		printf(" watermark was not updated (wm=%u)\n",
-		       watermark_old);
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-test_live_watermark_change(void)
-{
-	unsigned lcore_id = rte_lcore_id();
-	unsigned lcore_id2 = rte_get_next_lcore(lcore_id, 0, 1);
-
-	printf("Test watermark live modification\n");
-	rte_ring_set_water_mark(r, 16);
-
-	/* launch a thread that will enqueue and dequeue, checking
-	 * watermark and quota */
-	rte_eal_remote_launch(check_live_watermark_change, NULL, lcore_id2);
-
-	rte_delay_ms(100);
-	rte_ring_set_water_mark(r, 32);
-	rte_delay_ms(100);
-
-	if (rte_eal_wait_lcore(lcore_id2) < 0)
-		return -1;
-
-	return 0;
-}
-
-/* Test for catch on invalid watermark values */
-static int
-test_set_watermark( void ){
-	unsigned count;
-	int setwm;
-
-	struct rte_ring *r = rte_ring_lookup("test_ring_basic_ex");
-	if(r == NULL){
-		printf( " ring lookup failed\n" );
-		goto error;
-	}
-	count = r->size * 2;
-	setwm = rte_ring_set_water_mark(r, count);
-	if (setwm != -EINVAL){
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-
-	count = 0;
-	rte_ring_set_water_mark(r, count);
-	if (r->watermark != r->size) {
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-	return 0;
-
-error:
-	return -1;
-}
-
 /*
  * helper routine for test_ring_basic
  */
@@ -418,8 +286,7 @@ test_ring_basic(void)
 	cur_src = src;
 	cur_dst = dst;
 
-	printf("test watermark and default bulk enqueue / dequeue\n");
-	rte_ring_set_water_mark(r, 20);
+	printf("test default bulk enqueue / dequeue\n");
 	num_elems = 16;
 
 	cur_src = src;
@@ -433,8 +300,8 @@ test_ring_basic(void)
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != -EDQUOT) {
-		printf("Watermark not exceeded\n");
+	if (ret != 0) {
+		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
@@ -930,16 +797,6 @@ test_ring(void)
 		return -1;
 
 	/* basic operations */
-	if (test_live_watermark_change() < 0)
-		return -1;
-
-	if ( test_set_watermark() < 0){
-		printf ("Test failed to detect invalid parameter\n");
-		return -1;
-	}
-	else
-		printf ( "Test detected forced bad watermark values\n");
-
 	if ( test_create_count_odd() < 0){
 			printf ("Test failed to detect odd count\n");
 			return -1;
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index c69ca8f..4e748dc 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -118,6 +118,8 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
+  * removed the function ``rte_ring_set_water_mark`` as part of a general
+    removal of watermarks support in the library.
 
 ABI Changes
 -----------
diff --git a/examples/Makefile b/examples/Makefile
index da2bfdd..19cd5ad 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -81,7 +81,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += packet_ordering
 DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += qos_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += qos_sched
-DIRS-y += quota_watermark
+#DIRS-y += quota_watermark
 DIRS-$(CONFIG_RTE_ETHDEV_RXTX_CALLBACKS) += rxtx_callbacks
 DIRS-y += skeleton
 ifeq ($(CONFIG_RTE_LIBRTE_HASH),y)
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 90ee63f..18fb644 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -138,7 +138,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
 	r->size = count;
@@ -256,24 +255,6 @@ rte_ring_free(struct rte_ring *r)
 	rte_free(te);
 }
 
-/*
- * change the high water mark. If *count* is 0, water marking is
- * disabled
- */
-int
-rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
-{
-	if (count >= r->size)
-		return -EINVAL;
-
-	/* if count is 0, disable the watermarking */
-	if (count == 0)
-		count = r->size;
-
-	r->watermark = count;
-	return 0;
-}
-
 /* dump the status of the ring on the console */
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
@@ -287,10 +268,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->watermark == r->size)
-		fprintf(f, "  watermark=0\n");
-	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 0f95c84..e5fc751 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -148,7 +148,6 @@ struct rte_ring {
 			/**< Memzone, if any, containing the rte_ring */
 	uint32_t size;           /**< Size of ring. */
 	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_ht_ptr prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
@@ -163,7 +162,6 @@ struct rte_ring {
 
 #define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
 #define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
@@ -269,26 +267,6 @@ struct rte_ring *rte_ring_create(const char *name, unsigned count,
 void rte_ring_free(struct rte_ring *r);
 
 /**
- * Change the high water mark.
- *
- * If *count* is 0, water marking is disabled. Otherwise, it is set to the
- * *count* value. The *count* value must be greater than 0 and less
- * than the ring size.
- *
- * This function can be called at any time (not necessarily at
- * initialization).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param count
- *   The new water mark value.
- * @return
- *   - 0: Success; water mark changed.
- *   - -EINVAL: Invalid water mark value.
- */
-int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
-
-/**
  * Dump the status of the ring to a file.
  *
  * @param f
@@ -369,8 +347,6 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -385,7 +361,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	int success;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -426,13 +401,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-				(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	/*
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
@@ -441,7 +409,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -460,8 +428,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -474,7 +440,6 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_next, free_entries;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	prod_head = r->prod.head;
 	cons_tail = r->cons.tail;
@@ -503,15 +468,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-			(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -677,8 +635,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -699,8 +655,6 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -725,8 +679,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -751,8 +703,6 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -770,8 +720,6 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -793,8 +741,6 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 07/14] ring: make bulk and burst fn return vals consistent
                     ` (4 preceding siblings ...)
  2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 06/14] ring: remove watermark support Bruce Richardson
@ 2017-02-23 17:24  2% ` Bruce Richardson
  2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
    7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:24 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

The bulk fns for rings returns 0 for all elements enqueued and negative
for no space. Change that to make them consistent with the burst functions
in returning the number of elements enqueued/dequeued, i.e. 0 or N.
This change also allows the return value from enq/deq to be used directly
without a branch for error checking.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/test-pipeline/pipeline_hash.c                  |   2 +-
 app/test-pipeline/runtime.c                        |   8 +-
 app/test/test_ring.c                               |  46 +++++----
 app/test/test_ring_perf.c                          |   8 +-
 doc/guides/rel_notes/release_17_05.rst             |  11 +++
 doc/guides/sample_app_ug/server_node_efd.rst       |   2 +-
 examples/load_balancer/runtime.c                   |  16 ++-
 .../client_server_mp/mp_client/client.c            |   8 +-
 .../client_server_mp/mp_server/main.c              |   2 +-
 examples/qos_sched/app_thread.c                    |   8 +-
 examples/server_node_efd/node/node.c               |   2 +-
 examples/server_node_efd/server/main.c             |   2 +-
 lib/librte_mempool/rte_mempool_ring.c              |  12 ++-
 lib/librte_ring/rte_ring.h                         | 109 +++++++--------------
 14 files changed, 106 insertions(+), 130 deletions(-)

diff --git a/app/test-pipeline/pipeline_hash.c b/app/test-pipeline/pipeline_hash.c
index 10d2869..1ac0aa8 100644
--- a/app/test-pipeline/pipeline_hash.c
+++ b/app/test-pipeline/pipeline_hash.c
@@ -547,6 +547,6 @@ app_main_loop_rx_metadata(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index 42a6142..4e20669 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -98,7 +98,7 @@ app_main_loop_rx(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -123,7 +123,7 @@ app_main_loop_worker(void) {
 			(void **) worker_mbuf->array,
 			app.burst_size_worker_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		do {
@@ -131,7 +131,7 @@ app_main_loop_worker(void) {
 				app.rings_tx[i ^ 1],
 				(void **) worker_mbuf->array,
 				app.burst_size_worker_write);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -152,7 +152,7 @@ app_main_loop_tx(void) {
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
 			app.burst_size_tx_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		n_mbufs += app.burst_size_tx_read;
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index 666a451..112433b 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -117,20 +117,18 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
 		printf("%s: iteration %u, random shift: %u;\n",
 		    __func__, i, rand);
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rand));
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rand));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand) != 0);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
 
 		/* fill the ring */
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rsz));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz) != 0);
 		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
 		TEST_RING_VERIFY(rsz == rte_ring_count(r));
 		TEST_RING_VERIFY(rte_ring_full(r));
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rsz));
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -171,37 +169,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -217,37 +215,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -264,11 +262,11 @@ test_ring_basic(void)
 	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
 		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 		cur_src += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 		cur_dst += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 	}
 
@@ -294,25 +292,25 @@ test_ring_basic(void)
 
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue2\n");
 		goto fail;
 	}
diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index 320c20c..8ccbdef 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -195,13 +195,13 @@ enqueue_bulk(void *p)
 
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_sp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_mp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mp_end = rte_rdtsc();
 
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4e748dc..2b11765 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -120,6 +120,17 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
   * removed the function ``rte_ring_set_water_mark`` as part of a general
     removal of watermarks support in the library.
+  * changed the return value of the enqueue and dequeue bulk functions to
+    match that of the burst equivalents. In all cases, ring functions which
+    operate on multiple packets now return the number of elements enqueued
+    or dequeued, as appropriate. The updated functions are:
+
+    - ``rte_ring_mp_enqueue_bulk``
+    - ``rte_ring_sp_enqueue_bulk``
+    - ``rte_ring_enqueue_bulk``
+    - ``rte_ring_mc_dequeue_bulk``
+    - ``rte_ring_sc_dequeue_bulk``
+    - ``rte_ring_dequeue_bulk``
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/server_node_efd.rst b/doc/guides/sample_app_ug/server_node_efd.rst
index 9b69cfe..e3a63c8 100644
--- a/doc/guides/sample_app_ug/server_node_efd.rst
+++ b/doc/guides/sample_app_ug/server_node_efd.rst
@@ -286,7 +286,7 @@ repeated infinitely.
 
         cl = &nodes[node];
         if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-                cl_rx_buf[node].count) != 0){
+                cl_rx_buf[node].count) != cl_rx_buf[node].count){
             for (j = 0; j < cl_rx_buf[node].count; j++)
                 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
             cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 6944325..82b10bc 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -146,7 +146,7 @@ app_lcore_io_rx_buffer_to_send (
 		(void **) lp->rx.mbuf_out[worker].array,
 		bsz);
 
-	if (unlikely(ret == -ENOBUFS)) {
+	if (unlikely(ret == 0)) {
 		uint32_t k;
 		for (k = 0; k < bsz; k ++) {
 			struct rte_mbuf *m = lp->rx.mbuf_out[worker].array[k];
@@ -312,7 +312,7 @@ app_lcore_io_rx_flush(struct app_lcore_params_io *lp, uint32_t n_workers)
 			(void **) lp->rx.mbuf_out[worker].array,
 			lp->rx.mbuf_out[worker].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->rx.mbuf_out[worker].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->rx.mbuf_out[worker].array[k];
@@ -349,9 +349,8 @@ app_lcore_io_tx(
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
 				bsz_rd);
 
-			if (unlikely(ret == -ENOENT)) {
+			if (unlikely(ret == 0))
 				continue;
-			}
 
 			n_mbufs += bsz_rd;
 
@@ -505,9 +504,8 @@ app_lcore_worker(
 			(void **) lp->mbuf_in.array,
 			bsz_rd);
 
-		if (unlikely(ret == -ENOENT)) {
+		if (unlikely(ret == 0))
 			continue;
-		}
 
 #if APP_WORKER_DROP_ALL_PACKETS
 		for (j = 0; j < bsz_rd; j ++) {
@@ -559,7 +557,7 @@ app_lcore_worker(
 
 #if APP_STATS
 			lp->rings_out_iters[port] ++;
-			if (ret == 0) {
+			if (ret > 0) {
 				lp->rings_out_count[port] += 1;
 			}
 			if (lp->rings_out_iters[port] == APP_STATS){
@@ -572,7 +570,7 @@ app_lcore_worker(
 			}
 #endif
 
-			if (unlikely(ret == -ENOBUFS)) {
+			if (unlikely(ret == 0)) {
 				uint32_t k;
 				for (k = 0; k < bsz_wr; k ++) {
 					struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
@@ -609,7 +607,7 @@ app_lcore_worker_flush(struct app_lcore_params_worker *lp)
 			(void **) lp->mbuf_out[port].array,
 			lp->mbuf_out[port].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->mbuf_out[port].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index d4f9ca3..dca9eb9 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -276,14 +276,10 @@ main(int argc, char *argv[])
 	printf("[Press Ctrl-C to quit ...]\n");
 
 	for (;;) {
-		uint16_t i, rx_pkts = PKT_READ_SIZE;
+		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		/* try dequeuing max possible packets first, if that fails, get the
-		 * most we can. Loop body should only execute once, maximum */
-		while (rx_pkts > 0 &&
-				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts, rx_pkts) != 0))
-			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring), PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/multi_process/client_server_mp/mp_server/main.c b/examples/multi_process/client_server_mp/mp_server/main.c
index a6dc12d..19c95b2 100644
--- a/examples/multi_process/client_server_mp/mp_server/main.c
+++ b/examples/multi_process/client_server_mp/mp_server/main.c
@@ -227,7 +227,7 @@ flush_rx_queue(uint16_t client)
 
 	cl = &clients[client];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[client].buffer,
-			cl_rx_buf[client].count) != 0){
+			cl_rx_buf[client].count) == 0){
 		for (j = 0; j < cl_rx_buf[client].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[client].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[client].count;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 70fdcdb..dab4594 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -107,7 +107,7 @@ app_rx_thread(struct thread_conf **confs)
 			}
 
 			if (unlikely(rte_ring_sp_enqueue_bulk(conf->rx_ring,
-								(void **)rx_mbufs, nb_rx) != 0)) {
+					(void **)rx_mbufs, nb_rx) == 0)) {
 				for(i = 0; i < nb_rx; i++) {
 					rte_pktmbuf_free(rx_mbufs[i]);
 
@@ -180,7 +180,7 @@ app_tx_thread(struct thread_conf **confs)
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
 					burst_conf.qos_dequeue);
-		if (likely(retval == 0)) {
+		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
 			conf->counter = 0; /* reset empty read loop counter */
@@ -230,7 +230,9 @@ app_worker_thread(struct thread_conf **confs)
 		nb_pkt = rte_sched_port_dequeue(conf->sched_port, mbufs,
 					burst_conf.qos_dequeue);
 		if (likely(nb_pkt > 0))
-			while (rte_ring_sp_enqueue_bulk(conf->tx_ring, (void **)mbufs, nb_pkt) != 0);
+			while (rte_ring_sp_enqueue_bulk(conf->tx_ring,
+					(void **)mbufs, nb_pkt) == 0)
+				; /* empty body */
 
 		conf_idx++;
 		if (confs[conf_idx] == NULL)
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index a6c0c70..9ec6a05 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) != 0))
+					rx_pkts) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/examples/server_node_efd/server/main.c b/examples/server_node_efd/server/main.c
index 1a54d1b..3eb7fac 100644
--- a/examples/server_node_efd/server/main.c
+++ b/examples/server_node_efd/server/main.c
@@ -247,7 +247,7 @@ flush_rx_queue(uint16_t node)
 
 	cl = &nodes[node];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-			cl_rx_buf[node].count) != 0){
+			cl_rx_buf[node].count) != cl_rx_buf[node].count){
 		for (j = 0; j < cl_rx_buf[node].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index b9aa64d..409b860 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -42,26 +42,30 @@ static int
 common_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_mp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_sp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_mc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_sc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index e5fc751..6712f1f 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -344,14 +344,10 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -383,7 +379,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOBUFS;
+				return 0;
 			else {
 				/* No free entry available */
 				if (unlikely(free_entries == 0))
@@ -409,7 +405,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -425,14 +421,10 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -452,7 +444,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOBUFS;
+			return 0;
 		else {
 			/* No free entry available */
 			if (unlikely(free_entries == 0))
@@ -469,7 +461,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -490,16 +482,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
 
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -531,7 +518,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOENT;
+				return 0;
 			else {
 				if (unlikely(entries == 0))
 					return 0;
@@ -557,7 +544,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	r->cons.tail = cons_next;
 
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -575,15 +562,10 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -602,7 +584,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	if (n > entries) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOENT;
+			return 0;
 		else {
 			if (unlikely(entries == 0))
 				return 0;
@@ -618,7 +600,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -634,10 +616,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -654,10 +635,9 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -678,10 +658,9 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned n)
 {
@@ -708,7 +687,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 static inline int __attribute__((always_inline))
 rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_mp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -725,7 +704,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_sp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -746,10 +725,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_enqueue(struct rte_ring *r, void *obj)
 {
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue(r, obj);
-	else
-		return rte_ring_mp_enqueue(r, obj);
+	return rte_ring_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -765,11 +741,9 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -786,11 +760,9 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects to dequeue from the ring to the obj_table,
  *   must be strictly positive.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -810,11 +782,9 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	if (r->cons.sc_dequeue)
@@ -841,7 +811,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -859,7 +829,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -881,10 +851,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue(r, obj_p);
-	else
-		return rte_ring_mc_dequeue(r, obj_p);
+	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 09/14] ring: allow dequeue fns to return remaining entry count
                     ` (5 preceding siblings ...)
  2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
@ 2017-02-23 17:24  2% ` Bruce Richardson
    7 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-23 17:24 UTC (permalink / raw)
  To: olivier.matz; +Cc: dev, Bruce Richardson

Add an extra parameter to the ring dequeue burst/bulk functions so that
those functions can optionally return the amount of remaining objs in the
ring. This information can be used by applications in a number of ways,
for instance, with single-consumer queues, it provides a max
dequeue size which is guaranteed to work.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/pdump/main.c                                   |  2 +-
 app/test-pipeline/runtime.c                        |  6 +-
 app/test/test_link_bonding_mode4.c                 |  3 +-
 app/test/test_pmd_ring_perf.c                      |  7 +-
 app/test/test_ring.c                               | 54 ++++++-------
 app/test/test_ring_perf.c                          | 20 +++--
 app/test/test_table_acl.c                          |  2 +-
 app/test/test_table_pipeline.c                     |  2 +-
 app/test/test_table_ports.c                        |  8 +-
 app/test/virtual_pmd.c                             |  4 +-
 doc/guides/rel_notes/release_17_05.rst             |  8 ++
 drivers/crypto/null/null_crypto_pmd.c              |  2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c             |  3 +-
 drivers/net/ring/rte_eth_ring.c                    |  2 +-
 examples/distributor/main.c                        |  2 +-
 examples/load_balancer/runtime.c                   |  6 +-
 .../client_server_mp/mp_client/client.c            |  3 +-
 examples/packet_ordering/main.c                    |  6 +-
 examples/qos_sched/app_thread.c                    |  6 +-
 examples/quota_watermark/qw/main.c                 |  5 +-
 examples/server_node_efd/node/node.c               |  2 +-
 lib/librte_hash/rte_cuckoo_hash.c                  |  3 +-
 lib/librte_mempool/rte_mempool_ring.c              |  4 +-
 lib/librte_port/rte_port_frag.c                    |  3 +-
 lib/librte_port/rte_port_ring.c                    |  6 +-
 lib/librte_ring/rte_ring.h                         | 90 +++++++++++-----------
 26 files changed, 145 insertions(+), 114 deletions(-)

diff --git a/app/pdump/main.c b/app/pdump/main.c
index b88090d..3b13753 100644
--- a/app/pdump/main.c
+++ b/app/pdump/main.c
@@ -496,7 +496,7 @@ pdump_rxtx(struct rte_ring *ring, uint8_t vdev_id, struct pdump_stats *stats)
 
 	/* first dequeue packets from ring of primary process */
 	const uint16_t nb_in_deq = rte_ring_dequeue_burst(ring,
-			(void *)rxtx_bufs, BURST_SIZE);
+			(void *)rxtx_bufs, BURST_SIZE, NULL);
 	stats->dequeue_pkts += nb_in_deq;
 
 	if (nb_in_deq) {
diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c
index c06ff54..8970e1c 100644
--- a/app/test-pipeline/runtime.c
+++ b/app/test-pipeline/runtime.c
@@ -121,7 +121,8 @@ app_main_loop_worker(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_rx[i],
 			(void **) worker_mbuf->array,
-			app.burst_size_worker_read);
+			app.burst_size_worker_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
@@ -151,7 +152,8 @@ app_main_loop_tx(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_tx[i],
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
-			app.burst_size_tx_read);
+			app.burst_size_tx_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
diff --git a/app/test/test_link_bonding_mode4.c b/app/test/test_link_bonding_mode4.c
index 8df28b4..15091b1 100644
--- a/app/test/test_link_bonding_mode4.c
+++ b/app/test/test_link_bonding_mode4.c
@@ -193,7 +193,8 @@ static uint8_t lacpdu_rx_count[RTE_MAX_ETHPORTS] = {0, };
 static int
 slave_get_pkts(struct slave_conf *slave, struct rte_mbuf **buf, uint16_t size)
 {
-	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf, size);
+	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf,
+			size, NULL);
 }
 
 /*
diff --git a/app/test/test_pmd_ring_perf.c b/app/test/test_pmd_ring_perf.c
index 045a7f2..004882a 100644
--- a/app/test/test_pmd_ring_perf.c
+++ b/app/test/test_pmd_ring_perf.c
@@ -67,7 +67,7 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t eth_start = rte_rdtsc();
@@ -99,7 +99,7 @@ test_single_enqueue_dequeue(void)
 	rte_compiler_barrier();
 	for (i = 0; i < iterations; i++) {
 		rte_ring_enqueue_bulk(r, &burst, 1, NULL);
-		rte_ring_dequeue_bulk(r, &burst, 1);
+		rte_ring_dequeue_bulk(r, &burst, 1, NULL);
 	}
 	const uint64_t sc_end = rte_rdtsc_precise();
 	rte_compiler_barrier();
@@ -133,7 +133,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, (void *)burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, (void *)burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, (void *)burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
diff --git a/app/test/test_ring.c b/app/test/test_ring.c
index b0ca88b..858ebc1 100644
--- a/app/test/test_ring.c
+++ b/app/test/test_ring.c
@@ -119,7 +119,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		    __func__, i, rand);
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
 				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
+				NULL) == rand);
 
 		/* fill the ring */
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
@@ -129,7 +130,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
+				NULL) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -186,19 +188,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -232,19 +234,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -265,7 +267,7 @@ test_ring_basic(void)
 		cur_src += MAX_BULK;
 		if (ret == 0)
 			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if (ret == 0)
 			goto fail;
@@ -303,13 +305,13 @@ test_ring_basic(void)
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue2\n");
@@ -390,19 +392,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1) ;
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -451,19 +453,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -505,19 +507,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -539,7 +541,7 @@ test_ring_burst_basic(void)
 		cur_src += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
@@ -578,19 +580,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -613,7 +615,7 @@ test_ring_burst_basic(void)
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret != 2)
 		goto fail;
@@ -753,7 +755,7 @@ test_ring_basic_ex(void)
 		goto fail_test;
 	}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2);
+	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
 	if (ret != 2) {
 		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
 		goto fail_test;
diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c
index f95a8e9..ed89896 100644
--- a/app/test/test_ring_perf.c
+++ b/app/test/test_ring_perf.c
@@ -152,12 +152,12 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t mc_end = rte_rdtsc();
 
 	printf("SC empty dequeue: %.2F\n",
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
@@ -325,7 +325,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -333,7 +334,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
@@ -361,7 +363,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -369,7 +372,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
diff --git a/app/test/test_table_acl.c b/app/test/test_table_acl.c
index b3bfda4..4d43be7 100644
--- a/app/test/test_table_acl.c
+++ b/app/test/test_table_acl.c
@@ -713,7 +713,7 @@ test_pipeline_single_filter(int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0) {
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/app/test/test_table_pipeline.c b/app/test/test_table_pipeline.c
index 36bfeda..b58aa5d 100644
--- a/app/test/test_table_pipeline.c
+++ b/app/test/test_table_pipeline.c
@@ -494,7 +494,7 @@ test_pipeline_single_filter(int test_type, int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0)
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/app/test/test_table_ports.c b/app/test/test_table_ports.c
index 395f4f3..39592ce 100644
--- a/app/test/test_table_ports.c
+++ b/app/test/test_table_ports.c
@@ -163,7 +163,7 @@ test_port_ring_writer(void)
 	rte_port_ring_writer_ops.f_flush(port);
 	expected_pkts = 1;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -7;
@@ -178,7 +178,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -193,7 +193,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -208,7 +208,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -9;
diff --git a/app/test/virtual_pmd.c b/app/test/virtual_pmd.c
index 39e070c..b209355 100644
--- a/app/test/virtual_pmd.c
+++ b/app/test/virtual_pmd.c
@@ -342,7 +342,7 @@ virtual_ethdev_rx_burst_success(void *queue __rte_unused,
 	dev_private = vrtl_eth_dev->data->dev_private;
 
 	rx_count = rte_ring_dequeue_burst(dev_private->rx_queue, (void **) bufs,
-			nb_pkts);
+			nb_pkts, NULL);
 
 	/* increments ipackets count */
 	dev_private->eth_stats.ipackets += rx_count;
@@ -508,7 +508,7 @@ virtual_ethdev_get_mbufs_from_tx_queue(uint8_t port_id,
 
 	dev_private = vrtl_eth_dev->data->dev_private;
 	return rte_ring_dequeue_burst(dev_private->tx_queue, (void **)pkt_burst,
-		burst_length);
+		burst_length, NULL);
 }
 
 static uint8_t
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 249ad6e..563a74c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -123,6 +123,8 @@ API Changes
   * added an extra parameter to the burst/bulk enqueue functions to
     return the number of free spaces in the ring after enqueue. This can
     be used by an application to implement its own watermark functionality.
+  * added an extra parameter to the burst/bulk dequeue functions to return
+    the number elements remaining in the ring after dequeue.
   * changed the return value of the enqueue and dequeue bulk functions to
     match that of the burst equivalents. In all cases, ring functions which
     operate on multiple packets now return the number of elements enqueued
@@ -135,6 +137,12 @@ API Changes
     - ``rte_ring_sc_dequeue_bulk``
     - ``rte_ring_dequeue_bulk``
 
+    NOTE: the above functions all have different parameters as well as
+    different return values, due to the other listed changes above. This
+    means that all instances of the functions in existing code will be
+    flagged by the compiler. The return value usage should be checked
+    while fixing the compiler error due to the extra parameter.
+
 ABI Changes
 -----------
 
diff --git a/drivers/crypto/null/null_crypto_pmd.c b/drivers/crypto/null/null_crypto_pmd.c
index ed5a9fc..f68ec8d 100644
--- a/drivers/crypto/null/null_crypto_pmd.c
+++ b/drivers/crypto/null/null_crypto_pmd.c
@@ -155,7 +155,7 @@ null_crypto_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
 	unsigned nb_dequeued;
 
 	nb_dequeued = rte_ring_dequeue_burst(qp->processed_pkts,
-			(void **)ops, nb_ops);
+			(void **)ops, nb_ops, NULL);
 	qp->qp_stats.dequeued_count += nb_dequeued;
 
 	return nb_dequeued;
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index f3ac9e2..96638af 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1008,7 +1008,8 @@ bond_ethdev_tx_burst_8023ad(void *queue, struct rte_mbuf **bufs,
 		struct port *port = &mode_8023ad_ports[slaves[i]];
 
 		slave_slow_nb_pkts[i] = rte_ring_dequeue_burst(port->tx_ring,
-				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS);
+				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS,
+				NULL);
 		slave_nb_pkts[i] = slave_slow_nb_pkts[i];
 
 		for (j = 0; j < slave_slow_nb_pkts[i]; j++)
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index adbf478..77ef3a1 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -88,7 +88,7 @@ eth_ring_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
 	void **ptrs = (void *)&bufs[0];
 	struct ring_queue *r = q;
 	const uint16_t nb_rx = (uint16_t)rte_ring_dequeue_burst(r->rng,
-			ptrs, nb_bufs);
+			ptrs, nb_bufs, NULL);
 	if (r->rng->flags & RING_F_SC_DEQ)
 		r->rx_pkts.cnt += nb_rx;
 	else
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index cfd360b..5cb6185 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -330,7 +330,7 @@ lcore_tx(struct rte_ring *in_r)
 
 			struct rte_mbuf *bufs[BURST_SIZE];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE, NULL);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 1645994..8192c08 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -349,7 +349,8 @@ app_lcore_io_tx(
 			ret = rte_ring_sc_dequeue_bulk(
 				ring,
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
-				bsz_rd);
+				bsz_rd,
+				NULL);
 
 			if (unlikely(ret == 0))
 				continue;
@@ -504,7 +505,8 @@ app_lcore_worker(
 		ret = rte_ring_sc_dequeue_bulk(
 			ring_in,
 			(void **) lp->mbuf_in.array,
-			bsz_rd);
+			bsz_rd,
+			NULL);
 
 		if (unlikely(ret == 0))
 			continue;
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index dca9eb9..01b535c 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -279,7 +279,8 @@ main(int argc, char *argv[])
 		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts,
+				PKT_READ_SIZE, NULL);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index d268350..7719dad 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -462,7 +462,7 @@ worker_thread(void *args_ptr)
 
 		/* dequeue the mbufs from rx_to_workers ring */
 		burst_size = rte_ring_dequeue_burst(ring_in,
-				(void *)burst_buffer, MAX_PKTS_BURST);
+				(void *)burst_buffer, MAX_PKTS_BURST, NULL);
 		if (unlikely(burst_size == 0))
 			continue;
 
@@ -510,7 +510,7 @@ send_thread(struct send_thread_args *args)
 
 		/* deque the mbufs from workers_to_tx ring */
 		nb_dq_mbufs = rte_ring_dequeue_burst(args->ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(nb_dq_mbufs == 0))
 			continue;
@@ -595,7 +595,7 @@ tx_thread(struct rte_ring *ring_in)
 
 		/* deque the mbufs from workers_to_tx ring */
 		dqnum = rte_ring_dequeue_burst(ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(dqnum == 0))
 			continue;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 0c81a15..15f117f 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -179,7 +179,7 @@ app_tx_thread(struct thread_conf **confs)
 
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
-					burst_conf.qos_dequeue);
+					burst_conf.qos_dequeue, NULL);
 		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
@@ -218,7 +218,7 @@ app_worker_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
@@ -254,7 +254,7 @@ app_mixed_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
diff --git a/examples/quota_watermark/qw/main.c b/examples/quota_watermark/qw/main.c
index 57df8ef..2dcddea 100644
--- a/examples/quota_watermark/qw/main.c
+++ b/examples/quota_watermark/qw/main.c
@@ -247,7 +247,8 @@ pipeline_stage(__attribute__((unused)) void *args)
 			}
 
 			/* Dequeue up to quota mbuf from rx */
-			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota);
+			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts,
+					*quota, NULL);
 			if (unlikely(nb_dq_pkts < 0))
 				continue;
 
@@ -305,7 +306,7 @@ send_stage(__attribute__((unused)) void *args)
 
 			/* Dequeue packets from tx and send them */
 			nb_dq_pkts = (uint16_t) rte_ring_dequeue_burst(tx,
-					(void *) tx_pkts, *quota);
+					(void *) tx_pkts, *quota, NULL);
 			rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
 
 			/* TODO: Check if nb_dq_pkts == nb_tx_pkts? */
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index 9ec6a05..f780b92 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) == 0))
+					rx_pkts, NULL) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 6552199..645c0cf 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -536,7 +536,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
 			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
-					cached_free_slots->objs, LCORE_CACHE_SIZE);
+					cached_free_slots->objs,
+					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0)
 				return -ENOSPC;
 
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index 9b8fd2b..5c132bf 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -58,14 +58,14 @@ static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_mc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_sc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_port/rte_port_frag.c b/lib/librte_port/rte_port_frag.c
index 0fcace9..320407e 100644
--- a/lib/librte_port/rte_port_frag.c
+++ b/lib/librte_port/rte_port_frag.c
@@ -186,7 +186,8 @@ rte_port_ring_reader_frag_rx(void *port,
 		/* If "pkts" buffer is empty, read packet burst from ring */
 		if (p->n_pkts == 0) {
 			p->n_pkts = rte_ring_sc_dequeue_burst(p->ring,
-				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX);
+				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX,
+				NULL);
 			RTE_PORT_RING_READER_FRAG_STATS_PKTS_IN_ADD(p, p->n_pkts);
 			if (p->n_pkts == 0)
 				return n_pkts_out;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 9fadac7..492b0e7 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -111,7 +111,8 @@ rte_port_ring_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts)
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
@@ -124,7 +125,8 @@ rte_port_ring_multi_reader_rx(void *port, struct rte_mbuf **pkts,
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index b5a995e..afd5367 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -483,7 +483,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -492,11 +493,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	unsigned int i;
 	uint32_t mask = r->mask;
 
-	/* Avoid the unnecessary cmpset operation below, which is also
-	 * potentially harmful when n equals 0. */
-	if (n == 0)
-		return 0;
-
 	/* move cons.head atomically */
 	do {
 		/* Restore n as it may change every loop */
@@ -511,15 +507,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		entries = (prod_tail - cons_head);
 
 		/* Set the actual entries for dequeue */
-		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED)
-				return 0;
-			else {
-				if (unlikely(entries == 0))
-					return 0;
-				n = entries;
-			}
-		}
+		if (n > entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+		if (unlikely(n == 0))
+			goto end;
 
 		cons_next = cons_head + n;
 		success = rte_atomic32_cmpset(&r->cons.head, cons_head,
@@ -538,7 +530,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		rte_pause();
 
 	r->cons.tail = cons_next;
-
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -562,7 +556,8 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  */
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -577,15 +572,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * and size(ring)-1. */
 	entries = prod_tail - cons_head;
 
-	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED)
-			return 0;
-		else {
-			if (unlikely(entries == 0))
-				return 0;
-			n = entries;
-		}
-	}
+	if (n > entries)
+		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+	if (unlikely(entries == 0))
+		goto end;
 
 	cons_next = cons_head + n;
 	r->cons.head = cons_next;
@@ -595,6 +586,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -741,9 +735,11 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -760,9 +756,11 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -782,12 +780,13 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
+		unsigned int *available)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
 }
 
 /**
@@ -808,7 +807,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -826,7 +825,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -848,7 +847,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -1038,9 +1037,11 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1058,9 +1059,11 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1080,12 +1083,13 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - Number of objects dequeued
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
 }
 
 #ifdef __cplusplus
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-22 13:12  7%   ` Christian Ehrhardt
  2017-02-22 13:24 20%     ` [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version Christian Ehrhardt
@ 2017-02-23 18:48  4%     ` Ferruh Yigit
  2017-02-24  7:32  8%       ` Christian Ehrhardt
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-02-23 18:48 UTC (permalink / raw)
  To: Christian Ehrhardt, Jan Blunck
  Cc: dev, cjcollier, ricardo.salveti, Luca Boccassi

On 2/22/2017 1:12 PM, Christian Ehrhardt wrote:
> On Tue, Feb 14, 2017 at 9:31 PM, Jan Blunck <jblunck@infradead.org> wrote:
> 
>>> 1. Downstreams to insert Major version into soname
>>> Distributions could insert the DPDK major version (like 16.11) into the
>>> soname and package names. A common example of this is libboost [5].
>>> That would perfectly allow 16.07.<LIBABIVER> to coexist with
>>> 16.11.<LIBABIVER> even if for a given library LIBABIVER did not change.
>>> Yet it would mean that anything depending on the old library will have to
>>> be recompiled to pick up the new code, even if it depends on an ABI that
>> is
>>> still present in the new release.
>>> Also - not a technical reason - but it is clearly more work to force
>> update
>>> all dependencies and clean out old packages for every release.
>>
>> Actually this isn't exactly what I proposed during the summit. Just
>> keep it simple and fix the ABI version of all libraries at 16.11.0.
>> This is a proven approach and has been used for years with different
>> libraries.
> 
> 
> Since there was no other response I'll try to wrap up.
> 
> Yes #1 also is my preferred solution at the moment.
> We tried with individual following the tracking of LIBABIVER upstream but
> as outlined before we hit too many issues.
> I discussed it in the deb_dpdk group which acked as well to use this as
> general approach.
> The other options have too obvious flaws as I listed on my initial report
> and - thanks btw - you added a few more.
> 
> @Bruce - sorry I don't think dropping config options is the solution. Yet
> my suggestion does not prevent you from doing so.

Hi Christian,

Can you please describe this option more?

Does is mean for each DPDK release, distro will release all libraries?

For 16.07:
acl.2, eal.2, ethdev.4, pdump.1

For 16.11:
acl.2, eal.3, ethdev.5, pdump.1

Will dpdk package have following packages:
acl.16.07.2, eal.16.07.2, ethdev.16.07.4, pdump.16.07.1
acl.16.11.2, eal.16.11.3, ethdev.16.11.5, pdump.16.11.1

And for initial OVS usecase, will it be:

OVS
 +---> eal.16.07.2
 +---> pdump.16.11.1
        +---> eal.16.11.3


Assuming above understanding is correct J :

- If same version of the library will be delivered for each DPDK
release, what is the benefit of having fine grained libraries really?

- Above OVS usage still does not look right, I don't believe this is the
intention when library level dependency resolving introduced.

Overall I am for single library, but I can see the benefit of having
multiple small libraries, that is why I vote for option 4 in your
initial mail.

And I agree this can cause problem if not automated, but we already know
the library dependencies, I think a script can be developed to warn a
least, and they can be updated manually.

And isn't the purpose of increasing LIBABIVER to notify application that
library is modified and can't be used with that app anymore.
For DPDK, even if the library is not changed, if another library that it
depends modified, this may mean the behavior of the library may be
changed, so it makes sense to me if library notifies the user for this
case, by increasing its version.

Yes this makes effect of increasing a core library version big, but I
believe this is also true, increasing a core library version almost
means increasing dpdk version.

> 
> 
> 
>> You could easily do this independently of us upstream
>> fixing the ABI problems.
> 
> 
> 
> I agree, but I'd like to suggest the mechanism I want to implement.
> An ack by upstream for the Feature to set such a major ABI would be great.
> Actually since it is optional and can help more people integrating DPDK
> getting it accepted upstream be even better.
> 
> I'll send a patch in reply to this thread later today that implements what
> I have in mind.
> 
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] Further fun with ABI tracking
  2017-02-23 18:48  4%     ` [dpdk-dev] Further fun with ABI tracking Ferruh Yigit
@ 2017-02-24  7:32  8%       ` Christian Ehrhardt
  0 siblings, 0 replies; 200+ results
From: Christian Ehrhardt @ 2017-02-24  7:32 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Jan Blunck, dev, cjcollier, ricardo.salveti, Luca Boccassi

On Thu, Feb 23, 2017 at 7:48 PM, Ferruh Yigit <ferruh.yigit@intel.com>
wrote:

> Can you please describe this option more?
>

Of course, I happy about any engagement/discussion to have on that.
Much better than silently staying in queue.


> Does is mean for each DPDK release, distro will release all libraries?
>

First of all it is opt-in. If nobody does change the default setting (="")
to
anything nothing happens.

A distribution _CAN_ use the feature to reliably avoid collisions between
DPDK releases and allow concurrent installations more easily.


> For 16.07:
> acl.2, eal.2, ethdev.4, pdump.1
>
> For 16.11:
> acl.2, eal.3, ethdev.5, pdump.1
>

That example is what we did so far, trying to follow the DPDK ABI
versioning.
But that caused the issue I described with (new)pdump.1->eal.3 and the base
app->eal2


Will dpdk package have following packages:
> acl.16.07.2, eal.16.07.2, ethdev.16.07.4, pdump.16.07.1
> acl.16.11.2, eal.16.11.3, ethdev.16.11.5, pdump.16.11.1
>

I thought on that, but Jan correctly brought up that if we do override we
should
as well trivialize and override fully - ignoring the per lib LIBABIVER.
So it will not be eal.16.11.3 but instead just eal.16.11   (no subversion,
as there
is no need). If that is wanted that can easily be done, please let me know
if I
should run a v2 with that.

And for initial OVS usecase, will it be:
>
> OVS
>  +---> eal.16.07.2
>  +---> pdump.16.11.1
>         +---> eal.16.11.3
>
>
Not quite, the usecase would look like that:
The current DPDK generates LIBABIVER versions: eal.3, pdump.1, ...
OVS
 +---> eal.3
 +---> pdump.1
        +---> eal.3

Note: Packages are initially carried forward from the former Distrobution
release,
so the next release would start as the former has ended:
OVS
 +---> eal.3
 +---> pdump.1
        +---> eal.3

Then the new DPDK would come in and using this feature would then
generate all as major version: eal.17.02, pdump.17.02, ...
But since OVS was not recompiled yet AND there is no collision OVS
would still look like:
OVS
 +---> eal.3
 +---> pdump.1
        +---> eal.3

Then we can recompile OVS and it will become
OVS
 +---> eal.17.02
 +---> pdump.17.02
        +---> eal.17.02

Into the future with more apps depending on DPDK there can be many
scenarios:
1. all are fine rebuilding, there will be no dependency left to the older
dpdk
    and it will be autoremoved after all upgrades
2. some packages are slow to adapt, but that is fine we can still provide
the
    old dependencies at the same time if needed
3. the time in between #2 and #1 is not wreaking havok as the
cross-dependency
    issue is no more


> Assuming above understanding is correct J :
>
> - If same version of the library will be delivered for each DPDK
> release, what is the benefit of having fine grained libraries really?
>

The benefit of the fine grained versioning is for other types of
distributing DPDK.
Recognizing an ABI bump for lib-consuming developers, bundling it directly
with your app, ...



> - Above OVS usage still does not look right, I don't believe this is the
> intention when library level dependency resolving introduced.
>
> Overall I am for single library, but I can see the benefit of having
> multiple small libraries, that is why I vote for option 4 in your
> initial mail.
>

Single library would solve it as well, but as mentioned and you all
remember there
were people with reasons for it that I could not challenge being too far
out of the
application scenarios they had in mind.

And I agree this can cause problem if not automated, but we already know
> the library dependencies, I think a script can be developed to warn a
> least, and they can be updated manually.
>
> And isn't the purpose of increasing LIBABIVER to notify application that
> library is modified and can't be used with that app anymore.
> For DPDK, even if the library is not changed, if another library that it
> depends modified, this may mean the behavior of the library may be
> changed, so it makes sense to me if library notifies the user for this
> case, by increasing its version.
>
> Yes this makes effect of increasing a core library version big, but I
> believe this is also true, increasing a core library version almost
> means increasing dpdk version.
>

Interesting - thanks for sharing your opinion here - I rethought that for a
while now.

While this could work I consider it inferior to the approach I submitted
in the patch yesterday [1] for the following reasons:

- If we bump infecting (looking at the recent history) we most likely end up
  bumping all libraries at least every other release. Now there isn't much
  different in bumping all of them +1 or just using a single increasing
version.
  Except you could miss a few bumps or track it wrong.

- The new Feature is opt-in, allowing those who want to do that major bump;
  But at the same time allowing those who won't to keep on tracking each lib
  individually and build/deliver it that way.

- I learned (often the hard way) that to be different often causes problems
  that are hard to foresee.
  The infecting ABI would be "DPDK is different" again, while the major
  override is somewhat established.

For now I'd suggest taking the opt-in feature as suggested in [1] as a means
for those who need it (like us and maybe more downstreams over time).
If DPDK is evolving to become more stable and develops a feature like
the #4 "infecting-abi-bump + tracking" it can still be picked us later and
by anybody else who wants needs it.
It will then "just" be dropping a config option we set before to get back.


TL;DR: I think DPDK is not stable enough to make option #4 worth
implementing for now to make a difference worth (but would cause
lot of work and error potential). But since my code [1] implementing
approach #1 and a later approach #4 in the future are not mutually
exclusive I'd ask to go for #1 now and #4 later if one needs and
implements it.

[1]: http://dpdk.org/ml/archives/dev/2017-February/058121.html


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements
  2017-02-21  3:17  2% ` [dpdk-dev] [PATCH v7 0/17] distributor library " David Hunt
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
@ 2017-02-24 14:01  0%   ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2017-02-24 14:01 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:36AM +0000, David Hunt wrote:
> This patch aims to improve the throughput of the distributor library.
> 
> It uses a similar handshake mechanism to the previous version of
> the library, in that bits are used to indicate when packets are ready
> to be sent to a worker and ready to be returned from a worker. One main
> difference is that instead of sending one packet in a cache line, it makes
> use of the 7 free spaces in the same cache line in order to send up to
> 8 packets at a time to/from a worker.
> 
> The flow matching algorithm has had significant re-work, and now keeps an
> array of inflight flows and an array of backlog flows, and matches incoming
> flows to the inflight/backlog flows of all workers so that flow pinning to
> workers can be maintained.
> 
> The Flow Match algorithm has both scalar and a vector versions, and a
> function pointer is used to select the post appropriate function at run time,
> depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
> the scalar match function is selected, which should still gives a good boost
> in performance over the non-burst API.
> 
> v2 changes:
>   * Created a common distributor_priv.h header file with common
>     definitions and structures.
>   * Added a scalar version so it can be built and used on machines without
>     sse2 instruction set
>   * Added unit autotests
>   * Added perf autotest

For future reference, I think it's better to put the list of deltas from
each version in reverse order, so that the latest changes are on top,
and save scrolling for those of us who have been tracking the set.

> 
> v3 changes:
>   * Addressed mailing list review comments
>   * Test code removal
>   * Split out SSE match into separate file to facilitate NEON addition
>   * Cleaned up conditional compilation flags for SSE2
>   * Addressed c99 style compilation errors
>   * rebased on latest head (Jan 2 2017, Happy New Year to all)
> 
> v4 changes:
>    * fixed issue building shared libraries
> 
> v5 changes:
>    * Removed some un-needed code around retries in worker API calls
>    * Cleanup due to review comments on mailing list
>    * Cleanup of non-x86 platform compilation, fallback to scalar match
> 
> v6 changes:
>    * Fixed intermittent segfault where num pkts not divisible
>      by BURST_SIZE
>    * Cleanup due to review comments on mailing list
>    * Renamed _priv.h to _private.h.
> 
> v7 changes:
>    * Reorganised patch so there's a more natural progression in the
>      changes, and divided them down into easier to review chunks.
>    * Previous versions of this patch set were effectively two APIs.
>      We now have a single API. Legacy functionality can
>      be used by by using the rte_distributor_create API call with the
>      RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
>    * Added symbol versioning for old API so that ABI is preserved.
> 
The merging to a single API is great to see, making it so much easier
for app developers. Thanks for that.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-21 10:27  0%     ` Hunt, David
@ 2017-02-24 14:03  0%     ` Bruce Richardson
  2017-03-01  9:55  0%       ` Hunt, David
  2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-24 14:03 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:37AM +0000, David Hunt wrote:
> Move files out of the way so that we can replace with new
> versions of the distributor libtrary. Files are named in
> such a way as to match the symbol versioning that we will
> apply for backward ABI compatibility.
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  app/test/test_distributor.c                  |   2 +-
>  app/test/test_distributor_perf.c             |   2 +-
>  examples/distributor/main.c                  |   2 +-
>  lib/librte_distributor/Makefile              |   4 +-
>  lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
>  lib/librte_distributor/rte_distributor.h     | 247 --------------
>  lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++

Rather than changing the unit tests and example applications, I think
this patch would be better with a new rte_distributor.h file which
simply does "#include  <rte_distributor_v20.h>". Alternatively, I
recently upstreamed a patch, which went into 17.02, to allow symlinks in
the folder so you could create a symlink to the renamed file.

/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 2/2] ethdev: add hierarchical scheduler API
  @ 2017-02-24 16:28  1% ` Cristian Dumitrescu
  0 siblings, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2017-02-24 16:28 UTC (permalink / raw)
  To: dev; +Cc: thomas.monjalon, jerin.jacob, hemant.agrawal

This patch introduces the generic ethdev API for the hierarchical scheduler
capability.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow approach)
- Capability query API per port, per hierarchy level and per hierarchy node
- Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ),
  Weighted Round Robin (WRR)
- Traffic shaping: single/dual rate, private (per node) and shared (by multiple
  nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Changes in v2:
- Implemented feedback from Hemant [4]
- Improvements on the capability API
	- Added capability API for hierarchy level
	- Merged stats capability into the capability API
	- Added dynamic updates
	- Added non-leaf/leaf union to the node capability structure
	- Renamed sp_priority_min to sp_n_priorities_max, added clarifications
	- Fixed description for sp_n_children_max
- Clarified and enforced rule on node ID range for leaf and non-leaf nodes
	- Added API functions to get node type (i.e. leaf/non-leaf):
	  get_leaf_nodes(), node_type_get()
- Added clarification for the root node: its creation, its parent, its role
	- Macro NODE_ID_NULL as root node's parent
	- Description of the node_add() and node_parent_update() API functions
- Added clarification for the first time add vs. subsequent updates rule
	- Cleaned up the description for the node_add() function
- Statistics API improvements
	- Merged stats capability into the capability API
	- Added API function node_stats_update()
	- Added more stats per packet color
- Added more error types
- Fixed small Doxygen style issues

Changes in v1 (since RFC [1]):
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below, hopefully
  nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated object
      IDs. IMO the choice to have application-generated object IDs adds marginal
      complexity to the driver (search ID function required), but it provides
      huge simplification for the application. The app does not need to worry
      about building & managing tree-like structure for storing driver-generated
      object IDs, the app can use its own convention for node IDs depending on
      the specific hierarchy that it needs. Trivial example: identify all
      level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
      on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
      310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
      112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
      the other related simplification that was implemented: leaf nodes now have
      predefined IDs that are the same with their Ethernet TX queue ID (
      therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the shaper
  profile as part of node API (no shaper ID needed for private shapers), while
  the shared shapers are configured outside of the node API using shaper profile
  and communicated to the node using shared shaper ID. So there is no
  configuration overhead for shared shapers if the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet TX
  queue ID (therefore no translation is required for leaf nodes). This is also
  used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause (same
  as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
  based on IP packet bytes)

Next steps:
- SW fallback based on librte_sched library (to be later introduced by
  standalone patch set)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
[4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_scheddev.c        |  781 ++++++++++++++++++
 lib/librte_ether/rte_scheddev.h        | 1416 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_scheddev_driver.h |  365 ++++++++
 6 files changed, 2600 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_scheddev.c
 create mode 100644 lib/librte_ether/rte_scheddev.h
 create mode 100644 lib/librte_ether/rte_scheddev_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 24e0eff..8a8719f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -247,6 +247,10 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+SchedDev API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+F: lib/librte_ether/rte_scheddev*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 1d095a9..7e0527f 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_scheddev.c
 
 #
 # Export include files
@@ -54,6 +55,8 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_scheddev.h
+SYMLINK-y-include += rte_scheddev_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 637317c..4d67eee 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -159,5 +159,35 @@ DPDK_17.05 {
 	global:
 
 	rte_eth_dev_capability_ops_get;
+	rte_scheddev_get_leaf_nodes;
+	rte_scheddev_node_type_get;
+	rte_scheddev_capabilities_get;
+	rte_scheddev_level_capabilities_get;
+	rte_scheddev_node_capabilities_get;
+	rte_scheddev_wred_profile_add;
+	rte_scheddev_wred_profile_delete;
+	rte_scheddev_shared_wred_context_add_update;
+	rte_scheddev_shared_wred_context_delete;
+	rte_scheddev_shaper_profile_add;
+	rte_scheddev_shaper_profile_delete;
+	rte_scheddev_shared_shaper_add_update;
+	rte_scheddev_shared_shaper_delete;
+	rte_scheddev_node_add;
+	rte_scheddev_node_delete;
+	rte_scheddev_node_suspend;
+	rte_scheddev_node_resume;
+	rte_scheddev_hierarchy_set;
+	rte_scheddev_node_parent_update;
+	rte_scheddev_node_shaper_update;
+	rte_scheddev_node_shared_shaper_update;
+	rte_scheddev_node_stats_update;
+	rte_scheddev_node_scheduling_mode_update;
+	rte_scheddev_node_cman_update;
+	rte_scheddev_node_wred_context_update;
+	rte_scheddev_node_shared_wred_context_update;
+	rte_scheddev_node_stats_read;
+	rte_scheddev_mark_vlan_dei;
+	rte_scheddev_mark_ip_ecn;
+	rte_scheddev_mark_ip_dscp;
 
 } DPDK_17.02;
diff --git a/lib/librte_ether/rte_scheddev.c b/lib/librte_ether/rte_scheddev.c
new file mode 100644
index 0000000..d9c7dfe
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev.c
@@ -0,0 +1,781 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include <rte_branch_prediction.h>
+#include "rte_ethdev.h"
+#include "rte_scheddev_driver.h"
+#include "rte_scheddev.h"
+
+/* Get generic scheduler operations structure from a port. */
+const struct rte_scheddev_ops *
+rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_scheddev_error_set(error,
+			ENODEV,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->cap_ops_get == NULL) ||
+		(dev->dev_ops->cap_ops_get(dev, RTE_ETH_CAPABILITY_SCHED,
+		&ops) != 0) || (ops == NULL)) {
+		rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+/* Get number of leaf nodes */
+int
+rte_scheddev_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (n_leaf_nodes == NULL) {
+		rte_scheddev_error_set(error,
+			EINVAL,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(EINVAL));
+		return -rte_errno;
+	}
+
+	*n_leaf_nodes = dev->data->nb_tx_queues;
+	return 0;
+}
+
+/* Check node ID type (leaf or non-leaf) */
+int
+rte_scheddev_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_type_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_type_get(dev, node_id, is_leaf, error);
+}
+
+/* Get capabilities */
+int rte_scheddev_capabilities_get(uint8_t port_id,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->capabilities_get(dev, cap, error);
+}
+
+/* Get level capabilities */
+int rte_scheddev_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_scheddev_level_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->level_capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->level_capabilities_get(dev, level_id, cap, error);
+}
+
+/* Get node capabilities */
+int rte_scheddev_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_capabilities_get == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_capabilities_get(dev, node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_scheddev_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->wred_profile_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->wred_profile_add(dev, wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_scheddev_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->wred_profile_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->wred_profile_delete(dev, wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_wred_context_add_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_wred_context_add_update(dev, shared_wred_context_id,
+		wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_scheddev_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_wred_context_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_wred_context_delete(dev, shared_wred_context_id,
+		error);
+}
+
+/* Add shaper profile */
+int rte_scheddev_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shaper_profile_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shaper_profile_add(dev, shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_scheddev_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shaper_profile_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shaper_profile_delete(dev, shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_scheddev_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_shaper_add_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_shaper_add_update(dev, shared_shaper_id,
+		shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_scheddev_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->shared_shaper_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->shared_shaper_delete(dev, shared_shaper_id, error);
+}
+
+/* Add node to port scheduler hierarchy */
+int rte_scheddev_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_add == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_add(dev, node_id, parent_node_id, priority, weight,
+		params, error);
+}
+
+/* Delete node from scheduler hierarchy */
+int rte_scheddev_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_delete == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_delete(dev, node_id, error);
+}
+
+/* Suspend node */
+int rte_scheddev_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_suspend == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_suspend(dev, node_id, error);
+}
+
+/* Resume node */
+int rte_scheddev_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_resume == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_resume(dev, node_id, error);
+}
+
+/* Set the initial port scheduler hierarchy */
+int rte_scheddev_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->hierarchy_set == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->hierarchy_set(dev, clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_scheddev_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_parent_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_parent_update(dev, node_id, parent_node_id, priority,
+		weight, error);
+}
+
+/* Update node private shaper */
+int rte_scheddev_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shaper_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shaper_update(dev, node_id, shaper_profile_id,
+		error);
+}
+
+/* Update node shared shapers */
+int rte_scheddev_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shared_shaper_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shared_shaper_update(dev, node_id, shared_shaper_id,
+		add, error);
+}
+
+/* Update node stats */
+int rte_scheddev_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_update(dev, node_id, stats_mask, error);
+}
+
+/* Update scheduling mode */
+int rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_scheduling_mode_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_scheduling_mode_update(dev, node_id,
+		scheduling_mode_per_priority, n_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_scheddev_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_cman_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_cman_update(dev, node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_scheddev_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_wred_context_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_wred_context_update(dev, node_id, wred_profile_id,
+		error);
+}
+
+/* Update node shared WRED context */
+int rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_shared_wred_context_update == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_shared_wred_context_update(dev, node_id,
+		shared_wred_context_id, add, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_scheddev_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->node_stats_read == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->node_stats_read(dev, node_id, stats, stats_mask, clear,
+		error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_scheddev_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_vlan_dei == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_vlan_dei(dev, mark_green, mark_yellow, mark_red,
+		error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_scheddev_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_ip_ecn == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_ip_ecn(dev, mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_scheddev_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_scheddev_ops *ops =
+		rte_scheddev_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (ops->mark_ip_dscp == NULL)
+		return -rte_scheddev_error_set(error,
+			ENOSYS,
+			RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+
+	return ops->mark_ip_dscp(dev, mark_green, mark_yellow, mark_red,
+		error);
+}
diff --git a/lib/librte_ether/rte_scheddev.h b/lib/librte_ether/rte_scheddev.h
new file mode 100644
index 0000000..1741f7a
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev.h
@@ -0,0 +1,1416 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_SCHEDDEV_H__
+#define __INCLUDE_RTE_SCHEDDEV_H__
+
+/**
+ * @file
+ * RTE Generic Hierarchical Scheduler API
+ *
+ * This interface provides the ability to configure the hierarchical scheduler
+ * feature in a generic way.
+ */
+
+#include <stdint.h>
+
+#include <rte_red.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Ethernet framing overhead
+ *
+ * Overhead fields per Ethernet frame:
+ * 1. Preamble:                                            7 bytes;
+ * 2. Start of Frame Delimiter (SFD):                      1 byte;
+ * 3. Inter-Frame Gap (IFG):                              12 bytes.
+ */
+#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD                  20
+
+/**
+ * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
+ * is generated and added at the end of the Ethernet frame on TX side without
+ * any SW intervention.
+ */
+#define RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**< Invalid WRED profile ID */
+#define RTE_SCHEDDEV_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**< Invalid shaper profile ID */
+#define RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**< Node ID for the parent of the root node */
+#define RTE_SCHEDDEV_NODE_ID_NULL                          UINT32_MAX
+
+/**
+ * Color
+ */
+enum rte_scheddev_color {
+	e_RTE_SCHEDDEV_GREEN = 0, /**< Green */
+	e_RTE_SCHEDDEV_YELLOW, /**< Yellow */
+	e_RTE_SCHEDDEV_RED, /**< Red */
+	e_RTE_SCHEDDEV_COLORS /**< Number of colors */
+};
+
+/**
+ * Node statistics counter type
+ */
+enum rte_scheddev_stats_type {
+	/**< Number of packets scheduled from current node. */
+	RTE_SCHEDDEV_STATS_N_PKTS = 1 << 0,
+
+	/**< Number of bytes scheduled from current node. */
+	RTE_SCHEDDEV_STATS_N_BYTES = 1 << 1,
+
+	/**< Number of green packets dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
+
+	/**< Number of yellow packets dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
+
+	/**< Number of red packets dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_PKTS_RED_DROPPED = 1 << 4,
+
+	/**< Number of green bytes dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
+
+	/**< Number of yellow bytes dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
+
+	/**< Number of red bytes dropped by current leaf node.  */
+	RTE_SCHEDDEV_STATS_N_BYTES_RED_DROPPED = 1 << 7,
+
+	/**< Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_SCHEDDEV_STATS_N_PKTS_QUEUED = 1 << 8,
+
+	/**< Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_SCHEDDEV_STATS_N_BYTES_QUEUED = 1 << 9,
+};
+
+/**
+ * Node statistics counters
+ */
+struct rte_scheddev_node_stats {
+	/**< Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/**< Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/**< Statistics counters for leaf nodes only. */
+	struct {
+		/**< Number of packets dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_pkts_dropped[e_RTE_SCHEDDEV_COLORS];
+
+		/**< Number of bytes dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_bytes_dropped[e_RTE_SCHEDDEV_COLORS];
+
+		/**< Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/**< Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Scheduler dynamic updates
+ */
+enum rte_scheddev_dynamic_update_type {
+	/**< Dynamic parent node update. */
+	RTE_SCHEDDEV_UPDATE_NODE_PARENT = 1 << 0,
+
+	/**< Dynamic node add/delete. */
+	RTE_SCHEDDEV_UPDATE_NODE_ADD_DELETE = 1 << 1,
+
+	/**< Suspend/resume nodes. */
+	RTE_SCHEDDEV_UPDATE_NODE_SUSPEND_RESUME = 1 << 2,
+
+	/**< Dynamic switch between WFQ and WRR per node SP priority level. */
+	RTE_SCHEDDEV_UPDATE_NODE_SCHEDULING_MODE = 1 << 3,
+
+	/**< Dynamic update of the set of enabled stats counter types. */
+	RTE_SCHEDDEV_UPDATE_NODE_STATS = 1 << 4,
+
+	/**< Dynamic update of congestion management mode for leaf nodes. */
+	RTE_SCHEDDEV_UPDATE_NODE_CMAN = 1 << 5,
+};
+
+/**
+ * Scheduler node capabilities
+ */
+struct rte_scheddev_node_capabilities {
+	/**< Private shaper support. */
+	int shaper_private_supported;
+
+	/**< Dual rate shaping support for private shaper. Valid only when
+	 * private shaper is supported.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/**< Minimum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of supported shared shapers. The value of zero
+	 * indicates that shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Mask of supported statistics counter types. */
+	uint64_t stats_mask;
+
+	union {
+		/**< Items valid only for non-leaf nodes. */
+		struct {
+			/**< Maximum number of children nodes. */
+			uint32_t n_children_max;
+
+			/**< Maximum number of supported priority levels. The
+			 * value of zero is invalid. The value of 1 indicates
+			 * that only priority 0 is supported, which essentially
+			 * means that Strict Priority (SP) algorithm is not
+			 * supported.
+			 */
+			uint32_t sp_n_priorities_max;
+
+			/**< Maximum number of sibling nodes that can have the
+			 * same priority at any given time. The value of zero is
+			 * invalid. The value of 1 indicates that WFQ/WRR
+			 * algorithms are not supported. The maximum value is
+			 * *n_children_max*.
+			 */
+			uint32_t sp_n_children_max;
+
+			/**< WFQ algorithm support. */
+			int wfq_supported;
+
+			/**< WRR algorithm support. */
+			int wrr_supported;
+
+			/**< Maximum WFQ/WRR weight. */
+			uint32_t wfq_wrr_weight_max;
+		} nonleaf;
+
+		/**< Items valid only for leaf nodes. */
+		struct {
+			/**< Head drop algorithm support. */
+			int cman_head_drop_supported;
+
+			/**< Private WRED context support. */
+			int cman_wred_context_private_supported;
+
+			/**< Maximum number of shared WRED contexts supported.
+			 * The value of zero indicates that shared WRED contexts
+			 * are not supported.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+		} leaf;
+	};
+};
+
+/**
+ * Scheduler level capabilities
+ */
+struct rte_scheddev_level_capabilities {
+	/**< Maximum number of nodes for the current hierarchy level. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of non-leaf nodes for the current hierarchy level.
+	 * The value of 0 indicates that current level only supports leaf nodes.
+	 * The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_nonleaf_max;
+
+	/**< Maximum number of leaf nodes for the current hierarchy level. The
+	 * value of 0 indicates that current level only supports non-leaf nodes.
+	 * The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_leaf_max;
+
+	/**< Summary of node-level capabilities across all the non-leaf nodes
+	 * of the current hierarchy level. Valid only when *n_nodes_nonleaf_max*
+	 * is greater than 0.
+	 */
+	struct rte_scheddev_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all the leaf nodes of the
+	 * current hierarchy level. Valid only when *n_nodes_leaf_max* is
+	 * greater than 0.
+	 */
+	struct rte_scheddev_node_capabilities leaf;
+};
+
+/**
+ * Scheduler capabilities
+ */
+struct rte_scheddev_capabilities {
+	/**< Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/**< Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resource between private and
+	 * shared shapers, it is typically equal to the sum between
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/**< Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have the private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/**< Maximum number of shared shapers. The value of zero indicates that
+	 * shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Maximum number of nodes that can share the same shared shaper. Only
+	 * valid when shared shapers are supported.
+	 */
+	uint32_t shaper_shared_n_nodes_max;
+
+	/**< Maximum number of shared shapers that can be configured with dual
+	 * rate shaping. The value of zero indicates that dual rate shaping
+	 * support is not available for shared shapers.
+	 */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/**< Minimum committed/peak rate (bytes per second) for shared
+	 * shapers. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for shared
+	 * shaper. Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_max;
+
+	/**< Minimum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_min;
+
+	/**< Maximum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_max;
+
+	/**< Maximum number of WRED contexts. */
+	uint32_t cman_wred_context_n_max;
+
+	/**< Maximum number of private WRED contexts. Indicates the maximum
+	 * number of leaf nodes that can concurrently have the private WRED
+	 * context enabled.
+	 */
+	uint32_t cman_wred_context_private_n_max;
+
+	/**< Maximum number of shared WRED contexts. The value of zero indicates
+	 * that shared WRED contexts are not supported.
+	 */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/**< Maximum number of leaf nodes that can share the same WRED context.
+	 * Only valid when shared WRED contexts are supported.
+	 */
+	uint32_t cman_wred_context_shared_n_nodes_max;
+
+	/**< Support for VLAN DEI packet marking. */
+	int mark_vlan_dei_supported;
+
+	/**< Support for IPv4/IPv6 ECN marking of TCP packets. */
+	int mark_ip_ecn_tcp_supported;
+
+	/**< Support for IPv4/IPv6 ECN marking of SCTP packets. */
+	int mark_ip_ecn_sctp_supported;
+
+	/**< Support for IPv4/IPv6 DSCP packet marking. */
+	int mark_ip_dscp_supported;
+
+	/**< Set of supported dynamic update operations
+	 * (see enum rte_scheddev_dynamic_update_type).
+	 */
+	uint64_t dynamic_update_mask;
+
+	/**< Summary of node-level capabilities across all non-leaf nodes. */
+	struct rte_scheddev_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all leaf nodes. */
+	struct rte_scheddev_node_capabilities leaf;
+};
+
+/**
+ * Congestion management (CMAN) mode
+ *
+ * This is used for controlling the admission of packets into a packet queue or
+ * group of packet queues on congestion. On request of writing a new packet
+ * into the current queue while the queue is full, the *tail drop* algorithm
+ * drops the new packet while leaving the queue unmodified, as opposed to *head
+ * drop* algorithm, which drops the packet at the head of the queue (the oldest
+ * packet waiting in the queue) and admits the new packet at the tail of the
+ * queue.
+ *
+ * The *Random Early Detection (RED)* algorithm works by proactively dropping
+ * more and more input packets as the queue occupancy builds up. When the queue
+ * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+ * RED* algorithm uses a separate set of RED thresholds for each packet color.
+ */
+enum rte_scheddev_cman_mode {
+	RTE_SCHEDDEV_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_SCHEDDEV_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_SCHEDDEV_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+ * WRED profile
+ *
+ * Multiple WRED contexts can share the same WRED profile. Each leaf node with
+ * WRED enabled as its congestion management mode has zero or one private WRED
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * WRED contexts (multiple leaf nodes use the same WRED context). A private
+ * WRED context is used to perform congestion management for a single leaf
+ * node, while a shared WRED context is used to perform congestion management
+ * for a group of leaf nodes.
+ */
+struct rte_scheddev_wred_params {
+	/**< One set of RED parameters per packet color */
+	struct rte_red_params red_params[e_RTE_SCHEDDEV_COLORS];
+};
+
+/**
+ * Token bucket
+ */
+struct rte_scheddev_token_bucket {
+	/**< Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/**< Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+ * Shaper (rate limiter) profile
+ *
+ * Multiple shaper instances can share the same shaper profile. Each node has
+ * zero or one private shaper (only one node using it) and/or zero, one or
+ * several shared shapers (multiple nodes use the same shaper instance).
+ * A private shaper is used to perform traffic shaping for a single node, while
+ * a shared shaper is used to perform traffic shaping for a group of nodes.
+ *
+ * Single rate shapers use a single token bucket. A single rate shaper can be
+ * configured by setting the rate of the committed bucket to zero, which
+ * effectively disables this bucket. The peak bucket is used to limit the rate
+ * and the burst size for the current shaper.
+ *
+ * Dual rate shapers use both the committed and the peak token buckets. The
+ * rate of the committed bucket has to be less than or equal to the rate of the
+ * peak bucket.
+ */
+struct rte_scheddev_shaper_params {
+	/**< Committed token bucket */
+	struct rte_scheddev_token_bucket committed;
+
+	/**< Peak token bucket */
+	struct rte_scheddev_token_bucket peak;
+
+	/**< Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_SCHEDDEV_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+ * Node parameters
+ *
+ * Each scheduler hierarchy node has multiple inputs (children nodes of the
+ * current parent node) and a single output (which is input to its parent
+ * node). The current node arbitrates its inputs using Strict Priority (SP),
+ * Weighted Fair Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to
+ * schedule input packets on its output while observing its shaping (rate
+ * limiting) constraints.
+ *
+ * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
+ * approximations of the ideal of WFQ and are assimilated to WFQ, although
+ * an associated implementation-dependent trade-off on accuracy, performance
+ * and resource usage might exist.
+ *
+ * Children nodes with different priorities are scheduled using the SP
+ * algorithm, based on their priority, with zero (0) as the highest priority.
+ * Children with same priority are scheduled using the WFQ or WRR algorithm,
+ * based on their weight, which is relative to the sum of the weights of all
+ * siblings with same priority, with one (1) as the lowest weight.
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
+ * where N is the number of TX queues configured for the current Ethernet port.
+ * The non-leaf nodes have their IDs generated by the application.
+ */
+struct rte_scheddev_node_params {
+	/**< Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/**< User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	/**< Mask of statistics counter types to be enabled for this node. This
+	 * needs to be a subset of the statistics counter types available for
+	 * the current node. Any statistics counter type not included in this
+	 * set is to be disabled for the current node.
+	 */
+	uint64_t stats_mask;
+
+	union {
+		/**< Parameters only valid for non-leaf nodes. */
+		struct {
+			/**< For each priority, indicates whether the children
+			 * nodes sharing the same priority are to be scheduled
+			 * by WFQ or by WRR. When NULL, it indicates that WFQ
+			 * is to be used for all priorities. When non-NULL, it
+			 * points to a pre-allocated array of *n_priority*
+			 * elements, with a non-zero value element indicating
+			 * WFQ and a zero value element for WRR.
+			 */
+			int *scheduling_mode_per_priority;
+
+			/**< Number of priorities. */
+			uint32_t n_priorities;
+		} nonleaf;
+
+		/**< Parameters only valid for leaf nodes. */
+		struct {
+			/**< Congestion management mode */
+			enum rte_scheddev_cman_mode cman;
+
+			/**< WRED parameters (valid when *cman* is WRED). */
+			struct {
+				/**< WRED profile for private WRED context. */
+				uint32_t wred_profile_id;
+
+				/**< User allocated array of shared WRED context
+				 * IDs. The absence of a private WRED context
+				 * for current leaf node is indicated by value
+				 * RTE_SCHEDDEV_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/**< Number of shared WRED context IDs in the
+				 * *shared_wred_context_id* array.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_scheddev_error::cause.
+ */
+enum rte_scheddev_error_type {
+	RTE_SCHEDDEV_ERROR_TYPE_NONE, /**< No error. */
+	RTE_SCHEDDEV_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_SCHEDDEV_ERROR_TYPE_CAPABILITIES,
+	RTE_SCHEDDEV_ERROR_TYPE_LEVEL_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_SCHEDDEV_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
+	RTE_SCHEDDEV_ERROR_TYPE_SHAPER_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARENT_NODE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PRIORITY,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_WEIGHT,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_STATS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_N_PRIORITIES,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_CMAN,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
+	RTE_SCHEDDEV_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_scheddev_error {
+	enum rte_scheddev_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Scheduler get number of leaf nodes
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the set of leaf nodes is predefined, their number is always equal
+ * to N (where N is the number of TX queues configured for the current port) and
+ * their IDs are 0 .. (N-1).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param n_leaf_nodes
+ *   Number of leaf nodes for the current port.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node type (i.e. leaf or non-leaf) get
+ *
+ * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is the
+ * number of TX queues of the current Ethernet port. The non-leaf nodes have
+ * their IDs generated by the application outside of the above range, which is
+ * reserved for leaf nodes.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID value. Needs to be valid.
+ * @param is_leaf
+ *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Scheduler capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_capabilities_get(uint8_t port_id,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler level capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param level_id
+ *   The scheduler hierarchy level identifier. The value of 0 identifies the
+ *   level of the root node.
+ * @param cap
+ *   Scheduler level capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_scheddev_level_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param cap
+ *   Scheduler node capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is currently
+ * at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several scheduler hierarchy
+ * leaf nodes configured to use WRED as the congestion management mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. scheduler hierarchy leaf node) of this
+ * shared WRED context.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is no
+ * longer using the shaper profile previously assigned to it and is updated to
+ * use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. scheduler hierarchy node) of this shared
+ * shaper.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node add
+ *
+ * Create new node and connect it as child of an existing node. The new node is
+ * further identified by *node_id*, which needs to be unused by any of the
+ * existing nodes. The parent node is identified by *parent_node_id*, which
+ * needs to be the valid ID of an existing non-leaf node. The parent node is
+ * going to use the provided SP *priority* and WFQ/WRR *weight* to schedule its
+ * new child node.
+ *
+ * This function has to be called for both leaf and non-leaf nodes. In the case
+ * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
+ * the number of configured TX queues of the current port), the leaf node is
+ * configured rather than created (as the set of leaf nodes is predefined) and
+ * it is also connected as child of an existing node.
+ *
+ * The first node that is added becomes the root node and all the nodes that are
+ * subsequently added have to be added as descendants of the root node. The
+ * parent of the root node has to be specified as RTE_SCHEDDEV_NODE_ID_NULL and
+ * there can only be one node with this parent ID (i.e. the root node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has at
+ * least one user (i.e. child node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node suspend
+ *
+ * Suspend an existing node.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node resume
+ *
+ * Resume an existing node that was previously suspended.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler hierarchy set
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the scheduler start-up hierarchy.
+ *
+ * This function fails when the currently configured scheduler hierarchy is not
+ * supported by the Ethernet port, in which case the user can abort or try out
+ * another hierarchy configuration (e.g. a hierarchy with less leaf nodes),
+ * which can be build from scratch (when *clear_on_fail* is enabled) or by
+ * modifying the existing hierarchy configuration (when *clear_on_fail* is
+ * disabled).
+ *
+ * Note that, even when the configured scheduler hierarchy is supported (so this
+ * function is successful), the Ethernet port start might still fail due to e.g.
+ * not enough memory being available in the system, etc.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node parent update
+ *
+ * The parent of the root node cannot be changed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node private shaper update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_SCHEDDEV_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node shared shapers update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node enabled statistics counters update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats_mask
+ *   Mask of statistics counter types to be enabled for the current node. This
+ *   needs to be a subset of the statistics counter types available for the
+ *   current node. Any statistics counter type not included in this set is to be
+ *   disabled for the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node scheduling mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param scheduling_mode_per_priority
+ *   For each priority, indicates whether the children nodes sharing the same
+ *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates that
+ *   WFQ is to be used for all priorities. When non-NULL, it points to a
+ *   pre-allocated array of *n_priority* elements, with a non-zero value element
+ *   indicating WFQ and a zero value element for WRR.
+ * @param n_priorities
+ *   Number of priorities.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node congestion management mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param cman
+ *   Congestion management mode.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node private WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_SCHEDDEV_WRED_PROFILE_ID_NONE, with
+ *   the latter disabling the private WRED context of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node shared WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared WRED context to current node or to
+ *   zero to delete this shared WRED context from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler node statistics counters read
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param stats_mask
+ *   When non-NULL, it contains the mask of statistics counter types that are
+ *   currently enabled for this node, indicating which of the counters retrieved
+ *   with the *stats* structure are valid.
+ * @param clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read, otherwise
+ *   the statistics counters are left untouched.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion Notification
+ * (ECN) field (2 bits). The DSCP field is typically used to encode the traffic
+ * class and/or drop priority (RFC 2597), while the ECN field is used by RFC
+ * 3168 to implement a congestion notification mechanism to be leveraged by
+ * transport layer protocols such as TCP and SCTP that have congestion control
+ * mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10 (values
+ * indicating that source endpoint is ECN-capable) to 2'b11 (meaning that
+ * congestion is experienced). The destination endpoint can use the ECN-Echo
+ * (ECE) TCP flag to relay the congestion indication back to the source
+ * endpoint, which acknowledges it back to the destination endpoint with the
+ * Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+/**
+ * Scheduler packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ *                       Class 1    Class 2    Class 3    Class 4
+ *                     +----------+----------+----------+----------+
+ *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
+ *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
+ *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
+ *                     +----------+----------+----------+----------+
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2, as
+ * well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_scheddev_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_SCHEDDEV_H__ */
diff --git a/lib/librte_ether/rte_scheddev_driver.h b/lib/librte_ether/rte_scheddev_driver.h
new file mode 100644
index 0000000..d245aea
--- /dev/null
+++ b/lib/librte_ether/rte_scheddev_driver.h
@@ -0,0 +1,365 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
+#define __INCLUDE_RTE_SCHEDDEV_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Hierarchical Scheduler API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_scheddev.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int (*rte_scheddev_node_type_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node type get */
+
+typedef int (*rte_scheddev_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_scheddev_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler capabilities get */
+
+typedef int (*rte_scheddev_level_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t level_id,
+	struct rte_scheddev_level_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler level capabilities get */
+
+typedef int (*rte_scheddev_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_node_capabilities *cap,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node capabilities get */
+
+typedef int (*rte_scheddev_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_wred_params *profile,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler WRED profile add */
+
+typedef int (*rte_scheddev_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler WRED profile delete */
+
+typedef int (*rte_scheddev_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared WRED context add */
+
+typedef int (*rte_scheddev_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared WRED context delete */
+
+typedef int (*rte_scheddev_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_shaper_params *profile,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shaper profile add */
+
+typedef int (*rte_scheddev_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shaper profile delete */
+
+typedef int (*rte_scheddev_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared shaper add/update */
+
+typedef int (*rte_scheddev_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler shared shaper delete */
+
+typedef int (*rte_scheddev_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_node_params *params,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node add */
+
+typedef int (*rte_scheddev_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node delete */
+
+typedef int (*rte_scheddev_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node suspend */
+
+typedef int (*rte_scheddev_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node resume */
+
+typedef int (*rte_scheddev_hierarchy_set_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler hierarchy set */
+
+typedef int (*rte_scheddev_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node parent update */
+
+typedef int (*rte_scheddev_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node shaper update */
+
+typedef int (*rte_scheddev_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node shaper update */
+
+typedef int (*rte_scheddev_node_stats_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node stats update */
+
+typedef int (*rte_scheddev_node_scheduling_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node scheduling mode update */
+
+typedef int (*rte_scheddev_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_scheddev_cman_mode cman,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node congestion management mode update */
+
+typedef int (*rte_scheddev_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node WRED context update */
+
+typedef int (*rte_scheddev_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler node WRED context update */
+
+typedef int (*rte_scheddev_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_scheddev_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler read stats counters for specific node */
+
+typedef int (*rte_scheddev_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - VLAN DEI */
+
+typedef int (*rte_scheddev_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - IPv4/IPv6 ECN */
+
+typedef int (*rte_scheddev_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_scheddev_error *error);
+/**< @internal Scheduler packet marking - IPv4/IPv6 DSCP */
+
+struct rte_scheddev_ops {
+	/** Scheduler node type get */
+	rte_scheddev_node_type_get_t node_type_get;
+
+	/** Scheduler capabilities_get */
+	rte_scheddev_capabilities_get_t capabilities_get;
+	/** Scheduler level capabilities_get */
+	rte_scheddev_level_capabilities_get_t level_capabilities_get;
+	/** Scheduler node capabilities get */
+	rte_scheddev_node_capabilities_get_t node_capabilities_get;
+
+	/** Scheduler WRED profile add */
+	rte_scheddev_wred_profile_add_t wred_profile_add;
+	/** Scheduler WRED profile delete */
+	rte_scheddev_wred_profile_delete_t wred_profile_delete;
+	/** Scheduler shared WRED context add/update */
+	rte_scheddev_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Scheduler shared WRED context delete */
+	rte_scheddev_shared_wred_context_delete_t
+		shared_wred_context_delete;
+
+	/** Scheduler shaper profile add */
+	rte_scheddev_shaper_profile_add_t shaper_profile_add;
+	/** Scheduler shaper profile delete */
+	rte_scheddev_shaper_profile_delete_t shaper_profile_delete;
+	/** Scheduler shared shaper add/update */
+	rte_scheddev_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Scheduler shared shaper delete */
+	rte_scheddev_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Scheduler node add */
+	rte_scheddev_node_add_t node_add;
+	/** Scheduler node delete */
+	rte_scheddev_node_delete_t node_delete;
+	/** Scheduler node suspend */
+	rte_scheddev_node_suspend_t node_suspend;
+	/** Scheduler node resume */
+	rte_scheddev_node_resume_t node_resume;
+	/** Scheduler hierarchy set */
+	rte_scheddev_hierarchy_set_t hierarchy_set;
+
+	/** Scheduler node parent update */
+	rte_scheddev_node_parent_update_t node_parent_update;
+	/** Scheduler node shaper update */
+	rte_scheddev_node_shaper_update_t node_shaper_update;
+	/** Scheduler node shared shaper update */
+	rte_scheddev_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Scheduler node stats update */
+	rte_scheddev_node_stats_update_t node_stats_update;
+	/** Scheduler node scheduling mode update */
+	rte_scheddev_node_scheduling_mode_update_t node_scheduling_mode_update;
+	/** Scheduler node congestion management mode update */
+	rte_scheddev_node_cman_update_t node_cman_update;
+	/** Scheduler node WRED context update */
+	rte_scheddev_node_wred_context_update_t node_wred_context_update;
+	/** Scheduler node shared WRED context update */
+	rte_scheddev_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+	/** Scheduler read statistics counters for current node */
+	rte_scheddev_node_stats_read_t node_stats_read;
+
+	/** Scheduler packet marking - VLAN DEI */
+	rte_scheddev_mark_vlan_dei_t mark_vlan_dei;
+	/** Scheduler packet marking - IPv4/IPv6 ECN */
+	rte_scheddev_mark_ip_ecn_t mark_ip_ecn;
+	/** Scheduler packet marking - IPv4/IPv6 DSCP */
+	rte_scheddev_mark_ip_dscp_t mark_ip_dscp;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param error
+ *   Pointer to error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error type.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_scheddev_error_set(struct rte_scheddev_error *error,
+		   int code,
+		   enum rte_scheddev_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_scheddev_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic hierarchical scheduler operations structure from a port
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param error
+ *   Error details
+ *
+ * @return
+ *   The hierarchical scheduler operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_scheddev_ops *
+rte_scheddev_ops_get(uint8_t port_id, struct rte_scheddev_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_SCHEDDEV_DRIVER_H__ */
-- 
2.5.0

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH 04/16] net/avp: add PMD version map file
  @ 2017-02-25  1:23  3% ` Allain Legacy
    1 sibling, 0 replies; 200+ results
From: Allain Legacy @ 2017-02-25  1:23 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev

Adds a default ABI version file for the AVP PMD.

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
---
 drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/net/avp/rte_pmd_avp_version.map

diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
new file mode 100644
index 0000000..af8f3f4
--- /dev/null
+++ b/drivers/net/avp/rte_pmd_avp_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+
+    local: *;
+};
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 04/15] net/avp: add PMD version map file
  @ 2017-02-26 19:08  3%   ` Allain Legacy
    1 sibling, 0 replies; 200+ results
From: Allain Legacy @ 2017-02-26 19:08 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev

Adds a default ABI version file for the AVP PMD.

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
---
 drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/net/avp/rte_pmd_avp_version.map

diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
new file mode 100644
index 0000000..af8f3f4
--- /dev/null
+++ b/drivers/net/avp/rte_pmd_avp_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+
+    local: *;
+};
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version
  2017-02-22 13:24 20%     ` [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version Christian Ehrhardt
@ 2017-02-28  8:34  4%       ` Jan Blunck
  2017-03-01  9:31  4%         ` Christian Ehrhardt
  0 siblings, 1 reply; 200+ results
From: Jan Blunck @ 2017-02-28  8:34 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: dev, cjcollier @ linuxfoundation . org, ricardo.salveti, Luca Boccassi

On Wed, Feb 22, 2017 at 2:24 PM, Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -40,6 +40,12 @@ EXTLIB_BUILD ?= n
>  # VPATH contains at least SRCDIR
>  VPATH += $(SRCDIR)
>
> +ifneq ($(CONFIG_RTE_MAJOR_ABI),)
> +ifneq ($(LIBABIVER),)
> +LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
> +endif
> +endif
> +
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
>  LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
>  ifeq ($(EXTLIB_BUILD),n)
> @@ -156,11 +162,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
>         @[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
>         $(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> -ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
> -       $(Q)ln -s -f $< $(basename $(basename $@))
> -else
> -       $(Q)ln -s -f $< $(basename $@)
> -endif
> +       $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
>  endif
>

In case CONFIG_RTE_NEXT_ABI=y is set this is actually generating
shared objects with suffix:

  .so.$(CONFIG_RTE_MAJOR_ABI).1

I don't think that this is the intention.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting Bruce Richardson
@ 2017-02-28 11:35  0%   ` Jerin Jacob
  2017-02-28 11:57  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2017-02-28 11:35 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: olivier.matz, dev

On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> Users compiling DPDK should not need to know or care about the arrangement
> of cachelines in the rte_ring structure. Therefore just remove the build
> option and set the structures to be always split. For improved
> performance use 128B rather than 64B alignment since it stops the producer
> and consumer data being on adjacent cachelines.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  config/common_base                     | 1 -
>  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
>  lib/librte_ring/rte_ring.c             | 2 --
>  lib/librte_ring/rte_ring.h             | 8 ++------
>  4 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/config/common_base b/config/common_base
> index aeee13e..099ffda 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
>  #
>  CONFIG_RTE_LIBRTE_RING=y
>  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
>  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
>  
>  #
> diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> index e25ea9f..ea45e0c 100644
> --- a/doc/guides/rel_notes/release_17_05.rst
> +++ b/doc/guides/rel_notes/release_17_05.rst
> @@ -110,6 +110,12 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =========================================================
>  
> +* **Reworked rte_ring library**
> +
> +  The rte_ring library has been reworked and updated. The following changes
> +  have been made to it:
> +
> +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
>  
>  ABI Changes
>  -----------
> diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> index ca0a108..4bc6da1 100644
> --- a/lib/librte_ring/rte_ring.c
> +++ b/lib/librte_ring/rte_ring.c
> @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
>  	/* compilation-time checks */
>  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
>  			  RTE_CACHE_LINE_MASK) != 0);
> -#ifdef RTE_RING_SPLIT_PROD_CONS
>  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
>  			  RTE_CACHE_LINE_MASK) != 0);
> -#endif
>  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
>  			  RTE_CACHE_LINE_MASK) != 0);
>  #ifdef RTE_LIBRTE_RING_DEBUG
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 72ccca5..04fe667 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -168,7 +168,7 @@ struct rte_ring {
>  		uint32_t mask;           /**< Mask (size-1) of ring. */
>  		volatile uint32_t head;  /**< Producer head. */
>  		volatile uint32_t tail;  /**< Producer tail. */
> -	} prod __rte_cache_aligned;
> +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);

I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
size of 128B

> +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);


>  
>  	/** Ring consumer status. */
>  	struct cons {
> @@ -177,11 +177,7 @@ struct rte_ring {
>  		uint32_t mask;           /**< Mask (size-1) of ring. */
>  		volatile uint32_t head;  /**< Consumer head. */
>  		volatile uint32_t tail;  /**< Consumer tail. */
> -#ifdef RTE_RING_SPLIT_PROD_CONS
> -	} cons __rte_cache_aligned;
> -#else
> -	} cons;
> -#endif
> +	} cons __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
>  
>  #ifdef RTE_LIBRTE_RING_DEBUG
>  	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
> -- 
> 2.9.3
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 11:35  0%   ` Jerin Jacob
@ 2017-02-28 11:57  0%     ` Bruce Richardson
  2017-02-28 12:08  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-28 11:57 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > Users compiling DPDK should not need to know or care about the arrangement
> > of cachelines in the rte_ring structure. Therefore just remove the build
> > option and set the structures to be always split. For improved
> > performance use 128B rather than 64B alignment since it stops the producer
> > and consumer data being on adjacent cachelines.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> >  config/common_base                     | 1 -
> >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> >  lib/librte_ring/rte_ring.c             | 2 --
> >  lib/librte_ring/rte_ring.h             | 8 ++------
> >  4 files changed, 8 insertions(+), 9 deletions(-)
> > 
> > diff --git a/config/common_base b/config/common_base
> > index aeee13e..099ffda 100644
> > --- a/config/common_base
> > +++ b/config/common_base
> > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> >  #
> >  CONFIG_RTE_LIBRTE_RING=y
> >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> >  
> >  #
> > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > index e25ea9f..ea45e0c 100644
> > --- a/doc/guides/rel_notes/release_17_05.rst
> > +++ b/doc/guides/rel_notes/release_17_05.rst
> > @@ -110,6 +110,12 @@ API Changes
> >     Also, make sure to start the actual text at the margin.
> >     =========================================================
> >  
> > +* **Reworked rte_ring library**
> > +
> > +  The rte_ring library has been reworked and updated. The following changes
> > +  have been made to it:
> > +
> > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> >  
> >  ABI Changes
> >  -----------
> > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > index ca0a108..4bc6da1 100644
> > --- a/lib/librte_ring/rte_ring.c
> > +++ b/lib/librte_ring/rte_ring.c
> > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> >  	/* compilation-time checks */
> >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> >  			  RTE_CACHE_LINE_MASK) != 0);
> > -#ifdef RTE_RING_SPLIT_PROD_CONS
> >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> >  			  RTE_CACHE_LINE_MASK) != 0);
> > -#endif
> >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> >  			  RTE_CACHE_LINE_MASK) != 0);
> >  #ifdef RTE_LIBRTE_RING_DEBUG
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 72ccca5..04fe667 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -168,7 +168,7 @@ struct rte_ring {
> >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> >  		volatile uint32_t head;  /**< Producer head. */
> >  		volatile uint32_t tail;  /**< Producer tail. */
> > -	} prod __rte_cache_aligned;
> > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> 
> I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> size of 128B
> 
Sure.

However, can you perhaps try a performance test and check to see if
there is a performance difference between the two values before I change
it? In my tests I see improved performance by having an extra blank
cache-line between the producer and consumer data.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 11:57  0%     ` Bruce Richardson
@ 2017-02-28 12:08  0%       ` Jerin Jacob
  2017-02-28 13:52  0%         ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2017-02-28 12:08 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 11:57:03AM +0000, Bruce Richardson wrote:
> On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > > Users compiling DPDK should not need to know or care about the arrangement
> > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > option and set the structures to be always split. For improved
> > > performance use 128B rather than 64B alignment since it stops the producer
> > > and consumer data being on adjacent cachelines.
> > > 
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> > >  config/common_base                     | 1 -
> > >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> > >  lib/librte_ring/rte_ring.c             | 2 --
> > >  lib/librte_ring/rte_ring.h             | 8 ++------
> > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/config/common_base b/config/common_base
> > > index aeee13e..099ffda 100644
> > > --- a/config/common_base
> > > +++ b/config/common_base
> > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > >  #
> > >  CONFIG_RTE_LIBRTE_RING=y
> > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > >  
> > >  #
> > > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > > index e25ea9f..ea45e0c 100644
> > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > @@ -110,6 +110,12 @@ API Changes
> > >     Also, make sure to start the actual text at the margin.
> > >     =========================================================
> > >  
> > > +* **Reworked rte_ring library**
> > > +
> > > +  The rte_ring library has been reworked and updated. The following changes
> > > +  have been made to it:
> > > +
> > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > >  
> > >  ABI Changes
> > >  -----------
> > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > index ca0a108..4bc6da1 100644
> > > --- a/lib/librte_ring/rte_ring.c
> > > +++ b/lib/librte_ring/rte_ring.c
> > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > >  	/* compilation-time checks */
> > >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > -#endif
> > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > >  			  RTE_CACHE_LINE_MASK) != 0);
> > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 72ccca5..04fe667 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -168,7 +168,7 @@ struct rte_ring {
> > >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> > >  		volatile uint32_t head;  /**< Producer head. */
> > >  		volatile uint32_t tail;  /**< Producer tail. */
> > > -	} prod __rte_cache_aligned;
> > > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > 
> > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> > size of 128B
> > 
> Sure.
> 
> However, can you perhaps try a performance test and check to see if
> there is a performance difference between the two values before I change
> it? In my tests I see improved performance by having an extra blank
> cache-line between the producer and consumer data.

Sure. Which test are you running to measure the performance difference?
Is it app/test/test_ring_perf.c?

> 
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 12:08  0%       ` Jerin Jacob
@ 2017-02-28 13:52  0%         ` Bruce Richardson
  2017-02-28 17:54  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-02-28 13:52 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote:
> On Tue, Feb 28, 2017 at 11:57:03AM +0000, Bruce Richardson wrote:
> > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > > On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > > > Users compiling DPDK should not need to know or care about the arrangement
> > > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > > option and set the structures to be always split. For improved
> > > > performance use 128B rather than 64B alignment since it stops the producer
> > > > and consumer data being on adjacent cachelines.
> > > > 
> > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > ---
> > > >  config/common_base                     | 1 -
> > > >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> > > >  lib/librte_ring/rte_ring.c             | 2 --
> > > >  lib/librte_ring/rte_ring.h             | 8 ++------
> > > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/config/common_base b/config/common_base
> > > > index aeee13e..099ffda 100644
> > > > --- a/config/common_base
> > > > +++ b/config/common_base
> > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > > >  #
> > > >  CONFIG_RTE_LIBRTE_RING=y
> > > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > > >  
> > > >  #
> > > > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > > > index e25ea9f..ea45e0c 100644
> > > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > > @@ -110,6 +110,12 @@ API Changes
> > > >     Also, make sure to start the actual text at the margin.
> > > >     =========================================================
> > > >  
> > > > +* **Reworked rte_ring library**
> > > > +
> > > > +  The rte_ring library has been reworked and updated. The following changes
> > > > +  have been made to it:
> > > > +
> > > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > > >  
> > > >  ABI Changes
> > > >  -----------
> > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > index ca0a108..4bc6da1 100644
> > > > --- a/lib/librte_ring/rte_ring.c
> > > > +++ b/lib/librte_ring/rte_ring.c
> > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > > >  	/* compilation-time checks */
> > > >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > -#endif
> > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > index 72ccca5..04fe667 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -168,7 +168,7 @@ struct rte_ring {
> > > >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> > > >  		volatile uint32_t head;  /**< Producer head. */
> > > >  		volatile uint32_t tail;  /**< Producer tail. */
> > > > -	} prod __rte_cache_aligned;
> > > > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > > 
> > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> > > size of 128B
> > > 
> > Sure.
> > 
> > However, can you perhaps try a performance test and check to see if
> > there is a performance difference between the two values before I change
> > it? In my tests I see improved performance by having an extra blank
> > cache-line between the producer and consumer data.
> 
> Sure. Which test are you running to measure the performance difference?
> Is it app/test/test_ring_perf.c?
> 
> > 
Yep, just the basic ring perf test. I look mostly at the core-to-core
numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket
would be far less common use cases IMHO.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 13:52  0%         ` Bruce Richardson
@ 2017-02-28 17:54  0%           ` Jerin Jacob
  2017-03-01  9:47  0%             ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2017-02-28 17:54 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 01:52:26PM +0000, Bruce Richardson wrote:
> On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote:
> > On Tue, Feb 28, 2017 at 11:57:03AM +0000, Bruce Richardson wrote:
> > > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > > > On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > > > > Users compiling DPDK should not need to know or care about the arrangement
> > > > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > > > option and set the structures to be always split. For improved
> > > > > performance use 128B rather than 64B alignment since it stops the producer
> > > > > and consumer data being on adjacent cachelines.
> > > > > 
> > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > > ---
> > > > >  config/common_base                     | 1 -
> > > > >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> > > > >  lib/librte_ring/rte_ring.c             | 2 --
> > > > >  lib/librte_ring/rte_ring.h             | 8 ++------
> > > > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > > > 
> > > > > diff --git a/config/common_base b/config/common_base
> > > > > index aeee13e..099ffda 100644
> > > > > --- a/config/common_base
> > > > > +++ b/config/common_base
> > > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > > > >  #
> > > > >  CONFIG_RTE_LIBRTE_RING=y
> > > > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > > > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > > > >  
> > > > >  #
> > > > > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > > > > index e25ea9f..ea45e0c 100644
> > > > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > > > @@ -110,6 +110,12 @@ API Changes
> > > > >     Also, make sure to start the actual text at the margin.
> > > > >     =========================================================
> > > > >  
> > > > > +* **Reworked rte_ring library**
> > > > > +
> > > > > +  The rte_ring library has been reworked and updated. The following changes
> > > > > +  have been made to it:
> > > > > +
> > > > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > > > >  
> > > > >  ABI Changes
> > > > >  -----------
> > > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > > index ca0a108..4bc6da1 100644
> > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > > > >  	/* compilation-time checks */
> > > > >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > -#endif
> > > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > > index 72ccca5..04fe667 100644
> > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > @@ -168,7 +168,7 @@ struct rte_ring {
> > > > >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> > > > >  		volatile uint32_t head;  /**< Producer head. */
> > > > >  		volatile uint32_t tail;  /**< Producer tail. */
> > > > > -	} prod __rte_cache_aligned;
> > > > > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > > > 
> > > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> > > > size of 128B
> > > > 
> > > Sure.
> > > 
> > > However, can you perhaps try a performance test and check to see if
> > > there is a performance difference between the two values before I change
> > > it? In my tests I see improved performance by having an extra blank
> > > cache-line between the producer and consumer data.
> > 
> > Sure. Which test are you running to measure the performance difference?
> > Is it app/test/test_ring_perf.c?
> > 
> > > 
> Yep, just the basic ring perf test. I look mostly at the core-to-core
> numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket
> would be far less common use cases IMHO.

Performance test result shows regression with RTE_CACHE_LINE_MIN_SIZE
scheme in some use case and some use case has higher performance(Testing using
two physical cores)


# base code
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 84
MP/MC single enq/dequeue: 301
SP/SC burst enq/dequeue (size: 8): 20
MP/MC burst enq/dequeue (size: 8): 46
SP/SC burst enq/dequeue (size: 32): 12
MP/MC burst enq/dequeue (size: 32): 18

### Testing empty dequeue ###
SC empty dequeue: 7.11
MC empty dequeue: 12.15

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 19.08
MP/MC bulk enq/dequeue (size: 8): 46.28
SP/SC bulk enq/dequeue (size: 32): 11.89
MP/MC bulk enq/dequeue (size: 32): 18.84

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 37.42
MP/MC bulk enq/dequeue (size: 8): 73.32
SP/SC bulk enq/dequeue (size: 32): 18.69
MP/MC bulk enq/dequeue (size: 32): 24.59
Test OK

# with ring rework patch
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 84
MP/MC single enq/dequeue: 301
SP/SC burst enq/dequeue (size: 8): 19
MP/MC burst enq/dequeue (size: 8): 45
SP/SC burst enq/dequeue (size: 32): 11
MP/MC burst enq/dequeue (size: 32): 18

### Testing empty dequeue ###
SC empty dequeue: 7.10
MC empty dequeue: 12.15

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 18.59
MP/MC bulk enq/dequeue (size: 8): 45.49
SP/SC bulk enq/dequeue (size: 32): 11.67
MP/MC bulk enq/dequeue (size: 32): 18.65

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 37.41
MP/MC bulk enq/dequeue (size: 8): 72.98
SP/SC bulk enq/dequeue (size: 32): 18.69
MP/MC bulk enq/dequeue (size: 32): 24.59
Test OK
RTE>>

# with ring rework patch + cache-line size change to one on 128BCL target
RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 90
MP/MC single enq/dequeue: 317
SP/SC burst enq/dequeue (size: 8): 20
MP/MC burst enq/dequeue (size: 8): 48
SP/SC burst enq/dequeue (size: 32): 11
MP/MC burst enq/dequeue (size: 32): 18

### Testing empty dequeue ###
SC empty dequeue: 8.10
MC empty dequeue: 11.15

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 20.24
MP/MC bulk enq/dequeue (size: 8): 48.43
SP/SC bulk enq/dequeue (size: 32): 11.01
MP/MC bulk enq/dequeue (size: 32): 18.43

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 25.92
MP/MC bulk enq/dequeue (size: 8): 69.76
SP/SC bulk enq/dequeue (size: 32): 14.27
MP/MC bulk enq/dequeue (size: 32): 22.94
Test OK
RTE>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version
  2017-02-28  8:34  4%       ` Jan Blunck
@ 2017-03-01  9:31  4%         ` Christian Ehrhardt
  2017-03-01  9:34 20%           ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2017-03-01  9:31 UTC (permalink / raw)
  To: Jan Blunck
  Cc: dev, cjcollier @ linuxfoundation . org, ricardo.salveti, Luca Boccassi

On Tue, Feb 28, 2017 at 9:34 AM, Jan Blunck <jblunck@infradead.org> wrote:

> In case CONFIG_RTE_NEXT_ABI=y is set this is actually generating
> shared objects with suffix:
>
>   .so.$(CONFIG_RTE_MAJOR_ABI).1
>
> I don't think that this is the intention.
>

You are right, thanks for the catch Jan!
The fix is a trivial extra ifeq - a V2 is on the way soon.


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] mk: Provide option to set Major ABI version
  2017-03-01  9:31  4%         ` Christian Ehrhardt
@ 2017-03-01  9:34 20%           ` Christian Ehrhardt
  2017-03-01 14:35  4%             ` Jan Blunck
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2017-03-01  9:34 UTC (permalink / raw)
  To: dev
  Cc: Christian Ehrhardt, cjcollier @ linuxfoundation . org,
	ricardo.salveti, Luca Boccassi

Downstreams might want to provide different DPDK releases at the same
time to support multiple consumers of DPDK linked against older and newer
sonames.

Also due to the interdependencies that DPDK libraries can have applications
might end up with an executable space in which multiple versions of a
library are mapped by ld.so.

Think of LibA that got an ABI bump and LibB that did not get an ABI bump
but is depending on LibA.

    Application
    \-> LibA.old
    \-> LibB.new -> LibA.new

That is a conflict which can be avoided by setting CONFIG_RTE_MAJOR_ABI.
If set CONFIG_RTE_MAJOR_ABI overwrites any LIBABIVER value.
An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all
libraries librte<?>.so.16.11 instead of librte<?>.so.<LIBABIVER>.

We need to cut arbitrary long stings after the .so now and this would work
for any ABI version in LIBABIVER:
  $(Q)ln -s -f $< $(patsubst %.$(LIBABIVER),%,$@)
But using the following instead additionally allows to simplify the Make
File for the CONFIG_RTE_NEXT_ABI case.
  $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')

Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 config/common_base                     |  5 +++++
 doc/guides/contributing/versioning.rst | 25 +++++++++++++++++++++++++
 mk/rte.lib.mk                          | 14 +++++++++-----
 3 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/config/common_base b/config/common_base
index aeee13e..37aa1e1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -75,6 +75,11 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
 CONFIG_RTE_NEXT_ABI=y
 
 #
+# Major ABI to overwrite library specific LIBABIVER
+#
+CONFIG_RTE_MAJOR_ABI=
+
+#
 # Machine's cache line size
 #
 CONFIG_RTE_CACHE_LINE_SIZE=64
diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
index fbc44a7..8aaf370 100644
--- a/doc/guides/contributing/versioning.rst
+++ b/doc/guides/contributing/versioning.rst
@@ -133,6 +133,31 @@ The macros exported are:
   fully qualified function ``p``, so that if a symbol becomes versioned, it
   can still be mapped back to the public symbol name.
 
+Setting a Major ABI version
+---------------------------
+
+Downstreams might want to provide different DPDK releases at the same time to
+support multiple consumers of DPDK linked against older and newer sonames.
+
+Also due to the interdependencies that DPDK libraries can have applications
+might end up with an executable space in which multiple versions of a library
+are mapped by ld.so.
+
+Think of LibA that got an ABI bump and LibB that did not get an ABI bump but is
+depending on LibA.
+
+.. note::
+
+    Application
+    \-> LibA.old
+    \-> LibB.new -> LibA.new
+
+That is a conflict which can be avoided by setting ``CONFIG_RTE_MAJOR_ABI``.
+If set, the value of ``CONFIG_RTE_MAJOR_ABI`` overwrites all - otherwise per
+library - versions defined in the libraries ``LIBABIVER``.
+An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all libraries
+``librte<?>.so.16.11`` instead of ``librte<?>.so.<LIBABIVER>``.
+
 Examples of ABI Macro use
 -------------------------
 
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index 33a5f5a..1ffbf42 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -40,12 +40,20 @@ EXTLIB_BUILD ?= n
 # VPATH contains at least SRCDIR
 VPATH += $(SRCDIR)
 
+ifneq ($(CONFIG_RTE_MAJOR_ABI),)
+ifneq ($(LIBABIVER),)
+LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
+endif
+endif
+
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
 ifeq ($(EXTLIB_BUILD),n)
+ifeq ($(CONFIG_RTE_MAJOR_ABI),)
 ifeq ($(CONFIG_RTE_NEXT_ABI),y)
 LIB := $(LIB).1
 endif
+endif
 CPU_LDFLAGS += --version-script=$(SRCDIR)/$(EXPORT_MAP)
 endif
 endif
@@ -156,11 +164,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
 	@[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
 	$(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
-ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
-	$(Q)ln -s -f $< $(basename $(basename $@))
-else
-	$(Q)ln -s -f $< $(basename $@)
-endif
+	$(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
 endif
 
 #
-- 
2.7.4

^ permalink raw reply	[relevance 20%]

* Re: [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting
  2017-02-28 17:54  0%           ` Jerin Jacob
@ 2017-03-01  9:47  0%             ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-01  9:47 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: olivier.matz, dev

On Tue, Feb 28, 2017 at 11:24:25PM +0530, Jerin Jacob wrote:
> On Tue, Feb 28, 2017 at 01:52:26PM +0000, Bruce Richardson wrote:
> > On Tue, Feb 28, 2017 at 05:38:34PM +0530, Jerin Jacob wrote:
> > > On Tue, Feb 28, 2017 at 11:57:03AM +0000, Bruce Richardson wrote:
> > > > On Tue, Feb 28, 2017 at 05:05:13PM +0530, Jerin Jacob wrote:
> > > > > On Thu, Feb 23, 2017 at 05:23:54PM +0000, Bruce Richardson wrote:
> > > > > > Users compiling DPDK should not need to know or care about the arrangement
> > > > > > of cachelines in the rte_ring structure. Therefore just remove the build
> > > > > > option and set the structures to be always split. For improved
> > > > > > performance use 128B rather than 64B alignment since it stops the producer
> > > > > > and consumer data being on adjacent cachelines.
> > > > > > 
> > > > > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > > > ---
> > > > > >  config/common_base                     | 1 -
> > > > > >  doc/guides/rel_notes/release_17_05.rst | 6 ++++++
> > > > > >  lib/librte_ring/rte_ring.c             | 2 --
> > > > > >  lib/librte_ring/rte_ring.h             | 8 ++------
> > > > > >  4 files changed, 8 insertions(+), 9 deletions(-)
> > > > > > 
> > > > > > diff --git a/config/common_base b/config/common_base
> > > > > > index aeee13e..099ffda 100644
> > > > > > --- a/config/common_base
> > > > > > +++ b/config/common_base
> > > > > > @@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
> > > > > >  #
> > > > > >  CONFIG_RTE_LIBRTE_RING=y
> > > > > >  CONFIG_RTE_LIBRTE_RING_DEBUG=n
> > > > > > -CONFIG_RTE_RING_SPLIT_PROD_CONS=n
> > > > > >  CONFIG_RTE_RING_PAUSE_REP_COUNT=0
> > > > > >  
> > > > > >  #
> > > > > > diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
> > > > > > index e25ea9f..ea45e0c 100644
> > > > > > --- a/doc/guides/rel_notes/release_17_05.rst
> > > > > > +++ b/doc/guides/rel_notes/release_17_05.rst
> > > > > > @@ -110,6 +110,12 @@ API Changes
> > > > > >     Also, make sure to start the actual text at the margin.
> > > > > >     =========================================================
> > > > > >  
> > > > > > +* **Reworked rte_ring library**
> > > > > > +
> > > > > > +  The rte_ring library has been reworked and updated. The following changes
> > > > > > +  have been made to it:
> > > > > > +
> > > > > > +  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
> > > > > >  
> > > > > >  ABI Changes
> > > > > >  -----------
> > > > > > diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
> > > > > > index ca0a108..4bc6da1 100644
> > > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > > @@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
> > > > > >  	/* compilation-time checks */
> > > > > >  	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
> > > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > > -#ifdef RTE_RING_SPLIT_PROD_CONS
> > > > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
> > > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > > -#endif
> > > > > >  	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
> > > > > >  			  RTE_CACHE_LINE_MASK) != 0);
> > > > > >  #ifdef RTE_LIBRTE_RING_DEBUG
> > > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > > > > index 72ccca5..04fe667 100644
> > > > > > --- a/lib/librte_ring/rte_ring.h
> > > > > > +++ b/lib/librte_ring/rte_ring.h
> > > > > > @@ -168,7 +168,7 @@ struct rte_ring {
> > > > > >  		uint32_t mask;           /**< Mask (size-1) of ring. */
> > > > > >  		volatile uint32_t head;  /**< Producer head. */
> > > > > >  		volatile uint32_t tail;  /**< Producer tail. */
> > > > > > -	} prod __rte_cache_aligned;
> > > > > > +	} prod __rte_aligned(RTE_CACHE_LINE_SIZE * 2);
> > > > > 
> > > > > I think we need to use RTE_CACHE_LINE_MIN_SIZE instead of
> > > > > RTE_CACHE_LINE_SIZE for alignment here. PPC and ThunderX1 targets are cache line
> > > > > size of 128B
> > > > > 
> > > > Sure.
> > > > 
> > > > However, can you perhaps try a performance test and check to see if
> > > > there is a performance difference between the two values before I change
> > > > it? In my tests I see improved performance by having an extra blank
> > > > cache-line between the producer and consumer data.
> > > 
> > > Sure. Which test are you running to measure the performance difference?
> > > Is it app/test/test_ring_perf.c?
> > > 
> > > > 
> > Yep, just the basic ring perf test. I look mostly at the core-to-core
> > numbers, since hyperthread-to-hyperthread or NUMA socket to NUMA socket
> > would be far less common use cases IMHO.
> 
> Performance test result shows regression with RTE_CACHE_LINE_MIN_SIZE
> scheme in some use case and some use case has higher performance(Testing using
> two physical cores)
> 
> 
> # base code
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 84
> MP/MC single enq/dequeue: 301
> SP/SC burst enq/dequeue (size: 8): 20
> MP/MC burst enq/dequeue (size: 8): 46
> SP/SC burst enq/dequeue (size: 32): 12
> MP/MC burst enq/dequeue (size: 32): 18
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 7.11
> MC empty dequeue: 12.15
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 19.08
> MP/MC bulk enq/dequeue (size: 8): 46.28
> SP/SC bulk enq/dequeue (size: 32): 11.89
> MP/MC bulk enq/dequeue (size: 32): 18.84
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 37.42
> MP/MC bulk enq/dequeue (size: 8): 73.32
> SP/SC bulk enq/dequeue (size: 32): 18.69
> MP/MC bulk enq/dequeue (size: 32): 24.59
> Test OK
> 
> # with ring rework patch
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 84
> MP/MC single enq/dequeue: 301
> SP/SC burst enq/dequeue (size: 8): 19
> MP/MC burst enq/dequeue (size: 8): 45
> SP/SC burst enq/dequeue (size: 32): 11
> MP/MC burst enq/dequeue (size: 32): 18
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 7.10
> MC empty dequeue: 12.15
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 18.59
> MP/MC bulk enq/dequeue (size: 8): 45.49
> SP/SC bulk enq/dequeue (size: 32): 11.67
> MP/MC bulk enq/dequeue (size: 32): 18.65
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 37.41
> MP/MC bulk enq/dequeue (size: 8): 72.98
> SP/SC bulk enq/dequeue (size: 32): 18.69
> MP/MC bulk enq/dequeue (size: 32): 24.59
> Test OK
> RTE>>
> 
> # with ring rework patch + cache-line size change to one on 128BCL target
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 90
> MP/MC single enq/dequeue: 317
> SP/SC burst enq/dequeue (size: 8): 20
> MP/MC burst enq/dequeue (size: 8): 48
> SP/SC burst enq/dequeue (size: 32): 11
> MP/MC burst enq/dequeue (size: 32): 18
> 
> ### Testing empty dequeue ###
> SC empty dequeue: 8.10
> MC empty dequeue: 11.15
> 
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 20.24
> MP/MC bulk enq/dequeue (size: 8): 48.43
> SP/SC bulk enq/dequeue (size: 32): 11.01
> MP/MC bulk enq/dequeue (size: 32): 18.43
> 
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 25.92
> MP/MC bulk enq/dequeue (size: 8): 69.76
> SP/SC bulk enq/dequeue (size: 32): 14.27
> MP/MC bulk enq/dequeue (size: 32): 22.94
> Test OK
> RTE>>

So given that there is not much difference here, is the MIN_SIZE i.e.
forced 64B, your preference, rather than actual cacheline-size?

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-24 14:03  0%     ` Bruce Richardson
@ 2017-03-01  9:55  0%       ` Hunt, David
  0 siblings, 0 replies; 200+ results
From: Hunt, David @ 2017-03-01  9:55 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 24/2/2017 2:03 PM, Bruce Richardson wrote:
> On Tue, Feb 21, 2017 at 03:17:37AM +0000, David Hunt wrote:
>> Move files out of the way so that we can replace with new
>> versions of the distributor libtrary. Files are named in
>> such a way as to match the symbol versioning that we will
>> apply for backward ABI compatibility.
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   app/test/test_distributor.c                  |   2 +-
>>   app/test/test_distributor_perf.c             |   2 +-
>>   examples/distributor/main.c                  |   2 +-
>>   lib/librte_distributor/Makefile              |   4 +-
>>   lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
>>   lib/librte_distributor/rte_distributor.h     | 247 --------------
>>   lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
>>   lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++
> Rather than changing the unit tests and example applications, I think
> this patch would be better with a new rte_distributor.h file which
> simply does "#include  <rte_distributor_v20.h>". Alternatively, I
> recently upstreamed a patch, which went into 17.02, to allow symlinks in
> the folder so you could create a symlink to the renamed file.
>
> /Bruce

Thanks for the review, Bruce. I've just finished reworking the patchset 
on your review comments (including later emails) and will post soon.

Regards,
Dave.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 0/1] net/mlx5: add TSO support
  @ 2017-03-01 11:11  3% ` Shahaf Shuler
  0 siblings, 0 replies; 200+ results
From: Shahaf Shuler @ 2017-03-01 11:11 UTC (permalink / raw)
  To: nelio.laranjeiro, adrien.mazarguil; +Cc: dev

on v2:
* Suppressed patches:
  [PATCH 1/4] ethdev: add Tx offload limitations.
  [PATCH 2/4] ethdev: add TSO disable flag.
  [PATCH 3/4] app/testpmd: add TSO disable to test options.
* The changes introduced on the above conflict with tx_prepare API and break ABI.
  A proposal to disable by default optional offloads and a way to reflect HW offloads
  limitations to application will be addressed on different commit.
* TSO support modification 

[PATCH v2 1/1] net/mlx5: add hardware TSO support

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] mk: Provide option to set Major ABI version
  2017-03-01  9:34 20%           ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
@ 2017-03-01 14:35  4%             ` Jan Blunck
  2017-03-16 17:19  4%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jan Blunck @ 2017-03-01 14:35 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: dev, cjcollier @ linuxfoundation . org, ricardo.salveti, Luca Boccassi

On Wed, Mar 1, 2017 at 10:34 AM, Christian Ehrhardt
<christian.ehrhardt@canonical.com> wrote:
> Downstreams might want to provide different DPDK releases at the same
> time to support multiple consumers of DPDK linked against older and newer
> sonames.
>
> Also due to the interdependencies that DPDK libraries can have applications
> might end up with an executable space in which multiple versions of a
> library are mapped by ld.so.
>
> Think of LibA that got an ABI bump and LibB that did not get an ABI bump
> but is depending on LibA.
>
>     Application
>     \-> LibA.old
>     \-> LibB.new -> LibA.new
>
> That is a conflict which can be avoided by setting CONFIG_RTE_MAJOR_ABI.
> If set CONFIG_RTE_MAJOR_ABI overwrites any LIBABIVER value.
> An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all
> libraries librte<?>.so.16.11 instead of librte<?>.so.<LIBABIVER>.
>
> We need to cut arbitrary long stings after the .so now and this would work
> for any ABI version in LIBABIVER:
>   $(Q)ln -s -f $< $(patsubst %.$(LIBABIVER),%,$@)
> But using the following instead additionally allows to simplify the Make
> File for the CONFIG_RTE_NEXT_ABI case.
>   $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
>
> Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> ---
>  config/common_base                     |  5 +++++
>  doc/guides/contributing/versioning.rst | 25 +++++++++++++++++++++++++
>  mk/rte.lib.mk                          | 14 +++++++++-----
>  3 files changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/config/common_base b/config/common_base
> index aeee13e..37aa1e1 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -75,6 +75,11 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
>  CONFIG_RTE_NEXT_ABI=y
>
>  #
> +# Major ABI to overwrite library specific LIBABIVER
> +#
> +CONFIG_RTE_MAJOR_ABI=
> +
> +#
>  # Machine's cache line size
>  #
>  CONFIG_RTE_CACHE_LINE_SIZE=64
> diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
> index fbc44a7..8aaf370 100644
> --- a/doc/guides/contributing/versioning.rst
> +++ b/doc/guides/contributing/versioning.rst
> @@ -133,6 +133,31 @@ The macros exported are:
>    fully qualified function ``p``, so that if a symbol becomes versioned, it
>    can still be mapped back to the public symbol name.
>
> +Setting a Major ABI version
> +---------------------------
> +
> +Downstreams might want to provide different DPDK releases at the same time to
> +support multiple consumers of DPDK linked against older and newer sonames.
> +
> +Also due to the interdependencies that DPDK libraries can have applications
> +might end up with an executable space in which multiple versions of a library
> +are mapped by ld.so.
> +
> +Think of LibA that got an ABI bump and LibB that did not get an ABI bump but is
> +depending on LibA.
> +
> +.. note::
> +
> +    Application
> +    \-> LibA.old
> +    \-> LibB.new -> LibA.new
> +
> +That is a conflict which can be avoided by setting ``CONFIG_RTE_MAJOR_ABI``.
> +If set, the value of ``CONFIG_RTE_MAJOR_ABI`` overwrites all - otherwise per
> +library - versions defined in the libraries ``LIBABIVER``.
> +An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all libraries
> +``librte<?>.so.16.11`` instead of ``librte<?>.so.<LIBABIVER>``.
> +
>  Examples of ABI Macro use
>  -------------------------
>
> diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
> index 33a5f5a..1ffbf42 100644
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -40,12 +40,20 @@ EXTLIB_BUILD ?= n
>  # VPATH contains at least SRCDIR
>  VPATH += $(SRCDIR)
>
> +ifneq ($(CONFIG_RTE_MAJOR_ABI),)
> +ifneq ($(LIBABIVER),)
> +LIBABIVER := $(CONFIG_RTE_MAJOR_ABI)
> +endif
> +endif
> +
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
>  LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
>  ifeq ($(EXTLIB_BUILD),n)
> +ifeq ($(CONFIG_RTE_MAJOR_ABI),)
>  ifeq ($(CONFIG_RTE_NEXT_ABI),y)
>  LIB := $(LIB).1
>  endif
> +endif
>  CPU_LDFLAGS += --version-script=$(SRCDIR)/$(EXPORT_MAP)
>  endif
>  endif
> @@ -156,11 +164,7 @@ $(RTE_OUTPUT)/lib/$(LIB): $(LIB)
>         @[ -d $(RTE_OUTPUT)/lib ] || mkdir -p $(RTE_OUTPUT)/lib
>         $(Q)cp -f $(LIB) $(RTE_OUTPUT)/lib
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> -ifeq ($(CONFIG_RTE_NEXT_ABI)$(EXTLIB_BUILD),yn)
> -       $(Q)ln -s -f $< $(basename $(basename $@))
> -else
> -       $(Q)ln -s -f $< $(basename $@)
> -endif
> +       $(Q)ln -s -f $< $(shell echo $@ | sed 's/\.so.*/.so/')
>  endif
>
>  #
> --
> 2.7.4
>

Reviewed-by: Jan Blunck <jblunck@infradead.org>
Tested-by: Jan Blunck <jblunck@infradead.org>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files
  2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
@ 2017-03-01  7:47  1%       ` David Hunt
  2017-03-06  9:10  2%         ` [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements David Hunt
  2017-03-01  7:47  3%       ` [dpdk-dev] [PATCH v8 " David Hunt
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements
  2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-21 10:27  0%     ` Hunt, David
  2017-02-24 14:03  0%     ` Bruce Richardson
@ 2017-03-01  7:47  2%     ` David Hunt
  2017-03-01  7:47  1%       ` [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-01  7:47  3%       ` [dpdk-dev] [PATCH v8 " David Hunt
  2 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new burst oriented distributor structs
[04/18] lib: add new distributor code
[05/18] lib: add SIMD flow matching to distributor
[06/18] test/distributor: extra params for autotests
[07/18] lib: switch distributor over to new API
[08/18] lib: make v20 header file private
[09/18] lib: add symbol versioning to distributor
[10/18] test: test single and burst distributor API
[11/18] test: add perf test for distributor burst mode
[12/18] examples/distributor: allow for extra stats
[13/18] sample: distributor: wait for ports to come up
[14/18] examples/distributor: give distributor a core
[15/18] examples/distributor: limit number of Tx rings
[16/18] examples/distributor: give Rx thread a core
[17/18] doc: distributor library changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v8 09/18] lib: add symbol versioning to distributor
  2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2017-03-01  7:47  1%       ` [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-01  7:47  3%       ` David Hunt
  2017-03-01 14:50  0%         ` Hunt, David
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           |  8 ++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..2c5511d 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -168,6 +169,7 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
 
 int
 rte_distributor_return_pkt(struct rte_distributor *d,
@@ -197,6 +199,7 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, , 17.05);
 
 /**** APIs called on distributor core ***/
 
@@ -476,6 +479,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, , 17.05);
 
 /* return to the caller, packets returned from workers */
 int
@@ -504,6 +508,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, , 17.05);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -549,6 +554,7 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, , 17.05);
 
 /* clears the internal returns array in the distributor */
 void
@@ -565,6 +571,7 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, , 17.05);
 
 /* creates a distributor instance */
 struct rte_distributor *
@@ -638,3 +645,4 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, , 17.05);
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 09/18] lib: add symbol versioning to distributor
  2017-03-01  7:47  3%       ` [dpdk-dev] [PATCH v8 " David Hunt
@ 2017-03-01 14:50  0%         ` Hunt, David
  0 siblings, 0 replies; 200+ results
From: Hunt, David @ 2017-03-01 14:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

ERROR:SPACING: space prohibited before that ',' (ctx:WxW)
#84: FILE: lib/librte_distributor/rte_distributor.c:172:
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
                                               ^

FYI, checkpatch does not like this regardless of whether there's
a space there or not. It complains either way. :)

Regards,
Dave.



On 1/3/2017 7:47 AM, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
>
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>   lib/librte_distributor/Makefile                    |  2 +-
>   lib/librte_distributor/rte_distributor.c           |  8 ++++++++
>   lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
>   lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
>   4 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 2b28eff..2f05cf3 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
>   
>   EXPORT_MAP := rte_distributor_version.map
>   
> -LIBABIVER := 1
> +LIBABIVER := 2
>   
>   # all source are stored in SRCS-y
>   SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
> diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
> index 6e1debf..2c5511d 100644
> --- a/lib/librte_distributor/rte_distributor.c
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -36,6 +36,7 @@
>   #include <rte_mbuf.h>
>   #include <rte_memory.h>
>   #include <rte_cycles.h>
> +#include <rte_compat.h>
>   #include <rte_memzone.h>
>   #include <rte_errno.h>
>   #include <rte_string_fns.h>
> @@ -168,6 +169,7 @@ rte_distributor_get_pkt(struct rte_distributor *d,
>   	}
>   	return count;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
>   
>   int
>   rte_distributor_return_pkt(struct rte_distributor *d,
> @@ -197,6 +199,7 @@ rte_distributor_return_pkt(struct rte_distributor *d,
>   
>   	return 0;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, , 17.05);
>   
>   /**** APIs called on distributor core ***/
>   
> @@ -476,6 +479,7 @@ rte_distributor_process(struct rte_distributor *d,
>   
>   	return num_mbufs;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_process, , 17.05);
>   
>   /* return to the caller, packets returned from workers */
>   int
> @@ -504,6 +508,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
>   
>   	return retval;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, , 17.05);
>   
>   /*
>    * Return the number of packets in-flight in a distributor, i.e. packets
> @@ -549,6 +554,7 @@ rte_distributor_flush(struct rte_distributor *d)
>   
>   	return flushed;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_flush, , 17.05);
>   
>   /* clears the internal returns array in the distributor */
>   void
> @@ -565,6 +571,7 @@ rte_distributor_clear_returns(struct rte_distributor *d)
>   	for (wkr = 0; wkr < d->num_workers; wkr++)
>   		d->bufs[wkr].retptr64[0] = 0;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, , 17.05);
>   
>   /* creates a distributor instance */
>   struct rte_distributor *
> @@ -638,3 +645,4 @@ rte_distributor_create(const char *name,
>   
>   	return d;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_create, , 17.05);
> diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
> index 1f406c5..bb6c5d7 100644
> --- a/lib/librte_distributor/rte_distributor_v20.c
> +++ b/lib/librte_distributor/rte_distributor_v20.c
> @@ -38,6 +38,7 @@
>   #include <rte_memory.h>
>   #include <rte_memzone.h>
>   #include <rte_errno.h>
> +#include <rte_compat.h>
>   #include <rte_string_fns.h>
>   #include <rte_eal_memconfig.h>
>   #include "rte_distributor_v20.h"
> @@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
>   		rte_pause();
>   	buf->bufptr64 = req;
>   }
> +VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
>   
>   struct rte_mbuf *
>   rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
> @@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
>   	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
>   	return (struct rte_mbuf *)((uintptr_t)ret);
>   }
> +VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
>   
>   struct rte_mbuf *
>   rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
> @@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
>   		rte_pause();
>   	return ret;
>   }
> +VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
>   
>   int
>   rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
> @@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
>   	buf->bufptr64 = req;
>   	return 0;
>   }
> +VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
>   
>   /**** APIs called on distributor core ***/
>   
> @@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
>   	d->returns.count = ret_count;
>   	return num_mbufs;
>   }
> +VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
>   
>   /* return to the caller, packets returned from workers */
>   int
> @@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
>   
>   	return retval;
>   }
> +VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
>   
>   /* return the number of packets in-flight in a distributor, i.e. packets
>    * being workered on or queued up in a backlog. */
> @@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
>   
>   	return flushed;
>   }
> +VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
>   
>   /* clears the internal returns array in the distributor */
>   void
> @@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
>   	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
>   #endif
>   }
> +VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
>   
>   /* creates a distributor instance */
>   struct rte_distributor_v20 *
> @@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
>   
>   	return d;
>   }
> +VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
> diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
> index 73fdc43..3a285b3 100644
> --- a/lib/librte_distributor/rte_distributor_version.map
> +++ b/lib/librte_distributor/rte_distributor_version.map
> @@ -13,3 +13,17 @@ DPDK_2.0 {
>   
>   	local: *;
>   };
> +
> +DPDK_17.05 {
> +	global:
> +
> +	rte_distributor_clear_returns;
> +	rte_distributor_create;
> +	rte_distributor_flush;
> +	rte_distributor_get_pkt;
> +	rte_distributor_poll_pkt;
> +	rte_distributor_process;
> +	rte_distributor_request_pkt;
> +	rte_distributor_return_pkt;
> +	rte_distributor_returned_pkts;
> +} DPDK_2.0;

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 04/16] net/avp: add PMD version map file
  @ 2017-03-02  0:19  3%     ` Allain Legacy
    1 sibling, 0 replies; 200+ results
From: Allain Legacy @ 2017-03-02  0:19 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: ian.jolliffe, jerin.jacob, stephen, thomas.monjalon, dev

Adds a default ABI version file for the AVP PMD.

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
---
 drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/net/avp/rte_pmd_avp_version.map

diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
new file mode 100644
index 0000000..af8f3f4
--- /dev/null
+++ b/drivers/net/avp/rte_pmd_avp_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+
+    local: *;
+};
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 0/6] introduce prgdev abstraction library
@ 2017-03-02  4:03  3% Chen Jing D(Mark)
  2017-03-02  4:03  4% ` [dpdk-dev] [PATCH 5/6] prgdev: add ABI control info Chen Jing D(Mark)
  0 siblings, 1 reply; 200+ results
From: Chen Jing D(Mark) @ 2017-03-02  4:03 UTC (permalink / raw)
  To: dev
  Cc: cunming.liang, gerald.rogers, keith.wiles, bruce.richardson,
	Chen Jing D(Mark)

These patch set intend to introduce a DPDK generic programming device layer,
called prgdev, to provide an abstract, generic APIs for applications to
program hardware without knowing the details of programmable devices. From
driver's perspective, they'll try to adapt their functions to the abstract
APIs defined in prgdev.

The major purpose of prgdev is to help DPDK users to dynamically load/upgrade
RTL images for FPGA devices, or upgrade firmware for programmble NICs, without
breaking DPDK application running.


Chen Jing D(Mark) (5):
  prgdev: introduce new library
  prgdev: add debug macro for prgdev
  prgdev: add bus probe and remove functions
  prgdev: add prgdev API exposed to application
  prgdev: add ABI control info

Chen, Jing D (1):
  doc: introduction to prgdev

 config/common_base                       |    7 +
 doc/guides/prog_guide/index.rst          |    1 +
 doc/guides/prog_guide/prgdev_lib.rst     |  465 ++++++++++++++++++++++++++++++
 lib/Makefile                             |    1 +
 lib/librte_eal/common/include/rte_log.h  |    1 +
 lib/librte_prgdev/Makefile               |   57 ++++
 lib/librte_prgdev/rte_prgdev.c           |  459 +++++++++++++++++++++++++++++
 lib/librte_prgdev/rte_prgdev.h           |  401 ++++++++++++++++++++++++++
 lib/librte_prgdev/rte_prgdev_version.map |   19 ++
 mk/rte.app.mk                            |    1 +
 10 files changed, 1412 insertions(+), 0 deletions(-)
 create mode 100644 doc/guides/prog_guide/prgdev_lib.rst
 create mode 100644 lib/librte_prgdev/Makefile
 create mode 100644 lib/librte_prgdev/rte_prgdev.c
 create mode 100644 lib/librte_prgdev/rte_prgdev.h
 create mode 100644 lib/librte_prgdev/rte_prgdev_version.map

-- 
1.7.7.6

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 5/6] prgdev: add ABI control info
  2017-03-02  4:03  3% [dpdk-dev] [PATCH 0/6] introduce prgdev abstraction library Chen Jing D(Mark)
@ 2017-03-02  4:03  4% ` Chen Jing D(Mark)
  0 siblings, 0 replies; 200+ results
From: Chen Jing D(Mark) @ 2017-03-02  4:03 UTC (permalink / raw)
  To: dev
  Cc: cunming.liang, gerald.rogers, keith.wiles, bruce.richardson,
	Chen Jing D(Mark)

Add rte_prgdev_version.map file.

Signed-off-by: Chen Jing D(Mark) <jing.d.chen@intel.com>
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
---
 lib/librte_prgdev/rte_prgdev_version.map |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_prgdev/rte_prgdev_version.map

diff --git a/lib/librte_prgdev/rte_prgdev_version.map b/lib/librte_prgdev/rte_prgdev_version.map
new file mode 100644
index 0000000..51dc15a
--- /dev/null
+++ b/lib/librte_prgdev/rte_prgdev_version.map
@@ -0,0 +1,19 @@
+DPDK_17.05 {
+	global:
+
+	rte_prgdev_pci_probe;
+	rte_prgdev_pci_remove;
+	rte_prgdev_allocate;
+	rte_prgdev_release;
+	rte_prgdev_info_get;
+	rte_prgdev_is_valid_dev;
+	rte_prgdev_open;
+	rte_prgdev_img_download;
+	rte_prgdev_img_upload;
+	rte_prgdev_check_stat;
+	rte_prgdev_close;
+	rte_prgdev_bind;
+	rte_prgdev_unbind;
+
+	local: *;
+};
-- 
1.7.7.6

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 00/17] vhost: generic vhost API
@ 2017-03-03  9:51  4% Yuanhan Liu
  2017-03-03  9:51  3% ` [dpdk-dev] [PATCH 16/17] vhost: rename header file Yuanhan Liu
  0 siblings, 1 reply; 200+ results
From: Yuanhan Liu @ 2017-03-03  9:51 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

This is a first attempt to make DPDK vhost library be generic enough,
so that user could built its own vhost-user drivers on top of it. For
example, SPDK (Storage Performance Development Kit) is trying to enable
vhost-user SCSI.

The basic idea is, let DPDK vhost be a vhost-user agent. It stores all
the info about the virtio device (i.e. vring address, negotiated features,
etc) and let the specific vhost-user driver to fetch them (by the API
provided by DPDK vhost lib). With those info being provided, the vhost-user
driver then could get/put vring entries, thus, it could exchange data
between the guest and host.

The last patch demonstrates how to use these new APIs to implement a
very simple vhost-user net driver, without any fancy features enabled.


API/ABI Changes summary
=======================

- some renames
  * "struct virtio_net_device_ops" ==> "struct vhost_device_ops"
  * "rte_virtio_net.h"  ==> "rte_vhost.h"

- driver related APIs are bond with the socket file
  * rte_vhost_driver_set_features(socket_file, features);
  * rte_vhost_driver_get_features(socket_file, features);
  * rte_vhost_driver_enable_features(socket_file, features)
  * rte_vhost_driver_disable_features(socket_file, features)
  * rte_vhost_driver_callback_register(socket_file, notify_ops);

- new APIs to fetch guest and vring info
  * rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
  * rte_vhost_get_negotiated_features(int vid);
  * rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
			      struct rte_vhost_vring *vring);

- new exported structures 
  * struct rte_vhost_vring
  * struct rte_vhost_mem_region
  * struct rte_vhost_memory


Some design choices
===================

While making this patchset, I met quite few design choices and here are
two of them, with the issue and the reason I made such choices provided.
Please let me know if you have any comments (or better ideas).

Export public structures or not
-------------------------------

I made an ABI refactor last time (v16.07): move all the structures
internally and let applications use a "vid" to reference the internal
struct. With that, I hope we could never worry about the annoying ABI
issues.

It works great (and as expected) since then, as far as we only support
virito-net, as far as we can handle all the descs inside vhost lib. It
becomes problematic when a user wants to implement a vhost-user driver
somewhere. For example, it needs do the GPA to VVA translation. Without
any structs exported, some functions like gpa_to_vva() can't be inlined.
Calling it would be costly, especially it's a function we have to invoke
for processing each vring desc.

For that reason, the guest memory regions are exported. With that, the
gpa_to_vva could be inlined.

  
Add helper functions to fetch/update descs or not
-------------------------------------------------

I intended to do it like this way: introduce one function to get @count
of descs from a specific vring and another one to update the used descs.
It's something like
    rte_vhost_vring_get_descs(vid, vring_idx, count, offset, iov, descs);
    rte_vhost_vring_update_used_descs(vid, vring_idx, count, offset, descs);

With that, vhost-user driver programmer's task would be easier, as he/she
doesn't have to parse the descs any more (such as to handle indirect desc).

But judging that virtio 1.1 is just emerged and it proposes a completely
ring layout, and most importantly, the vring desc structure is also changed,
I'd like to hold to introduce such two functions. Otherwise, it's very
likely the two will be invalid when virtio 1.1 is out. Though I think it
may could be addressed with a care design, something like making the IOV
generic enough:

	struct rte_vhost_iov {
		uint64_t	gpa;
		uint64_t	vva;
		uint64_t	len;
	};

Instead, I go with the other way: introduce few APIs to export all the vring
infos (vring size, vring addr, callfd, etc), and let the vhost-user driver
read and update the descs. Those info could be passed to vhost-user driver
by introducing one API for each, but for saving few APIs and reducing few
calls for the programmer, I packed few key fields into a new structure, so
that it can be fetched with one call:
        struct rte_vhost_vring {
                struct vring_desc       *desc;
                struct vring_avail      *avail;
                struct vring_used       *used;
                uint64_t                log_guest_addr;
       
                int                     callfd;
                int                     kickfd;
                uint16_t                size;
        };

When virtio 1.1 comes out, likely a simple change like following would
just work:
        struct rte_vhost_vring {
		union {
			struct {
                		struct vring_desc       *desc;
                		struct vring_avail      *avail;
                		struct vring_used       *used;
                		uint64_t                log_guest_addr;
			};
			struct desc	*desc_1_1;	/* vring addr for virtio 1.1 */
		};
       
                int                     callfd;
                int                     kickfd;
                uint16_t                size;
        };

AFAIK, it's not an ABI breakage. Even if it does, we could introduce a new
API to get the virtio 1.1 ring address.

Those fields are the minimum set I got for a specific vring, with the mind
it would bring the minimum chance to break ABI for future extension. If we
need more info, we could introduce a new API.

OTOH, for getting the best performance, the two functions also have to be
inlined ("vid + vring_idx" combo is replaced with "vring"):
    rte_vhost_vring_get_descs(vring, count, offset, iov, descs);
    rte_vhost_vring_update_used_descs(vring, count, offset, descs);

That said, one way or another, we have to export rte_vhost_vring struct.
For this reason, I didn't rush into introducing the two APIs.


TODOs
=====

This series still got few small items to finish, and they are:
- update release note
- fill API comments
- set protocol features


	--yliu

---
Yuanhan Liu (17):
  vhost: introduce driver features related APIs
  net/vhost: remove feature related APIs
  vhost: use new APIs to handle features
  vhost: make notify ops per vhost driver
  vhost: export guest memory regions
  vhost: introduce API to fetch negotiated features
  vhost: export vhost vring info
  vhost: export API to translate gpa to vva
  vhost: turn queue pair to vring
  vhost: export the number of vrings
  vhost: move the device ready check at proper place
  vhost: drop the Rx and Tx queue macro
  vhost: do not include net specific headers
  vhost: rename device ops struct
  vhost: rename virtio-net to vhost
  vhost: rename header file
  examples/vhost: demonstrate the new generic vhost APIs

 doc/guides/rel_notes/deprecation.rst        |   9 -
 drivers/net/vhost/rte_eth_vhost.c           |  51 ++--
 drivers/net/vhost/rte_eth_vhost.h           |  32 +--
 drivers/net/vhost/rte_pmd_vhost_version.map |   3 -
 examples/tep_termination/main.c             |  11 +-
 examples/tep_termination/main.h             |   2 +
 examples/tep_termination/vxlan_setup.c      |   2 +-
 examples/vhost/Makefile                     |   2 +-
 examples/vhost/main.c                       |  88 ++++--
 examples/vhost/main.h                       |  33 ++-
 examples/vhost/virtio_net.c                 | 405 ++++++++++++++++++++++++++++
 lib/librte_vhost/Makefile                   |   4 +-
 lib/librte_vhost/rte_vhost.h                | 259 ++++++++++++++++++
 lib/librte_vhost/rte_vhost_version.map      |  18 +-
 lib/librte_vhost/rte_virtio_net.h           | 193 -------------
 lib/librte_vhost/socket.c                   | 143 ++++++++++
 lib/librte_vhost/vhost.c                    | 209 +++++++-------
 lib/librte_vhost/vhost.h                    |  82 +++---
 lib/librte_vhost/vhost_user.c               |  91 +++----
 lib/librte_vhost/vhost_user.h               |   2 +-
 lib/librte_vhost/virtio_net.c               |  35 +--
 21 files changed, 1140 insertions(+), 534 deletions(-)
 create mode 100644 examples/vhost/virtio_net.c
 create mode 100644 lib/librte_vhost/rte_vhost.h
 delete mode 100644 lib/librte_vhost/rte_virtio_net.h

-- 
1.9.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 16/17] vhost: rename header file
  2017-03-03  9:51  4% [dpdk-dev] [PATCH 00/17] vhost: generic vhost API Yuanhan Liu
@ 2017-03-03  9:51  3% ` Yuanhan Liu
  0 siblings, 0 replies; 200+ results
From: Yuanhan Liu @ 2017-03-03  9:51 UTC (permalink / raw)
  To: dev; +Cc: Maxime Coquelin, Harris James R, Liu Changpeng, Yuanhan Liu

Rename "rte_virtio_net.h" to "rte_vhost.h", to not let it be virtio
net specific.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 doc/guides/rel_notes/deprecation.rst   |   9 --
 drivers/net/vhost/rte_eth_vhost.c      |   2 +-
 drivers/net/vhost/rte_eth_vhost.h      |   2 +-
 examples/tep_termination/main.c        |   2 +-
 examples/tep_termination/vxlan_setup.c |   2 +-
 examples/vhost/main.c                  |   2 +-
 lib/librte_vhost/Makefile              |   2 +-
 lib/librte_vhost/rte_vhost.h           | 259 +++++++++++++++++++++++++++++++++
 lib/librte_vhost/rte_virtio_net.h      | 259 ---------------------------------
 lib/librte_vhost/vhost.c               |   2 +-
 lib/librte_vhost/vhost.h               |   2 +-
 lib/librte_vhost/vhost_user.h          |   2 +-
 lib/librte_vhost/virtio_net.c          |   2 +-
 13 files changed, 269 insertions(+), 278 deletions(-)
 create mode 100644 lib/librte_vhost/rte_vhost.h
 delete mode 100644 lib/librte_vhost/rte_virtio_net.h

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9d4dfcc..84c8b9d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -104,15 +104,6 @@ Deprecation Notices
   Target release for removal of the legacy API will be defined once most
   PMDs have switched to rte_flow.
 
-* vhost: API/ABI changes are planned for 17.05, for making DPDK vhost library
-  generic enough so that applications can build different vhost-user drivers
-  (instead of vhost-user net only) on top of that.
-  Specifically, ``virtio_net_device_ops`` will be renamed to ``vhost_device_ops``.
-  Correspondingly, some API's parameter need be changed. Few more functions also
-  need be reworked to let it be device aware. For example, different virtio device
-  has different feature set, meaning functions like ``rte_vhost_feature_disable``
-  need be changed. Last, file rte_virtio_net.h will be renamed to rte_vhost.h.
-
 * kni: Remove :ref:`kni_vhost_backend-label` feature (KNI_VHOST) in 17.05 release.
   :doc:`Vhost Library </prog_guide/vhost_lib>` is currently preferred method for
   guest - host communication. Just for clarification, this is not to remove KNI
diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c
index df1e386..f7c370e 100644
--- a/drivers/net/vhost/rte_eth_vhost.c
+++ b/drivers/net/vhost/rte_eth_vhost.c
@@ -40,7 +40,7 @@
 #include <rte_memcpy.h>
 #include <rte_vdev.h>
 #include <rte_kvargs.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_spinlock.h>
 
 #include "rte_eth_vhost.h"
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
index ea4bce4..39ca771 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -41,7 +41,7 @@
 #include <stdint.h>
 #include <stdbool.h>
 
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 /*
  * Event description.
diff --git a/examples/tep_termination/main.c b/examples/tep_termination/main.c
index fa1c7a4..63a5dd3 100644
--- a/examples/tep_termination/main.c
+++ b/examples/tep_termination/main.c
@@ -49,7 +49,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 #include "main.h"
 #include "vxlan.h"
diff --git a/examples/tep_termination/vxlan_setup.c b/examples/tep_termination/vxlan_setup.c
index 8f1f15b..87de74d 100644
--- a/examples/tep_termination/vxlan_setup.c
+++ b/examples/tep_termination/vxlan_setup.c
@@ -49,7 +49,7 @@
 #include <rte_tcp.h>
 
 #include "main.h"
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 #include "vxlan.h"
 #include "vxlan_setup.h"
 
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 080c60b..a9b5352 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -49,7 +49,7 @@
 #include <rte_log.h>
 #include <rte_string_fns.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_ip.h>
 #include <rte_tcp.h>
 
diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 5cf4e93..4847069 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -51,7 +51,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c socket.c vhost.c vhost_user.c \
 				   virtio_net.c
 
 # install includes
-SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
 
 # dependencies
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VHOST) += lib/librte_eal
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
new file mode 100644
index 0000000..cfb3507
--- /dev/null
+++ b/lib/librte_vhost/rte_vhost.h
@@ -0,0 +1,259 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_VHOST_H_
+#define _RTE_VHOST_H_
+
+/**
+ * @file
+ * Interface to vhost-user
+ */
+
+#include <stdint.h>
+#include <linux/vhost.h>
+#include <linux/virtio_ring.h>
+#include <sys/eventfd.h>
+
+#include <rte_memory.h>
+#include <rte_mempool.h>
+
+#define RTE_VHOST_USER_CLIENT		(1ULL << 0)
+#define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
+#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
+
+/**
+ * Information relating to memory regions including offsets to
+ * addresses in QEMUs memory file.
+ */
+struct rte_vhost_mem_region {
+	uint64_t guest_phys_addr;
+	uint64_t guest_user_addr;
+	uint64_t host_user_addr;
+	uint64_t size;
+	void	 *mmap_addr;
+	uint64_t mmap_size;
+	int fd;
+};
+
+/**
+ * Memory structure includes region and mapping information.
+ */
+struct rte_vhost_memory {
+	uint32_t nregions;
+	struct rte_vhost_mem_region regions[0];
+};
+
+struct rte_vhost_vring {
+	struct vring_desc	*desc;
+	struct vring_avail	*avail;
+	struct vring_used	*used;
+	uint64_t		log_guest_addr;
+
+	int			callfd;
+	int			kickfd;
+	uint16_t		size;
+};
+
+/**
+ * Device and vring operations.
+ */
+struct vhost_device_ops {
+	int (*new_device)(int vid);		/**< Add device. */
+	void (*destroy_device)(int vid);	/**< Remove device. */
+
+	int (*vring_state_changed)(int vid, uint16_t queue_id, int enable);	/**< triggered when a vring is enabled or disabled */
+
+	void *reserved[5]; /**< Reserved for future extension */
+};
+
+/**
+ * Convert guest physical Address to host virtual address
+ */
+static inline uint64_t __attribute__((always_inline))
+rte_vhost_gpa_to_vva(struct rte_vhost_memory *mem, uint64_t gpa)
+{
+	struct rte_vhost_mem_region *reg;
+	uint32_t i;
+
+	for (i = 0; i < mem->nregions; i++) {
+		reg = &mem->regions[i];
+		if (gpa >= reg->guest_phys_addr &&
+		    gpa <  reg->guest_phys_addr + reg->size) {
+			return gpa - reg->guest_phys_addr +
+			       reg->host_user_addr;
+		}
+	}
+
+	return 0;
+}
+
+int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable);
+
+/**
+ * Register vhost driver. path could be different for multiple
+ * instance support.
+ */
+int rte_vhost_driver_register(const char *path, uint64_t flags);
+
+/* Unregister vhost driver. This is only meaningful to vhost user. */
+int rte_vhost_driver_unregister(const char *path);
+
+/**
+ * Set feature bits the vhost driver supports.
+ */
+int rte_vhost_driver_set_features(const char *path, uint64_t features);
+uint64_t rte_vhost_driver_get_features(const char *path);
+
+int rte_vhost_driver_enable_features(const char *path, uint64_t features);
+int rte_vhost_driver_disable_features(const char *path, uint64_t features);
+
+/* Register callbacks. */
+int rte_vhost_driver_callback_register(const char *path,
+	struct vhost_device_ops const * const);
+/* Start vhost driver session blocking loop. */
+int rte_vhost_driver_session_start(void);
+
+/**
+ * Get the numa node from which the virtio net device's memory
+ * is allocated.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The numa node, -1 on failure
+ */
+int rte_vhost_get_numa_node(int vid);
+
+/**
+ * @deprecated
+ * Get the number of queues the device supports.
+ *
+ * Note this function is deprecated, as it returns a queue pair number,
+ * which is vhost specific. Instead, rte_vhost_get_vring_num should
+ * be used.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The number of queues, 0 on failure
+ */
+__rte_deprecated
+uint32_t rte_vhost_get_queue_num(int vid);
+
+/**
+ * Get the number of vrings the device supports.
+ *
+ * @param vid
+ *  vhost device ID
+ *
+ * @return
+ *  The number of vrings, 0 on failure
+ */
+uint16_t rte_vhost_get_vring_num(int vid);
+
+/**
+ * Get the virtio net device's ifname, which is the vhost-user socket
+ * file path.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param buf
+ *  The buffer to stored the queried ifname
+ * @param len
+ *  The length of buf
+ *
+ * @return
+ *  0 on success, -1 on failure
+ */
+int rte_vhost_get_ifname(int vid, char *buf, size_t len);
+
+/**
+ * Get how many avail entries are left in the queue
+ *
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index
+ *
+ * @return
+ *  num of avail entires left
+ */
+uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id);
+
+/**
+ * This function adds buffers to the virtio devices RX virtqueue. Buffers can
+ * be received from the physical port or from another virtual device. A packet
+ * count is returned to indicate the number of packets that were succesfully
+ * added to the RX queue.
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index in mq case
+ * @param pkts
+ *  array to contain packets to be enqueued
+ * @param count
+ *  packets num to be enqueued
+ * @return
+ *  num of packets enqueued
+ */
+uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
+	struct rte_mbuf **pkts, uint16_t count);
+
+/**
+ * This function gets guest buffers from the virtio device TX virtqueue,
+ * construct host mbufs, copies guest buffer content to host mbufs and
+ * store them in pkts to be processed.
+ * @param vid
+ *  vhost device ID
+ * @param queue_id
+ *  virtio queue index in mq case
+ * @param mbuf_pool
+ *  mbuf_pool where host mbuf is allocated.
+ * @param pkts
+ *  array to contain packets to be dequeued
+ * @param count
+ *  packets num to be dequeued
+ * @return
+ *  num of packets dequeued
+ */
+uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
+	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
+
+int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
+uint64_t rte_vhost_get_negotiated_features(int vid);
+int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
+			      struct rte_vhost_vring *vring);
+
+#endif /* _RTE_VHOST_H_ */
diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
deleted file mode 100644
index 2f761da..0000000
--- a/lib/librte_vhost/rte_virtio_net.h
+++ /dev/null
@@ -1,259 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _VIRTIO_NET_H_
-#define _VIRTIO_NET_H_
-
-/**
- * @file
- * Interface to vhost net
- */
-
-#include <stdint.h>
-#include <linux/vhost.h>
-#include <linux/virtio_ring.h>
-#include <sys/eventfd.h>
-
-#include <rte_memory.h>
-#include <rte_mempool.h>
-
-#define RTE_VHOST_USER_CLIENT		(1ULL << 0)
-#define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
-#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
-
-/**
- * Information relating to memory regions including offsets to
- * addresses in QEMUs memory file.
- */
-struct rte_vhost_mem_region {
-	uint64_t guest_phys_addr;
-	uint64_t guest_user_addr;
-	uint64_t host_user_addr;
-	uint64_t size;
-	void	 *mmap_addr;
-	uint64_t mmap_size;
-	int fd;
-};
-
-/**
- * Memory structure includes region and mapping information.
- */
-struct rte_vhost_memory {
-	uint32_t nregions;
-	struct rte_vhost_mem_region regions[0];
-};
-
-struct rte_vhost_vring {
-	struct vring_desc	*desc;
-	struct vring_avail	*avail;
-	struct vring_used	*used;
-	uint64_t		log_guest_addr;
-
-	int			callfd;
-	int			kickfd;
-	uint16_t		size;
-};
-
-/**
- * Device and vring operations.
- */
-struct vhost_device_ops {
-	int (*new_device)(int vid);		/**< Add device. */
-	void (*destroy_device)(int vid);	/**< Remove device. */
-
-	int (*vring_state_changed)(int vid, uint16_t queue_id, int enable);	/**< triggered when a vring is enabled or disabled */
-
-	void *reserved[5]; /**< Reserved for future extension */
-};
-
-/**
- * Convert guest physical Address to host virtual address
- */
-static inline uint64_t __attribute__((always_inline))
-rte_vhost_gpa_to_vva(struct rte_vhost_memory *mem, uint64_t gpa)
-{
-	struct rte_vhost_mem_region *reg;
-	uint32_t i;
-
-	for (i = 0; i < mem->nregions; i++) {
-		reg = &mem->regions[i];
-		if (gpa >= reg->guest_phys_addr &&
-		    gpa <  reg->guest_phys_addr + reg->size) {
-			return gpa - reg->guest_phys_addr +
-			       reg->host_user_addr;
-		}
-	}
-
-	return 0;
-}
-
-int rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable);
-
-/**
- * Register vhost driver. path could be different for multiple
- * instance support.
- */
-int rte_vhost_driver_register(const char *path, uint64_t flags);
-
-/* Unregister vhost driver. This is only meaningful to vhost user. */
-int rte_vhost_driver_unregister(const char *path);
-
-/**
- * Set feature bits the vhost driver supports.
- */
-int rte_vhost_driver_set_features(const char *path, uint64_t features);
-uint64_t rte_vhost_driver_get_features(const char *path);
-
-int rte_vhost_driver_enable_features(const char *path, uint64_t features);
-int rte_vhost_driver_disable_features(const char *path, uint64_t features);
-
-/* Register callbacks. */
-int rte_vhost_driver_callback_register(const char *path,
-	struct vhost_device_ops const * const);
-/* Start vhost driver session blocking loop. */
-int rte_vhost_driver_session_start(void);
-
-/**
- * Get the numa node from which the virtio net device's memory
- * is allocated.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The numa node, -1 on failure
- */
-int rte_vhost_get_numa_node(int vid);
-
-/**
- * @deprecated
- * Get the number of queues the device supports.
- *
- * Note this function is deprecated, as it returns a queue pair number,
- * which is vhost specific. Instead, rte_vhost_get_vring_num should
- * be used.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The number of queues, 0 on failure
- */
-__rte_deprecated
-uint32_t rte_vhost_get_queue_num(int vid);
-
-/**
- * Get the number of vrings the device supports.
- *
- * @param vid
- *  vhost device ID
- *
- * @return
- *  The number of vrings, 0 on failure
- */
-uint16_t rte_vhost_get_vring_num(int vid);
-
-/**
- * Get the virtio net device's ifname, which is the vhost-user socket
- * file path.
- *
- * @param vid
- *  vhost device ID
- * @param buf
- *  The buffer to stored the queried ifname
- * @param len
- *  The length of buf
- *
- * @return
- *  0 on success, -1 on failure
- */
-int rte_vhost_get_ifname(int vid, char *buf, size_t len);
-
-/**
- * Get how many avail entries are left in the queue
- *
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index
- *
- * @return
- *  num of avail entires left
- */
-uint16_t rte_vhost_avail_entries(int vid, uint16_t queue_id);
-
-/**
- * This function adds buffers to the virtio devices RX virtqueue. Buffers can
- * be received from the physical port or from another virtual device. A packet
- * count is returned to indicate the number of packets that were succesfully
- * added to the RX queue.
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index in mq case
- * @param pkts
- *  array to contain packets to be enqueued
- * @param count
- *  packets num to be enqueued
- * @return
- *  num of packets enqueued
- */
-uint16_t rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
-	struct rte_mbuf **pkts, uint16_t count);
-
-/**
- * This function gets guest buffers from the virtio device TX virtqueue,
- * construct host mbufs, copies guest buffer content to host mbufs and
- * store them in pkts to be processed.
- * @param vid
- *  vhost device ID
- * @param queue_id
- *  virtio queue index in mq case
- * @param mbuf_pool
- *  mbuf_pool where host mbuf is allocated.
- * @param pkts
- *  array to contain packets to be dequeued
- * @param count
- *  packets num to be dequeued
- * @return
- *  num of packets dequeued
- */
-uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
-	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count);
-
-int rte_vhost_get_vhost_memory(int vid, struct rte_vhost_memory **mem);
-uint64_t rte_vhost_get_negotiated_features(int vid);
-int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
-			      struct rte_vhost_vring *vring);
-
-#endif /* _VIRTIO_NET_H_ */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 0a27888..e0548fe 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -45,7 +45,7 @@
 #include <rte_string_fns.h>
 #include <rte_memory.h>
 #include <rte_malloc.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 
 #include "vhost.h"
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index fc9e431..29132f3 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -46,7 +46,7 @@
 #include <rte_log.h>
 #include <rte_ether.h>
 
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 
 #define VHOST_USER_F_PROTOCOL_FEATURES	30
 
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 179e441..f1a7823 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -37,7 +37,7 @@
 #include <stdint.h>
 #include <linux/vhost.h>
 
-#include "rte_virtio_net.h"
+#include "rte_vhost.h"
 
 /* refer to hw/virtio/vhost-user.c */
 
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8ed2b93..6287c7a 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -39,7 +39,7 @@
 #include <rte_memcpy.h>
 #include <rte_ether.h>
 #include <rte_ip.h>
-#include <rte_virtio_net.h>
+#include <rte_vhost.h>
 #include <rte_tcp.h>
 #include <rte_udp.h>
 #include <rte_sctp.h>
-- 
1.9.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/5] cfgfile: configurable comment character
  @ 2017-03-03 12:10  4%           ` Bruce Richardson
  2017-03-03 12:17  0%             ` Legacy, Allain
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2017-03-03 12:10 UTC (permalink / raw)
  To: Legacy, Allain
  Cc: Dumitrescu, Cristian, Yuanhan Liu, dev, Jolliffe, Ian (Wind River)

On Fri, Mar 03, 2017 at 11:31:11AM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Dumitrescu, Cristian [mailto:cristian.dumitrescu@intel.com]
> > Possible options that I see:
> > 1. Add a new parameters argument to the load functions (e.g. struct
> > cfgfile_params *p), whit the comment char as one (and currently only) field
> > of this struct. Drawbacks: API change that might have to be announced one
> > release before the actual API change.
> 
> I would prefer this option as it provides more flexibility.  We can leave the existing API as is and a wrapper that accepts additional parameters.   Something like the following (with implementations in the .c obviously rather than inline the header like I have it here).  There are several examples of this pattern already in the dpdk (i.e., ring APIs, mempool APIs, etc.) where we use a common function invoked by higher level functions that pass in additional parameters to customize behavior.
> 
> struct rte_cfgfile *_rte_cfgfile_load(const char *filename,
>                                           const struct rte_cfgfile_params *params);
> 
> struct rte_cfgfile *rte_cfgfile_load(const char *filename, int flags)
> {
>         struct rte_cfgfile_params params;
> 
>         rte_cfgfile_set_default_params(&params);
>         params |= flags;
>         return _rte_cfgfile_load(filename, &params);
> }
> 
> struct rte_cfgfile *rte_cfgfile_load_with_params(const char *filename,
>                                                     const struct rte_cfgfile_params *params)
> {
>         return _rte_cfgfile_load(filename, params);
> }

No need for a new API. Just add the extra parameter to the existing load
parameter and use function versioning for ABI compatilibity. Since it's
only one function, I don't think using versioning is a big deal, and
that's what it is there for.

Also, for a single parameter like a comment char, I don't think we need
to go creating a separate structure. The current flags parameter is
unused, so just replace it with the comment char one. With using the
structure, any additions to the struct would be an ABI change anyway, so
I see little point in using it, unless we already know of additional
parameters we will be adding in future. [It's an ABI change even when
adding to the end, since the struct is allocated in the app itself, not
the library.]

/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/5] cfgfile: configurable comment character
  2017-03-03 12:10  4%           ` Bruce Richardson
@ 2017-03-03 12:17  0%             ` Legacy, Allain
  2017-03-03 13:10  0%               ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Legacy, Allain @ 2017-03-03 12:17 UTC (permalink / raw)
  To: RICHARDSON, BRUCE
  Cc: DUMITRESCU, CRISTIAN FLORIN, Yuanhan Liu, dev, Jolliffe, Ian

> -----Original Message-----
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
 > Also, for a single parameter like a comment char, I don't think we need to go
> creating a separate structure. The current flags parameter is unused, so just
> replace it with the comment char one. With using the structure, any additions
In my earlier patch, I proprose using a "global" flag to indicate that an unnamed section exists so the flags argument would still be needed.  

> to the struct would be an ABI change anyway, so I see little point in using it,
> unless we already know of additional parameters we will be adding in future.
We already have 2 parameters in mind - flags, and comment char.  I don't feel that combining the two in a single enum is particularly good since it would be better to allow the application the freedom to set an arbitrary comment character and not be locked in to any static list that we choose (see my previous email response). 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/5] cfgfile: configurable comment character
  2017-03-03 12:17  0%             ` Legacy, Allain
@ 2017-03-03 13:10  0%               ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-03 13:10 UTC (permalink / raw)
  To: Legacy, Allain
  Cc: Dumitrescu, Cristian, Yuanhan Liu, dev, Jolliffe, Ian (Wind River)

On Fri, Mar 03, 2017 at 12:17:47PM +0000, Legacy, Allain wrote:
> > -----Original Message-----
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
>  > Also, for a single parameter like a comment char, I don't think we need to go
> > creating a separate structure. The current flags parameter is unused, so just
> > replace it with the comment char one. With using the structure, any additions
> In my earlier patch, I proprose using a "global" flag to indicate that an unnamed section exists so the flags argument would still be needed.

Ok, good point, I missed that.

> 
> > to the struct would be an ABI change anyway, so I see little point in using it,
> > unless we already know of additional parameters we will be adding in future.
> We already have 2 parameters in mind - flags, and comment char.  I don't feel that combining the two in a single enum is particularly good since it would be better to allow the application the freedom to set an arbitrary comment character and not be locked in to any static list that we choose (see my previous email response).
>
I also agree on not using enums and not limiting comment chars.

I don't particularly like config structs, and would prefer individual
flags and comment char parameters - given it's not a huge list of
params, just 2 - but no big deal either way.

/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler API
  @ 2017-03-04  1:10  1% ` Cristian Dumitrescu
    1 sibling, 0 replies; 200+ results
From: Cristian Dumitrescu @ 2017-03-04  1:10 UTC (permalink / raw)
  To: dev
  Cc: thomas.monjalon, jerin.jacob, balasubramanian.manoharan,
	hemant.agrawal, shreyansh.jain

This patch introduces the generic ethdev API for the traffic manager
capability, which includes: hierarchical scheduling, traffic shaping,
congestion management, packet marking.

Main features:
- Exposed as ethdev plugin capability (similar to rte_flow approach)
- Capability query API per port, per hierarchy level and per hierarchy node
- Scheduling algorithms: Strict Priority (SP), Weighed Fair Queuing (WFQ),
  Weighted Round Robin (WRR)
- Traffic shaping: single/dual rate, private (per node) and shared (by multiple
  nodes) shapers
- Congestion management for hierarchy leaf nodes: algorithms of tail drop,
  head drop, WRED; private (per node) and shared (by multiple nodes) WRED
  contexts
- Packet marking: IEEE 802.1q (VLAN DEI), IETF RFC 3168 (IPv4/IPv6 ECN for
  TCP and SCTP), IETF RFC 2597 (IPv4 / IPv6 DSCP)

Changes in v3:
- Implemented feedback from Jerin [5]
- Changed naming convention: scheddev -> tm
- Improvements on the capability API:
	- Specification of marking capabilities per color
	- WFQ/WRR groups: sp_n_children_max -> wfq_wrr_n_children_per_group_max,
	  added wfq_wrr_n_groups_max, improved description of both, improved
	  description of wfq_wrr_weight_max
	- Dynamic updates: added KEEP_LEVEL and CHANGE_LEVEL for parent update
- Enforced/documented restrictions for root node (node_add() and update())
- Enforced/documented shaper profile restrictions on PIR: PIR != 0, PIR >= CIR
- Turned repetitive code in rte_tm.c into macro
- Removed dependency on rte_red.h file (added RED params to rte_tm.h)
- Color: removed "e_" from color names enum
- Fixed small Doxygen style issues

Changes in v2:
- Implemented feedback from Hemant [4]
- Improvements on the capability API
	- Added capability API for hierarchy level
	- Merged stats capability into the capability API
	- Added dynamic updates
	- Added non-leaf/leaf union to the node capability structure
	- Renamed sp_priority_min to sp_n_priorities_max, added clarifications
	- Fixed description for sp_n_children_max
- Clarified and enforced rule on node ID range for leaf and non-leaf nodes
	- Added API functions to get node type (i.e. leaf/non-leaf):
	  get_leaf_nodes(), node_type_get()
- Added clarification for the root node: its creation, its parent, its role
	- Macro NODE_ID_NULL as root node's parent
	- Description of the node_add() and node_parent_update() API functions
- Added clarification for the first time add vs. subsequent updates rule
	- Cleaned up the description for the node_add() function
- Statistics API improvements
	- Merged stats capability into the capability API
	- Added API function node_stats_update()
	- Added more stats per packet color
- Added more error types
- Fixed small Doxygen style issues

Changes in v1 (since RFC [1]):
- Implemented as ethdev plugin (similar to rte_flow) as opposed to more
  monolithic additions to ethdev itself
- Implemented feedback from Jerin [2] and Hemant [3]. Implemented all the
  suggested items with only one exception, see the long list below, hopefully
  nothing was forgotten.
    - The item not done (hopefully for a good reason): driver-generated object
      IDs. IMO the choice to have application-generated object IDs adds marginal
      complexity to the driver (search ID function required), but it provides
      huge simplification for the application. The app does not need to worry
      about building & managing tree-like structure for storing driver-generated
      object IDs, the app can use its own convention for node IDs depending on
      the specific hierarchy that it needs. Trivial example: identify all
      level-2 nodes with IDs like 100, 200, 300, … and the level-3 nodes based
      on their level-2 parents: 110, 120, 130, 140, …, 210, 220, 230, 240, …,
      310, 320, 330, … and level-4 nodes based on their level-3 parents: 111,
      112, 113, 114, …, 121, 122, 123, 124, …). Moreover, see the change log for
      the other related simplification that was implemented: leaf nodes now have
      predefined IDs that are the same with their Ethernet TX queue ID (
      therefore no translation is required for leaf nodes).
- Capability API. Done per port and per node as well.
- Dual rate shapers
- Added configuration of private shaper (per node) directly from the shaper
  profile as part of node API (no shaper ID needed for private shapers), while
  the shared shapers are configured outside of the node API using shaper profile
  and communicated to the node using shared shaper ID. So there is no
  configuration overhead for shared shapers if the app does not use any of them.
- Leaf nodes now have predefined IDs that are the same with their Ethernet TX
  queue ID (therefore no translation is required for leaf nodes). This is also
  used to differentiate between a leaf node and a non-leaf node.
- Domain-specific errors to give a precise indication of the error cause (same
  as done by rte_flow)
- Packet marking API
- Packet length optional adjustment for shapers, positive (e.g. for adding
  Ethernet framing overhead of 20 bytes) or negative (e.g. for rate limiting
  based on IP packet bytes)

[1] RFC: http://dpdk.org/ml/archives/dev/2016-November/050956.html
[2] Jerin’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054484.html
[3] Hemant’s feedback on RFC: http://www.dpdk.org/ml/archives/dev/2017-January/054866.html
[4] Hemant's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-February/058033.html
[5] Jerin's feedback on v1: http://www.dpdk.org/ml/archives/dev/2017-March/058895.html

Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
---
 MAINTAINERS                            |    4 +
 lib/librte_ether/Makefile              |    5 +-
 lib/librte_ether/rte_ether_version.map |   30 +
 lib/librte_ether/rte_tm.c              |  436 ++++++++++
 lib/librte_ether/rte_tm.h              | 1466 ++++++++++++++++++++++++++++++++
 lib/librte_ether/rte_tm_driver.h       |  365 ++++++++
 6 files changed, 2305 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_ether/rte_tm.c
 create mode 100644 lib/librte_ether/rte_tm.h
 create mode 100644 lib/librte_ether/rte_tm_driver.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 5030c1c..7893ac6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -247,6 +247,10 @@ Flow API
 M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
 F: lib/librte_ether/rte_flow*
 
+Traffic Manager API
+M: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
+F: lib/librte_ether/rte_tm*
+
 Crypto API
 M: Declan Doherty <declan.doherty@intel.com>
 F: lib/librte_cryptodev/
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 1d095a9..82faa67 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -1,6 +1,6 @@
 #   BSD LICENSE
 #
-#   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+#   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
 #   All rights reserved.
 #
 #   Redistribution and use in source and binary forms, with or without
@@ -45,6 +45,7 @@ LIBABIVER := 6
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
+SRCS-y += rte_tm.c
 
 #
 # Export include files
@@ -54,6 +55,8 @@ SYMLINK-y-include += rte_eth_ctrl.h
 SYMLINK-y-include += rte_dev_info.h
 SYMLINK-y-include += rte_flow.h
 SYMLINK-y-include += rte_flow_driver.h
+SYMLINK-y-include += rte_tm.h
+SYMLINK-y-include += rte_tm_driver.h
 
 # this lib depends upon:
 DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf
diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map
index 637317c..42ad3fb 100644
--- a/lib/librte_ether/rte_ether_version.map
+++ b/lib/librte_ether/rte_ether_version.map
@@ -159,5 +159,35 @@ DPDK_17.05 {
 	global:
 
 	rte_eth_dev_capability_ops_get;
+	rte_tm_get_leaf_nodes;
+	rte_tm_node_type_get;
+	rte_tm_capabilities_get;
+	rte_tm_level_capabilities_get;
+	rte_tm_node_capabilities_get;
+	rte_tm_wred_profile_add;
+	rte_tm_wred_profile_delete;
+	rte_tm_shared_wred_context_add_update;
+	rte_tm_shared_wred_context_delete;
+	rte_tm_shaper_profile_add;
+	rte_tm_shaper_profile_delete;
+	rte_tm_shared_shaper_add_update;
+	rte_tm_shared_shaper_delete;
+	rte_tm_node_add;
+	rte_tm_node_delete;
+	rte_tm_node_suspend;
+	rte_tm_node_resume;
+	rte_tm_hierarchy_set;
+	rte_tm_node_parent_update;
+	rte_tm_node_shaper_update;
+	rte_tm_node_shared_shaper_update;
+	rte_tm_node_stats_update;
+	rte_tm_node_scheduling_mode_update;
+	rte_tm_node_cman_update;
+	rte_tm_node_wred_context_update;
+	rte_tm_node_shared_wred_context_update;
+	rte_tm_node_stats_read;
+	rte_tm_mark_vlan_dei;
+	rte_tm_mark_ip_ecn;
+	rte_tm_mark_ip_dscp;
 
 } DPDK_17.02;
diff --git a/lib/librte_ether/rte_tm.c b/lib/librte_ether/rte_tm.c
new file mode 100644
index 0000000..f8bd491
--- /dev/null
+++ b/lib/librte_ether/rte_tm.c
@@ -0,0 +1,436 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm_driver.h"
+#include "rte_tm.h"
+
+/* Get generic traffic manager operations structure from a port. */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops;
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_tm_error_set(error,
+			ENODEV,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENODEV));
+		return NULL;
+	}
+
+	if ((dev->dev_ops->cap_ops_get == NULL) ||
+		(dev->dev_ops->cap_ops_get(dev, RTE_ETH_CAPABILITY_TM,
+		&ops) != 0) || (ops == NULL)) {
+		rte_tm_error_set(error,
+			ENOSYS,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(ENOSYS));
+		return NULL;
+	}
+
+	return ops;
+}
+
+#define RTE_TM_FUNC(port_id, func)				\
+({								\
+	const struct rte_tm_ops *ops =			\
+		rte_tm_ops_get(port_id, error);		\
+	if (ops == NULL)						\
+		return -rte_errno;				\
+								\
+	if (ops->func == NULL)					\
+		return -rte_tm_error_set(error,		\
+			ENOSYS,					\
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,	\
+			NULL,					\
+			rte_strerror(ENOSYS));			\
+								\
+	ops->func;						\
+})
+
+/* Get number of leaf nodes */
+int
+rte_tm_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	const struct rte_tm_ops *ops =
+		rte_tm_ops_get(port_id, error);
+
+	if (ops == NULL)
+		return -rte_errno;
+
+	if (n_leaf_nodes == NULL) {
+		rte_tm_error_set(error,
+			EINVAL,
+			RTE_TM_ERROR_TYPE_UNSPECIFIED,
+			NULL,
+			rte_strerror(EINVAL));
+		return -rte_errno;
+	}
+
+	*n_leaf_nodes = dev->data->nb_tx_queues;
+	return 0;
+}
+
+/* Check node ID type (leaf or non-leaf) */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_type_get)(dev,
+		node_id, is_leaf, error);
+}
+
+/* Get capabilities */
+int rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, capabilities_get)(dev,
+		cap, error);
+}
+
+/* Get level capabilities */
+int rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, level_capabilities_get)(dev,
+		level_id, cap, error);
+}
+
+/* Get node capabilities */
+int rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_capabilities_get)(dev,
+		node_id, cap, error);
+}
+
+/* Add WRED profile */
+int rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_add)(dev,
+		wred_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, wred_profile_delete)(dev,
+		wred_profile_id, error);
+}
+
+/* Add/update shared WRED context */
+int rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_add_update)(dev,
+		shared_wred_context_id, wred_profile_id, error);
+}
+
+/* Delete shared WRED context */
+int rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_wred_context_delete)(dev,
+		shared_wred_context_id, error);
+}
+
+/* Add shaper profile */
+int rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_add)(dev,
+		shaper_profile_id, profile, error);
+}
+
+/* Delete WRED profile */
+int rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shaper_profile_delete)(dev,
+		shaper_profile_id, error);
+}
+
+/* Add shared shaper */
+int rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_add_update)(dev,
+		shared_shaper_id, shaper_profile_id, error);
+}
+
+/* Delete shared shaper */
+int rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, shared_shaper_delete)(dev,
+		shared_shaper_id, error);
+}
+
+/* Add node to port traffic manager hierarchy */
+int rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_add)(dev,
+		node_id, parent_node_id, priority, weight, params, error);
+}
+
+/* Delete node from traffic manager hierarchy */
+int rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_delete)(dev,
+		node_id, error);
+}
+
+/* Suspend node */
+int rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_suspend)(dev,
+		node_id, error);
+}
+
+/* Resume node */
+int rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_resume)(dev,
+		node_id, error);
+}
+
+/* Set the initial port traffic manager hierarchy */
+int rte_tm_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, hierarchy_set)(dev,
+		clear_on_fail, error);
+}
+
+/* Update node parent  */
+int rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_parent_update)(dev,
+		node_id, parent_node_id, priority, weight, error);
+}
+
+/* Update node private shaper */
+int rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shaper_update)(dev,
+		node_id, shaper_profile_id, error);
+}
+
+/* Update node shared shapers */
+int rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_shaper_update)(dev,
+		node_id, shared_shaper_id, add, error);
+}
+
+/* Update node stats */
+int rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_update)(dev,
+		node_id, stats_mask, error);
+}
+
+/* Update scheduling mode */
+int rte_tm_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_scheduling_mode_update)(dev,
+		node_id, scheduling_mode_per_priority, n_priorities, error);
+}
+
+/* Update node congestion management mode */
+int rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_cman_update)(dev,
+		node_id, cman, error);
+}
+
+/* Update node private WRED context */
+int rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_wred_context_update)(dev,
+		node_id, wred_profile_id, error);
+}
+
+/* Update node shared WRED context */
+int rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_shared_wred_context_update)(dev,
+		node_id, shared_wred_context_id, add, error);
+}
+
+/* Read and/or clear stats counters for specific node */
+int rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, node_stats_read)(dev,
+		node_id, stats, stats_mask, clear, error);
+}
+
+/* Packet marking - VLAN DEI */
+int rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_vlan_dei)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 ECN */
+int rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_ecn)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
+
+/* Packet marking - IPv4/IPv6 DSCP */
+int rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error)
+{
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+	return RTE_TM_FUNC(port_id, mark_ip_dscp)(dev,
+		mark_green, mark_yellow, mark_red, error);
+}
diff --git a/lib/librte_ether/rte_tm.h b/lib/librte_ether/rte_tm.h
new file mode 100644
index 0000000..64ef5dd
--- /dev/null
+++ b/lib/librte_ether/rte_tm.h
@@ -0,0 +1,1466 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_H__
+#define __INCLUDE_RTE_TM_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API
+ *
+ * This interface provides the ability to configure the traffic manager in a
+ * generic way. It includes features such as: hierarchical scheduling,
+ * traffic shaping, congestion management, packet marking, etc.
+ */
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** Ethernet framing overhead
+ *
+ * Overhead fields per Ethernet frame:
+ * 1. Preamble:                                            7 bytes;
+ * 2. Start of Frame Delimiter (SFD):                      1 byte;
+ * 3. Inter-Frame Gap (IFG):                              12 bytes.
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD                  20
+
+/**
+ * Ethernet framing overhead plus Frame Check Sequence (FCS). Useful when FCS
+ * is generated and added at the end of the Ethernet frame on TX side without
+ * any SW intervention.
+ */
+#define RTE_TM_ETH_FRAMING_OVERHEAD_FCS              24
+
+/**< Invalid WRED profile ID */
+#define RTE_TM_WRED_PROFILE_ID_NONE                  UINT32_MAX
+
+/**< Invalid shaper profile ID */
+#define RTE_TM_SHAPER_PROFILE_ID_NONE                UINT32_MAX
+
+/**< Node ID for the parent of the root node */
+#define RTE_TM_NODE_ID_NULL                          UINT32_MAX
+
+/**
+ * Color
+ */
+enum rte_tm_color {
+	RTE_TM_GREEN = 0, /**< Green */
+	RTE_TM_YELLOW, /**< Yellow */
+	RTE_TM_RED, /**< Red */
+	RTE_TM_COLORS /**< Number of colors */
+};
+
+/**
+ * Node statistics counter type
+ */
+enum rte_tm_stats_type {
+	/**< Number of packets scheduled from current node. */
+	RTE_TM_STATS_N_PKTS = 1 << 0,
+
+	/**< Number of bytes scheduled from current node. */
+	RTE_TM_STATS_N_BYTES = 1 << 1,
+
+	/**< Number of green packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_GREEN_DROPPED = 1 << 2,
+
+	/**< Number of yellow packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_YELLOW_DROPPED = 1 << 3,
+
+	/**< Number of red packets dropped by current leaf node.  */
+	RTE_TM_STATS_N_PKTS_RED_DROPPED = 1 << 4,
+
+	/**< Number of green bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_GREEN_DROPPED = 1 << 5,
+
+	/**< Number of yellow bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_YELLOW_DROPPED = 1 << 6,
+
+	/**< Number of red bytes dropped by current leaf node.  */
+	RTE_TM_STATS_N_BYTES_RED_DROPPED = 1 << 7,
+
+	/**< Number of packets currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_PKTS_QUEUED = 1 << 8,
+
+	/**< Number of bytes currently waiting in the packet queue of current
+	 * leaf node.
+	 */
+	RTE_TM_STATS_N_BYTES_QUEUED = 1 << 9,
+};
+
+/**
+ * Node statistics counters
+ */
+struct rte_tm_node_stats {
+	/**< Number of packets scheduled from current node. */
+	uint64_t n_pkts;
+
+	/**< Number of bytes scheduled from current node. */
+	uint64_t n_bytes;
+
+	/**< Statistics counters for leaf nodes only. */
+	struct {
+		/**< Number of packets dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_pkts_dropped[RTE_TM_COLORS];
+
+		/**< Number of bytes dropped by current leaf node per each
+		 * color.
+		 */
+		uint64_t n_bytes_dropped[RTE_TM_COLORS];
+
+		/**< Number of packets currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_pkts_queued;
+
+		/**< Number of bytes currently waiting in the packet queue of
+		 * current leaf node.
+		 */
+		uint64_t n_bytes_queued;
+	} leaf;
+};
+
+/**
+ * Traffic manager dynamic updates
+ */
+enum rte_tm_dynamic_update_type {
+	/**< Dynamic parent node update. The new parent node is located on same
+	 * hierarchy level as the former parent node. Consequently, the node
+	 * whose parent is changed preserves its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_KEEP_LEVEL = 1 << 0,
+
+	/**< Dynamic parent node update. The new parent node is located on
+	 * different hierarchy level than the former parent node. Consequently,
+	 * the node whose parent is changed also changes its hierarchy level.
+	 */
+	RTE_TM_UPDATE_NODE_PARENT_CHANGE_LEVEL = 1 << 1,
+
+	/**< Dynamic node add/delete. */
+	RTE_TM_UPDATE_NODE_ADD_DELETE = 1 << 2,
+
+	/**< Suspend/resume nodes. */
+	RTE_TM_UPDATE_NODE_SUSPEND_RESUME = 1 << 3,
+
+	/**< Dynamic switch between WFQ and WRR per node SP priority level. */
+	RTE_TM_UPDATE_NODE_SCHEDULING_MODE = 1 << 4,
+
+	/**< Dynamic update of the set of enabled stats counter types. */
+	RTE_TM_UPDATE_NODE_STATS = 1 << 5,
+
+	/**< Dynamic update of congestion management mode for leaf nodes. */
+	RTE_TM_UPDATE_NODE_CMAN = 1 << 6,
+};
+
+/**
+ * Traffic manager node capabilities
+ */
+struct rte_tm_node_capabilities {
+	/**< Private shaper support. */
+	int shaper_private_supported;
+
+	/**< Dual rate shaping support for private shaper. Valid only when
+	 * private shaper is supported.
+	 */
+	int shaper_private_dual_rate_supported;
+
+	/**< Minimum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for private
+	 * shaper. Valid only when private shaper is supported.
+	 */
+	uint64_t shaper_private_rate_max;
+
+	/**< Maximum number of supported shared shapers. The value of zero
+	 * indicates that shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Mask of supported statistics counter types. */
+	uint64_t stats_mask;
+
+	union {
+		/**< Items valid only for non-leaf nodes. */
+		struct {
+			/**< Maximum number of children nodes. */
+			uint32_t n_children_max;
+
+			/**< Maximum number of supported priority levels. The
+			 * value of zero is invalid. The value of 1 indicates
+			 * that only priority 0 is supported, which essentially
+			 * means that Strict Priority (SP) algorithm is not
+			 * supported.
+			 */
+			uint32_t sp_n_priorities_max;
+
+			/**< Maximum number of sibling nodes that can have the
+			 * same priority at any given time, i.e. maximum size
+			 * of the WFQ/WRR sibling node group. The value of zero
+			 * is invalid. The value of 1 indicates that WFQ/WRR
+			 * algorithms are not supported. The maximum value is
+			 * *n_children_max*.
+			 */
+			uint32_t wfq_wrr_n_children_per_group_max;
+
+			/**< Maximum number of priority levels that can have
+			 * more than one child node at any given time, i.e.
+			 * maximum number of WFQ/WRR sibling node groups that
+			 * have two or more members. The value of zero states
+			 * that WFQ/WRR algorithms are not supported. The value
+			 * of 1 indicates that (*sp_n_priorities_max* - 1)
+			 * priority levels have at most one child node, so
+			 * there can be only one priority level with two or
+			 * more sibling nodes making up a WFQ/WRR group. The
+			 * maximum value is: min(floor(*n_children_max* / 2),
+			 * *sp_n_priorities_max*).
+			 */
+			uint32_t wfq_wrr_n_groups_max;
+
+			/**< WFQ algorithm support. */
+			int wfq_supported;
+
+			/**< WRR algorithm support. */
+			int wrr_supported;
+
+			/**< Maximum WFQ/WRR weight. The value of 1 indicates
+			 * that all sibling nodes with same priority have the
+			 * same WFQ/WRR weight, so WFQ/WRR is reduced to FQ/RR.
+			 */
+			uint32_t wfq_wrr_weight_max;
+		} nonleaf;
+
+		/**< Items valid only for leaf nodes. */
+		struct {
+			/**< Head drop algorithm support. */
+			int cman_head_drop_supported;
+
+			/**< Private WRED context support. */
+			int cman_wred_context_private_supported;
+
+			/**< Maximum number of shared WRED contexts supported.
+			 * The value of zero indicates that shared WRED
+			 * contexts are not supported.
+			 */
+			uint32_t cman_wred_context_shared_n_max;
+		} leaf;
+	};
+};
+
+/**
+ * Traffic manager level capabilities
+ */
+struct rte_tm_level_capabilities {
+	/**< Maximum number of nodes for the current hierarchy level. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of non-leaf nodes for the current hierarchy level.
+	 * The value of 0 indicates that current level only supports leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_nonleaf_max;
+
+	/**< Maximum number of leaf nodes for the current hierarchy level. The
+	 * value of 0 indicates that current level only supports non-leaf
+	 * nodes. The maximum value is *n_nodes_max*.
+	 */
+	uint32_t n_nodes_leaf_max;
+
+	/**< Summary of node-level capabilities across all the non-leaf nodes
+	 * of the current hierarchy level. Valid only when
+	 * *n_nodes_nonleaf_max* is greater than 0.
+	 */
+	struct rte_tm_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all the leaf nodes of
+	 * the current hierarchy level. Valid only when *n_nodes_leaf_max* is
+	 * greater than 0.
+	 */
+	struct rte_tm_node_capabilities leaf;
+};
+
+/**
+ * Traffic manager capabilities
+ */
+struct rte_tm_capabilities {
+	/**< Maximum number of nodes. */
+	uint32_t n_nodes_max;
+
+	/**< Maximum number of levels (i.e. number of nodes connecting the root
+	 * node with any leaf node, including the root and the leaf).
+	 */
+	uint32_t n_levels_max;
+
+	/**< Maximum number of shapers, either private or shared. In case the
+	 * implementation does not share any resource between private and
+	 * shared shapers, it is typically equal to the sum between
+	 * *shaper_private_n_max* and *shaper_shared_n_max*.
+	 */
+	uint32_t shaper_n_max;
+
+	/**< Maximum number of private shapers. Indicates the maximum number of
+	 * nodes that can concurrently have the private shaper enabled.
+	 */
+	uint32_t shaper_private_n_max;
+
+	/**< Maximum number of shared shapers. The value of zero indicates that
+	 * shared shapers are not supported.
+	 */
+	uint32_t shaper_shared_n_max;
+
+	/**< Maximum number of nodes that can share the same shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint32_t shaper_shared_n_nodes_max;
+
+	/**< Maximum number of shared shapers that can be configured with dual
+	 * rate shaping. The value of zero indicates that dual rate shaping
+	 * support is not available for shared shapers.
+	 */
+	uint32_t shaper_shared_dual_rate_n_max;
+
+	/**< Minimum committed/peak rate (bytes per second) for shared shapers.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_min;
+
+	/**< Maximum committed/peak rate (bytes per second) for shared shaper.
+	 * Only valid when shared shapers are supported.
+	 */
+	uint64_t shaper_shared_rate_max;
+
+	/**< Minimum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_min;
+
+	/**< Maximum value allowed for packet length adjustment for
+	 * private/shared shapers.
+	 */
+	int shaper_pkt_length_adjust_max;
+
+	/**< Maximum number of WRED contexts. */
+	uint32_t cman_wred_context_n_max;
+
+	/**< Maximum number of private WRED contexts. Indicates the maximum
+	 * number of leaf nodes that can concurrently have the private WRED
+	 * context enabled.
+	 */
+	uint32_t cman_wred_context_private_n_max;
+
+	/**< Maximum number of shared WRED contexts. The value of zero
+	 * indicates that shared WRED contexts are not supported.
+	 */
+	uint32_t cman_wred_context_shared_n_max;
+
+	/**< Maximum number of leaf nodes that can share the same WRED context.
+	 * Only valid when shared WRED contexts are supported.
+	 */
+	uint32_t cman_wred_context_shared_n_nodes_max;
+
+	/**< Support for VLAN DEI packet marking (per color). */
+	int mark_vlan_dei_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 ECN marking of TCP packets (per color). */
+	int mark_ip_ecn_tcp_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 ECN marking of SCTP packets (per color). */
+	int mark_ip_ecn_sctp_supported[RTE_TM_COLORS];
+
+	/**< Support for IPv4/IPv6 DSCP packet marking (per color). */
+	int mark_ip_dscp_supported[RTE_TM_COLORS];
+
+	/**< Set of supported dynamic update operations
+	 * (see enum rte_tm_dynamic_update_type).
+	 */
+	uint64_t dynamic_update_mask;
+
+	/**< Summary of node-level capabilities across all non-leaf nodes. */
+	struct rte_tm_node_capabilities nonleaf;
+
+	/**< Summary of node-level capabilities across all leaf nodes. */
+	struct rte_tm_node_capabilities leaf;
+};
+
+/**
+ * Congestion management (CMAN) mode
+ *
+ * This is used for controlling the admission of packets into a packet queue or
+ * group of packet queues on congestion. On request of writing a new packet
+ * into the current queue while the queue is full, the *tail drop* algorithm
+ * drops the new packet while leaving the queue unmodified, as opposed to *head
+ * drop* algorithm, which drops the packet at the head of the queue (the oldest
+ * packet waiting in the queue) and admits the new packet at the tail of the
+ * queue.
+ *
+ * The *Random Early Detection (RED)* algorithm works by proactively dropping
+ * more and more input packets as the queue occupancy builds up. When the queue
+ * is full or almost full, RED effectively works as *tail drop*. The *Weighted
+ * RED* algorithm uses a separate set of RED thresholds for each packet color.
+ */
+enum rte_tm_cman_mode {
+	RTE_TM_CMAN_TAIL_DROP = 0, /**< Tail drop */
+	RTE_TM_CMAN_HEAD_DROP, /**< Head drop */
+	RTE_TM_CMAN_WRED, /**< Weighted Random Early Detection (WRED) */
+};
+
+/**
+ * Random Early Detection (RED) profile
+ */
+struct rte_tm_red_params {
+	/**< Minimum queue threshold */
+	uint16_t min_th;
+
+	/**< Maximum queue threshold */
+	uint16_t max_th;
+
+	/**< Inverse of packet marking probability maximum value (maxp), i.e.
+	 * maxp_inv = 1 / maxp
+	 */
+	uint16_t maxp_inv;
+
+	/**< Negated log2 of queue weight (wq), i.e. wq = 1 / (2 ^ wq_log2) */
+	uint16_t wq_log2;
+};
+
+/**
+ * Weighted RED (WRED) profile
+ *
+ * Multiple WRED contexts can share the same WRED profile. Each leaf node with
+ * WRED enabled as its congestion management mode has zero or one private WRED
+ * context (only one leaf node using it) and/or zero, one or several shared
+ * WRED contexts (multiple leaf nodes use the same WRED context). A private
+ * WRED context is used to perform congestion management for a single leaf
+ * node, while a shared WRED context is used to perform congestion management
+ * for a group of leaf nodes.
+ */
+struct rte_tm_wred_params {
+	/**< One set of RED parameters per packet color */
+	struct rte_tm_red_params red_params[RTE_TM_COLORS];
+};
+
+/**
+ * Token bucket
+ */
+struct rte_tm_token_bucket {
+	/**< Token bucket rate (bytes per second) */
+	uint64_t rate;
+
+	/**< Token bucket size (bytes), a.k.a. max burst size */
+	uint64_t size;
+};
+
+/**
+ * Shaper (rate limiter) profile
+ *
+ * Multiple shaper instances can share the same shaper profile. Each node has
+ * zero or one private shaper (only one node using it) and/or zero, one or
+ * several shared shapers (multiple nodes use the same shaper instance).
+ * A private shaper is used to perform traffic shaping for a single node, while
+ * a shared shaper is used to perform traffic shaping for a group of nodes.
+ *
+ * Single rate shapers use a single token bucket. A single rate shaper can be
+ * configured by setting the rate of the committed bucket to zero, which
+ * effectively disables this bucket. The peak bucket is used to limit the rate
+ * and the burst size for the current shaper.
+ *
+ * Dual rate shapers use both the committed and the peak token buckets. The
+ * rate of the peak bucket has to be bigger than zero, as well as greater than
+ * or equal to the rate of the committed bucket.
+ */
+struct rte_tm_shaper_params {
+	/**< Committed token bucket */
+	struct rte_tm_token_bucket committed;
+
+	/**< Peak token bucket */
+	struct rte_tm_token_bucket peak;
+
+	/**< Signed value to be added to the length of each packet for the
+	 * purpose of shaping. Can be used to correct the packet length with
+	 * the framing overhead bytes that are also consumed on the wire (e.g.
+	 * RTE_TM_ETH_FRAMING_OVERHEAD_FCS).
+	 */
+	int32_t pkt_length_adjust;
+};
+
+/**
+ * Node parameters
+ *
+ * Each hierarchy node has multiple inputs (children nodes of the current
+ * parent node) and a single output (which is input to its parent node). The
+ * current node arbitrates its inputs using Strict Priority (SP), Weighted Fair
+ * Queuing (WFQ) and Weighted Round Robin (WRR) algorithms to schedule input
+ * packets on its output while observing its shaping (rate limiting)
+ * constraints.
+ *
+ * Algorithms such as byte-level WRR, Deficit WRR (DWRR), etc are considered
+ * approximations of the ideal of WFQ and are assimilated to WFQ, although an
+ * associated implementation-dependent trade-off on accuracy, performance and
+ * resource usage might exist.
+ *
+ * Children nodes with different priorities are scheduled using the SP
+ * algorithm, based on their priority, with zero (0) as the highest priority.
+ * Children with same priority are scheduled using the WFQ or WRR algorithm,
+ * based on their weight, which is relative to the sum of the weights of all
+ * siblings with same priority, with one (1) as the lowest weight.
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the leaf nodes are predefined with the node IDs of 0 .. (N-1),
+ * where N is the number of TX queues configured for the current Ethernet port.
+ * The non-leaf nodes have their IDs generated by the application.
+ */
+struct rte_tm_node_params {
+	/**< Shaper profile for the private shaper. The absence of the private
+	 * shaper for the current node is indicated by setting this parameter
+	 * to RTE_TM_SHAPER_PROFILE_ID_NONE.
+	 */
+	uint32_t shaper_profile_id;
+
+	/**< User allocated array of valid shared shaper IDs. */
+	uint32_t *shared_shaper_id;
+
+	/**< Number of shared shaper IDs in the *shared_shaper_id* array. */
+	uint32_t n_shared_shapers;
+
+	/**< Mask of statistics counter types to be enabled for this node. This
+	 * needs to be a subset of the statistics counter types available for
+	 * the current node. Any statistics counter type not included in this
+	 * set is to be disabled for the current node.
+	 */
+	uint64_t stats_mask;
+
+	union {
+		/**< Parameters only valid for non-leaf nodes. */
+		struct {
+			/**< For each priority, indicates whether the children
+			 * nodes sharing the same priority are to be scheduled
+			 * by WFQ or by WRR. When NULL, it indicates that WFQ
+			 * is to be used for all priorities. When non-NULL, it
+			 * points to a pre-allocated array of *n_priority*
+			 * elements, with a non-zero value element indicating
+			 * WFQ and a zero value element for WRR.
+			 */
+			int *scheduling_mode_per_priority;
+
+			/**< Number of priorities. */
+			uint32_t n_priorities;
+		} nonleaf;
+
+		/**< Parameters only valid for leaf nodes. */
+		struct {
+			/**< Congestion management mode */
+			enum rte_tm_cman_mode cman;
+
+			/**< WRED parameters (valid when *cman* is WRED). */
+			struct {
+				/**< WRED profile for private WRED context. */
+				uint32_t wred_profile_id;
+
+				/**< User allocated array of shared WRED
+				 * context IDs. The absence of a private WRED
+				 * context for current leaf node is indicated
+				 * by value RTE_TM_WRED_PROFILE_ID_NONE.
+				 */
+				uint32_t *shared_wred_context_id;
+
+				/**< Number of shared WRED context IDs in the
+				 * *shared_wred_context_id* array.
+				 */
+				uint32_t n_shared_wred_contexts;
+			} wred;
+		} leaf;
+	};
+};
+
+/**
+ * Verbose error types.
+ *
+ * Most of them provide the type of the object referenced by struct
+ * rte_tm_error::cause.
+ */
+enum rte_tm_error_type {
+	RTE_TM_ERROR_TYPE_NONE, /**< No error. */
+	RTE_TM_ERROR_TYPE_UNSPECIFIED, /**< Cause unspecified. */
+	RTE_TM_ERROR_TYPE_CAPABILITIES,
+	RTE_TM_ERROR_TYPE_LEVEL_ID,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_GREEN,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
+	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_RATE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PEAK_SIZE,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_PKT_ADJUST_LEN,
+	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARENT_NODE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PRIORITY,
+	RTE_TM_ERROR_TYPE_NODE_WEIGHT,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHAPER_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_SHAPER_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_SHAPERS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_STATS,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SCHEDULING_MODE,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_PRIORITIES,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_CMAN,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_WRED_PROFILE_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_SHARED_WRED_CONTEXT_ID,
+	RTE_TM_ERROR_TYPE_NODE_PARAMS_N_SHARED_WRED_CONTEXTS,
+	RTE_TM_ERROR_TYPE_NODE_ID,
+};
+
+/**
+ * Verbose error structure definition.
+ *
+ * This object is normally allocated by applications and set by PMDs, the
+ * message points to a constant string which does not need to be freed by
+ * the application, however its pointer can be considered valid only as long
+ * as its associated DPDK port remains configured. Closing the underlying
+ * device or unloading the PMD invalidates it.
+ *
+ * Both cause and message may be NULL regardless of the error type.
+ */
+struct rte_tm_error {
+	enum rte_tm_error_type type; /**< Cause field and error type. */
+	const void *cause; /**< Object responsible for the error. */
+	const char *message; /**< Human-readable error message. */
+};
+
+/**
+ * Traffic manager get number of leaf nodes
+ *
+ * Each leaf node sits on on top of a TX queue of the current Ethernet port.
+ * Therefore, the set of leaf nodes is predefined, their number is always equal
+ * to N (where N is the number of TX queues configured for the current port)
+ * and their IDs are 0 .. (N-1).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param n_leaf_nodes
+ *   Number of leaf nodes for the current port.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_get_leaf_nodes(uint8_t port_id,
+	uint32_t *n_leaf_nodes,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node type (i.e. leaf or non-leaf) get
+ *
+ * The leaf nodes have predefined IDs in the range of 0 .. (N-1), where N is
+ * the number of TX queues of the current Ethernet port. The non-leaf nodes
+ * have their IDs generated by the application outside of the above range,
+ * which is reserved for leaf nodes.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID value. Needs to be valid.
+ * @param is_leaf
+ *   Set to non-zero value when node is leaf and to zero otherwise (non-leaf).
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_type_get(uint8_t port_id,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param cap
+ *   Traffic manager capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_capabilities_get(uint8_t port_id,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager level capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param level_id
+ *   The hierarchy level identifier. The value of 0 identifies the level of the
+ *   root node.
+ * @param cap
+ *   Traffic manager level capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_level_capabilities_get(uint8_t port_id,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node capabilities get
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param cap
+ *   Traffic manager node capabilities. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_capabilities_get(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile add
+ *
+ * Create a new WRED profile with ID set to *wred_profile_id*. The new profile
+ * is used to create one or several WRED contexts.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   WRED profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_wred_profile_add(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager WRED profile delete
+ *
+ * Delete an existing WRED profile. This operation fails when there is
+ * currently at least one user (i.e. WRED context) of this WRED profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_wred_profile_delete(uint8_t port_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context add or update
+ *
+ * When *shared_wred_context_id* is invalid, a new WRED context with this ID is
+ * created by using the WRED profile identified by *wred_profile_id*.
+ *
+ * When *shared_wred_context_id* is valid, this WRED context is no longer using
+ * the profile previously assigned to it and is updated to use the profile
+ * identified by *wred_profile_id*.
+ *
+ * A valid shared WRED context can be assigned to several hierarchy leaf nodes
+ * configured to use WRED as the congestion management mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID
+ * @param wred_profile_id
+ *   WRED profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_wred_context_add_update(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared WRED context delete
+ *
+ * Delete an existing shared WRED context. This operation fails when there is
+ * currently at least one user (i.e. hierarchy leaf node) of this shared WRED
+ * context.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_wred_context_delete(uint8_t port_id,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile add
+ *
+ * Create a new shaper profile with ID set to *shaper_profile_id*. The new
+ * shaper profile is used to create one or several shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the new profile. Needs to be unused.
+ * @param profile
+ *   Shaper profile parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shaper_profile_add(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shaper profile delete
+ *
+ * Delete an existing shaper profile. This operation fails when there is
+ * currently at least one user (i.e. shaper) of this shaper profile.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shaper_profile_delete(uint8_t port_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper add or update
+ *
+ * When *shared_shaper_id* is not a valid shared shaper ID, a new shared shaper
+ * with this ID is created using the shaper profile identified by
+ * *shaper_profile_id*.
+ *
+ * When *shared_shaper_id* is a valid shared shaper ID, this shared shaper is
+ * no longer using the shaper profile previously assigned to it and is updated
+ * to use the shaper profile identified by *shaper_profile_id*.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID
+ * @param shaper_profile_id
+ *   Shaper profile ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_shaper_add_update(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager shared shaper delete
+ *
+ * Delete an existing shared shaper. This operation fails when there is
+ * currently at least one user (i.e. hierarchy node) of this shared shaper.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be the valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_shared_shaper_delete(uint8_t port_id,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node add
+ *
+ * Create new node and connect it as child of an existing node. The new node is
+ * further identified by *node_id*, which needs to be unused by any of the
+ * existing nodes. The parent node is identified by *parent_node_id*, which
+ * needs to be the valid ID of an existing non-leaf node. The parent node is
+ * going to use the provided SP *priority* and WFQ/WRR *weight* to schedule its
+ * new child node.
+ *
+ * This function has to be called for both leaf and non-leaf nodes. In the case
+ * of leaf nodes (i.e. *node_id* is within the range of 0 .. (N-1), with N as
+ * the number of configured TX queues of the current port), the leaf node is
+ * configured rather than created (as the set of leaf nodes is predefined) and
+ * it is also connected as child of an existing node.
+ *
+ * The first node that is added becomes the root node and all the nodes that
+ * are subsequently added have to be added as descendants of the root node. The
+ * parent of the root node has to be specified as RTE_TM_NODE_ID_NULL and there
+ * can only be one node with this parent ID (i.e. the root node). Further
+ * restrictions for root node: needs to be non-leaf, its private shaper profile
+ * needs to be valid and single rate, cannot use any shared shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be unused by any of the existing nodes.
+ * @param parent_node_id
+ *   Parent node ID. Needs to be the valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is one. Used by the WFQ/WRR
+ *   algorithm running on the parent of the current node for scheduling this
+ *   child node.
+ * @param params
+ *   Node parameters. Needs to be pre-allocated and valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_add(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node delete
+ *
+ * Delete an existing node. This operation fails when this node currently has
+ * at least one user (i.e. child node).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_delete(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node suspend
+ *
+ * Suspend an existing node.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_suspend(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node resume
+ *
+ * Resume an existing node that was previously suspended.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_resume(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager hierarchy set
+ *
+ * This function is called during the port initialization phase (before the
+ * Ethernet port is started) to freeze the start-up hierarchy.
+ *
+ * This function fails when the currently configured hierarchy is not supported
+ * by the Ethernet port, in which case the user can abort or try out another
+ * hierarchy configuration (e.g. a hierarchy with less leaf nodes), which can
+ * be build from scratch (when *clear_on_fail* is enabled) or by modifying the
+ * existing hierarchy configuration (when *clear_on_fail* is disabled).
+ *
+ * Note that, even when the configured hierarchy is supported (so this function
+ * is successful), the Ethernet port start might still fail due to e.g. not
+ * enough memory being available in the system, etc.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param clear_on_fail
+ *   On function call failure, hierarchy is cleared when this parameter is
+ *   non-zero and preserved when this parameter is equal to zero.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_hierarchy_set(uint8_t port_id,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node parent update
+ *
+ * Restriction for root node: its parent cannot be changed.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param parent_node_id
+ *   Node ID for the new parent. Needs to be valid.
+ * @param priority
+ *   Node priority. The highest node priority is zero. Used by the SP algorithm
+ *   running on the parent of the current node for scheduling this child node.
+ * @param weight
+ *   Node weight. The node weight is relative to the weight sum of all siblings
+ *   that have the same priority. The lowest weight is zero. Used by the
+ *   WFQ/WRR algorithm running on the parent of the current node for scheduling
+ *   this child node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_parent_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private shaper update
+ *
+ * Restriction for root node: its private shaper profile needs to be valid and
+ * single rate.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shaper_profile_id
+ *   Shaper profile ID for the private shaper of the current node. Needs to be
+ *   either valid shaper profile ID or RTE_TM_SHAPER_PROFILE_ID_NONE, with
+ *   the latter disabling the private shaper of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared shapers update
+ *
+ * Restriction for root node: cannot use any shared rate shapers.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param shared_shaper_id
+ *   Shared shaper ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared shaper to current node or to zero
+ *   to delete this shared shaper from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shared_shaper_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node enabled statistics counters update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats_mask
+ *   Mask of statistics counter types to be enabled for the current node. This
+ *   needs to be a subset of the statistics counter types available for the
+ *   current node. Any statistics counter type not included in this set is to
+ *   be disabled for the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_stats_update(uint8_t port_id,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node scheduling mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param scheduling_mode_per_priority
+ *   For each priority, indicates whether the children nodes sharing the same
+ *   priority are to be scheduled by WFQ or by WRR. When NULL, it indicates
+ *   that WFQ is to be used for all priorities. When non-NULL, it points to a
+ *   pre-allocated array of *n_priority* elements, with a non-zero value
+ *   element indicating WFQ and a zero value element for WRR.
+ * @param n_priorities
+ *   Number of priorities.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_scheduling_mode_update(uint8_t port_id,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node congestion management mode update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param cman
+ *   Congestion management mode.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_cman_update(uint8_t port_id,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node private WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param wred_profile_id
+ *   WRED profile ID for the private WRED context of the current node. Needs to
+ *   be either valid WRED profile ID or RTE_TM_WRED_PROFILE_ID_NONE, with
+ *   the latter disabling the private WRED context of the current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node shared WRED context update
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid leaf node ID.
+ * @param shared_wred_context_id
+ *   Shared WRED context ID. Needs to be valid.
+ * @param add
+ *   Set to non-zero value to add this shared WRED context to current node or
+ *   to zero to delete this shared WRED context from current node.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_shared_wred_context_update(uint8_t port_id,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager node statistics counters read
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param node_id
+ *   Node ID. Needs to be valid.
+ * @param stats
+ *   When non-NULL, it contains the current value for the statistics counters
+ *   enabled for the current node.
+ * @param stats_mask
+ *   When non-NULL, it contains the mask of statistics counter types that are
+ *   currently enabled for this node, indicating which of the counters
+ *   retrieved with the *stats* structure are valid.
+ * @param clear
+ *   When this parameter has a non-zero value, the statistics counters are
+ *   cleared (i.e. set to zero) immediately after they have been read,
+ *   otherwise the statistics counters are left untouched.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_node_stats_read(uint8_t port_id,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - VLAN DEI (IEEE 802.1Q)
+ *
+ * IEEE 802.1p maps the traffic class to the VLAN Priority Code Point (PCP)
+ * field (3 bits), while IEEE 802.1q maps the drop priority to the VLAN Drop
+ * Eligible Indicator (DEI) field (1 bit), which was previously named Canonical
+ * Format Indicator (CFI).
+ *
+ * All VLAN frames of a given color get their DEI bit set if marking is enabled
+ * for this color; otherwise, their DEI bit is left as is (either set or not).
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_vlan_dei(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 ECN (IETF RFC 3168)
+ *
+ * IETF RFCs 2474 and 3168 reorganize the IPv4 Type of Service (TOS) field
+ * (8 bits) and the IPv6 Traffic Class (TC) field (8 bits) into Differentiated
+ * Services Codepoint (DSCP) field (6 bits) and Explicit Congestion
+ * Notification (ECN) field (2 bits). The DSCP field is typically used to
+ * encode the traffic class and/or drop priority (RFC 2597), while the ECN
+ * field is used by RFC 3168 to implement a congestion notification mechanism
+ * to be leveraged by transport layer protocols such as TCP and SCTP that have
+ * congestion control mechanisms.
+ *
+ * When congestion is experienced, as alternative to dropping the packet,
+ * routers can change the ECN field of input packets from 2'b01 or 2'b10
+ * (values indicating that source endpoint is ECN-capable) to 2'b11 (meaning
+ * that congestion is experienced). The destination endpoint can use the
+ * ECN-Echo (ECE) TCP flag to relay the congestion indication back to the
+ * source endpoint, which acknowledges it back to the destination endpoint with
+ * the Congestion Window Reduced (CWR) TCP flag.
+ *
+ * All IPv4/IPv6 packets of a given color with ECN set to 2’b01 or 2’b10
+ * carrying TCP or SCTP have their ECN set to 2’b11 if the marking feature is
+ * enabled for the current color, otherwise the ECN field is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_ip_ecn(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+/**
+ * Traffic manager packet marking - IPv4 / IPv6 DSCP (IETF RFC 2597)
+ *
+ * IETF RFC 2597 maps the traffic class and the drop priority to the IPv4/IPv6
+ * Differentiated Services Codepoint (DSCP) field (6 bits). Here are the DSCP
+ * values proposed by this RFC:
+ *
+ *                       Class 1    Class 2    Class 3    Class 4
+ *                     +----------+----------+----------+----------+
+ *    Low Drop Prec    |  001010  |  010010  |  011010  |  100010  |
+ *    Medium Drop Prec |  001100  |  010100  |  011100  |  100100  |
+ *    High Drop Prec   |  001110  |  010110  |  011110  |  100110  |
+ *                     +----------+----------+----------+----------+
+ *
+ * There are 4 traffic classes (classes 1 .. 4) encoded by DSCP bits 1 and 2,
+ * as well as 3 drop priorities (low/medium/high) encoded by DSCP bits 3 and 4.
+ *
+ * All IPv4/IPv6 packets have their color marked into DSCP bits 3 and 4 as
+ * follows: green mapped to Low Drop Precedence (2’b01), yellow to Medium
+ * (2’b10) and red to High (2’b11). Marking needs to be explicitly enabled
+ * for each color; when not enabled for a given color, the DSCP field of all
+ * packets with that color is left as is.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param mark_green
+ *   Set to non-zero value to enable marking of green packets and to zero to
+ *   disable it.
+ * @param mark_yellow
+ *   Set to non-zero value to enable marking of yellow packets and to zero to
+ *   disable it.
+ * @param mark_red
+ *   Set to non-zero value to enable marking of red packets and to zero to
+ *   disable it.
+ * @param error
+ *   Error details. Filled in only on error, when not NULL.
+ * @return
+ *   0 on success, non-zero error code otherwise.
+ */
+int
+rte_tm_mark_ip_dscp(uint8_t port_id,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_H__ */
diff --git a/lib/librte_ether/rte_tm_driver.h b/lib/librte_ether/rte_tm_driver.h
new file mode 100644
index 0000000..b3c9c15
--- /dev/null
+++ b/lib/librte_ether/rte_tm_driver.h
@@ -0,0 +1,365 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __INCLUDE_RTE_TM_DRIVER_H__
+#define __INCLUDE_RTE_TM_DRIVER_H__
+
+/**
+ * @file
+ * RTE Generic Traffic Manager API (Driver Side)
+ *
+ * This file provides implementation helpers for internal use by PMDs, they
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ */
+
+#include <stdint.h>
+
+#include <rte_errno.h>
+#include "rte_ethdev.h"
+#include "rte_tm.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef int (*rte_tm_node_type_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *is_leaf,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node type get */
+
+typedef int (*rte_tm_capabilities_get_t)(struct rte_eth_dev *dev,
+	struct rte_tm_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager capabilities get */
+
+typedef int (*rte_tm_level_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t level_id,
+	struct rte_tm_level_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager level capabilities get */
+
+typedef int (*rte_tm_node_capabilities_get_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_capabilities *cap,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node capabilities get */
+
+typedef int (*rte_tm_wred_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_wred_params *profile,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager WRED profile add */
+
+typedef int (*rte_tm_wred_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager WRED profile delete */
+
+typedef int (*rte_tm_shared_wred_context_add_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared WRED context add */
+
+typedef int (*rte_tm_shared_wred_context_delete_t)(
+	struct rte_eth_dev *dev,
+	uint32_t shared_wred_context_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared WRED context delete */
+
+typedef int (*rte_tm_shaper_profile_add_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_shaper_params *profile,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shaper profile add */
+
+typedef int (*rte_tm_shaper_profile_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shaper profile delete */
+
+typedef int (*rte_tm_shared_shaper_add_update_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared shaper add/update */
+
+typedef int (*rte_tm_shared_shaper_delete_t)(struct rte_eth_dev *dev,
+	uint32_t shared_shaper_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager shared shaper delete */
+
+typedef int (*rte_tm_node_add_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_node_params *params,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node add */
+
+typedef int (*rte_tm_node_delete_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node delete */
+
+typedef int (*rte_tm_node_suspend_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node suspend */
+
+typedef int (*rte_tm_node_resume_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node resume */
+
+typedef int (*rte_tm_hierarchy_set_t)(struct rte_eth_dev *dev,
+	int clear_on_fail,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager hierarchy set */
+
+typedef int (*rte_tm_node_parent_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t parent_node_id,
+	uint32_t priority,
+	uint32_t weight,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node parent update */
+
+typedef int (*rte_tm_node_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shaper_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node shaper update */
+
+typedef int (*rte_tm_node_shared_shaper_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_shaper_id,
+	int32_t add,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node shaper update */
+
+typedef int (*rte_tm_node_stats_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint64_t stats_mask,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node stats update */
+
+typedef int (*rte_tm_node_scheduling_mode_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	int *scheduling_mode_per_priority,
+	uint32_t n_priorities,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node scheduling mode update */
+
+typedef int (*rte_tm_node_cman_update_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	enum rte_tm_cman_mode cman,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node congestion management mode update */
+
+typedef int (*rte_tm_node_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t wred_profile_id,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WRED context update */
+
+typedef int (*rte_tm_node_shared_wred_context_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t node_id,
+	uint32_t shared_wred_context_id,
+	int add,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager node WRED context update */
+
+typedef int (*rte_tm_node_stats_read_t)(struct rte_eth_dev *dev,
+	uint32_t node_id,
+	struct rte_tm_node_stats *stats,
+	uint64_t *stats_mask,
+	int clear,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager read stats counters for specific node */
+
+typedef int (*rte_tm_mark_vlan_dei_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - VLAN DEI */
+
+typedef int (*rte_tm_mark_ip_ecn_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - IPv4/IPv6 ECN */
+
+typedef int (*rte_tm_mark_ip_dscp_t)(struct rte_eth_dev *dev,
+	int mark_green,
+	int mark_yellow,
+	int mark_red,
+	struct rte_tm_error *error);
+/**< @internal Traffic manager packet marking - IPv4/IPv6 DSCP */
+
+struct rte_tm_ops {
+	/** Traffic manager node type get */
+	rte_tm_node_type_get_t node_type_get;
+
+	/** Traffic manager capabilities_get */
+	rte_tm_capabilities_get_t capabilities_get;
+	/** Traffic manager level capabilities_get */
+	rte_tm_level_capabilities_get_t level_capabilities_get;
+	/** Traffic manager node capabilities get */
+	rte_tm_node_capabilities_get_t node_capabilities_get;
+
+	/** Traffic manager WRED profile add */
+	rte_tm_wred_profile_add_t wred_profile_add;
+	/** Traffic manager WRED profile delete */
+	rte_tm_wred_profile_delete_t wred_profile_delete;
+	/** Traffic manager shared WRED context add/update */
+	rte_tm_shared_wred_context_add_update_t
+		shared_wred_context_add_update;
+	/** Traffic manager shared WRED context delete */
+	rte_tm_shared_wred_context_delete_t
+		shared_wred_context_delete;
+
+	/** Traffic manager shaper profile add */
+	rte_tm_shaper_profile_add_t shaper_profile_add;
+	/** Traffic manager shaper profile delete */
+	rte_tm_shaper_profile_delete_t shaper_profile_delete;
+	/** Traffic manager shared shaper add/update */
+	rte_tm_shared_shaper_add_update_t shared_shaper_add_update;
+	/** Traffic manager shared shaper delete */
+	rte_tm_shared_shaper_delete_t shared_shaper_delete;
+
+	/** Traffic manager node add */
+	rte_tm_node_add_t node_add;
+	/** Traffic manager node delete */
+	rte_tm_node_delete_t node_delete;
+	/** Traffic manager node suspend */
+	rte_tm_node_suspend_t node_suspend;
+	/** Traffic manager node resume */
+	rte_tm_node_resume_t node_resume;
+	/** Traffic manager hierarchy set */
+	rte_tm_hierarchy_set_t hierarchy_set;
+
+	/** Traffic manager node parent update */
+	rte_tm_node_parent_update_t node_parent_update;
+	/** Traffic manager node shaper update */
+	rte_tm_node_shaper_update_t node_shaper_update;
+	/** Traffic manager node shared shaper update */
+	rte_tm_node_shared_shaper_update_t node_shared_shaper_update;
+	/** Traffic manager node stats update */
+	rte_tm_node_stats_update_t node_stats_update;
+	/** Traffic manager node scheduling mode update */
+	rte_tm_node_scheduling_mode_update_t node_scheduling_mode_update;
+	/** Traffic manager node congestion management mode update */
+	rte_tm_node_cman_update_t node_cman_update;
+	/** Traffic manager node WRED context update */
+	rte_tm_node_wred_context_update_t node_wred_context_update;
+	/** Traffic manager node shared WRED context update */
+	rte_tm_node_shared_wred_context_update_t
+		node_shared_wred_context_update;
+	/** Traffic manager read statistics counters for current node */
+	rte_tm_node_stats_read_t node_stats_read;
+
+	/** Traffic manager packet marking - VLAN DEI */
+	rte_tm_mark_vlan_dei_t mark_vlan_dei;
+	/** Traffic manager packet marking - IPv4/IPv6 ECN */
+	rte_tm_mark_ip_ecn_t mark_ip_ecn;
+	/** Traffic manager packet marking - IPv4/IPv6 DSCP */
+	rte_tm_mark_ip_dscp_t mark_ip_dscp;
+};
+
+/**
+ * Initialize generic error structure.
+ *
+ * This function also sets rte_errno to a given value.
+ *
+ * @param error
+ *   Pointer to error structure (may be NULL).
+ * @param code
+ *   Related error code (rte_errno).
+ * @param type
+ *   Cause field and error type.
+ * @param cause
+ *   Object responsible for the error.
+ * @param message
+ *   Human-readable error message.
+ *
+ * @return
+ *   Error code.
+ */
+static inline int
+rte_tm_error_set(struct rte_tm_error *error,
+		   int code,
+		   enum rte_tm_error_type type,
+		   const void *cause,
+		   const char *message)
+{
+	if (error) {
+		*error = (struct rte_tm_error){
+			.type = type,
+			.cause = cause,
+			.message = message,
+		};
+	}
+	rte_errno = code;
+	return code;
+}
+
+/**
+ * Get generic traffic manager operations structure from a port
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param error
+ *   Error details
+ *
+ * @return
+ *   The traffic manager operations structure associated with port_id on
+ *   success, NULL otherwise.
+ */
+const struct rte_tm_ops *
+rte_tm_ops_get(uint8_t port_id, struct rte_tm_error *error);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __INCLUDE_RTE_TM_DRIVER_H__ */
-- 
2.5.0

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements
  2017-03-01  7:47  1%       ` [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-06  9:10  2%         ` David Hunt
  2017-03-06  9:10  1%           ` [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-06  9:10  2%           ` [dpdk-dev] [PATCH v9 09/18] " David Hunt
  0 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v9 changes:
   * fixed symbol versioning so it will compile on CentOS and RedHat

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new burst oriented distributor structs
[04/18] lib: add new distributor code
[05/18] lib: add SIMD flow matching to distributor
[06/18] test/distributor: extra params for autotests
[07/18] lib: switch distributor over to new API
[08/18] lib: make v20 header file private
[09/18] lib: add symbol versioning to distributor
[10/18] test: test single and burst distributor API
[11/18] test: add perf test for distributor burst mode
[12/18] examples/distributor: allow for extra stats
[13/18] sample: distributor: wait for ports to come up
[14/18] examples/distributor: give distributor a core
[15/18] examples/distributor: limit number of Tx rings
[16/18] examples/distributor: give Rx thread a core
[17/18] doc: distributor library changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files
  2017-03-06  9:10  2%         ` [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements David Hunt
@ 2017-03-06  9:10  1%           ` David Hunt
  2017-03-15  6:19  2%             ` [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements David Hunt
  2017-03-06  9:10  2%           ` [dpdk-dev] [PATCH v9 09/18] " David Hunt
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-06  9:10  2%         ` [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements David Hunt
  2017-03-06  9:10  1%           ` [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-06  9:10  2%           ` David Hunt
  2017-03-10 16:22  0%             ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
 lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 +++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++
 5 files changed, 162 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..c4128a0 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -44,6 +45,7 @@
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
 #include "rte_distributor_v20.h"
+#include "rte_distributor_v1705.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
@@ -57,7 +59,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
@@ -102,9 +104,14 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count),
+		rte_distributor_request_pkt_v1705);
 
 int
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -138,9 +145,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts),
+		rte_distributor_poll_pkt_v1705);
 
 int
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -168,9 +179,14 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count),
+		rte_distributor_get_pkt_v1705);
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -197,6 +213,10 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num),
+		rte_distributor_return_pkt_v1705);
 
 /**** APIs called on distributor core ***/
 
@@ -342,7 +362,7 @@ release(struct rte_distributor *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -476,10 +496,14 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs),
+		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -504,6 +528,10 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs),
+		rte_distributor_returned_pkts_v1705);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -525,7 +553,7 @@ total_outstanding(const struct rte_distributor *d)
  * queued up.
  */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
@@ -549,10 +577,13 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
+		rte_distributor_returned_pkts_v1705);
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
@@ -565,10 +596,13 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
+		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
 struct rte_distributor *
-rte_distributor_create(const char *name,
+rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
@@ -638,3 +672,8 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, _v1705, 17.05);
+MAP_STATIC_SYMBOL(struct rte_distributor *rte_distributor_create(
+		const char *name, unsigned int socket_id,
+		unsigned int num_workers, unsigned int alg_type),
+		rte_distributor_create_v1705);
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..81b2691
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,89 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V1705_H_
+#define _RTE_DISTRIB_V1705_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3 1/2] ethdev: add capability control API
  @ 2017-03-06 20:41  3%       ` Wiles, Keith
  0 siblings, 0 replies; 200+ results
From: Wiles, Keith @ 2017-03-06 20:41 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Dumitrescu, Cristian, DPDK, jerin.jacob,
	balasubramanian.manoharan, hemant.agrawal, shreyansh.jain,
	Richardson, Bruce


> On Mar 6, 2017, at 2:21 PM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
>> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
>>> 2017-03-06 16:35, Dumitrescu, Cristian:
>>>>>> +int rte_eth_dev_capability_ops_get(uint8_t port_id,
>>>>>> +	enum rte_eth_capability cap, void *arg);
>>>>> 
>>>>> What is the benefit of getting different kind of capabilities with
>>>>> the same function?
>>>>> enum + void* = ioctl
>>>>> A self-explanatory API should have a dedicated function for each kind
>>>>> of features with different argument types.
>>>> 
>>>> The advantage is providing a standard interface to query the capabilities of
>>> the device rather than having each capability provide its own mechanism in a
>>> slightly different way.
>>>> 
>>>> IMO this mechanism is of great help to guide the developers of future
>>> ethdev features on the clean path to add new features in a modular way,
>>> extending the ethdev functionality while doing so in a separate name space
>>> and file (that's why I tend to call this a plugin-like mechanism), as opposed to
>>> the current monolithic approach for ethdev, where we have 100+ API
>>> functions in a single name space and that are split into functional groups just
>>> by blank lines in the header file. It is simply the generalization of the
>>> mechanism introduced by rte_flow in release 17.02 (so all the credit should
>>> go to Adrien and not me).
>>>> 
>>>> IMO, having a standard function as above it cleaner than having a separate
>>> and slightly different function per feature. People can quickly see the set of
>>> standard ethdev capabilities and which ones are supported by a specific
>>> device. Between A) and B) below, I definitely prefer A):
>>>> A) status = rte_eth_dev_capability_ops_get(port_id,
>>> RTE_ETH_CABABILITY_TM, &tm_ops);
>>>> B) status = rte_eth_dev_tm_ops_get(port_id, &tm_ops);
>>> 
>>> I prefer B because instead of tm_ops, you can use some specific tm
>>> arguments,
>>> show their types and properly document each parameter.
>> 
>> Note that rte_flow already returns the flow ops as a void * with no strong argument type checking (approach A from above). Are you saying this is wrong?
>> 
>> 	rte_eth_dev_filter_ctrl(port_id, RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_GET, void *eth_flow_ops);
>> 
>> Personally, I am in favour of allowing the standard interface at the expense of strong build-time type checking. Especially that this API function is between ethdev and the drivers, as opposed to between app and ethdev.
> 
> rte_eth_dev_filter_ctrl is going to be specialized in rte_flow operations.
> I agree with you on having independent API blocks in ethdev like rte_flow.
> But this function rte_eth_dev_capability_ops_get that you propose would be
> cross-blocks. I don't see the benefit.
> I especially don't think there is a sense in the enum
> 	enum rte_eth_capability {
> 		RTE_ETH_CAPABILITY_FLOW = 0, /**< Flow */
> 		RTE_ETH_CAPABILITY_TM, /**< Traffic Manager */
> 		RTE_ETH_CAPABILITY_MAX
> 	}
> 
> I won't debate more on this. We have to read opinions of other reviewers.

The benefit is providing a generic API, which we do not need to alter in the future (causing ABI breakage). The PMD can add a capability to the list if not present already and then provide a API structure for the feature.

Being able to add features without having to change DPDK maybe a strong feature for companies that have special needs for its application. They just need to add a rte_eth_capability enum in a range that they want to control (which does not mean they need to change the above structure) and they can provide private features to the application especially if they are very specific features to some HW. I do not like private features, but I also do not want to stick just any old API in DPDK for any given special feature.

Today the structure is just APIs, but it could also provide some special or specific information to the application in that structure or via an API call.

Regards,
Keith

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 04/14] ring: remove debug setting
    2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 01/14] ring: remove split cacheline build setting Bruce Richardson
  2017-03-07 11:32  3%   ` [dpdk-dev] [PATCH v2 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
@ 2017-03-07 11:32  2%   ` Bruce Richardson
  2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

The debug option only provided statistics to the user, most of
which could be tracked by the application itself. Remove this as a
compile time option, and feature, simplifying the code.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                     |   1 -
 doc/guides/prog_guide/ring_lib.rst     |   7 -
 doc/guides/rel_notes/release_17_05.rst |   1 +
 lib/librte_ring/rte_ring.c             |  41 ----
 lib/librte_ring/rte_ring.h             |  97 +-------
 test/test/test_ring.c                  | 410 ---------------------------------
 6 files changed, 13 insertions(+), 544 deletions(-)

diff --git a/config/common_base b/config/common_base
index 099ffda..b3d8272 100644
--- a/config/common_base
+++ b/config/common_base
@@ -447,7 +447,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_LIBRTE_RING_DEBUG=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index 9f69753..d4ab502 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -110,13 +110,6 @@ Once an enqueue operation reaches the high water mark, the producer is notified,
 
 This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE.
 
-Debug
-~~~~~
-
-When debug is enabled (CONFIG_RTE_LIBRTE_RING_DEBUG is set),
-the library stores some per-ring statistic counters about the number of enqueues/dequeues.
-These statistics are per-core to avoid concurrent accesses or atomic operations.
-
 Use Cases
 ---------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index ea45e0c..e0ebd71 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -116,6 +116,7 @@ API Changes
   have been made to it:
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
+  * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 80fc356..90ee63f 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -131,12 +131,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 			  RTE_CACHE_LINE_MASK) != 0);
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_LIBRTE_RING_DEBUG
-	RTE_BUILD_BUG_ON((sizeof(struct rte_ring_debug_stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, stats) &
-			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 
 	/* init the ring structure */
 	memset(r, 0, sizeof(*r));
@@ -284,11 +278,6 @@ rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
 {
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats sum;
-	unsigned lcore_id;
-#endif
-
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
 	fprintf(f, "  size=%"PRIu32"\n", r->size);
@@ -302,36 +291,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		fprintf(f, "  watermark=0\n");
 	else
 		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
-
-	/* sum and dump statistics */
-#ifdef RTE_LIBRTE_RING_DEBUG
-	memset(&sum, 0, sizeof(sum));
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-		sum.enq_success_bulk += r->stats[lcore_id].enq_success_bulk;
-		sum.enq_success_objs += r->stats[lcore_id].enq_success_objs;
-		sum.enq_quota_bulk += r->stats[lcore_id].enq_quota_bulk;
-		sum.enq_quota_objs += r->stats[lcore_id].enq_quota_objs;
-		sum.enq_fail_bulk += r->stats[lcore_id].enq_fail_bulk;
-		sum.enq_fail_objs += r->stats[lcore_id].enq_fail_objs;
-		sum.deq_success_bulk += r->stats[lcore_id].deq_success_bulk;
-		sum.deq_success_objs += r->stats[lcore_id].deq_success_objs;
-		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
-		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
-	}
-	fprintf(f, "  size=%"PRIu32"\n", r->size);
-	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
-	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
-	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
-	fprintf(f, "  enq_quota_objs=%"PRIu64"\n", sum.enq_quota_objs);
-	fprintf(f, "  enq_fail_bulk=%"PRIu64"\n", sum.enq_fail_bulk);
-	fprintf(f, "  enq_fail_objs=%"PRIu64"\n", sum.enq_fail_objs);
-	fprintf(f, "  deq_success_bulk=%"PRIu64"\n", sum.deq_success_bulk);
-	fprintf(f, "  deq_success_objs=%"PRIu64"\n", sum.deq_success_objs);
-	fprintf(f, "  deq_fail_bulk=%"PRIu64"\n", sum.deq_fail_bulk);
-	fprintf(f, "  deq_fail_objs=%"PRIu64"\n", sum.deq_fail_objs);
-#else
-	fprintf(f, "  no statistics available\n");
-#endif
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 61c0982..af7b7d4 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -109,24 +109,6 @@ enum rte_ring_queue_behavior {
 	RTE_RING_QUEUE_VARIABLE   /* Enq/Deq as many items as possible from ring */
 };
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-/**
- * A structure that stores the ring statistics (per-lcore).
- */
-struct rte_ring_debug_stats {
-	uint64_t enq_success_bulk; /**< Successful enqueues number. */
-	uint64_t enq_success_objs; /**< Objects successfully enqueued. */
-	uint64_t enq_quota_bulk;   /**< Successful enqueues above watermark. */
-	uint64_t enq_quota_objs;   /**< Objects enqueued above watermark. */
-	uint64_t enq_fail_bulk;    /**< Failed enqueues number. */
-	uint64_t enq_fail_objs;    /**< Objects that failed to be enqueued. */
-	uint64_t deq_success_bulk; /**< Successful dequeues number. */
-	uint64_t deq_success_objs; /**< Objects successfully dequeued. */
-	uint64_t deq_fail_bulk;    /**< Failed dequeues number. */
-	uint64_t deq_fail_objs;    /**< Objects that failed to be dequeued. */
-} __rte_cache_aligned;
-#endif
-
 #define RTE_RING_MZ_PREFIX "RG_"
 /**< The maximum length of a ring name. */
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
@@ -187,10 +169,6 @@ struct rte_ring {
 	/** Ring consumer status. */
 	struct rte_ring_headtail cons __rte_aligned(CONS_ALIGN);
 
-#ifdef RTE_LIBRTE_RING_DEBUG
-	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-#endif
-
 	void *ring[] __rte_cache_aligned;   /**< Memory space of ring starts here.
 	                                     * not volatile so need to be careful
 	                                     * about compiler re-ordering */
@@ -202,27 +180,6 @@ struct rte_ring {
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
- * @internal When debug is enabled, store ring statistics.
- * @param r
- *   A pointer to the ring.
- * @param name
- *   The name of the statistics field to increment in the ring.
- * @param n
- *   The number to add to the object-oriented statistics.
- */
-#ifdef RTE_LIBRTE_RING_DEBUG
-#define __RING_STAT_ADD(r, name, n) do {                        \
-		unsigned __lcore_id = rte_lcore_id();           \
-		if (__lcore_id < RTE_MAX_LCORE) {               \
-			r->stats[__lcore_id].name##_objs += n;  \
-			r->stats[__lcore_id].name##_bulk += 1;  \
-		}                                               \
-	} while(0)
-#else
-#define __RING_STAT_ADD(r, name, n) do {} while(0)
-#endif
-
-/**
  * Calculate the memory size needed for a ring
  *
  * This function returns the number of bytes needed for a ring, given
@@ -463,17 +420,12 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOBUFS;
-			}
 			else {
 				/* No free entry available */
-				if (unlikely(free_entries == 0)) {
-					__RING_STAT_ADD(r, enq_fail, n);
+				if (unlikely(free_entries == 0))
 					return 0;
-				}
-
 				n = free_entries;
 			}
 		}
@@ -488,15 +440,11 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	/*
 	 * If there are other enqueues in progress that preceded us,
@@ -560,17 +508,12 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, enq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOBUFS;
-		}
 		else {
 			/* No free entry available */
-			if (unlikely(free_entries == 0)) {
-				__RING_STAT_ADD(r, enq_fail, n);
+			if (unlikely(free_entries == 0))
 				return 0;
-			}
-
 			n = free_entries;
 		}
 	}
@@ -583,15 +526,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
-		__RING_STAT_ADD(r, enq_quota, n);
-	}
-	else {
+	else
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-		__RING_STAT_ADD(r, enq_success, n);
-	}
 
 	r->prod.tail = prod_next;
 	return ret;
@@ -655,16 +594,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED) {
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (behavior == RTE_RING_QUEUE_FIXED)
 				return -ENOENT;
-			}
 			else {
-				if (unlikely(entries == 0)){
-					__RING_STAT_ADD(r, deq_fail, n);
+				if (unlikely(entries == 0))
 					return 0;
-				}
-
 				n = entries;
 			}
 		}
@@ -694,7 +628,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 			sched_yield();
 		}
 	}
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -741,16 +674,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	entries = prod_tail - cons_head;
 
 	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED) {
-			__RING_STAT_ADD(r, deq_fail, n);
+		if (behavior == RTE_RING_QUEUE_FIXED)
 			return -ENOENT;
-		}
 		else {
-			if (unlikely(entries == 0)){
-				__RING_STAT_ADD(r, deq_fail, n);
+			if (unlikely(entries == 0))
 				return 0;
-			}
-
 			n = entries;
 		}
 	}
@@ -762,7 +690,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	DEQUEUE_PTRS();
 	rte_smp_rmb();
 
-	__RING_STAT_ADD(r, deq_success, n);
 	r->cons.tail = cons_next;
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
 }
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 5f09097..3891f5d 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -763,412 +763,6 @@ test_ring_burst_basic(void)
 	return -1;
 }
 
-static int
-test_ring_stats(void)
-{
-
-#ifndef RTE_LIBRTE_RING_DEBUG
-	printf("Enable RTE_LIBRTE_RING_DEBUG to test ring stats.\n");
-	return 0;
-#else
-	void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL;
-	int ret;
-	unsigned i;
-	unsigned num_items            = 0;
-	unsigned failed_enqueue_ops   = 0;
-	unsigned failed_enqueue_items = 0;
-	unsigned failed_dequeue_ops   = 0;
-	unsigned failed_dequeue_items = 0;
-	unsigned last_enqueue_ops     = 0;
-	unsigned last_enqueue_items   = 0;
-	unsigned last_quota_ops       = 0;
-	unsigned last_quota_items     = 0;
-	unsigned lcore_id = rte_lcore_id();
-	struct rte_ring_debug_stats *ring_stats = &r->stats[lcore_id];
-
-	printf("Test the ring stats.\n");
-
-	/* Reset the watermark in case it was set in another test. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Allocate some dummy object pointers. */
-	src = malloc(RING_SIZE*2*sizeof(void *));
-	if (src == NULL)
-		goto fail;
-
-	for (i = 0; i < RING_SIZE*2 ; i++) {
-		src[i] = (void *)(unsigned long)i;
-	}
-
-	/* Allocate some memory for copied objects. */
-	dst = malloc(RING_SIZE*2*sizeof(void *));
-	if (dst == NULL)
-		goto fail;
-
-	memset(dst, 0, RING_SIZE*2*sizeof(void *));
-
-	/* Set the head and tail pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	/* Do Enqueue tests. */
-	printf("Test the dequeue stats.\n");
-
-	/* Fill the ring up to RING_SIZE -1. */
-	printf("Fill the ring.\n");
-	for (i = 0; i< (RING_SIZE/MAX_BULK); i++) {
-		rte_ring_sp_enqueue_burst(r, cur_src, MAX_BULK);
-		cur_src += MAX_BULK;
-	}
-
-	/* Adjust for final enqueue = MAX_BULK -1. */
-	cur_src--;
-
-	printf("Verify that the ring is full.\n");
-	if (rte_ring_full(r) != 1)
-		goto fail;
-
-
-	printf("Verify the enqueue success stats.\n");
-	/* Stats should match above enqueue operations to fill the ring. */
-	if (ring_stats->enq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Current max objects is RING_SIZE -1. */
-	if (ring_stats->enq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any failures yet. */
-	if (ring_stats->enq_fail_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_fail_objs != 0)
-		goto fail;
-
-
-	printf("Test stats for SP burst enqueue to a full ring.\n");
-	num_items = 2;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for SP bulk enqueue to a full ring.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP burst enqueue to a full ring.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	printf("Test stats for MP bulk enqueue to a full ring.\n");
-	num_items = 16;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -ENOBUFS)
-		goto fail;
-
-	failed_enqueue_ops   += 1;
-	failed_enqueue_items += num_items;
-
-	/* The enqueue should have failed. */
-	if (ring_stats->enq_fail_bulk != failed_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_fail_objs != failed_enqueue_items)
-		goto fail;
-
-
-	/* Do Dequeue tests. */
-	printf("Test the dequeue stats.\n");
-
-	printf("Empty the ring.\n");
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* There was only RING_SIZE -1 objects to dequeue. */
-	cur_dst++;
-
-	printf("Verify ring is empty.\n");
-	if (1 != rte_ring_empty(r))
-		goto fail;
-
-	printf("Verify the dequeue success stats.\n");
-	/* Stats should match above dequeue operations. */
-	if (ring_stats->deq_success_bulk != (RING_SIZE/MAX_BULK))
-		goto fail;
-
-	/* Objects dequeued is RING_SIZE -1. */
-	if (ring_stats->deq_success_objs != RING_SIZE -1)
-		goto fail;
-
-	/* Shouldn't have any dequeue failure stats yet. */
-	if (ring_stats->deq_fail_bulk != 0)
-		goto fail;
-
-	printf("Test stats for SC burst dequeue with an empty ring.\n");
-	num_items = 2;
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for SC bulk dequeue with an empty ring.\n");
-	num_items = 4;
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC burst dequeue with an empty ring.\n");
-	num_items = 8;
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != 0)
-		goto fail;
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test stats for MC bulk dequeue with an empty ring.\n");
-	num_items = 16;
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, num_items);
-	if (ret != -ENOENT)
-		goto fail;
-
-	failed_dequeue_ops   += 1;
-	failed_dequeue_items += num_items;
-
-	/* The dequeue should have failed. */
-	if (ring_stats->deq_fail_bulk != failed_dequeue_ops)
-		goto fail;
-	if (ring_stats->deq_fail_objs != failed_dequeue_items)
-		goto fail;
-
-
-	printf("Test total enqueue/dequeue stats.\n");
-	/* At this point the enqueue and dequeue stats should be the same. */
-	if (ring_stats->enq_success_bulk != ring_stats->deq_success_bulk)
-		goto fail;
-	if (ring_stats->enq_success_objs != ring_stats->deq_success_objs)
-		goto fail;
-	if (ring_stats->enq_fail_bulk    != ring_stats->deq_fail_bulk)
-		goto fail;
-	if (ring_stats->enq_fail_objs    != ring_stats->deq_fail_objs)
-		goto fail;
-
-
-	/* Watermark Tests. */
-	printf("Test the watermark/quota stats.\n");
-
-	printf("Verify the initial watermark stats.\n");
-	/* Watermark stats should be 0 since there is no watermark. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Set a watermark. */
-	rte_ring_set_water_mark(r, 16);
-
-	/* Reset pointers. */
-	cur_src = src;
-	cur_dst = dst;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue below watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should still be 0. */
-	if (ring_stats->enq_quota_bulk != 0)
-		goto fail;
-	if (ring_stats->enq_quota_objs != 0)
-		goto fail;
-
-	/* Success stats should have increased. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops + 1)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items + num_items)
-		goto fail;
-
-	last_enqueue_ops   = ring_stats->enq_success_bulk;
-	last_enqueue_items = ring_stats->enq_success_objs;
-
-
-	printf("Test stats for SP burst enqueue at watermark.\n");
-	num_items = 8;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != 1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP burst enqueue above watermark.\n");
-	num_items = 1;
-	ret = rte_ring_sp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP burst enqueue above watermark.\n");
-	num_items = 2;
-	ret = rte_ring_mp_enqueue_burst(r, cur_src, num_items);
-	if ((ret & RTE_RING_SZ_MASK) != num_items)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for SP bulk enqueue above watermark.\n");
-	num_items = 4;
-	ret = rte_ring_sp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	last_quota_ops   = ring_stats->enq_quota_bulk;
-	last_quota_items = ring_stats->enq_quota_objs;
-
-
-	printf("Test stats for MP bulk enqueue above watermark.\n");
-	num_items = 8;
-	ret = rte_ring_mp_enqueue_bulk(r, cur_src, num_items);
-	if (ret != -EDQUOT)
-		goto fail;
-
-	/* Watermark stats should have changed. */
-	if (ring_stats->enq_quota_bulk != last_quota_ops +1)
-		goto fail;
-	if (ring_stats->enq_quota_objs != last_quota_items + num_items)
-		goto fail;
-
-	printf("Test watermark success stats.\n");
-	/* Success stats should be same as last non-watermarked enqueue. */
-	if (ring_stats->enq_success_bulk != last_enqueue_ops)
-		goto fail;
-	if (ring_stats->enq_success_objs != last_enqueue_items)
-		goto fail;
-
-
-	/* Cleanup. */
-
-	/* Empty the ring. */
-	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
-		rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
-		cur_dst += MAX_BULK;
-	}
-
-	/* Reset the watermark. */
-	rte_ring_set_water_mark(r, 0);
-
-	/* Reset the ring stats. */
-	memset(&r->stats[lcore_id], 0, sizeof(r->stats[lcore_id]));
-
-	/* Free memory before test completed */
-	free(src);
-	free(dst);
-	return 0;
-
-fail:
-	free(src);
-	free(dst);
-	return -1;
-#endif
-}
-
 /*
  * it will always fail to create ring with a wrong ring size number in this function
  */
@@ -1335,10 +929,6 @@ test_ring(void)
 	if (test_ring_basic() < 0)
 		return -1;
 
-	/* ring stats */
-	if (test_ring_stats() < 0)
-		return -1;
-
 	/* basic operations */
 	if (test_live_watermark_change() < 0)
 		return -1;
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 05/14] ring: remove the yield when waiting for tail update
                       ` (2 preceding siblings ...)
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 04/14] ring: remove debug setting Bruce Richardson
@ 2017-03-07 11:32  4%   ` Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 06/14] ring: remove watermark support Bruce Richardson
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

There was a compile time setting to enable a ring to yield when
it entered a loop in mp or mc rings waiting for the tail pointer update.
Build time settings are not recommended for enabling/disabling features,
and since this was off by default, remove it completely. If needed, a
runtime enabled equivalent can be used.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/common_base                              |  1 -
 doc/guides/prog_guide/env_abstraction_layer.rst |  5 ----
 doc/guides/rel_notes/release_17_05.rst          |  1 +
 lib/librte_ring/rte_ring.h                      | 35 +++++--------------------
 4 files changed, 7 insertions(+), 35 deletions(-)

diff --git a/config/common_base b/config/common_base
index b3d8272..d5beadd 100644
--- a/config/common_base
+++ b/config/common_base
@@ -447,7 +447,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 # Compile librte_ring
 #
 CONFIG_RTE_LIBRTE_RING=y
-CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
 # Compile librte_mempool
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 10a10a8..7c39cd2 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -352,11 +352,6 @@ Known Issues
 
   3. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR.
 
-  ``RTE_RING_PAUSE_REP_COUNT`` is defined for rte_ring to reduce contention. It's mainly for case 2, a yield is issued after number of times pause repeat.
-
-  It adds a sched_yield() syscall if the thread spins for too long while waiting on the other thread to finish its operations on the ring.
-  This gives the preempted thread a chance to proceed and finish with the ring enqueue/dequeue operation.
-
 + rte_timer
 
   Running  ``rte_timer_manager()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed.
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e0ebd71..c69ca8f 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -117,6 +117,7 @@ API Changes
 
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
+  * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index af7b7d4..2177954 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -114,11 +114,6 @@ enum rte_ring_queue_behavior {
 #define RTE_RING_NAMESIZE (RTE_MEMZONE_NAMESIZE - \
 			   sizeof(RTE_RING_MZ_PREFIX) + 1)
 
-#ifndef RTE_RING_PAUSE_REP_COUNT
-#define RTE_RING_PAUSE_REP_COUNT 0 /**< Yield after pause num of times, no yield
-                                    *   if RTE_RING_PAUSE_REP not defined. */
-#endif
-
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
 #if RTE_CACHE_LINE_SIZE < 128
@@ -396,7 +391,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t cons_tail, free_entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -450,18 +445,9 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->prod.tail != prod_head)) {
+	while (unlikely(r->prod.tail != prod_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->prod.tail = prod_next;
 	return ret;
 }
@@ -494,7 +480,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 {
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 	int ret;
 
@@ -571,7 +557,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_next, entries;
 	const unsigned max = n;
 	int success;
-	unsigned i, rep = 0;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -616,18 +602,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * If there are other dequeues in progress that preceded us,
 	 * we need to wait for them to complete
 	 */
-	while (unlikely(r->cons.tail != cons_head)) {
+	while (unlikely(r->cons.tail != cons_head))
 		rte_pause();
 
-		/* Set RTE_RING_PAUSE_REP_COUNT to avoid spin too long waiting
-		 * for other thread finish. It gives pre-empted thread a chance
-		 * to proceed and finish with ring dequeue operation. */
-		if (RTE_RING_PAUSE_REP_COUNT &&
-		    ++rep == RTE_RING_PAUSE_REP_COUNT) {
-			rep = 0;
-			sched_yield();
-		}
-	}
 	r->cons.tail = cons_next;
 
 	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
@@ -662,7 +639,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
-	unsigned i;
+	unsigned int i;
 	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 01/14] ring: remove split cacheline build setting
  @ 2017-03-07 11:32  4%   ` Bruce Richardson
  2017-03-07 11:32  3%   ` [dpdk-dev] [PATCH v2 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

Users compiling DPDK should not need to know or care about the arrangement
of cachelines in the rte_ring structure.  Therefore just remove the build
option and set the structures to be always split. On platforms with 64B
cachelines, for improved performance use 128B rather than 64B alignment
since it stops the producer and consumer data being on adjacent cachelines.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---

V2: Limit the cacheline * 2 alignment to platforms with < 128B line size
---
 config/common_base                     |  1 -
 doc/guides/rel_notes/release_17_05.rst |  6 ++++++
 lib/librte_ring/rte_ring.c             |  2 --
 lib/librte_ring/rte_ring.h             | 16 ++++++++++------
 4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/config/common_base b/config/common_base
index aeee13e..099ffda 100644
--- a/config/common_base
+++ b/config/common_base
@@ -448,7 +448,6 @@ CONFIG_RTE_LIBRTE_PMD_NULL_CRYPTO=y
 #
 CONFIG_RTE_LIBRTE_RING=y
 CONFIG_RTE_LIBRTE_RING_DEBUG=n
-CONFIG_RTE_RING_SPLIT_PROD_CONS=n
 CONFIG_RTE_RING_PAUSE_REP_COUNT=0
 
 #
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e25ea9f..ea45e0c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -110,6 +110,12 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Reworked rte_ring library**
+
+  The rte_ring library has been reworked and updated. The following changes
+  have been made to it:
+
+  * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
 
 ABI Changes
 -----------
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index ca0a108..4bc6da1 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -127,10 +127,8 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	/* compilation-time checks */
 	RTE_BUILD_BUG_ON((sizeof(struct rte_ring) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#ifdef RTE_RING_SPLIT_PROD_CONS
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, cons) &
 			  RTE_CACHE_LINE_MASK) != 0);
-#endif
 	RTE_BUILD_BUG_ON((offsetof(struct rte_ring, prod) &
 			  RTE_CACHE_LINE_MASK) != 0);
 #ifdef RTE_LIBRTE_RING_DEBUG
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 72ccca5..399ae3b 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -139,6 +139,14 @@ struct rte_ring_debug_stats {
 
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
+#if RTE_CACHE_LINE_SIZE < 128
+#define PROD_ALIGN (RTE_CACHE_LINE_SIZE * 2)
+#define CONS_ALIGN (RTE_CACHE_LINE_SIZE * 2)
+#else
+#define PROD_ALIGN RTE_CACHE_LINE_SIZE
+#define CONS_ALIGN RTE_CACHE_LINE_SIZE
+#endif
+
 /**
  * An RTE ring structure.
  *
@@ -168,7 +176,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Producer head. */
 		volatile uint32_t tail;  /**< Producer tail. */
-	} prod __rte_cache_aligned;
+	} prod __rte_aligned(PROD_ALIGN);
 
 	/** Ring consumer status. */
 	struct cons {
@@ -177,11 +185,7 @@ struct rte_ring {
 		uint32_t mask;           /**< Mask (size-1) of ring. */
 		volatile uint32_t head;  /**< Consumer head. */
 		volatile uint32_t tail;  /**< Consumer tail. */
-#ifdef RTE_RING_SPLIT_PROD_CONS
-	} cons __rte_cache_aligned;
-#else
-	} cons;
-#endif
+	} cons __rte_aligned(CONS_ALIGN);
 
 #ifdef RTE_LIBRTE_RING_DEBUG
 	struct rte_ring_debug_stats stats[RTE_MAX_LCORE];
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 03/14] ring: eliminate duplication of size and mask fields
    2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 01/14] ring: remove split cacheline build setting Bruce Richardson
@ 2017-03-07 11:32  3%   ` Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 04/14] ring: remove debug setting Bruce Richardson
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

The size and mask fields are duplicated in both the producer and
consumer data structures. Move them out of that into the top level
structure so they are not duplicated.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_ring/rte_ring.c | 20 ++++++++++----------
 lib/librte_ring/rte_ring.h | 32 ++++++++++++++++----------------
 test/test/test_ring.c      |  6 +++---
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 4bc6da1..80fc356 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -144,11 +144,11 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->prod.watermark = count;
+	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
-	r->prod.size = r->cons.size = count;
-	r->prod.mask = r->cons.mask = count-1;
+	r->size = count;
+	r->mask = count - 1;
 	r->prod.head = r->cons.head = 0;
 	r->prod.tail = r->cons.tail = 0;
 
@@ -269,14 +269,14 @@ rte_ring_free(struct rte_ring *r)
 int
 rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
 {
-	if (count >= r->prod.size)
+	if (count >= r->size)
 		return -EINVAL;
 
 	/* if count is 0, disable the watermarking */
 	if (count == 0)
-		count = r->prod.size;
+		count = r->size;
 
-	r->prod.watermark = count;
+	r->watermark = count;
 	return 0;
 }
 
@@ -291,17 +291,17 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 
 	fprintf(f, "ring <%s>@%p\n", r->name, r);
 	fprintf(f, "  flags=%x\n", r->flags);
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  ct=%"PRIu32"\n", r->cons.tail);
 	fprintf(f, "  ch=%"PRIu32"\n", r->cons.head);
 	fprintf(f, "  pt=%"PRIu32"\n", r->prod.tail);
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->prod.watermark == r->prod.size)
+	if (r->watermark == r->size)
 		fprintf(f, "  watermark=0\n");
 	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->prod.watermark);
+		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 
 	/* sum and dump statistics */
 #ifdef RTE_LIBRTE_RING_DEBUG
@@ -318,7 +318,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 		sum.deq_fail_bulk += r->stats[lcore_id].deq_fail_bulk;
 		sum.deq_fail_objs += r->stats[lcore_id].deq_fail_objs;
 	}
-	fprintf(f, "  size=%"PRIu32"\n", r->prod.size);
+	fprintf(f, "  size=%"PRIu32"\n", r->size);
 	fprintf(f, "  enq_success_bulk=%"PRIu64"\n", sum.enq_success_bulk);
 	fprintf(f, "  enq_success_objs=%"PRIu64"\n", sum.enq_success_objs);
 	fprintf(f, "  enq_quota_bulk=%"PRIu64"\n", sum.enq_quota_bulk);
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 659c6d0..61c0982 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -151,13 +151,10 @@ struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 struct rte_ring_headtail {
 	volatile uint32_t head;  /**< Prod/consumer head. */
 	volatile uint32_t tail;  /**< Prod/consumer tail. */
-	uint32_t size;           /**< Size of ring. */
-	uint32_t mask;           /**< Mask (size-1) of ring. */
 	union {
 		uint32_t sp_enqueue; /**< True, if single producer. */
 		uint32_t sc_dequeue; /**< True, if single consumer. */
 	};
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 };
 
 /**
@@ -177,9 +174,12 @@ struct rte_ring {
 	 * next time the ABI changes
 	 */
 	char name[RTE_MEMZONE_NAMESIZE];    /**< Name of the ring. */
-	int flags;                       /**< Flags supplied at creation. */
+	int flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
+	uint32_t size;           /**< Size of ring. */
+	uint32_t mask;           /**< Mask (size-1) of ring. */
+	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_headtail prod __rte_aligned(PROD_ALIGN);
@@ -358,7 +358,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * Placed here since identical code needed in both
  * single and multi producer enqueue functions */
 #define ENQUEUE_PTRS() do { \
-	const uint32_t size = r->prod.size; \
+	const uint32_t size = r->size; \
 	uint32_t idx = prod_head & mask; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & ((~(unsigned)0x3))); i+=4, idx+=4) { \
@@ -385,7 +385,7 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  * single and multi consumer dequeue functions */
 #define DEQUEUE_PTRS() do { \
 	uint32_t idx = cons_head & mask; \
-	const uint32_t size = r->cons.size; \
+	const uint32_t size = r->size; \
 	if (likely(idx + n < size)) { \
 		for (i = 0; i < (n & (~(unsigned)0x3)); i+=4, idx+=4) {\
 			obj_table[i] = r->ring[idx]; \
@@ -440,7 +440,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
@@ -488,7 +488,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 				(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -547,7 +547,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_head, cons_tail;
 	uint32_t prod_next, free_entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 	int ret;
 
 	prod_head = r->prod.head;
@@ -583,7 +583,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->prod.watermark)) {
+	if (unlikely(((mask + 1) - free_entries + n) > r->watermark)) {
 		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
 			(int)(n | RTE_RING_QUOT_EXCEED);
 		__RING_STAT_ADD(r, enq_quota, n);
@@ -633,7 +633,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	const unsigned max = n;
 	int success;
 	unsigned i, rep = 0;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -730,7 +730,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
 	unsigned i;
-	uint32_t mask = r->prod.mask;
+	uint32_t mask = r->mask;
 
 	cons_head = r->cons.head;
 	prod_tail = r->prod.tail;
@@ -1059,7 +1059,7 @@ rte_ring_full(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return ((cons_tail - prod_tail - 1) & r->prod.mask) == 0;
+	return ((cons_tail - prod_tail - 1) & r->mask) == 0;
 }
 
 /**
@@ -1092,7 +1092,7 @@ rte_ring_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (prod_tail - cons_tail) & r->prod.mask;
+	return (prod_tail - cons_tail) & r->mask;
 }
 
 /**
@@ -1108,7 +1108,7 @@ rte_ring_free_count(const struct rte_ring *r)
 {
 	uint32_t prod_tail = r->prod.tail;
 	uint32_t cons_tail = r->cons.tail;
-	return (cons_tail - prod_tail - 1) & r->prod.mask;
+	return (cons_tail - prod_tail - 1) & r->mask;
 }
 
 /**
@@ -1122,7 +1122,7 @@ rte_ring_free_count(const struct rte_ring *r)
 static inline unsigned int
 rte_ring_get_size(const struct rte_ring *r)
 {
-	return r->prod.size;
+	return r->size;
 }
 
 /**
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index ebcb896..5f09097 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -148,7 +148,7 @@ check_live_watermark_change(__attribute__((unused)) void *dummy)
 		}
 
 		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->prod.watermark;
+		watermark = r->watermark;
 		if (watermark != watermark_old &&
 		    (watermark_old != 16 || watermark != 32)) {
 			printf("Bad watermark change %u -> %u\n", watermark_old,
@@ -213,7 +213,7 @@ test_set_watermark( void ){
 		printf( " ring lookup failed\n" );
 		goto error;
 	}
-	count = r->prod.size*2;
+	count = r->size * 2;
 	setwm = rte_ring_set_water_mark(r, count);
 	if (setwm != -EINVAL){
 		printf("Test failed to detect invalid watermark count value\n");
@@ -222,7 +222,7 @@ test_set_watermark( void ){
 
 	count = 0;
 	rte_ring_set_water_mark(r, count);
-	if (r->prod.watermark != r->prod.size) {
+	if (r->watermark != r->size) {
 		printf("Test failed to detect invalid watermark count value\n");
 		goto error;
 	}
-- 
2.9.3

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 06/14] ring: remove watermark support
                       ` (3 preceding siblings ...)
  2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
@ 2017-03-07 11:32  2%   ` Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

Remove the watermark support. A future commit will add support for having
enqueue functions return the amount of free space in the ring, which will
allow applications to implement their own watermark checks, while also
being more useful to the app.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
V2: fix missed references to watermarks in v1

---
 doc/guides/prog_guide/ring_lib.rst     |   8 --
 doc/guides/rel_notes/release_17_05.rst |   2 +
 examples/Makefile                      |   2 +-
 lib/librte_ring/rte_ring.c             |  23 -----
 lib/librte_ring/rte_ring.h             |  58 +------------
 test/test/autotest_test_funcs.py       |   7 --
 test/test/commands.c                   |  52 ------------
 test/test/test_ring.c                  | 149 +--------------------------------
 8 files changed, 8 insertions(+), 293 deletions(-)

diff --git a/doc/guides/prog_guide/ring_lib.rst b/doc/guides/prog_guide/ring_lib.rst
index d4ab502..b31ab7a 100644
--- a/doc/guides/prog_guide/ring_lib.rst
+++ b/doc/guides/prog_guide/ring_lib.rst
@@ -102,14 +102,6 @@ Name
 A ring is identified by a unique name.
 It is not possible to create two rings with the same name (rte_ring_create() returns NULL if this is attempted).
 
-Water Marking
-~~~~~~~~~~~~~
-
-The ring can have a high water mark (threshold).
-Once an enqueue operation reaches the high water mark, the producer is notified, if the water mark is configured.
-
-This mechanism can be used, for example, to exert a back pressure on I/O to inform the LAN to PAUSE.
-
 Use Cases
 ---------
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index c69ca8f..4e748dc 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -118,6 +118,8 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_SPLIT_PROD_CONS``
   * removed the build-time setting ``CONFIG_RTE_LIBRTE_RING_DEBUG``
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
+  * removed the function ``rte_ring_set_water_mark`` as part of a general
+    removal of watermarks support in the library.
 
 ABI Changes
 -----------
diff --git a/examples/Makefile b/examples/Makefile
index da2bfdd..19cd5ad 100644
--- a/examples/Makefile
+++ b/examples/Makefile
@@ -81,7 +81,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += packet_ordering
 DIRS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ptpclient
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += qos_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += qos_sched
-DIRS-y += quota_watermark
+#DIRS-y += quota_watermark
 DIRS-$(CONFIG_RTE_ETHDEV_RXTX_CALLBACKS) += rxtx_callbacks
 DIRS-y += skeleton
 ifeq ($(CONFIG_RTE_LIBRTE_HASH),y)
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 90ee63f..18fb644 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -138,7 +138,6 @@ rte_ring_init(struct rte_ring *r, const char *name, unsigned count,
 	if (ret < 0 || ret >= (int)sizeof(r->name))
 		return -ENAMETOOLONG;
 	r->flags = flags;
-	r->watermark = count;
 	r->prod.sp_enqueue = !!(flags & RING_F_SP_ENQ);
 	r->cons.sc_dequeue = !!(flags & RING_F_SC_DEQ);
 	r->size = count;
@@ -256,24 +255,6 @@ rte_ring_free(struct rte_ring *r)
 	rte_free(te);
 }
 
-/*
- * change the high water mark. If *count* is 0, water marking is
- * disabled
- */
-int
-rte_ring_set_water_mark(struct rte_ring *r, unsigned count)
-{
-	if (count >= r->size)
-		return -EINVAL;
-
-	/* if count is 0, disable the watermarking */
-	if (count == 0)
-		count = r->size;
-
-	r->watermark = count;
-	return 0;
-}
-
 /* dump the status of the ring on the console */
 void
 rte_ring_dump(FILE *f, const struct rte_ring *r)
@@ -287,10 +268,6 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 	fprintf(f, "  ph=%"PRIu32"\n", r->prod.head);
 	fprintf(f, "  used=%u\n", rte_ring_count(r));
 	fprintf(f, "  avail=%u\n", rte_ring_free_count(r));
-	if (r->watermark == r->size)
-		fprintf(f, "  watermark=0\n");
-	else
-		fprintf(f, "  watermark=%"PRIu32"\n", r->watermark);
 }
 
 /* dump the status of all rings on the console */
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 2177954..e7061be 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -156,7 +156,6 @@ struct rte_ring {
 			/**< Memzone, if any, containing the rte_ring */
 	uint32_t size;           /**< Size of ring. */
 	uint32_t mask;           /**< Mask (size-1) of ring. */
-	uint32_t watermark;      /**< Max items before EDQUOT in producer. */
 
 	/** Ring producer status. */
 	struct rte_ring_headtail prod __rte_aligned(PROD_ALIGN);
@@ -171,7 +170,6 @@ struct rte_ring {
 
 #define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
 #define RING_F_SC_DEQ 0x0002 /**< The default dequeue is "single-consumer". */
-#define RTE_RING_QUOT_EXCEED (1 << 31)  /**< Quota exceed for burst ops */
 #define RTE_RING_SZ_MASK  (unsigned)(0x0fffffff) /**< Ring size mask */
 
 /**
@@ -277,26 +275,6 @@ struct rte_ring *rte_ring_create(const char *name, unsigned count,
 void rte_ring_free(struct rte_ring *r);
 
 /**
- * Change the high water mark.
- *
- * If *count* is 0, water marking is disabled. Otherwise, it is set to the
- * *count* value. The *count* value must be greater than 0 and less
- * than the ring size.
- *
- * This function can be called at any time (not necessarily at
- * initialization).
- *
- * @param r
- *   A pointer to the ring structure.
- * @param count
- *   The new water mark value.
- * @return
- *   - 0: Success; water mark changed.
- *   - -EINVAL: Invalid water mark value.
- */
-int rte_ring_set_water_mark(struct rte_ring *r, unsigned count);
-
-/**
  * Dump the status of the ring to a file.
  *
  * @param f
@@ -377,8 +355,6 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -393,7 +369,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	int success;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	/* Avoid the unnecessary cmpset operation below, which is also
 	 * potentially harmful when n equals 0. */
@@ -434,13 +409,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-				(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	/*
 	 * If there are other enqueues in progress that preceded us,
 	 * we need to wait for them to complete
@@ -449,7 +417,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -468,8 +436,6 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   Depend on the behavior value
  *   if behavior = RTE_RING_QUEUE_FIXED
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  *   if behavior = RTE_RING_QUEUE_VARIABLE
  *   - n: Actual number of objects enqueued.
@@ -482,7 +448,6 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	uint32_t prod_next, free_entries;
 	unsigned int i;
 	uint32_t mask = r->mask;
-	int ret;
 
 	prod_head = r->prod.head;
 	cons_tail = r->cons.tail;
@@ -511,15 +476,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	ENQUEUE_PTRS();
 	rte_smp_wmb();
 
-	/* if we exceed the watermark */
-	if (unlikely(((mask + 1) - free_entries + n) > r->watermark))
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? -EDQUOT :
-			(int)(n | RTE_RING_QUOT_EXCEED);
-	else
-		ret = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
-
 	r->prod.tail = prod_next;
-	return ret;
+	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
 }
 
 /**
@@ -685,8 +643,6 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueue.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -707,8 +663,6 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -733,8 +687,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   The number of objects to add in the ring from the obj_table.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -759,8 +711,6 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -778,8 +728,6 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
@@ -801,8 +749,6 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
  *   A pointer to the object to be added.
  * @return
  *   - 0: Success; objects enqueued.
- *   - -EDQUOT: Quota exceeded. The objects have been enqueued, but the
- *     high water mark is exceeded.
  *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
  */
 static inline int __attribute__((always_inline))
diff --git a/test/test/autotest_test_funcs.py b/test/test/autotest_test_funcs.py
index 1c5f390..8da8fcd 100644
--- a/test/test/autotest_test_funcs.py
+++ b/test/test/autotest_test_funcs.py
@@ -292,11 +292,4 @@ def ring_autotest(child, test_name):
     elif index == 2:
         return -1, "Fail [Timeout]"
 
-    child.sendline("set_watermark test 100")
-    child.sendline("dump_ring test")
-    index = child.expect(["  watermark=100",
-                          pexpect.TIMEOUT], timeout=1)
-    if index != 0:
-        return -1, "Fail [Bad watermark]"
-
     return 0, "Success"
diff --git a/test/test/commands.c b/test/test/commands.c
index 2df46b0..551c81d 100644
--- a/test/test/commands.c
+++ b/test/test/commands.c
@@ -228,57 +228,6 @@ cmdline_parse_inst_t cmd_dump_one = {
 
 /****************/
 
-struct cmd_set_ring_result {
-	cmdline_fixed_string_t set;
-	cmdline_fixed_string_t name;
-	uint32_t value;
-};
-
-static void cmd_set_ring_parsed(void *parsed_result, struct cmdline *cl,
-				__attribute__((unused)) void *data)
-{
-	struct cmd_set_ring_result *res = parsed_result;
-	struct rte_ring *r;
-	int ret;
-
-	r = rte_ring_lookup(res->name);
-	if (r == NULL) {
-		cmdline_printf(cl, "Cannot find ring\n");
-		return;
-	}
-
-	if (!strcmp(res->set, "set_watermark")) {
-		ret = rte_ring_set_water_mark(r, res->value);
-		if (ret != 0)
-			cmdline_printf(cl, "Cannot set water mark\n");
-	}
-}
-
-cmdline_parse_token_string_t cmd_set_ring_set =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, set,
-				 "set_watermark");
-
-cmdline_parse_token_string_t cmd_set_ring_name =
-	TOKEN_STRING_INITIALIZER(struct cmd_set_ring_result, name, NULL);
-
-cmdline_parse_token_num_t cmd_set_ring_value =
-	TOKEN_NUM_INITIALIZER(struct cmd_set_ring_result, value, UINT32);
-
-cmdline_parse_inst_t cmd_set_ring = {
-	.f = cmd_set_ring_parsed,  /* function to call */
-	.data = NULL,      /* 2nd arg of func */
-	.help_str = "set watermark: "
-			"set_watermark <ring_name> <value>",
-	.tokens = {        /* token list, NULL terminated */
-		(void *)&cmd_set_ring_set,
-		(void *)&cmd_set_ring_name,
-		(void *)&cmd_set_ring_value,
-		NULL,
-	},
-};
-
-/****************/
-
 struct cmd_quit_result {
 	cmdline_fixed_string_t quit;
 };
@@ -419,7 +368,6 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_autotest,
 	(cmdline_parse_inst_t *)&cmd_dump,
 	(cmdline_parse_inst_t *)&cmd_dump_one,
-	(cmdline_parse_inst_t *)&cmd_set_ring,
 	(cmdline_parse_inst_t *)&cmd_quit,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx,
 	(cmdline_parse_inst_t *)&cmd_set_rxtx_anchor,
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 3891f5d..666a451 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -78,21 +78,6 @@
  *      - Dequeue one object, two objects, MAX_BULK objects
  *      - Check that dequeued pointers are correct
  *
- *    - Test watermark and default bulk enqueue/dequeue:
- *
- *      - Set watermark
- *      - Set default bulk value
- *      - Enqueue objects, check that -EDQUOT is returned when
- *        watermark is exceeded
- *      - Check that dequeued pointers are correct
- *
- * #. Check live watermark change
- *
- *    - Start a loop on another lcore that will enqueue and dequeue
- *      objects in a ring. It will monitor the value of watermark.
- *    - At the same time, change the watermark on the master lcore.
- *    - The slave lcore will check that watermark changes from 16 to 32.
- *
  * #. Performance tests.
  *
  * Tests done in test_ring_perf.c
@@ -115,123 +100,6 @@ static struct rte_ring *r;
 
 #define	TEST_RING_FULL_EMTPY_ITER	8
 
-static int
-check_live_watermark_change(__attribute__((unused)) void *dummy)
-{
-	uint64_t hz = rte_get_timer_hz();
-	void *obj_table[MAX_BULK];
-	unsigned watermark, watermark_old = 16;
-	uint64_t cur_time, end_time;
-	int64_t diff = 0;
-	int i, ret;
-	unsigned count = 4;
-
-	/* init the object table */
-	memset(obj_table, 0, sizeof(obj_table));
-	end_time = rte_get_timer_cycles() + (hz / 4);
-
-	/* check that bulk and watermark are 4 and 32 (respectively) */
-	while (diff >= 0) {
-
-		/* add in ring until we reach watermark */
-		ret = 0;
-		for (i = 0; i < 16; i ++) {
-			if (ret != 0)
-				break;
-			ret = rte_ring_enqueue_bulk(r, obj_table, count);
-		}
-
-		if (ret != -EDQUOT) {
-			printf("Cannot enqueue objects, or watermark not "
-			       "reached (ret=%d)\n", ret);
-			return -1;
-		}
-
-		/* read watermark, the only change allowed is from 16 to 32 */
-		watermark = r->watermark;
-		if (watermark != watermark_old &&
-		    (watermark_old != 16 || watermark != 32)) {
-			printf("Bad watermark change %u -> %u\n", watermark_old,
-			       watermark);
-			return -1;
-		}
-		watermark_old = watermark;
-
-		/* dequeue objects from ring */
-		while (i--) {
-			ret = rte_ring_dequeue_bulk(r, obj_table, count);
-			if (ret != 0) {
-				printf("Cannot dequeue (ret=%d)\n", ret);
-				return -1;
-			}
-		}
-
-		cur_time = rte_get_timer_cycles();
-		diff = end_time - cur_time;
-	}
-
-	if (watermark_old != 32 ) {
-		printf(" watermark was not updated (wm=%u)\n",
-		       watermark_old);
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-test_live_watermark_change(void)
-{
-	unsigned lcore_id = rte_lcore_id();
-	unsigned lcore_id2 = rte_get_next_lcore(lcore_id, 0, 1);
-
-	printf("Test watermark live modification\n");
-	rte_ring_set_water_mark(r, 16);
-
-	/* launch a thread that will enqueue and dequeue, checking
-	 * watermark and quota */
-	rte_eal_remote_launch(check_live_watermark_change, NULL, lcore_id2);
-
-	rte_delay_ms(100);
-	rte_ring_set_water_mark(r, 32);
-	rte_delay_ms(100);
-
-	if (rte_eal_wait_lcore(lcore_id2) < 0)
-		return -1;
-
-	return 0;
-}
-
-/* Test for catch on invalid watermark values */
-static int
-test_set_watermark( void ){
-	unsigned count;
-	int setwm;
-
-	struct rte_ring *r = rte_ring_lookup("test_ring_basic_ex");
-	if(r == NULL){
-		printf( " ring lookup failed\n" );
-		goto error;
-	}
-	count = r->size * 2;
-	setwm = rte_ring_set_water_mark(r, count);
-	if (setwm != -EINVAL){
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-
-	count = 0;
-	rte_ring_set_water_mark(r, count);
-	if (r->watermark != r->size) {
-		printf("Test failed to detect invalid watermark count value\n");
-		goto error;
-	}
-	return 0;
-
-error:
-	return -1;
-}
-
 /*
  * helper routine for test_ring_basic
  */
@@ -418,8 +286,7 @@ test_ring_basic(void)
 	cur_src = src;
 	cur_dst = dst;
 
-	printf("test watermark and default bulk enqueue / dequeue\n");
-	rte_ring_set_water_mark(r, 20);
+	printf("test default bulk enqueue / dequeue\n");
 	num_elems = 16;
 
 	cur_src = src;
@@ -433,8 +300,8 @@ test_ring_basic(void)
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != -EDQUOT) {
-		printf("Watermark not exceeded\n");
+	if (ret != 0) {
+		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
@@ -930,16 +797,6 @@ test_ring(void)
 		return -1;
 
 	/* basic operations */
-	if (test_live_watermark_change() < 0)
-		return -1;
-
-	if ( test_set_watermark() < 0){
-		printf ("Test failed to detect invalid parameter\n");
-		return -1;
-	}
-	else
-		printf ( "Test detected forced bad watermark values\n");
-
 	if ( test_create_count_odd() < 0){
 			printf ("Test failed to detect odd count\n");
 			return -1;
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent
                       ` (4 preceding siblings ...)
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 06/14] ring: remove watermark support Bruce Richardson
@ 2017-03-07 11:32  2%   ` Bruce Richardson
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

The bulk fns for rings returns 0 for all elements enqueued and negative
for no space. Change that to make them consistent with the burst functions
in returning the number of elements enqueued/dequeued, i.e. 0 or N.
This change also allows the return value from enq/deq to be used directly
without a branch for error checking.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/rel_notes/release_17_05.rst             |  11 +++
 doc/guides/sample_app_ug/server_node_efd.rst       |   2 +-
 examples/load_balancer/runtime.c                   |  16 ++-
 .../client_server_mp/mp_client/client.c            |   8 +-
 .../client_server_mp/mp_server/main.c              |   2 +-
 examples/qos_sched/app_thread.c                    |   8 +-
 examples/server_node_efd/node/node.c               |   2 +-
 examples/server_node_efd/server/main.c             |   2 +-
 lib/librte_mempool/rte_mempool_ring.c              |  12 ++-
 lib/librte_ring/rte_ring.h                         | 109 +++++++--------------
 test/test-pipeline/pipeline_hash.c                 |   2 +-
 test/test-pipeline/runtime.c                       |   8 +-
 test/test/test_ring.c                              |  46 +++++----
 test/test/test_ring_perf.c                         |   8 +-
 14 files changed, 106 insertions(+), 130 deletions(-)

diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4e748dc..2b11765 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -120,6 +120,17 @@ API Changes
   * removed the build-time setting ``CONFIG_RTE_RING_PAUSE_REP_COUNT``
   * removed the function ``rte_ring_set_water_mark`` as part of a general
     removal of watermarks support in the library.
+  * changed the return value of the enqueue and dequeue bulk functions to
+    match that of the burst equivalents. In all cases, ring functions which
+    operate on multiple packets now return the number of elements enqueued
+    or dequeued, as appropriate. The updated functions are:
+
+    - ``rte_ring_mp_enqueue_bulk``
+    - ``rte_ring_sp_enqueue_bulk``
+    - ``rte_ring_enqueue_bulk``
+    - ``rte_ring_mc_dequeue_bulk``
+    - ``rte_ring_sc_dequeue_bulk``
+    - ``rte_ring_dequeue_bulk``
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/server_node_efd.rst b/doc/guides/sample_app_ug/server_node_efd.rst
index 9b69cfe..e3a63c8 100644
--- a/doc/guides/sample_app_ug/server_node_efd.rst
+++ b/doc/guides/sample_app_ug/server_node_efd.rst
@@ -286,7 +286,7 @@ repeated infinitely.
 
         cl = &nodes[node];
         if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-                cl_rx_buf[node].count) != 0){
+                cl_rx_buf[node].count) != cl_rx_buf[node].count){
             for (j = 0; j < cl_rx_buf[node].count; j++)
                 rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
             cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 6944325..82b10bc 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -146,7 +146,7 @@ app_lcore_io_rx_buffer_to_send (
 		(void **) lp->rx.mbuf_out[worker].array,
 		bsz);
 
-	if (unlikely(ret == -ENOBUFS)) {
+	if (unlikely(ret == 0)) {
 		uint32_t k;
 		for (k = 0; k < bsz; k ++) {
 			struct rte_mbuf *m = lp->rx.mbuf_out[worker].array[k];
@@ -312,7 +312,7 @@ app_lcore_io_rx_flush(struct app_lcore_params_io *lp, uint32_t n_workers)
 			(void **) lp->rx.mbuf_out[worker].array,
 			lp->rx.mbuf_out[worker].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->rx.mbuf_out[worker].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->rx.mbuf_out[worker].array[k];
@@ -349,9 +349,8 @@ app_lcore_io_tx(
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
 				bsz_rd);
 
-			if (unlikely(ret == -ENOENT)) {
+			if (unlikely(ret == 0))
 				continue;
-			}
 
 			n_mbufs += bsz_rd;
 
@@ -505,9 +504,8 @@ app_lcore_worker(
 			(void **) lp->mbuf_in.array,
 			bsz_rd);
 
-		if (unlikely(ret == -ENOENT)) {
+		if (unlikely(ret == 0))
 			continue;
-		}
 
 #if APP_WORKER_DROP_ALL_PACKETS
 		for (j = 0; j < bsz_rd; j ++) {
@@ -559,7 +557,7 @@ app_lcore_worker(
 
 #if APP_STATS
 			lp->rings_out_iters[port] ++;
-			if (ret == 0) {
+			if (ret > 0) {
 				lp->rings_out_count[port] += 1;
 			}
 			if (lp->rings_out_iters[port] == APP_STATS){
@@ -572,7 +570,7 @@ app_lcore_worker(
 			}
 #endif
 
-			if (unlikely(ret == -ENOBUFS)) {
+			if (unlikely(ret == 0)) {
 				uint32_t k;
 				for (k = 0; k < bsz_wr; k ++) {
 					struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
@@ -609,7 +607,7 @@ app_lcore_worker_flush(struct app_lcore_params_worker *lp)
 			(void **) lp->mbuf_out[port].array,
 			lp->mbuf_out[port].n_mbufs);
 
-		if (unlikely(ret < 0)) {
+		if (unlikely(ret == 0)) {
 			uint32_t k;
 			for (k = 0; k < lp->mbuf_out[port].n_mbufs; k ++) {
 				struct rte_mbuf *pkt_to_free = lp->mbuf_out[port].array[k];
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index d4f9ca3..dca9eb9 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -276,14 +276,10 @@ main(int argc, char *argv[])
 	printf("[Press Ctrl-C to quit ...]\n");
 
 	for (;;) {
-		uint16_t i, rx_pkts = PKT_READ_SIZE;
+		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		/* try dequeuing max possible packets first, if that fails, get the
-		 * most we can. Loop body should only execute once, maximum */
-		while (rx_pkts > 0 &&
-				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts, rx_pkts) != 0))
-			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring), PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/multi_process/client_server_mp/mp_server/main.c b/examples/multi_process/client_server_mp/mp_server/main.c
index a6dc12d..19c95b2 100644
--- a/examples/multi_process/client_server_mp/mp_server/main.c
+++ b/examples/multi_process/client_server_mp/mp_server/main.c
@@ -227,7 +227,7 @@ flush_rx_queue(uint16_t client)
 
 	cl = &clients[client];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[client].buffer,
-			cl_rx_buf[client].count) != 0){
+			cl_rx_buf[client].count) == 0){
 		for (j = 0; j < cl_rx_buf[client].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[client].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[client].count;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 70fdcdb..dab4594 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -107,7 +107,7 @@ app_rx_thread(struct thread_conf **confs)
 			}
 
 			if (unlikely(rte_ring_sp_enqueue_bulk(conf->rx_ring,
-								(void **)rx_mbufs, nb_rx) != 0)) {
+					(void **)rx_mbufs, nb_rx) == 0)) {
 				for(i = 0; i < nb_rx; i++) {
 					rte_pktmbuf_free(rx_mbufs[i]);
 
@@ -180,7 +180,7 @@ app_tx_thread(struct thread_conf **confs)
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
 					burst_conf.qos_dequeue);
-		if (likely(retval == 0)) {
+		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
 			conf->counter = 0; /* reset empty read loop counter */
@@ -230,7 +230,9 @@ app_worker_thread(struct thread_conf **confs)
 		nb_pkt = rte_sched_port_dequeue(conf->sched_port, mbufs,
 					burst_conf.qos_dequeue);
 		if (likely(nb_pkt > 0))
-			while (rte_ring_sp_enqueue_bulk(conf->tx_ring, (void **)mbufs, nb_pkt) != 0);
+			while (rte_ring_sp_enqueue_bulk(conf->tx_ring,
+					(void **)mbufs, nb_pkt) == 0)
+				; /* empty body */
 
 		conf_idx++;
 		if (confs[conf_idx] == NULL)
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index a6c0c70..9ec6a05 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) != 0))
+					rx_pkts) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/examples/server_node_efd/server/main.c b/examples/server_node_efd/server/main.c
index 1a54d1b..3eb7fac 100644
--- a/examples/server_node_efd/server/main.c
+++ b/examples/server_node_efd/server/main.c
@@ -247,7 +247,7 @@ flush_rx_queue(uint16_t node)
 
 	cl = &nodes[node];
 	if (rte_ring_enqueue_bulk(cl->rx_q, (void **)cl_rx_buf[node].buffer,
-			cl_rx_buf[node].count) != 0){
+			cl_rx_buf[node].count) != cl_rx_buf[node].count){
 		for (j = 0; j < cl_rx_buf[node].count; j++)
 			rte_pktmbuf_free(cl_rx_buf[node].buffer[j]);
 		cl->stats.rx_drop += cl_rx_buf[node].count;
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index b9aa64d..409b860 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -42,26 +42,30 @@ static int
 common_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_mp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sp_enqueue(struct rte_mempool *mp, void * const *obj_table,
 		unsigned n)
 {
-	return rte_ring_sp_enqueue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sp_enqueue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_mc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_mc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
-	return rte_ring_sc_dequeue_bulk(mp->pool_data, obj_table, n);
+	return rte_ring_sc_dequeue_bulk(mp->pool_data,
+			obj_table, n) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index e7061be..5f6589f 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -352,14 +352,10 @@ void rte_ring_dump(FILE *f, const struct rte_ring *r);
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -391,7 +387,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		/* check that we have enough room in ring */
 		if (unlikely(n > free_entries)) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOBUFS;
+				return 0;
 			else {
 				/* No free entry available */
 				if (unlikely(free_entries == 0))
@@ -417,7 +413,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 		rte_pause();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -433,14 +429,10 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects enqueued.
+ *   Actual number of objects enqueued.
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 			 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -460,7 +452,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	/* check that we have enough room in ring */
 	if (unlikely(n > free_entries)) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOBUFS;
+			return 0;
 		else {
 			/* No free entry available */
 			if (unlikely(free_entries == 0))
@@ -477,7 +469,7 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 	rte_smp_wmb();
 
 	r->prod.tail = prod_next;
-	return (behavior == RTE_RING_QUEUE_FIXED) ? 0 : n;
+	return n;
 }
 
 /**
@@ -498,16 +490,11 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
 
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -539,7 +526,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		/* Set the actual entries for dequeue */
 		if (n > entries) {
 			if (behavior == RTE_RING_QUEUE_FIXED)
-				return -ENOENT;
+				return 0;
 			else {
 				if (unlikely(entries == 0))
 					return 0;
@@ -565,7 +552,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	r->cons.tail = cons_next;
 
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -583,15 +570,10 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
  *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items a possible from ring
  * @return
- *   Depend on the behavior value
- *   if behavior = RTE_RING_QUEUE_FIXED
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
- *   if behavior = RTE_RING_QUEUE_VARIABLE
- *   - n: Actual number of objects dequeued.
+ *   - Actual number of objects dequeued.
+ *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 		 unsigned n, enum rte_ring_queue_behavior behavior)
 {
@@ -610,7 +592,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 
 	if (n > entries) {
 		if (behavior == RTE_RING_QUEUE_FIXED)
-			return -ENOENT;
+			return 0;
 		else {
 			if (unlikely(entries == 0))
 				return 0;
@@ -626,7 +608,7 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
-	return behavior == RTE_RING_QUEUE_FIXED ? 0 : n;
+	return n;
 }
 
 /**
@@ -642,10 +624,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueue.
- *   - -ENOBUFS: Not enough room in the ring to enqueue, no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -662,10 +643,9 @@ rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned n)
 {
@@ -686,10 +666,9 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
  * @param n
  *   The number of objects to add in the ring from the obj_table.
  * @return
- *   - 0: Success; objects enqueued.
- *   - -ENOBUFS: Not enough room in the ring to enqueue; no object is enqueued.
+ *   The number of objects enqueued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned n)
 {
@@ -716,7 +695,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 static inline int __attribute__((always_inline))
 rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_mp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -733,7 +712,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1);
+	return rte_ring_sp_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -754,10 +733,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 static inline int __attribute__((always_inline))
 rte_ring_enqueue(struct rte_ring *r, void *obj)
 {
-	if (r->prod.sp_enqueue)
-		return rte_ring_sp_enqueue(r, obj);
-	else
-		return rte_ring_mp_enqueue(r, obj);
+	return rte_ring_enqueue_bulk(r, &obj, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -773,11 +749,9 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -794,11 +768,9 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects to dequeue from the ring to the obj_table,
  *   must be strictly positive.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue; no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
@@ -818,11 +790,9 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  * @param n
  *   The number of objects to dequeue from the ring to the obj_table.
  * @return
- *   - 0: Success; objects dequeued.
- *   - -ENOENT: Not enough entries in the ring to dequeue, no object is
- *     dequeued.
+ *   The number of objects dequeued, either 0 or n
  */
-static inline int __attribute__((always_inline))
+static inline unsigned int __attribute__((always_inline))
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 {
 	if (r->cons.sc_dequeue)
@@ -849,7 +819,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -867,7 +837,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1);
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -889,10 +859,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue(r, obj_p);
-	else
-		return rte_ring_mc_dequeue(r, obj_p);
+	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
 }
 
 /**
diff --git a/test/test-pipeline/pipeline_hash.c b/test/test-pipeline/pipeline_hash.c
index 10d2869..1ac0aa8 100644
--- a/test/test-pipeline/pipeline_hash.c
+++ b/test/test-pipeline/pipeline_hash.c
@@ -547,6 +547,6 @@ app_main_loop_rx_metadata(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
diff --git a/test/test-pipeline/runtime.c b/test/test-pipeline/runtime.c
index 42a6142..4e20669 100644
--- a/test/test-pipeline/runtime.c
+++ b/test/test-pipeline/runtime.c
@@ -98,7 +98,7 @@ app_main_loop_rx(void) {
 				app.rings_rx[i],
 				(void **) app.mbuf_rx.array,
 				n_mbufs);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -123,7 +123,7 @@ app_main_loop_worker(void) {
 			(void **) worker_mbuf->array,
 			app.burst_size_worker_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		do {
@@ -131,7 +131,7 @@ app_main_loop_worker(void) {
 				app.rings_tx[i ^ 1],
 				(void **) worker_mbuf->array,
 				app.burst_size_worker_write);
-		} while (ret < 0);
+		} while (ret == 0);
 	}
 }
 
@@ -152,7 +152,7 @@ app_main_loop_tx(void) {
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
 			app.burst_size_tx_read);
 
-		if (ret == -ENOENT)
+		if (ret == 0)
 			continue;
 
 		n_mbufs += app.burst_size_tx_read;
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index 666a451..112433b 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -117,20 +117,18 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		rand = RTE_MAX(rte_rand() % RING_SIZE, 1UL);
 		printf("%s: iteration %u, random shift: %u;\n",
 		    __func__, i, rand);
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rand));
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rand));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand) != 0);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
 
 		/* fill the ring */
-		TEST_RING_VERIFY(-ENOBUFS != rte_ring_enqueue_bulk(r, src,
-		    rsz));
+		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz) != 0);
 		TEST_RING_VERIFY(0 == rte_ring_free_count(r));
 		TEST_RING_VERIFY(rsz == rte_ring_count(r));
 		TEST_RING_VERIFY(rte_ring_full(r));
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(0 == rte_ring_dequeue_bulk(r, dst, rsz));
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -171,37 +169,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_sp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -217,37 +215,37 @@ test_ring_basic(void)
 	printf("enqueue 1 obj\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 1);
 	cur_src += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue 2 objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, 2);
 	cur_src += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("enqueue MAX_BULK objs\n");
 	ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 	cur_src += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
 	cur_dst += 1;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
 	cur_dst += 2;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
 	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 	cur_dst += MAX_BULK;
-	if (ret != 0)
+	if (ret == 0)
 		goto fail;
 
 	/* check data */
@@ -264,11 +262,11 @@ test_ring_basic(void)
 	for (i = 0; i<RING_SIZE/MAX_BULK; i++) {
 		ret = rte_ring_mp_enqueue_bulk(r, cur_src, MAX_BULK);
 		cur_src += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
 		cur_dst += MAX_BULK;
-		if (ret != 0)
+		if (ret == 0)
 			goto fail;
 	}
 
@@ -294,25 +292,25 @@ test_ring_basic(void)
 
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_enqueue_bulk(r, cur_src, num_elems);
 	cur_src += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
 	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
 	cur_dst += num_elems;
-	if (ret != 0) {
+	if (ret == 0) {
 		printf("Cannot dequeue2\n");
 		goto fail;
 	}
diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c
index 320c20c..8ccbdef 100644
--- a/test/test/test_ring_perf.c
+++ b/test/test/test_ring_perf.c
@@ -195,13 +195,13 @@ enqueue_bulk(void *p)
 
 	const uint64_t sp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_sp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sp_end = rte_rdtsc();
 
 	const uint64_t mp_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mp_enqueue_bulk(r, burst, size) != 0)
+		while (rte_ring_mp_enqueue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mp_end = rte_rdtsc();
 
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) != 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 09/14] ring: allow dequeue fns to return remaining entry count
                       ` (5 preceding siblings ...)
  2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
@ 2017-03-07 11:32  2%   ` Bruce Richardson
  6 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2017-03-07 11:32 UTC (permalink / raw)
  To: olivier.matz; +Cc: jerin.jacob, dev, Bruce Richardson

Add an extra parameter to the ring dequeue burst/bulk functions so that
those functions can optionally return the amount of remaining objs in the
ring. This information can be used by applications in a number of ways,
for instance, with single-consumer queues, it provides a max
dequeue size which is guaranteed to work.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 app/pdump/main.c                                   |  2 +-
 doc/guides/rel_notes/release_17_05.rst             |  8 ++
 drivers/crypto/null/null_crypto_pmd.c              |  2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c             |  3 +-
 drivers/net/ring/rte_eth_ring.c                    |  2 +-
 examples/distributor/main.c                        |  2 +-
 examples/load_balancer/runtime.c                   |  6 +-
 .../client_server_mp/mp_client/client.c            |  3 +-
 examples/packet_ordering/main.c                    |  6 +-
 examples/qos_sched/app_thread.c                    |  6 +-
 examples/quota_watermark/qw/main.c                 |  5 +-
 examples/server_node_efd/node/node.c               |  2 +-
 lib/librte_hash/rte_cuckoo_hash.c                  |  3 +-
 lib/librte_mempool/rte_mempool_ring.c              |  4 +-
 lib/librte_port/rte_port_frag.c                    |  3 +-
 lib/librte_port/rte_port_ring.c                    |  6 +-
 lib/librte_ring/rte_ring.h                         | 90 +++++++++++-----------
 test/test-pipeline/runtime.c                       |  6 +-
 test/test/test_link_bonding_mode4.c                |  3 +-
 test/test/test_pmd_ring_perf.c                     |  7 +-
 test/test/test_ring.c                              | 54 ++++++-------
 test/test/test_ring_perf.c                         | 20 +++--
 test/test/test_table_acl.c                         |  2 +-
 test/test/test_table_pipeline.c                    |  2 +-
 test/test/test_table_ports.c                       |  8 +-
 test/test/virtual_pmd.c                            |  4 +-
 26 files changed, 145 insertions(+), 114 deletions(-)

diff --git a/app/pdump/main.c b/app/pdump/main.c
index b88090d..3b13753 100644
--- a/app/pdump/main.c
+++ b/app/pdump/main.c
@@ -496,7 +496,7 @@ pdump_rxtx(struct rte_ring *ring, uint8_t vdev_id, struct pdump_stats *stats)
 
 	/* first dequeue packets from ring of primary process */
 	const uint16_t nb_in_deq = rte_ring_dequeue_burst(ring,
-			(void *)rxtx_bufs, BURST_SIZE);
+			(void *)rxtx_bufs, BURST_SIZE, NULL);
 	stats->dequeue_pkts += nb_in_deq;
 
 	if (nb_in_deq) {
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 249ad6e..563a74c 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -123,6 +123,8 @@ API Changes
   * added an extra parameter to the burst/bulk enqueue functions to
     return the number of free spaces in the ring after enqueue. This can
     be used by an application to implement its own watermark functionality.
+  * added an extra parameter to the burst/bulk dequeue functions to return
+    the number elements remaining in the ring after dequeue.
   * changed the return value of the enqueue and dequeue bulk functions to
     match that of the burst equivalents. In all cases, ring functions which
     operate on multiple packets now return the number of elements enqueued
@@ -135,6 +137,12 @@ API Changes
     - ``rte_ring_sc_dequeue_bulk``
     - ``rte_ring_dequeue_bulk``
 
+    NOTE: the above functions all have different parameters as well as
+    different return values, due to the other listed changes above. This
+    means that all instances of the functions in existing code will be
+    flagged by the compiler. The return value usage should be checked
+    while fixing the compiler error due to the extra parameter.
+
 ABI Changes
 -----------
 
diff --git a/drivers/crypto/null/null_crypto_pmd.c b/drivers/crypto/null/null_crypto_pmd.c
index ed5a9fc..f68ec8d 100644
--- a/drivers/crypto/null/null_crypto_pmd.c
+++ b/drivers/crypto/null/null_crypto_pmd.c
@@ -155,7 +155,7 @@ null_crypto_pmd_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
 	unsigned nb_dequeued;
 
 	nb_dequeued = rte_ring_dequeue_burst(qp->processed_pkts,
-			(void **)ops, nb_ops);
+			(void **)ops, nb_ops, NULL);
 	qp->qp_stats.dequeued_count += nb_dequeued;
 
 	return nb_dequeued;
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index f3ac9e2..96638af 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1008,7 +1008,8 @@ bond_ethdev_tx_burst_8023ad(void *queue, struct rte_mbuf **bufs,
 		struct port *port = &mode_8023ad_ports[slaves[i]];
 
 		slave_slow_nb_pkts[i] = rte_ring_dequeue_burst(port->tx_ring,
-				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS);
+				slow_pkts, BOND_MODE_8023AX_SLAVE_TX_PKTS,
+				NULL);
 		slave_nb_pkts[i] = slave_slow_nb_pkts[i];
 
 		for (j = 0; j < slave_slow_nb_pkts[i]; j++)
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index adbf478..77ef3a1 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -88,7 +88,7 @@ eth_ring_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
 	void **ptrs = (void *)&bufs[0];
 	struct ring_queue *r = q;
 	const uint16_t nb_rx = (uint16_t)rte_ring_dequeue_burst(r->rng,
-			ptrs, nb_bufs);
+			ptrs, nb_bufs, NULL);
 	if (r->rng->flags & RING_F_SC_DEQ)
 		r->rx_pkts.cnt += nb_rx;
 	else
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index cfd360b..5cb6185 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -330,7 +330,7 @@ lcore_tx(struct rte_ring *in_r)
 
 			struct rte_mbuf *bufs[BURST_SIZE];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE, NULL);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 1645994..8192c08 100644
--- a/examples/load_balancer/runtime.c
+++ b/examples/load_balancer/runtime.c
@@ -349,7 +349,8 @@ app_lcore_io_tx(
 			ret = rte_ring_sc_dequeue_bulk(
 				ring,
 				(void **) &lp->tx.mbuf_out[port].array[n_mbufs],
-				bsz_rd);
+				bsz_rd,
+				NULL);
 
 			if (unlikely(ret == 0))
 				continue;
@@ -504,7 +505,8 @@ app_lcore_worker(
 		ret = rte_ring_sc_dequeue_bulk(
 			ring_in,
 			(void **) lp->mbuf_in.array,
-			bsz_rd);
+			bsz_rd,
+			NULL);
 
 		if (unlikely(ret == 0))
 			continue;
diff --git a/examples/multi_process/client_server_mp/mp_client/client.c b/examples/multi_process/client_server_mp/mp_client/client.c
index dca9eb9..01b535c 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -279,7 +279,8 @@ main(int argc, char *argv[])
 		uint16_t i, rx_pkts;
 		uint8_t port;
 
-		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts, PKT_READ_SIZE);
+		rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts,
+				PKT_READ_SIZE, NULL);
 
 		if (unlikely(rx_pkts == 0)){
 			if (need_flush)
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index d268350..7719dad 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -462,7 +462,7 @@ worker_thread(void *args_ptr)
 
 		/* dequeue the mbufs from rx_to_workers ring */
 		burst_size = rte_ring_dequeue_burst(ring_in,
-				(void *)burst_buffer, MAX_PKTS_BURST);
+				(void *)burst_buffer, MAX_PKTS_BURST, NULL);
 		if (unlikely(burst_size == 0))
 			continue;
 
@@ -510,7 +510,7 @@ send_thread(struct send_thread_args *args)
 
 		/* deque the mbufs from workers_to_tx ring */
 		nb_dq_mbufs = rte_ring_dequeue_burst(args->ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(nb_dq_mbufs == 0))
 			continue;
@@ -595,7 +595,7 @@ tx_thread(struct rte_ring *ring_in)
 
 		/* deque the mbufs from workers_to_tx ring */
 		dqnum = rte_ring_dequeue_burst(ring_in,
-				(void *)mbufs, MAX_PKTS_BURST);
+				(void *)mbufs, MAX_PKTS_BURST, NULL);
 
 		if (unlikely(dqnum == 0))
 			continue;
diff --git a/examples/qos_sched/app_thread.c b/examples/qos_sched/app_thread.c
index 0c81a15..15f117f 100644
--- a/examples/qos_sched/app_thread.c
+++ b/examples/qos_sched/app_thread.c
@@ -179,7 +179,7 @@ app_tx_thread(struct thread_conf **confs)
 
 	while ((conf = confs[conf_idx])) {
 		retval = rte_ring_sc_dequeue_bulk(conf->tx_ring, (void **)mbufs,
-					burst_conf.qos_dequeue);
+					burst_conf.qos_dequeue, NULL);
 		if (likely(retval != 0)) {
 			app_send_packets(conf, mbufs, burst_conf.qos_dequeue);
 
@@ -218,7 +218,7 @@ app_worker_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
@@ -254,7 +254,7 @@ app_mixed_thread(struct thread_conf **confs)
 
 		/* Read packet from the ring */
 		nb_pkt = rte_ring_sc_dequeue_burst(conf->rx_ring, (void **)mbufs,
-					burst_conf.ring_burst);
+					burst_conf.ring_burst, NULL);
 		if (likely(nb_pkt)) {
 			int nb_sent = rte_sched_port_enqueue(conf->sched_port, mbufs,
 					nb_pkt);
diff --git a/examples/quota_watermark/qw/main.c b/examples/quota_watermark/qw/main.c
index 57df8ef..2dcddea 100644
--- a/examples/quota_watermark/qw/main.c
+++ b/examples/quota_watermark/qw/main.c
@@ -247,7 +247,8 @@ pipeline_stage(__attribute__((unused)) void *args)
 			}
 
 			/* Dequeue up to quota mbuf from rx */
-			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts, *quota);
+			nb_dq_pkts = rte_ring_dequeue_burst(rx, pkts,
+					*quota, NULL);
 			if (unlikely(nb_dq_pkts < 0))
 				continue;
 
@@ -305,7 +306,7 @@ send_stage(__attribute__((unused)) void *args)
 
 			/* Dequeue packets from tx and send them */
 			nb_dq_pkts = (uint16_t) rte_ring_dequeue_burst(tx,
-					(void *) tx_pkts, *quota);
+					(void *) tx_pkts, *quota, NULL);
 			rte_eth_tx_burst(dest_port_id, 0, tx_pkts, nb_dq_pkts);
 
 			/* TODO: Check if nb_dq_pkts == nb_tx_pkts? */
diff --git a/examples/server_node_efd/node/node.c b/examples/server_node_efd/node/node.c
index 9ec6a05..f780b92 100644
--- a/examples/server_node_efd/node/node.c
+++ b/examples/server_node_efd/node/node.c
@@ -392,7 +392,7 @@ main(int argc, char *argv[])
 		 */
 		while (rx_pkts > 0 &&
 				unlikely(rte_ring_dequeue_bulk(rx_ring, pkts,
-					rx_pkts) == 0))
+					rx_pkts, NULL) == 0))
 			rx_pkts = (uint16_t)RTE_MIN(rte_ring_count(rx_ring),
 					PKT_READ_SIZE);
 
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index 6552199..645c0cf 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -536,7 +536,8 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
 		if (cached_free_slots->len == 0) {
 			/* Need to get another burst of free slots from global ring */
 			n_slots = rte_ring_mc_dequeue_burst(h->free_slots,
-					cached_free_slots->objs, LCORE_CACHE_SIZE);
+					cached_free_slots->objs,
+					LCORE_CACHE_SIZE, NULL);
 			if (n_slots == 0)
 				return -ENOSPC;
 
diff --git a/lib/librte_mempool/rte_mempool_ring.c b/lib/librte_mempool/rte_mempool_ring.c
index 9b8fd2b..5c132bf 100644
--- a/lib/librte_mempool/rte_mempool_ring.c
+++ b/lib/librte_mempool/rte_mempool_ring.c
@@ -58,14 +58,14 @@ static int
 common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_mc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static int
 common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
 {
 	return rte_ring_sc_dequeue_bulk(mp->pool_data,
-			obj_table, n) == 0 ? -ENOBUFS : 0;
+			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
 }
 
 static unsigned
diff --git a/lib/librte_port/rte_port_frag.c b/lib/librte_port/rte_port_frag.c
index 0fcace9..320407e 100644
--- a/lib/librte_port/rte_port_frag.c
+++ b/lib/librte_port/rte_port_frag.c
@@ -186,7 +186,8 @@ rte_port_ring_reader_frag_rx(void *port,
 		/* If "pkts" buffer is empty, read packet burst from ring */
 		if (p->n_pkts == 0) {
 			p->n_pkts = rte_ring_sc_dequeue_burst(p->ring,
-				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX);
+				(void **) p->pkts, RTE_PORT_IN_BURST_SIZE_MAX,
+				NULL);
 			RTE_PORT_RING_READER_FRAG_STATS_PKTS_IN_ADD(p, p->n_pkts);
 			if (p->n_pkts == 0)
 				return n_pkts_out;
diff --git a/lib/librte_port/rte_port_ring.c b/lib/librte_port/rte_port_ring.c
index 9fadac7..492b0e7 100644
--- a/lib/librte_port/rte_port_ring.c
+++ b/lib/librte_port/rte_port_ring.c
@@ -111,7 +111,8 @@ rte_port_ring_reader_rx(void *port, struct rte_mbuf **pkts, uint32_t n_pkts)
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_sc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
@@ -124,7 +125,8 @@ rte_port_ring_multi_reader_rx(void *port, struct rte_mbuf **pkts,
 	struct rte_port_ring_reader *p = (struct rte_port_ring_reader *) port;
 	uint32_t nb_rx;
 
-	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts, n_pkts);
+	nb_rx = rte_ring_mc_dequeue_burst(p->ring, (void **) pkts,
+			n_pkts, NULL);
 	RTE_PORT_RING_READER_STATS_PKTS_IN_ADD(p, nb_rx);
 
 	return nb_rx;
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 73b1c26..ca25dd7 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -491,7 +491,8 @@ __rte_ring_sp_do_enqueue(struct rte_ring *r, void * const *obj_table,
 
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -500,11 +501,6 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 	unsigned int i;
 	uint32_t mask = r->mask;
 
-	/* Avoid the unnecessary cmpset operation below, which is also
-	 * potentially harmful when n equals 0. */
-	if (n == 0)
-		return 0;
-
 	/* move cons.head atomically */
 	do {
 		/* Restore n as it may change every loop */
@@ -519,15 +515,11 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		entries = (prod_tail - cons_head);
 
 		/* Set the actual entries for dequeue */
-		if (n > entries) {
-			if (behavior == RTE_RING_QUEUE_FIXED)
-				return 0;
-			else {
-				if (unlikely(entries == 0))
-					return 0;
-				n = entries;
-			}
-		}
+		if (n > entries)
+			n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+		if (unlikely(n == 0))
+			goto end;
 
 		cons_next = cons_head + n;
 		success = rte_atomic32_cmpset(&r->cons.head, cons_head,
@@ -546,7 +538,9 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
 		rte_pause();
 
 	r->cons.tail = cons_next;
-
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -570,7 +564,8 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table,
  */
 static inline unsigned int __attribute__((always_inline))
 __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned n, enum rte_ring_queue_behavior behavior)
+		 unsigned int n, enum rte_ring_queue_behavior behavior,
+		 unsigned int *available)
 {
 	uint32_t cons_head, prod_tail;
 	uint32_t cons_next, entries;
@@ -585,15 +580,11 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	 * and size(ring)-1. */
 	entries = prod_tail - cons_head;
 
-	if (n > entries) {
-		if (behavior == RTE_RING_QUEUE_FIXED)
-			return 0;
-		else {
-			if (unlikely(entries == 0))
-				return 0;
-			n = entries;
-		}
-	}
+	if (n > entries)
+		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : entries;
+
+	if (unlikely(entries == 0))
+		goto end;
 
 	cons_next = cons_head + n;
 	r->cons.head = cons_next;
@@ -603,6 +594,9 @@ __rte_ring_sc_do_dequeue(struct rte_ring *r, void **obj_table,
 	rte_smp_rmb();
 
 	r->cons.tail = cons_next;
+end:
+	if (available != NULL)
+		*available = entries - n;
 	return n;
 }
 
@@ -749,9 +743,11 @@ rte_ring_enqueue(struct rte_ring *r, void *obj)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -768,9 +764,11 @@ rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
+			available);
 }
 
 /**
@@ -790,12 +788,13 @@ rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
  *   The number of objects dequeued, either 0 or n
  */
 static inline unsigned int __attribute__((always_inline))
-rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
+		unsigned int *available)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n);
+		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
 }
 
 /**
@@ -816,7 +815,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned n)
 static inline int __attribute__((always_inline))
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1)  ? 0 : -ENOBUFS;
+	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOBUFS;
 }
 
 /**
@@ -834,7 +833,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -856,7 +855,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static inline int __attribute__((always_inline))
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_dequeue_bulk(r, obj_p, 1) ? 0 : -ENOBUFS;
+	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOBUFS;
 }
 
 /**
@@ -1046,9 +1045,11 @@ rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_mc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_mc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1066,9 +1067,11 @@ rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - n: Actual number of objects dequeued, 0 if ring is empty
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_sc_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_VARIABLE);
+	return __rte_ring_sc_do_dequeue(r, obj_table, n,
+			RTE_RING_QUEUE_VARIABLE, available);
 }
 
 /**
@@ -1088,12 +1091,13 @@ rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
  *   - Number of objects dequeued
  */
 static inline unsigned __attribute__((always_inline))
-rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table, unsigned n)
+rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
+		unsigned int n, unsigned int *available)
 {
 	if (r->cons.sc_dequeue)
-		return rte_ring_sc_dequeue_burst(r, obj_table, n);
+		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
 	else
-		return rte_ring_mc_dequeue_burst(r, obj_table, n);
+		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
 }
 
 #ifdef __cplusplus
diff --git a/test/test-pipeline/runtime.c b/test/test-pipeline/runtime.c
index c06ff54..8970e1c 100644
--- a/test/test-pipeline/runtime.c
+++ b/test/test-pipeline/runtime.c
@@ -121,7 +121,8 @@ app_main_loop_worker(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_rx[i],
 			(void **) worker_mbuf->array,
-			app.burst_size_worker_read);
+			app.burst_size_worker_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
@@ -151,7 +152,8 @@ app_main_loop_tx(void) {
 		ret = rte_ring_sc_dequeue_bulk(
 			app.rings_tx[i],
 			(void **) &app.mbuf_tx[i].array[n_mbufs],
-			app.burst_size_tx_read);
+			app.burst_size_tx_read,
+			NULL);
 
 		if (ret == 0)
 			continue;
diff --git a/test/test/test_link_bonding_mode4.c b/test/test/test_link_bonding_mode4.c
index 8df28b4..15091b1 100644
--- a/test/test/test_link_bonding_mode4.c
+++ b/test/test/test_link_bonding_mode4.c
@@ -193,7 +193,8 @@ static uint8_t lacpdu_rx_count[RTE_MAX_ETHPORTS] = {0, };
 static int
 slave_get_pkts(struct slave_conf *slave, struct rte_mbuf **buf, uint16_t size)
 {
-	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf, size);
+	return rte_ring_dequeue_burst(slave->tx_queue, (void **)buf,
+			size, NULL);
 }
 
 /*
diff --git a/test/test/test_pmd_ring_perf.c b/test/test/test_pmd_ring_perf.c
index 045a7f2..004882a 100644
--- a/test/test/test_pmd_ring_perf.c
+++ b/test/test/test_pmd_ring_perf.c
@@ -67,7 +67,7 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t eth_start = rte_rdtsc();
@@ -99,7 +99,7 @@ test_single_enqueue_dequeue(void)
 	rte_compiler_barrier();
 	for (i = 0; i < iterations; i++) {
 		rte_ring_enqueue_bulk(r, &burst, 1, NULL);
-		rte_ring_dequeue_bulk(r, &burst, 1);
+		rte_ring_dequeue_bulk(r, &burst, 1, NULL);
 	}
 	const uint64_t sc_end = rte_rdtsc_precise();
 	rte_compiler_barrier();
@@ -133,7 +133,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, (void *)burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, (void *)burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, (void *)burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
diff --git a/test/test/test_ring.c b/test/test/test_ring.c
index b0ca88b..858ebc1 100644
--- a/test/test/test_ring.c
+++ b/test/test/test_ring.c
@@ -119,7 +119,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		    __func__, i, rand);
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rand,
 				NULL) != 0);
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand) == rand);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rand,
+				NULL) == rand);
 
 		/* fill the ring */
 		TEST_RING_VERIFY(rte_ring_enqueue_bulk(r, src, rsz, NULL) != 0);
@@ -129,7 +130,8 @@ test_ring_basic_full_empty(void * const src[], void *dst[])
 		TEST_RING_VERIFY(0 == rte_ring_empty(r));
 
 		/* empty the ring */
-		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz) == rsz);
+		TEST_RING_VERIFY(rte_ring_dequeue_bulk(r, dst, rsz,
+				NULL) == rsz);
 		TEST_RING_VERIFY(rsz == rte_ring_free_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_count(r));
 		TEST_RING_VERIFY(0 == rte_ring_full(r));
@@ -186,19 +188,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -232,19 +234,19 @@ test_ring_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret == 0)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if (ret == 0)
 		goto fail;
@@ -265,7 +267,7 @@ test_ring_basic(void)
 		cur_src += MAX_BULK;
 		if (ret == 0)
 			goto fail;
-		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_bulk(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if (ret == 0)
 			goto fail;
@@ -303,13 +305,13 @@ test_ring_basic(void)
 		printf("Cannot enqueue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue\n");
 		goto fail;
 	}
-	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems);
+	ret = rte_ring_dequeue_bulk(r, cur_dst, num_elems, NULL);
 	cur_dst += num_elems;
 	if (ret == 0) {
 		printf("Cannot dequeue2\n");
@@ -390,19 +392,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1) ;
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -451,19 +453,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available memory space for the exact MAX_BULK entries */
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_sc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -505,19 +507,19 @@ test_ring_burst_basic(void)
 		goto fail;
 
 	printf("dequeue 1 obj\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 1, NULL);
 	cur_dst += 1;
 	if ((ret & RTE_RING_SZ_MASK) != 1)
 		goto fail;
 
 	printf("dequeue 2 objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
 	printf("dequeue MAX_BULK objs\n");
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 		goto fail;
@@ -539,7 +541,7 @@ test_ring_burst_basic(void)
 		cur_src += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
@@ -578,19 +580,19 @@ test_ring_burst_basic(void)
 
 	printf("Test dequeue without enough objects \n");
 	for (i = 0; i<RING_SIZE/MAX_BULK - 1; i++) {
-		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+		ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 		cur_dst += MAX_BULK;
 		if ((ret & RTE_RING_SZ_MASK) != MAX_BULK)
 			goto fail;
 	}
 
 	/* Available objects - the exact MAX_BULK */
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK);
+	ret = rte_ring_mc_dequeue_burst(r, cur_dst, MAX_BULK, NULL);
 	cur_dst += MAX_BULK - 3;
 	if ((ret & RTE_RING_SZ_MASK) != MAX_BULK - 3)
 		goto fail;
@@ -613,7 +615,7 @@ test_ring_burst_basic(void)
 	if ((ret & RTE_RING_SZ_MASK) != 2)
 		goto fail;
 
-	ret = rte_ring_dequeue_burst(r, cur_dst, 2);
+	ret = rte_ring_dequeue_burst(r, cur_dst, 2, NULL);
 	cur_dst += 2;
 	if (ret != 2)
 		goto fail;
@@ -753,7 +755,7 @@ test_ring_basic_ex(void)
 		goto fail_test;
 	}
 
-	ret = rte_ring_dequeue_burst(rp, obj, 2);
+	ret = rte_ring_dequeue_burst(rp, obj, 2, NULL);
 	if (ret != 2) {
 		printf("test_ring_basic_ex: rte_ring_dequeue_burst fails \n");
 		goto fail_test;
diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c
index f95a8e9..ed89896 100644
--- a/test/test/test_ring_perf.c
+++ b/test/test/test_ring_perf.c
@@ -152,12 +152,12 @@ test_empty_dequeue(void)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0]);
+		rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[0], NULL);
 	const uint64_t mc_end = rte_rdtsc();
 
 	printf("SC empty dequeue: %.2F\n",
@@ -230,13 +230,13 @@ dequeue_bulk(void *p)
 
 	const uint64_t sc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_sc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_sc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t sc_end = rte_rdtsc();
 
 	const uint64_t mc_start = rte_rdtsc();
 	for (i = 0; i < iterations; i++)
-		while (rte_ring_mc_dequeue_bulk(r, burst, size) == 0)
+		while (rte_ring_mc_dequeue_bulk(r, burst, size, NULL) == 0)
 			rte_pause();
 	const uint64_t mc_end = rte_rdtsc();
 
@@ -325,7 +325,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -333,7 +334,8 @@ test_burst_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_burst(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_burst(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_burst(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
@@ -361,7 +363,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_sp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_sc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_sc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t sc_end = rte_rdtsc();
 
@@ -369,7 +372,8 @@ test_bulk_enqueue_dequeue(void)
 		for (i = 0; i < iterations; i++) {
 			rte_ring_mp_enqueue_bulk(r, burst,
 					bulk_sizes[sz], NULL);
-			rte_ring_mc_dequeue_bulk(r, burst, bulk_sizes[sz]);
+			rte_ring_mc_dequeue_bulk(r, burst,
+					bulk_sizes[sz], NULL);
 		}
 		const uint64_t mc_end = rte_rdtsc();
 
diff --git a/test/test/test_table_acl.c b/test/test/test_table_acl.c
index b3bfda4..4d43be7 100644
--- a/test/test/test_table_acl.c
+++ b/test/test/test_table_acl.c
@@ -713,7 +713,7 @@ test_pipeline_single_filter(int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0) {
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/test/test/test_table_pipeline.c b/test/test/test_table_pipeline.c
index 36bfeda..b58aa5d 100644
--- a/test/test/test_table_pipeline.c
+++ b/test/test/test_table_pipeline.c
@@ -494,7 +494,7 @@ test_pipeline_single_filter(int test_type, int expected_count)
 		void *objs[RING_TX_SIZE];
 		struct rte_mbuf *mbuf;
 
-		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10);
+		ret = rte_ring_sc_dequeue_burst(rings_tx[i], objs, 10, NULL);
 		if (ret <= 0)
 			printf("Got no objects from ring %d - error code %d\n",
 				i, ret);
diff --git a/test/test/test_table_ports.c b/test/test/test_table_ports.c
index 395f4f3..39592ce 100644
--- a/test/test/test_table_ports.c
+++ b/test/test/test_table_ports.c
@@ -163,7 +163,7 @@ test_port_ring_writer(void)
 	rte_port_ring_writer_ops.f_flush(port);
 	expected_pkts = 1;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -7;
@@ -178,7 +178,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -193,7 +193,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -8;
@@ -208,7 +208,7 @@ test_port_ring_writer(void)
 
 	expected_pkts = RTE_PORT_IN_BURST_SIZE_MAX;
 	received_pkts = rte_ring_sc_dequeue_burst(port_ring_writer_params.ring,
-		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz);
+		(void **)res_mbuf, port_ring_writer_params.tx_burst_sz, NULL);
 
 	if (received_pkts < expected_pkts)
 		return -9;
diff --git a/test/test/virtual_pmd.c b/test/test/virtual_pmd.c
index 39e070c..b209355 100644
--- a/test/test/virtual_pmd.c
+++ b/test/test/virtual_pmd.c
@@ -342,7 +342,7 @@ virtual_ethdev_rx_burst_success(void *queue __rte_unused,
 	dev_private = vrtl_eth_dev->data->dev_private;
 
 	rx_count = rte_ring_dequeue_burst(dev_private->rx_queue, (void **) bufs,
-			nb_pkts);
+			nb_pkts, NULL);
 
 	/* increments ipackets count */
 	dev_private->eth_stats.ipackets += rx_count;
@@ -508,7 +508,7 @@ virtual_ethdev_get_mbufs_from_tx_queue(uint8_t port_id,
 
 	dev_private = vrtl_eth_dev->data->dev_private;
 	return rte_ring_dequeue_burst(dev_private->tx_queue, (void **)pkt_burst,
-		burst_length);
+		burst_length, NULL);
 }
 
 static uint8_t
-- 
2.9.3

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] Issues with ixgbe  and rte_flow
  @ 2017-03-08 15:41  3%     ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2017-03-08 15:41 UTC (permalink / raw)
  To: Le Scouarnec Nicolas
  Cc: Lu, Wenzhuo, dev, users, Jan Medala, Evgeny Schemeilin,
	Stephen Hurd, Jerin Jacob, Rahul Lakkireddy, John Daley,
	Matej Vido, Helin Zhang, Konstantin Ananyev, Jingjing Wu,
	Jing Chen, Alejandro Lucero, Harish Patil, Rasesh Mody,
	Andrew Rybchenko, Nelio Laranjeiro, Vasily Philipov,
	Pascal Mazon, Thomas Monjalon

CC'ing users@dpdk.org since this issue primarily affects rte_flow users, and
several PMD maintainers to get their opinion on the matter, see below.

On Wed, Mar 08, 2017 at 09:24:26AM +0000, Le Scouarnec Nicolas wrote:
> My response is inline bellow, and further comment on the code excerpt also
> 
> 
> From: Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Sent: Wednesday, March 8, 2017 4:16 AM
> To: Le Scouarnec Nicolas; dev@dpdk.org; Adrien Mazarguil (adrien.mazarguil@6wind.com)
> Cc: Yigit, Ferruh
> Subject: RE: Issues with ixgbe and rte_flow
>     
> >> I have been using the new API rte_flow to program filtering on an X540 (ixgbe)
> >> NIC. My goal is to send packets from different VLANs to different queues
> >> (filtering which should be supported by flow director as far as I understand). I
> >> enclosed the setup code at the bottom of this email.
> >> For reference, here is the setup code I use
> >>
> >>       vlan_spec.tci = vlan_be;
> >>       vlan_spec.tpid = 0;
> >>
> >>       vlan_mask.tci = rte_cpu_to_be_16(0x0fff);
> >>       vlan_mask.tpid =  0;
> 
> >To my opinion, this setting is not right. As we know, vlan tag is inserted between MAC source address and Ether type.
> >So if we have a MAC+VLAN+IPv4 packet, the vlan_spec.tpid should be 0x8100, the eth_spec.type should be 0x0800.
> >+ Adrien, the author. He can correct me if I'm wrong.

That's right, however the confusion is understandable, perhaps the
documentation should be clearer. It currently states what follows without
describing the reason:

 /**
  * RTE_FLOW_ITEM_TYPE_VLAN
  *
  * Matches an 802.1Q/ad VLAN tag.
  *
  * This type normally follows either RTE_FLOW_ITEM_TYPE_ETH or
  * RTE_FLOW_ITEM_TYPE_VLAN.
  */

> Ok, I apologize, you're right. Being more used to the software-side than to the hardware-side, I misunderstood struct rte_flow_item_vlan and though it was the "equivalent" of struct vlan_hdr, in which case the vlan_hdr contains the type of the encapsulated frame.
> 
> (  /**
>  * Ethernet VLAN Header.
>  * Contains the 16-bit VLAN Tag Control Identifier and the Ethernet type
>  * of the encapsulated frame.
>  */
> struct vlan_hdr {
> 	uint16_t vlan_tci; /**< Priority (3) + CFI (1) + Identifier Code (12) */
> 	uint16_t eth_proto;/**< Ethernet type of encapsulated frame. */
> } __attribute__((__packed__));        )

Indeed, struct vlan_hdr and struct rte_flow_item_vlan are not mapped at the
same offset; the former includes EtherType of the inner packet (eth_proto),
while the latter describes the inserted VLAN header itself starting with
TPID.

This approach was chosen for rte_flow for consistency with the fact each
pattern item describes exactly one protocol header, even though in the case
of VLAN and other layer 2.5 protocols, some happen to be embedded.
IPv4/IPv6 options will be provided as separate items in a similar fashion.

It also allows adding/removing VLAN tags to an existing rule without
modifying the EtherType of the inner frame.

Now assuming you're not the only one facing that issue, if the current
definition does not make sense, perhaps we can update the API before it's
too late. I'll attempt to summarize it with an example below.

In any case, matching nonspecific VLAN-tagged and QinQ UDPv4 packets in
testpmd is written as:

 flow create 0 pattern eth / vlan / ipv4 / udp / end actions queue 1 / end
 flow create 0 pattern eth / vlan / vlan / ipv4 / udp / end actions queue 1 / end

However, with the current API described above, specifying inner/outer
EtherTypes for the above packets yields (as a reminder, 0x8100 stands for
VLAN, 0x8000 for IPv4 and 0x88A8 for QinQ):

#1

 flow create 0 pattern eth type is 0x8000 / vlan tpid is 0x8100 / ipv4 / udp / actions queue 1 / end
 flow create 0 pattern eth type is 0x8000 / vlan tpid is 0x88A8 / vlan tpid is 0x8100 / ipv4 / udp / actions queue 1 / end

Instead of the arguably more accurate (renaming "tpid" to "inner_type" for
clarity):

#2

 flow create 0 pattern eth type is 0x8100 / vlan type is 0x8000 / ipv4 / udp / actions queue 1 / end
 flow create 0 pattern eth type is 0x88A8 / vlan inner_type is 0x8100 / vlan inner_type is 0x8000 / ipv4 / udp / actions queue 1 / end

So, should the VLAN item be updated to behave as described in #2?

Note: doing so will cause a serious API/ABI breakage, I know it was not
supposed to happen according to the rte_flow sales pitch, but hey.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
  @ 2017-03-09  5:59  3%     ` Xing, Beilei
  2017-03-09 10:01  0%       ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Xing, Beilei @ 2017-03-09  5:59 UTC (permalink / raw)
  To: Yigit, Ferruh, Wu, Jingjing
  Cc: Zhang, Helin, dev, Iremonger, Bernard, Stroe, Laura



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, March 8, 2017 11:50 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Iremonger,
> Bernard <bernard.iremonger@intel.com>; Stroe, Laura
> <laura.stroe@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
> 
> On 3/3/2017 9:31 AM, Beilei Xing wrote:
> > Add new admin queue function and extended fields in DCR 288:
> >  - Add admin queue function for Replace filter
> >    command (Opcode: 0x025F)
> >  - Add General fields for Add/Remove Cloud filters
> >    command
> >
> > This patch will be removed to base driver in future.
> >
> > Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> > Signed-off-by: Stroe Laura <laura.stroe@intel.com>
> > Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
> > Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> > ---
> >  drivers/net/i40e/i40e_ethdev.h | 106 ++++++++++++++++++++++++++++
> >  drivers/net/i40e/i40e_flow.c   | 152
> +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 258 insertions(+)
> >
> > diff --git a/drivers/net/i40e/i40e_ethdev.h
> > b/drivers/net/i40e/i40e_ethdev.h index f545850..3a49865 100644
> > --- a/drivers/net/i40e/i40e_ethdev.h
> > +++ b/drivers/net/i40e/i40e_ethdev.h
> > @@ -729,6 +729,100 @@ struct i40e_valid_pattern {
> >  	parse_filter_t parse_filter;
> >  };
> >
> > +/* Support replace filter */
> > +
> > +/* i40e_aqc_add_remove_cloud_filters_element_big_data is used when
> > + * I40E_AQC_ADD_REM_CLOUD_CMD_BIG_BUFFER flag is set. refer to
> > + * DCR288
> 
> Please do not refer to DCR, unless you can provide a public link for it.
OK, got it.

> 
> > + */
> > +struct i40e_aqc_add_remove_cloud_filters_element_big_data {
> > +	struct i40e_aqc_add_remove_cloud_filters_element_data element;
> 
> What is the difference between
> "i40e_aqc_add_remove_cloud_filters_element_big_data" and
> "i40e_aqc_add_remove_cloud_filters_element_data", why need big_data
> one?

As ' Add/Remove Cloud filters -command buffer ' is changed in the DCR288, 'general fields' exists only when big_buffer is set.
But we don't want to change the  " i40e_aqc_add_remove_cloud_filters_element_data " as it will cause ABI/API change in kernel driver.

> 
> > +	uint16_t     general_fields[32];
> 
> Not very useful variable name.

It's the name from DCR.

> 
> <...>
> 
> > +/* Replace filter Command 0x025F
> > + * uses the i40e_aqc_replace_cloud_filters,
> > + * and the generic indirect completion structure  */ struct
> > +i40e_filter_data {
> > +	uint8_t filter_type;
> > +	uint8_t input[3];
> > +};
> > +
> > +struct i40e_aqc_replace_cloud_filters_cmd {
> 
> Is replace does something different than remove old and add new cloud
> filter?

It's just like remove an old filter and add a new filter.
It can replace both l1 filter and cloud filter.

> 
> <...>
> 
> > +enum i40e_status_code i40e_aq_add_cloud_filters_big_buffer(struct
> i40e_hw *hw,
> > +	   uint16_t seid,
> > +	   struct i40e_aqc_add_remove_cloud_filters_element_big_data
> *filters,
> > +	   uint8_t filter_count);
> > +enum i40e_status_code i40e_aq_remove_cloud_filters_big_buffer(
> > +	struct i40e_hw *hw, uint16_t seid,
> > +	struct i40e_aqc_add_remove_cloud_filters_element_big_data
> *filters,
> > +	uint8_t filter_count);
> > +enum i40e_status_code i40e_aq_replace_cloud_filters(struct i40e_hw
> *hw,
> > +		    struct i40e_aqc_replace_cloud_filters_cmd *filters,
> > +		    struct i40e_aqc_replace_cloud_filters_cmd_buf
> *cmd_buf);
> > +
> 
> Do you need these function declarations?
We can remove it if we define them with "static".

> 
> >  #define I40E_DEV_TO_PCI(eth_dev) \
> >  	RTE_DEV_TO_PCI((eth_dev)->device)
> >
> > diff --git a/drivers/net/i40e/i40e_flow.c
> > b/drivers/net/i40e/i40e_flow.c index f163ce5..3c49228 100644
> > --- a/drivers/net/i40e/i40e_flow.c
> > +++ b/drivers/net/i40e/i40e_flow.c
> > @@ -1874,3 +1874,155 @@ i40e_flow_flush_tunnel_filter(struct i40e_pf
> > *pf)
> >
> >  	return ret;
> >  }
> > +
> > +#define i40e_aqc_opc_replace_cloud_filters 0x025F #define
> > +I40E_AQC_ADD_REM_CLOUD_CMD_BIG_BUFFER 1
> > +/**
> > + * i40e_aq_add_cloud_filters_big_buffer
> > + * @hw: pointer to the hardware structure
> > + * @seid: VSI seid to add cloud filters from
> > + * @filters: Buffer which contains the filters in big buffer to be
> > +added
> > + * @filter_count: number of filters contained in the buffer
> > + *
> > + * Set the cloud filters for a given VSI.  The contents of the
> > + * i40e_aqc_add_remove_cloud_filters_element_big_data are filled
> > + * in by the caller of the function.
> > + *
> > + **/
> > +enum i40e_status_code i40e_aq_add_cloud_filters_big_buffer(
> 
> There are already non big_buffer versions of these functions, like
> "i40e_aq_add_cloud_filters()" why big_data version required, what it does
> differently?

Parameters are different.
We add i40e_aq_add_cloud_filters_big_buffer to handle structure " i40e_aqc_add_remove_cloud_filters_element_data " which includes general_fields.

> 
> And is there a reason that these functions are not static? (For this patch they
> are not used at all and will cause build error, but my question is after they
> started to be used)

No.. same with the patch for Pipeline Personalization Profile, it's designed according to base code style.

> 
> <...>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 4/4] net/i40e: refine consistent tunnel filter
  @ 2017-03-09  6:11  3%     ` Xing, Beilei
  0 siblings, 0 replies; 200+ results
From: Xing, Beilei @ 2017-03-09  6:11 UTC (permalink / raw)
  To: Yigit, Ferruh, Wu, Jingjing; +Cc: Zhang, Helin, dev



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, March 8, 2017 11:51 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 4/4] net/i40e: refine consistent tunnel filter
> 
> On 3/3/2017 9:31 AM, Beilei Xing wrote:
> > Add i40e_tunnel_type enumeration type to refine consistent tunnel
> > filter, it will be esay to add new tunnel type for
> 
> s/esay/easy
> 
> > i40e.
> >
> > Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> 
> <...>
> 
> >  /**
> > + * Tunnel type.
> > + */
> > +enum i40e_tunnel_type {
> > +	I40E_TUNNEL_TYPE_NONE = 0,
> > +	I40E_TUNNEL_TYPE_VXLAN,
> > +	I40E_TUNNEL_TYPE_GENEVE,
> > +	I40E_TUNNEL_TYPE_TEREDO,
> > +	I40E_TUNNEL_TYPE_NVGRE,
> > +	I40E_TUNNEL_TYPE_IP_IN_GRE,
> > +	I40E_L2_TUNNEL_TYPE_E_TAG,
> > +	I40E_TUNNEL_TYPE_MAX,
> > +};
> 
> Same question here, there is already "rte_eth_tunnel_type", why driver is
> duplicating the structure?
> 

Same with " struct i40e_tunnel_filter_conf ", to avoid ABI change, we create it in PMD to add new tunnel type easily, like MPLS.

> <...>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
  2017-03-09  5:59  3%     ` Xing, Beilei
@ 2017-03-09 10:01  0%       ` Ferruh Yigit
  2017-03-09 10:43  0%         ` Xing, Beilei
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2017-03-09 10:01 UTC (permalink / raw)
  To: Xing, Beilei, Wu, Jingjing
  Cc: Zhang, Helin, dev, Iremonger, Bernard, Stroe, Laura

On 3/9/2017 5:59 AM, Xing, Beilei wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Wednesday, March 8, 2017 11:50 PM
>> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
>> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Iremonger,
>> Bernard <bernard.iremonger@intel.com>; Stroe, Laura
>> <laura.stroe@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
>>
>> On 3/3/2017 9:31 AM, Beilei Xing wrote:
>>> Add new admin queue function and extended fields in DCR 288:
>>>  - Add admin queue function for Replace filter
>>>    command (Opcode: 0x025F)
>>>  - Add General fields for Add/Remove Cloud filters
>>>    command
>>>
>>> This patch will be removed to base driver in future.
>>>
>>> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
>>> Signed-off-by: Stroe Laura <laura.stroe@intel.com>
>>> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
>>> Signed-off-by: Beilei Xing <beilei.xing@intel.com>
>>> ---
>>>  drivers/net/i40e/i40e_ethdev.h | 106 ++++++++++++++++++++++++++++
>>>  drivers/net/i40e/i40e_flow.c   | 152
>> +++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 258 insertions(+)
>>>
>>> diff --git a/drivers/net/i40e/i40e_ethdev.h
>>> b/drivers/net/i40e/i40e_ethdev.h index f545850..3a49865 100644
>>> --- a/drivers/net/i40e/i40e_ethdev.h
>>> +++ b/drivers/net/i40e/i40e_ethdev.h
>>> @@ -729,6 +729,100 @@ struct i40e_valid_pattern {
>>>  	parse_filter_t parse_filter;
>>>  };
>>>
>>> +/* Support replace filter */
>>> +
>>> +/* i40e_aqc_add_remove_cloud_filters_element_big_data is used when
>>> + * I40E_AQC_ADD_REM_CLOUD_CMD_BIG_BUFFER flag is set. refer to
>>> + * DCR288
>>
>> Please do not refer to DCR, unless you can provide a public link for it.
> OK, got it.
> 
>>
>>> + */
>>> +struct i40e_aqc_add_remove_cloud_filters_element_big_data {
>>> +	struct i40e_aqc_add_remove_cloud_filters_element_data element;
>>
>> What is the difference between
>> "i40e_aqc_add_remove_cloud_filters_element_big_data" and
>> "i40e_aqc_add_remove_cloud_filters_element_data", why need big_data
>> one?
> 
> As ' Add/Remove Cloud filters -command buffer ' is changed in the DCR288, 'general fields' exists only when big_buffer is set.

What does it mean having "big_buffer" set? What changes functionally
being big_buffer set or not?

> But we don't want to change the  " i40e_aqc_add_remove_cloud_filters_element_data " as it will cause ABI/API change in kernel driver.
> 
<...>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
  2017-03-09 10:01  0%       ` Ferruh Yigit
@ 2017-03-09 10:43  0%         ` Xing, Beilei
  0 siblings, 0 replies; 200+ results
From: Xing, Beilei @ 2017-03-09 10:43 UTC (permalink / raw)
  To: Yigit, Ferruh, Wu, Jingjing
  Cc: Zhang, Helin, dev, Iremonger, Bernard, Stroe, Laura



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Thursday, March 9, 2017 6:02 PM
> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Iremonger,
> Bernard <bernard.iremonger@intel.com>; Stroe, Laura
> <laura.stroe@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type
> 
> On 3/9/2017 5:59 AM, Xing, Beilei wrote:
> >
> >
> >> -----Original Message-----
> >> From: Yigit, Ferruh
> >> Sent: Wednesday, March 8, 2017 11:50 PM
> >> To: Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> >> <jingjing.wu@intel.com>
> >> Cc: Zhang, Helin <helin.zhang@intel.com>; dev@dpdk.org; Iremonger,
> >> Bernard <bernard.iremonger@intel.com>; Stroe, Laura
> >> <laura.stroe@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter
> >> type
> >>
> >> On 3/3/2017 9:31 AM, Beilei Xing wrote:
> >>> Add new admin queue function and extended fields in DCR 288:
> >>>  - Add admin queue function for Replace filter
> >>>    command (Opcode: 0x025F)
> >>>  - Add General fields for Add/Remove Cloud filters
> >>>    command
> >>>
> >>> This patch will be removed to base driver in future.
> >>>
> >>> Signed-off-by: Bernard Iremonger <bernard.iremonger@intel.com>
> >>> Signed-off-by: Stroe Laura <laura.stroe@intel.com>
> >>> Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
> >>> Signed-off-by: Beilei Xing <beilei.xing@intel.com>
> >>> ---
> >>>  drivers/net/i40e/i40e_ethdev.h | 106
> ++++++++++++++++++++++++++++
> >>>  drivers/net/i40e/i40e_flow.c   | 152
> >> +++++++++++++++++++++++++++++++++++++++++
> >>>  2 files changed, 258 insertions(+)
> >>>
> >>> diff --git a/drivers/net/i40e/i40e_ethdev.h
> >>> b/drivers/net/i40e/i40e_ethdev.h index f545850..3a49865 100644
> >>> --- a/drivers/net/i40e/i40e_ethdev.h
> >>> +++ b/drivers/net/i40e/i40e_ethdev.h
> >>> @@ -729,6 +729,100 @@ struct i40e_valid_pattern {
> >>>  	parse_filter_t parse_filter;
> >>>  };
> >>>
> >>> +/* Support replace filter */
> >>> +
> >>> +/* i40e_aqc_add_remove_cloud_filters_element_big_data is used
> when
> >>> + * I40E_AQC_ADD_REM_CLOUD_CMD_BIG_BUFFER flag is set. refer to
> >>> + * DCR288
> >>
> >> Please do not refer to DCR, unless you can provide a public link for it.
> > OK, got it.
> >
> >>
> >>> + */
> >>> +struct i40e_aqc_add_remove_cloud_filters_element_big_data {
> >>> +	struct i40e_aqc_add_remove_cloud_filters_element_data element;
> >>
> >> What is the difference between
> >> "i40e_aqc_add_remove_cloud_filters_element_big_data" and
> >> "i40e_aqc_add_remove_cloud_filters_element_data", why need
> big_data
> >> one?
> >
> > As ' Add/Remove Cloud filters -command buffer ' is changed in the DCR288,
> 'general fields' exists only when big_buffer is set.
> 
> What does it mean having "big_buffer" set? What changes functionally being
> big_buffer set or not?

According to DCR288, "Add/Remove Cloud Filter Command" should add 'Big Buffer' in byte20, but we can't change ' struct i40e_aqc_add_remove_cloud_filters ' in base code,
struct i40e_aqc_add_remove_cloud_filters {
        u8      num_filters;
        u8      reserved;
        __le16  seid;
#define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT   0
#define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_MASK    (0x3FF << \
                                        I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT)
        u8      reserved2[4];
        __le32  addr_high;
        __le32  addr_low;
};

So we use reserverd[0] for 'Big Buffer' here, in the patch for ND, we changed above structure with following:

struct i40e_aqc_add_remove_cloud_filters {
        u8      num_filters;
        u8      reserved;
        __le16  seid;
#define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT   0
#define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_MASK    (0x3FF << \
                                        I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT)
        u8      big_buffer;
        u8      reserved2[3];
        __le32  addr_high;
        __le32  addr_low;
};


> 
> > But we don't want to change the  "
> i40e_aqc_add_remove_cloud_filters_element_data " as it will cause ABI/API
> change in kernel driver.
> >
> <...>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v11 1/7] lib: add information metrics library
  @ 2017-03-09 16:25  1% ` Remy Horton
  2017-03-09 16:25  2% ` [dpdk-dev] [PATCH v11 3/7] lib: add bitrate statistics library Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-03-09 16:25 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a new information metrics library. This Metrics
library implements a mechanism by which producers can publish
numeric information for later querying by consumers. Metrics
themselves are statistics that are not generated by PMDs, and
hence are not reported via ethdev extended statistics.

Metric information is populated using a push model, where
producers update the values contained within the metric
library by calling an update function on the relevant metrics.
Consumers receive metric information by querying the central
metric data, which is held in shared memory.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                |   4 +
 config/common_base                         |   5 +
 doc/api/doxy-api-index.md                  |   1 +
 doc/api/doxy-api.conf                      |   1 +
 doc/guides/prog_guide/index.rst            |   1 +
 doc/guides/prog_guide/metrics_lib.rst      | 180 +++++++++++++++++
 doc/guides/rel_notes/release_17_02.rst     |   1 +
 doc/guides/rel_notes/release_17_05.rst     |   8 +
 lib/Makefile                               |   1 +
 lib/librte_metrics/Makefile                |  51 +++++
 lib/librte_metrics/rte_metrics.c           | 299 +++++++++++++++++++++++++++++
 lib/librte_metrics/rte_metrics.h           | 240 +++++++++++++++++++++++
 lib/librte_metrics/rte_metrics_version.map |  13 ++
 mk/rte.app.mk                              |   2 +
 14 files changed, 807 insertions(+)
 create mode 100644 doc/guides/prog_guide/metrics_lib.rst
 create mode 100644 lib/librte_metrics/Makefile
 create mode 100644 lib/librte_metrics/rte_metrics.c
 create mode 100644 lib/librte_metrics/rte_metrics.h
 create mode 100644 lib/librte_metrics/rte_metrics_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 5030c1c..66478f3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -635,6 +635,10 @@ F: lib/librte_jobstats/
 F: examples/l2fwd-jobstats/
 F: doc/guides/sample_app_ug/l2_forward_job_stats.rst
 
+Metrics
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_metrics/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index aeee13e..cea055f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -501,6 +501,11 @@ CONFIG_RTE_LIBRTE_EFD=y
 CONFIG_RTE_LIBRTE_JOBSTATS=y
 
 #
+# Compile the device metrics library
+#
+CONFIG_RTE_LIBRTE_METRICS=y
+
+#
 # Compile librte_lpm
 #
 CONFIG_RTE_LIBRTE_LPM=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index eb39f69..26a26b7 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -156,4 +156,5 @@ There are many libraries, so their headers may be grouped by topics:
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
+  [device metrics]     (@ref rte_metrics.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fdcf13c..fbbcf8e 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -52,6 +52,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_mbuf \
                           lib/librte_mempool \
                           lib/librte_meter \
+                          lib/librte_metrics \
                           lib/librte_net \
                           lib/librte_pdump \
                           lib/librte_pipeline \
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 77f427e..2a69844 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -62,6 +62,7 @@ Programmer's Guide
     packet_classif_access_ctrl
     packet_framework
     vhost_lib
+    metrics_lib
     port_hotplug_framework
     source_org
     dev_kit_build_system
diff --git a/doc/guides/prog_guide/metrics_lib.rst b/doc/guides/prog_guide/metrics_lib.rst
new file mode 100644
index 0000000..87f806d
--- /dev/null
+++ b/doc/guides/prog_guide/metrics_lib.rst
@@ -0,0 +1,180 @@
+..  BSD LICENSE
+    Copyright(c) 2017 Intel Corporation. All rights reserved.
+    All rights reserved.
+
+    Redistribution and use in source and binary forms, with or without
+    modification, are permitted provided that the following conditions
+    are met:
+
+    * Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+    * Neither the name of Intel Corporation nor the names of its
+    contributors may be used to endorse or promote products derived
+    from this software without specific prior written permission.
+
+    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+.. _Metrics_Library:
+
+Metrics Library
+===============
+
+The Metrics library implements a mechanism by which *producers* can
+publish numeric information for later querying by *consumers*. In
+practice producers will typically be other libraries or primary
+processes, whereas consumers will typically be applications.
+
+Metrics themselves are statistics that are not generated by PMDs. Metric
+information is populated using a push model, where producers update the
+values contained within the metric library by calling an update function
+on the relevant metrics. Consumers receive metric information by querying
+the central metric data, which is held in shared memory.
+
+For each metric, a separate value is maintained for each port id, and
+when publishing metric values the producers need to specify which port is
+being updated. In addition there is a special id ``RTE_METRICS_GLOBAL``
+that is intended for global statistics that are not associated with any
+individual device. Since the metrics library is self-contained, the only
+restriction on port numbers is that they are less than ``RTE_MAX_ETHPORTS``
+- there is no requirement for the ports to actually exist.
+
+Initialising the library
+------------------------
+
+Before the library can be used, it has to be initialized by calling
+``rte_metrics_init()`` which sets up the metric store in shared memory.
+This is where producers will publish metric information to, and where
+consumers will query it from.
+
+.. code-block:: c
+
+    rte_metrics_init(rte_socket_id());
+
+This function **must** be called from a primary process, but otherwise
+producers and consumers can be in either primary or secondary processes.
+
+Registering metrics
+-------------------
+
+Metrics must first be *registered*, which is the way producers declare
+the names of the metrics they will be publishing. Registration can either
+be done individually, or a set of metrics can be registered as a group.
+Individual registration is done using ``rte_metrics_reg_name()``:
+
+.. code-block:: c
+
+    id_1 = rte_metrics_reg_name("mean_bits_in");
+    id_2 = rte_metrics_reg_name("mean_bits_out");
+    id_3 = rte_metrics_reg_name("peak_bits_in");
+    id_4 = rte_metrics_reg_name("peak_bits_out");
+
+or alternatively, a set of metrics can be registered together using
+``rte_metrics_reg_names()``:
+
+.. code-block:: c
+
+    const char * const names[] = {
+        "mean_bits_in", "mean_bits_out",
+        "peak_bits_in", "peak_bits_out",
+    };
+    id_set = rte_metrics_reg_names(&names[0], 4);
+
+If the return value is negative, it means registration failed. Otherwise
+the return value is the *key* for the metric, which is used when updating
+values. A table mapping together these key values and the metrics' names
+can be obtained using ``rte_metrics_get_names()``.
+
+Updating metric values
+----------------------
+
+Once registered, producers can update the metric for a given port using
+the ``rte_metrics_update_value()`` function. This uses the metric key
+that is returned when registering the metric, and can also be looked up
+using ``rte_metrics_get_names()``.
+
+.. code-block:: c
+
+    rte_metrics_update_value(port_id, id_1, values[0]);
+    rte_metrics_update_value(port_id, id_2, values[1]);
+    rte_metrics_update_value(port_id, id_3, values[2]);
+    rte_metrics_update_value(port_id, id_4, values[3]);
+
+if metrics were registered as a single set, they can either be updated
+individually using ``rte_metrics_update_value()``, or updated together
+using the ``rte_metrics_update_values()`` function:
+
+.. code-block:: c
+
+    rte_metrics_update_value(port_id, id_set, values[0]);
+    rte_metrics_update_value(port_id, id_set + 1, values[1]);
+    rte_metrics_update_value(port_id, id_set + 2, values[2]);
+    rte_metrics_update_value(port_id, id_set + 3, values[3]);
+
+    rte_metrics_update_values(port_id, id_set, values, 4);
+
+Note that ``rte_metrics_update_values()`` cannot be used to update
+metric values from *multiple* *sets*, as there is no guarantee two
+sets registered one after the other have contiguous id values.
+
+Querying metrics
+----------------
+
+Consumers can obtain metric values by querying the metrics library using
+the ``rte_metrics_get_values()`` function that return an array of
+``struct rte_metric_value``. Each entry within this array contains a metric
+value and its associated key. A key-name mapping can be obtained using the
+``rte_metrics_get_names()`` function that returns an array of
+``struct rte_metric_name`` that is indexed by the key. The following will
+print out all metrics for a given port:
+
+.. code-block:: c
+
+    void print_metrics() {
+        struct rte_metric_name *names;
+        int len;
+
+        len = rte_metrics_get_names(NULL, 0);
+        if (len < 0) {
+            printf("Cannot get metrics count\n");
+            return;
+        }
+        if (len == 0) {
+            printf("No metrics to display (none have been registered)\n");
+            return;
+        }
+        metrics = malloc(sizeof(struct rte_metric_value) * len);
+        names =  malloc(sizeof(struct rte_metric_name) * len);
+        if (metrics == NULL || names == NULL) {
+            printf("Cannot allocate memory\n");
+            free(metrics);
+            free(names);
+            return;
+        }
+        ret = rte_metrics_get_values(port_id, metrics, len);
+        if (ret < 0 || ret > len) {
+            printf("Cannot get metrics values\n");
+            free(metrics);
+            free(names);
+            return;
+        }
+        printf("Metrics for port %i:\n", port_id);
+        for (i = 0; i < len; i++)
+            printf("  %s: %"PRIu64"\n",
+                names[metrics[i].key].name, metrics[i].value);
+        free(metrics);
+        free(names);
+    }
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 357965a..8bd706f 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -368,6 +368,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_mbuf.so.2
      librte_mempool.so.2
      librte_meter.so.1
+   + librte_metrics.so.1
      librte_net.so.1
      librte_pdump.so.1
      librte_pipeline.so.3
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index e25ea9f..3ed809e 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -61,6 +61,14 @@ Resolved Issues
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Added information metric library.**
+
+  A library that allows information metrics to be added and updated
+  by producers, typically other libraries, for later retrieval by
+  consumers such as applications. It is intended to provide a
+  reporting mechanism that is independent of other libraries such
+  as ethdev.
+
 
 EAL
 ~~~
diff --git a/lib/Makefile b/lib/Makefile
index 4178325..29f6a81 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += librte_jobstats
+DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
diff --git a/lib/librte_metrics/Makefile b/lib/librte_metrics/Makefile
new file mode 100644
index 0000000..8d6e23a
--- /dev/null
+++ b/lib/librte_metrics/Makefile
@@ -0,0 +1,51 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_metrics.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_metrics_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_METRICS) := rte_metrics.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_METRICS)-include += rte_metrics.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_METRICS) += lib/librte_eal
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_metrics/rte_metrics.c b/lib/librte_metrics/rte_metrics.c
new file mode 100644
index 0000000..aa9ec50
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.c
@@ -0,0 +1,299 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <string.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_lcore.h>
+#include <rte_memzone.h>
+#include <rte_spinlock.h>
+
+#define RTE_METRICS_MAX_METRICS 256
+#define RTE_METRICS_MEMZONE_NAME "RTE_METRICS"
+
+/**
+ * Internal stats metadata and value entry.
+ *
+ * @internal
+ */
+struct rte_metrics_meta_s {
+	/** Name of metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+	/** Current value for metric */
+	uint64_t value[RTE_MAX_ETHPORTS];
+	/** Used for global metrics */
+	uint64_t global_value;
+	/** Index of next root element (zero for none) */
+	uint16_t idx_next_set;
+	/** Index of next metric in set (zero for none) */
+	uint16_t idx_next_stat;
+};
+
+/**
+ * Internal stats info structure.
+ *
+ * @internal
+ * Offsets into metadata are used instead of pointers because ASLR
+ * means that having the same physical addresses in different
+ * processes is not guaranteed.
+ */
+struct rte_metrics_data_s {
+	/**   Index of last metadata entry with valid data.
+	 * This value is not valid if cnt_stats is zero.
+	 */
+	uint16_t idx_last_set;
+	/**   Number of metrics. */
+	uint16_t cnt_stats;
+	/** Metric data memory block. */
+	struct rte_metrics_meta_s metadata[RTE_METRICS_MAX_METRICS];
+	/** Metric data access lock */
+	rte_spinlock_t lock;
+};
+
+void
+rte_metrics_init(int socket_id)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone != NULL)
+		return;
+	memzone = rte_memzone_reserve(RTE_METRICS_MEMZONE_NAME,
+		sizeof(struct rte_metrics_data_s), socket_id, 0);
+	if (memzone == NULL)
+		rte_exit(EXIT_FAILURE, "Unable to allocate stats memzone\n");
+	stats = memzone->addr;
+	memset(stats, 0, sizeof(struct rte_metrics_data_s));
+	rte_spinlock_init(&stats->lock);
+}
+
+int
+rte_metrics_reg_name(const char *name)
+{
+	const char * const list_names[] = {name};
+
+	return rte_metrics_reg_names(list_names, 1);
+}
+
+int
+rte_metrics_reg_names(const char * const *names, uint16_t cnt_names)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	uint16_t idx_base;
+
+	/* Some sanity checks */
+	if (cnt_names < 1 || names == NULL)
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	if (stats->cnt_stats + cnt_names >= RTE_METRICS_MAX_METRICS)
+		return -ENOMEM;
+
+	rte_spinlock_lock(&stats->lock);
+
+	/* Overwritten later if this is actually first set.. */
+	stats->metadata[stats->idx_last_set].idx_next_set = stats->cnt_stats;
+
+	stats->idx_last_set = idx_base = stats->cnt_stats;
+
+	for (idx_name = 0; idx_name < cnt_names; idx_name++) {
+		entry = &stats->metadata[idx_name + stats->cnt_stats];
+		strncpy(entry->name, names[idx_name],
+			RTE_METRICS_MAX_NAME_LEN);
+		memset(entry->value, 0, sizeof(entry->value));
+		entry->idx_next_stat = idx_name + stats->cnt_stats + 1;
+	}
+	entry->idx_next_stat = 0;
+	entry->idx_next_set = 0;
+	stats->cnt_stats += cnt_names;
+
+	rte_spinlock_unlock(&stats->lock);
+
+	return idx_base;
+}
+
+int
+rte_metrics_update_value(int port_id, uint16_t key, const uint64_t value)
+{
+	return rte_metrics_update_values(port_id, key, &value, 1);
+}
+
+int
+rte_metrics_update_values(int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_metric;
+	uint16_t idx_value;
+	uint16_t cnt_setsize;
+
+	if (port_id != RTE_METRICS_GLOBAL &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	if (memzone == NULL)
+		return -EIO;
+	stats = memzone->addr;
+
+	rte_spinlock_lock(&stats->lock);
+	idx_metric = key;
+	cnt_setsize = 1;
+	while (idx_metric < stats->cnt_stats) {
+		entry = &stats->metadata[idx_metric];
+		if (entry->idx_next_stat == 0)
+			break;
+		cnt_setsize++;
+		idx_metric++;
+	}
+	/* Check update does not cross set border */
+	if (count > cnt_setsize) {
+		rte_spinlock_unlock(&stats->lock);
+		return -ERANGE;
+	}
+
+	if (port_id == RTE_METRICS_GLOBAL)
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].global_value =
+				values[idx_value];
+		}
+	else
+		for (idx_value = 0; idx_value < count; idx_value++) {
+			idx_metric = key + idx_value;
+			stats->metadata[idx_metric].value[port_id] =
+				values[idx_value];
+		}
+	rte_spinlock_unlock(&stats->lock);
+	return 0;
+}
+
+int
+rte_metrics_get_names(struct rte_metric_name *names,
+	uint16_t capacity)
+{
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+	if (names != NULL) {
+		if (capacity < stats->cnt_stats) {
+			return_value = stats->cnt_stats;
+			rte_spinlock_unlock(&stats->lock);
+			return return_value;
+		}
+		for (idx_name = 0; idx_name < stats->cnt_stats; idx_name++)
+			strncpy(names[idx_name].name,
+				stats->metadata[idx_name].name,
+				RTE_METRICS_MAX_NAME_LEN);
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
+
+int
+rte_metrics_get_values(int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity)
+{
+	struct rte_metrics_meta_s *entry;
+	struct rte_metrics_data_s *stats;
+	const struct rte_memzone *memzone;
+	uint16_t idx_name;
+	int return_value;
+
+	if (port_id != RTE_METRICS_GLOBAL &&
+			(port_id < 0 || port_id > RTE_MAX_ETHPORTS))
+		return -EINVAL;
+
+	memzone = rte_memzone_lookup(RTE_METRICS_MEMZONE_NAME);
+	/* If not allocated, fail silently */
+	if (memzone == NULL)
+		return 0;
+	stats = memzone->addr;
+	rte_spinlock_lock(&stats->lock);
+
+	if (values != NULL) {
+		if (capacity < stats->cnt_stats) {
+			return_value = stats->cnt_stats;
+			rte_spinlock_unlock(&stats->lock);
+			return return_value;
+		}
+		if (port_id == RTE_METRICS_GLOBAL)
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->global_value;
+			}
+		else
+			for (idx_name = 0;
+					idx_name < stats->cnt_stats;
+					idx_name++) {
+				entry = &stats->metadata[idx_name];
+				values[idx_name].key = idx_name;
+				values[idx_name].value = entry->value[port_id];
+			}
+	}
+	return_value = stats->cnt_stats;
+	rte_spinlock_unlock(&stats->lock);
+	return return_value;
+}
diff --git a/lib/librte_metrics/rte_metrics.h b/lib/librte_metrics/rte_metrics.h
new file mode 100644
index 0000000..7458328
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics.h
@@ -0,0 +1,240 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/**
+ * @file
+ *
+ * DPDK Metrics module
+ *
+ * Metrics are statistics that are not generated by PMDs, and hence
+ * are better reported through a mechanism that is independent from
+ * the ethdev-based extended statistics. Providers will typically
+ * be other libraries and consumers will typically be applications.
+ *
+ * Metric information is populated using a push model, where producers
+ * update the values contained within the metric library by calling
+ * an update function on the relevant metrics. Consumers receive
+ * metric information by querying the central metric data, which is
+ * held in shared memory. Currently only bulk querying of metrics
+ * by consumers is supported.
+ */
+
+#ifndef _RTE_METRICS_H_
+#define _RTE_METRICS_H_
+
+/** Maximum length of metric name (including null-terminator) */
+#define RTE_METRICS_MAX_NAME_LEN 64
+
+/**
+ * Global metric special id.
+ *
+ * When used for the port_id parameter when calling
+ * rte_metrics_update_metric() or rte_metrics_update_metric(),
+ * the global metric, which are not associated with any specific
+ * port (i.e. device), are updated.
+ */
+#define RTE_METRICS_GLOBAL -1
+
+
+/**
+ * A name-key lookup for metrics.
+ *
+ * An array of this structure is returned by rte_metrics_get_names().
+ * The struct rte_metric_value references these names via their array index.
+ */
+struct rte_metric_name {
+	/** String describing metric */
+	char name[RTE_METRICS_MAX_NAME_LEN];
+};
+
+
+/**
+ * Metric value structure.
+ *
+ * This structure is used by rte_metrics_get_values() to return metrics,
+ * which are statistics that are not generated by PMDs. It maps a name key,
+ * which corresponds to an index in the array returned by
+ * rte_metrics_get_names().
+ */
+struct rte_metric_value {
+	/** Numeric identifier of metric. */
+	uint16_t key;
+	/** Value for metric */
+	uint64_t value;
+};
+
+
+/**
+ * Initializes metric module. This function must be called from
+ * a primary process before metrics are used.
+ *
+ * @param socket_id
+ *   Socket to use for shared memory allocation.
+ */
+void rte_metrics_init(int socket_id);
+
+/**
+ * Register a metric, making it available as a reporting parameter.
+ *
+ * Registering a metric is the way producers declare a parameter
+ * that they wish to be reported. Once registered, the associated
+ * numeric key can be obtained via rte_metrics_get_names(), which
+ * is required for updating said metric's value.
+ *
+ * @param name
+ *   Metric name
+ *
+ * @return
+ *  - Zero or positive: Success (index key of new metric)
+ *  - -EIO: Error, unable to access metrics shared memory
+ *    (rte_metrics_init() not called)
+ *  - -EINVAL: Error, invalid parameters
+ *  - -ENOMEM: Error, maximum metrics reached
+ */
+int rte_metrics_reg_name(const char *name);
+
+/**
+ * Register a set of metrics.
+ *
+ * This is a bulk version of rte_metrics_reg_metrics() and aside from
+ * handling multiple keys at once is functionally identical.
+ *
+ * @param names
+ *   List of metric names
+ *
+ * @param cnt_names
+ *   Number of metrics in set
+ *
+ * @return
+ *  - Zero or positive: Success (index key of start of set)
+ *  - -EIO: Error, unable to access metrics shared memory
+ *    (rte_metrics_init() not called)
+ *  - -EINVAL: Error, invalid parameters
+ *  - -ENOMEM: Error, maximum metrics reached
+ */
+int rte_metrics_reg_names(const char * const *names, uint16_t cnt_names);
+
+/**
+ * Get metric name-key lookup table.
+ *
+ * @param names
+ *   A struct rte_metric_name array of at least *capacity* in size to
+ *   receive key names. If this is NULL, function returns the required
+ *   number of elements for this array.
+ *
+ * @param capacity
+ *   Size (number of elements) of struct rte_metric_name array.
+ *   Disregarded if names is NULL.
+ *
+ * @return
+ *   - Positive value above capacity: error, *names* is too small.
+ *     Return value is required size.
+ *   - Positive value equal or less than capacity: Success. Return
+ *     value is number of elements filled in.
+ *   - Negative value: error.
+ */
+int rte_metrics_get_names(
+	struct rte_metric_name *names,
+	uint16_t capacity);
+
+/**
+ * Get metric value table.
+ *
+ * @param port_id
+ *   Port id to query
+ *
+ * @param values
+ *   A struct rte_metric_value array of at least *capacity* in size to
+ *   receive metric ids and values. If this is NULL, function returns
+ *   the required number of elements for this array.
+ *
+ * @param capacity
+ *   Size (number of elements) of struct rte_metric_value array.
+ *   Disregarded if names is NULL.
+ *
+ * @return
+ *   - Positive value above capacity: error, *values* is too small.
+ *     Return value is required size.
+ *   - Positive value equal or less than capacity: Success. Return
+ *     value is number of elements filled in.
+ *   - Negative value: error.
+ */
+int rte_metrics_get_values(
+	int port_id,
+	struct rte_metric_value *values,
+	uint16_t capacity);
+
+/**
+ * Updates a metric
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Id of metric to update
+ * @param value
+ *   New value
+ *
+ * @return
+ *   - -EIO if unable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_value(
+	int port_id,
+	uint16_t key,
+	const uint64_t value);
+
+/**
+ * Updates a metric set. Note that it is an error to try to
+ * update across a set boundary.
+ *
+ * @param port_id
+ *   Port to update metrics for
+ * @param key
+ *   Base id of metrics set to update
+ * @param values
+ *   Set of new values
+ * @param count
+ *   Number of new values
+ *
+ * @return
+ *   - -ERANGE if count exceeds metric set size
+ *   - -EIO if upable to access shared metrics memory
+ *   - Zero on success
+ */
+int rte_metrics_update_values(
+	int port_id,
+	uint16_t key,
+	const uint64_t *values,
+	uint32_t count);
+
+#endif
diff --git a/lib/librte_metrics/rte_metrics_version.map b/lib/librte_metrics/rte_metrics_version.map
new file mode 100644
index 0000000..4c5234c
--- /dev/null
+++ b/lib/librte_metrics/rte_metrics_version.map
@@ -0,0 +1,13 @@
+DPDK_17.05 {
+	global:
+
+	rte_metrics_get_names;
+	rte_metrics_get_values;
+	rte_metrics_init;
+	rte_metrics_reg_name;
+	rte_metrics_reg_names;
+	rte_metrics_update_value;
+	rte_metrics_update_values;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index d46a33e..98eb052 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -99,6 +99,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+
 
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
 # plugins (link only if static libraries)
-- 
2.5.5

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v11 3/7] lib: add bitrate statistics library
    2017-03-09 16:25  1% ` [dpdk-dev] [PATCH v11 1/7] lib: add information metrics library Remy Horton
@ 2017-03-09 16:25  2% ` Remy Horton
  1 sibling, 0 replies; 200+ results
From: Remy Horton @ 2017-03-09 16:25 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

This patch adds a library that calculates peak and average data-rate
statistics. For ethernet devices. These statistics are reported using
the metrics library.

Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/prog_guide/metrics_lib.rst              |  65 ++++++++++
 doc/guides/rel_notes/release_17_02.rst             |   1 +
 doc/guides/rel_notes/release_17_05.rst             |   5 +
 lib/Makefile                                       |   1 +
 lib/librte_bitratestats/Makefile                   |  53 ++++++++
 lib/librte_bitratestats/rte_bitrate.c              | 141 +++++++++++++++++++++
 lib/librte_bitratestats/rte_bitrate.h              |  80 ++++++++++++
 .../rte_bitratestats_version.map                   |   9 ++
 mk/rte.app.mk                                      |   1 +
 13 files changed, 367 insertions(+)
 create mode 100644 lib/librte_bitratestats/Makefile
 create mode 100644 lib/librte_bitratestats/rte_bitrate.c
 create mode 100644 lib/librte_bitratestats/rte_bitrate.h
 create mode 100644 lib/librte_bitratestats/rte_bitratestats_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 66478f3..8abf4fd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -639,6 +639,10 @@ Metrics
 M: Remy Horton <remy.horton@intel.com>
 F: lib/librte_metrics/
 
+Bit-rate statistica
+M: Remy Horton <remy.horton@intel.com>
+F: lib/librte_bitratestats/
+
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index cea055f..d700ee0 100644
--- a/config/common_base
+++ b/config/common_base
@@ -630,3 +630,8 @@ CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS=n
 # Compile the crypto performance application
 #
 CONFIG_RTE_APP_CRYPTO_PERF=y
+
+#
+# Compile the bitrate statistics library
+#
+CONFIG_RTE_LIBRTE_BITRATE=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 26a26b7..8492bce 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -157,4 +157,5 @@ There are many libraries, so their headers may be grouped by topics:
   [ABI compat]         (@ref rte_compat.h),
   [keepalive]          (@ref rte_keepalive.h),
   [device metrics]     (@ref rte_metrics.h),
+  [bitrate statistics] (@ref rte_bitrate.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index fbbcf8e..c4b3b68 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -36,6 +36,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_eal/common/include \
                           lib/librte_eal/common/include/generic \
                           lib/librte_acl \
+                          lib/librte_bitratestats \
                           lib/librte_cfgfile \
                           lib/librte_cmdline \
                           lib/librte_compat \
diff --git a/doc/guides/prog_guide/metrics_lib.rst b/doc/guides/prog_guide/metrics_lib.rst
index 87f806d..1c2a28f 100644
--- a/doc/guides/prog_guide/metrics_lib.rst
+++ b/doc/guides/prog_guide/metrics_lib.rst
@@ -178,3 +178,68 @@ print out all metrics for a given port:
         free(metrics);
         free(names);
     }
+
+
+Bit-rate statistics library
+---------------------------
+
+The bit-rate library calculates the exponentially-weighted moving
+average and peak bit-rates for each active port (i.e. network device).
+These statistics are reported via the metrics library using the
+following names:
+
+    - ``mean_bits_in``: Average inbound bit-rate
+    - ``mean_bits_out``:  Average outbound bit-rate
+    - ``ewma_bits_in``: Average inbound bit-rate (EWMA smoothed)
+    - ``ewma_bits_out``:  Average outbound bit-rate (EWMA smoothed)
+    - ``peak_bits_in``:  Peak inbound bit-rate
+    - ``peak_bits_out``:  Peak outbound bit-rate
+
+Once initialised and clocked at the appropriate frequency, these
+statistics can be obtained by querying the metrics library.
+
+Initialization
+~~~~~~~~~~~~~~
+
+Before it is used the bit-rate statistics library has to be initialised
+by calling ``rte_stats_bitrate_create()``, which will return a bit-rate
+calculation object. Since the bit-rate library uses the metrics library
+to report the calculated statistics, the bit-rate library then needs to
+register the calculated statistics with the metrics library. This is
+done using the helper function ``rte_stats_bitrate_reg()``.
+
+.. code-block:: c
+
+    struct rte_stats_bitrates *bitrate_data;
+
+    bitrate_data = rte_stats_bitrate_create();
+    if (bitrate_data == NULL)
+        rte_exit(EXIT_FAILURE, "Could not allocate bit-rate data.\n");
+    rte_stats_bitrate_reg(bitrate_data);
+
+Controlling the sampling rate
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since the library works by periodic sampling but does not use an
+internal thread, the application has to periodically call
+``rte_stats_bitrate_calc()``. The frequency at which this function
+is called should be the intended sampling rate required for the
+calculated statistics. For instance if per-second statistics are
+desired, this function should be called once a second.
+
+.. code-block:: c
+
+    tics_datum = rte_rdtsc();
+    tics_per_1sec = rte_get_timer_hz();
+
+    while( 1 ) {
+        /* ... */
+        tics_current = rte_rdtsc();
+	if (tics_current - tics_datum >= tics_per_1sec) {
+	    /* Periodic bitrate calculation */
+	    for (idx_port = 0; idx_port < cnt_ports; idx_port++)
+	            rte_stats_bitrate_calc(bitrate_data, idx_port);
+		tics_datum = tics_current;
+	    }
+        /* ... */
+    }
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 8bd706f..63786df 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -353,6 +353,7 @@ The libraries prepended with a plus sign were incremented in this version.
 .. code-block:: diff
 
      librte_acl.so.2
+   + librte_bitratestats.so.1
      librte_cfgfile.so.2
      librte_cmdline.so.2
      librte_cryptodev.so.2
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 3ed809e..83c83b2 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -69,6 +69,11 @@ Resolved Issues
   reporting mechanism that is independent of other libraries such
   as ethdev.
 
+* **Added bit-rate calculation library.**
+
+  A library that can be used to calculate device bit-rates. Calculated
+  bitrates are reported using the metrics library.
+
 
 EAL
 ~~~
diff --git a/lib/Makefile b/lib/Makefile
index 29f6a81..ecc54c0 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -50,6 +50,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_IP_FRAG) += librte_ip_frag
 DIRS-$(CONFIG_RTE_LIBRTE_JOBSTATS) += librte_jobstats
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
+DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
 DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
diff --git a/lib/librte_bitratestats/Makefile b/lib/librte_bitratestats/Makefile
new file mode 100644
index 0000000..743b62c
--- /dev/null
+++ b/lib/librte_bitratestats/Makefile
@@ -0,0 +1,53 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bitratestats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_bitratestats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BITRATE) := rte_bitrate.c
+
+# Install header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_BITRATE)-include += rte_bitrate.h
+
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bitratestats/rte_bitrate.c b/lib/librte_bitratestats/rte_bitrate.c
new file mode 100644
index 0000000..3252598
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.c
@@ -0,0 +1,141 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_common.h>
+#include <rte_ethdev.h>
+#include <rte_malloc.h>
+#include <rte_metrics.h>
+#include <rte_bitrate.h>
+
+/*
+ * Persistent bit-rate data.
+ * @internal
+ */
+struct rte_stats_bitrate {
+	uint64_t last_ibytes;
+	uint64_t last_obytes;
+	uint64_t peak_ibits;
+	uint64_t peak_obits;
+	uint64_t mean_ibits;
+	uint64_t mean_obits;
+	uint64_t ewma_ibits;
+	uint64_t ewma_obits;
+};
+
+struct rte_stats_bitrates {
+	struct rte_stats_bitrate port_stats[RTE_MAX_ETHPORTS];
+	uint16_t id_stats_set;
+};
+
+struct rte_stats_bitrates *
+rte_stats_bitrate_create(void)
+{
+	return rte_zmalloc(NULL, sizeof(struct rte_stats_bitrates),
+		RTE_CACHE_LINE_SIZE);
+}
+
+int
+rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data)
+{
+	const char * const names[] = {
+		"ewma_bits_in", "ewma_bits_out",
+		"mean_bits_in", "mean_bits_out",
+		"peak_bits_in", "peak_bits_out",
+	};
+	int return_value;
+
+	return_value = rte_metrics_reg_names(&names[0], 6);
+	if (return_value >= 0)
+		bitrate_data->id_stats_set = return_value;
+	return return_value;
+}
+
+int
+rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data,
+	uint8_t port_id)
+{
+	struct rte_stats_bitrate *port_data;
+	struct rte_eth_stats eth_stats;
+	int ret_code;
+	uint64_t cnt_bits;
+	int64_t delta;
+	const int64_t alpha_percent = 20;
+	uint64_t values[6];
+
+	ret_code = rte_eth_stats_get(port_id, &eth_stats);
+	if (ret_code != 0)
+		return ret_code;
+
+	port_data = &bitrate_data->port_stats[port_id];
+
+	/* Incoming bitrate. This is an iteratively calculated EWMA
+	 * (Expomentially Weighted Moving Average) that uses a
+	 * weighting factor of alpha_percent. An unsmoothed mean
+	 * for just the current time delta is also calculated for the
+	 * benefit of people who don't understand signal processing.
+	 */
+	cnt_bits = (eth_stats.ibytes - port_data->last_ibytes) << 3;
+	port_data->last_ibytes = eth_stats.ibytes;
+	if (cnt_bits > port_data->peak_ibits)
+		port_data->peak_ibits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_ibits;
+	/* The +-50 fixes integer rounding during divison */
+	if (delta > 0)
+		delta = (delta * alpha_percent + 50) / 100;
+	else
+		delta = (delta * alpha_percent - 50) / 100;
+	port_data->ewma_ibits += delta;
+	port_data->mean_ibits = cnt_bits;
+
+	/* Outgoing bitrate (also EWMA) */
+	cnt_bits = (eth_stats.obytes - port_data->last_obytes) << 3;
+	port_data->last_obytes = eth_stats.obytes;
+	if (cnt_bits > port_data->peak_obits)
+		port_data->peak_obits = cnt_bits;
+	delta = cnt_bits;
+	delta -= port_data->ewma_obits;
+	delta = (delta * alpha_percent + 50) / 100;
+	port_data->ewma_obits += delta;
+	port_data->mean_obits = cnt_bits;
+
+	values[0] = port_data->ewma_ibits;
+	values[1] = port_data->ewma_obits;
+	values[2] = port_data->mean_ibits;
+	values[3] = port_data->mean_obits;
+	values[4] = port_data->peak_ibits;
+	values[5] = port_data->peak_obits;
+	rte_metrics_update_values(port_id, bitrate_data->id_stats_set,
+		values, 6);
+	return 0;
+}
diff --git a/lib/librte_bitratestats/rte_bitrate.h b/lib/librte_bitratestats/rte_bitrate.h
new file mode 100644
index 0000000..564e4f7
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitrate.h
@@ -0,0 +1,80 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+/**
+ *  Bitrate statistics data structure.
+ *  This data structure is intentionally opaque.
+ */
+struct rte_stats_bitrates;
+
+
+/**
+ * Allocate a bitrate statistics structure
+ *
+ * @return
+ *   - Pointer to structure on success
+ *   - NULL on error (zmalloc failure)
+ */
+struct rte_stats_bitrates *rte_stats_bitrate_create(void);
+
+
+/**
+ * Register bitrate statistics with the metric library.
+ *
+ * @param bitrate_data
+ *   Pointer allocated by rte_stats_create()
+ *
+ * @return
+ *   Zero on success
+ *   Negative on error
+ */
+int rte_stats_bitrate_reg(struct rte_stats_bitrates *bitrate_data);
+
+
+/**
+ * Calculate statistics for current time window. The period with which
+ * this function is called should be the intended sampling window width.
+ *
+ * @param bitrate_data
+ *   Bitrate statistics data pointer
+ *
+ * @param port_id
+ *   Port id to calculate statistics for
+ *
+ * @return
+ *  - Zero on success
+ *  - Negative value on error
+ */
+int rte_stats_bitrate_calc(struct rte_stats_bitrates *bitrate_data,
+	uint8_t port_id);
diff --git a/lib/librte_bitratestats/rte_bitratestats_version.map b/lib/librte_bitratestats/rte_bitratestats_version.map
new file mode 100644
index 0000000..fe74544
--- /dev/null
+++ b/lib/librte_bitratestats/rte_bitratestats_version.map
@@ -0,0 +1,9 @@
+DPDK_17.05 {
+	global:
+
+	rte_stats_bitrate_calc;
+	rte_stats_bitrate_create;
+	rte_stats_bitrate_reg;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 98eb052..39c988a 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -100,6 +100,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
 
 
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
-- 
2.5.5

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-06  9:10  2%           ` [dpdk-dev] [PATCH v9 09/18] " David Hunt
@ 2017-03-10 16:22  0%             ` Bruce Richardson
  2017-03-13 10:17  0%               ` Hunt, David
  2017-03-13 10:28  0%               ` Hunt, David
  0 siblings, 2 replies; 200+ results
From: Bruce Richardson @ 2017-03-10 16:22 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |  2 +-
>  lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>  lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++

A file named rte_distributor_v1705.h was added in patch 4, then deleted
in patch 7, and now added again here. Seems a lot of churn.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-10 16:22  0%             ` Bruce Richardson
@ 2017-03-13 10:17  0%               ` Hunt, David
  2017-03-13 10:28  0%               ` Hunt, David
  1 sibling, 0 replies; 200+ results
From: Hunt, David @ 2017-03-13 10:17 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev



-----Original Message-----
From: Richardson, Bruce 
Sent: Friday, 10 March, 2017 4:22 PM
To: Hunt, David <david.hunt@intel.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH v9 09/18] lib: add symbol versioning to distributor

On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |  2 +-
>  lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>  lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++

A file named rte_distributor_v1705.h was added in patch 4, then deleted in patch 7, and now added again here. Seems a lot of churn.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-10 16:22  0%             ` Bruce Richardson
  2017-03-13 10:17  0%               ` Hunt, David
@ 2017-03-13 10:28  0%               ` Hunt, David
  2017-03-13 11:01  0%                 ` Van Haaren, Harry
  1 sibling, 1 reply; 200+ results
From: Hunt, David @ 2017-03-13 10:28 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev


On 10/3/2017 4:22 PM, Bruce Richardson wrote:
> On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
>> Also bumped up the ABI version number in the Makefile
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   lib/librte_distributor/Makefile                    |  2 +-
>>   lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>>   lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
> A file named rte_distributor_v1705.h was added in patch 4, then deleted
> in patch 7, and now added again here. Seems a lot of churn.
>
> /Bruce
>

The first introduction of this file is what will become the public 
header. For successful compilation,
this cannot be called rte_distributor.h until the symbol versioning 
patch, at which stage I will
rename the file, and introduce the symbol versioned header at the same 
time. In the next patch
I'll rename this version of the files as rte_distributor_public.h to 
make this clearer.

Regards,
Dave.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-13 10:28  0%               ` Hunt, David
@ 2017-03-13 11:01  0%                 ` Van Haaren, Harry
  2017-03-13 11:02  0%                   ` Hunt, David
  0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2017-03-13 11:01 UTC (permalink / raw)
  To: Hunt, David, Richardson, Bruce; +Cc: dev

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hunt, David
> Subject: Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
> 
> On 10/3/2017 4:22 PM, Bruce Richardson wrote:
> > On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> >> Also bumped up the ABI version number in the Makefile
> >>
> >> Signed-off-by: David Hunt <david.hunt@intel.com>
> >> ---
> >>   lib/librte_distributor/Makefile                    |  2 +-
> >>   lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
> >>   lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
> > A file named rte_distributor_v1705.h was added in patch 4, then deleted
> > in patch 7, and now added again here. Seems a lot of churn.
> >
> > /Bruce
> >
> 
> The first introduction of this file is what will become the public
> header. For successful compilation,
> this cannot be called rte_distributor.h until the symbol versioning
> patch, at which stage I will
> rename the file, and introduce the symbol versioned header at the same
> time. In the next patch
> I'll rename this version of the files as rte_distributor_public.h to
> make this clearer.


Suggestion to use rte_distributor_next.h instead of public?
Public doesn't indicate if its old or new, while next would make that clearer IMO :)

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-13 11:01  0%                 ` Van Haaren, Harry
@ 2017-03-13 11:02  0%                   ` Hunt, David
  0 siblings, 0 replies; 200+ results
From: Hunt, David @ 2017-03-13 11:02 UTC (permalink / raw)
  To: Van Haaren, Harry, Richardson, Bruce; +Cc: dev


On 13/3/2017 11:01 AM, Van Haaren, Harry wrote:
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hunt, David
>> Subject: Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
>>
>> On 10/3/2017 4:22 PM, Bruce Richardson wrote:
>>> On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
>>>> Also bumped up the ABI version number in the Makefile
>>>>
>>>> Signed-off-by: David Hunt <david.hunt@intel.com>
>>>> ---
>>>>    lib/librte_distributor/Makefile                    |  2 +-
>>>>    lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>>>>    lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
>>> A file named rte_distributor_v1705.h was added in patch 4, then deleted
>>> in patch 7, and now added again here. Seems a lot of churn.
>>>
>>> /Bruce
>>>
>> The first introduction of this file is what will become the public
>> header. For successful compilation,
>> this cannot be called rte_distributor.h until the symbol versioning
>> patch, at which stage I will
>> rename the file, and introduce the symbol versioned header at the same
>> time. In the next patch
>> I'll rename this version of the files as rte_distributor_public.h to
>> make this clearer.
>
> Suggestion to use rte_distributor_next.h instead of public?
> Public doesn't indicate if its old or new, while next would make that clearer IMO :)

Good call, will use "_next". Its clearer.
Thanks,
Dave.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 04/17] net/avp: add PMD version map file
  @ 2017-03-13 19:16  3%       ` Allain Legacy
  2017-03-16 14:52  0%         ` Ferruh Yigit
    1 sibling, 1 reply; 200+ results
From: Allain Legacy @ 2017-03-13 19:16 UTC (permalink / raw)
  To: ferruh.yigit
  Cc: dev, ian.jolliffe, bruce.richardson, john.mcnamara, keith.wiles,
	thomas.monjalon, vincent.jardin, jerin.jacob, stephen, 3chas3

Adds a default ABI version file for the AVP PMD.

Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
Signed-off-by: Matt Peters <matt.peters@windriver.com>
---
 drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
 1 file changed, 4 insertions(+)
 create mode 100644 drivers/net/avp/rte_pmd_avp_version.map

diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
new file mode 100644
index 0000000..af8f3f4
--- /dev/null
+++ b/drivers/net/avp/rte_pmd_avp_version.map
@@ -0,0 +1,4 @@
+DPDK_17.05 {
+
+    local: *;
+};
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3] lpm: extend IPv6 next hop field
  @ 2017-03-14 17:17  4% ` Vladyslav Buslov
  0 siblings, 0 replies; 200+ results
From: Vladyslav Buslov @ 2017-03-14 17:17 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: bruce.richardson, dev

This patch extend next_hop field from 8-bits to 21-bits in LPM library
for IPv6.

Added versioning symbols to functions and updated
library and applications that have a dependency on LPM library.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---

Fixed compilation error in l3fwd_lpm.h

 doc/guides/prog_guide/lpm6_lib.rst              |   2 +-
 doc/guides/rel_notes/release_17_05.rst          |   5 +
 examples/ip_fragmentation/main.c                |  17 +--
 examples/ip_reassembly/main.c                   |  17 +--
 examples/ipsec-secgw/ipsec-secgw.c              |   2 +-
 examples/l3fwd/l3fwd_lpm.h                      |   2 +-
 examples/l3fwd/l3fwd_lpm_sse.h                  |  24 ++---
 examples/performance-thread/l3fwd-thread/main.c |  11 +-
 lib/librte_lpm/rte_lpm6.c                       | 134 +++++++++++++++++++++---
 lib/librte_lpm/rte_lpm6.h                       |  32 +++++-
 lib/librte_lpm/rte_lpm_version.map              |  10 ++
 lib/librte_table/rte_table_lpm_ipv6.c           |   9 +-
 test/test/test_lpm6.c                           | 115 ++++++++++++++------
 test/test/test_lpm6_perf.c                      |   4 +-
 14 files changed, 293 insertions(+), 91 deletions(-)

diff --git a/doc/guides/prog_guide/lpm6_lib.rst b/doc/guides/prog_guide/lpm6_lib.rst
index 0aea5c5..f791507 100644
--- a/doc/guides/prog_guide/lpm6_lib.rst
+++ b/doc/guides/prog_guide/lpm6_lib.rst
@@ -53,7 +53,7 @@ several thousand IPv6 rules, but the number can vary depending on the case.
 An LPM prefix is represented by a pair of parameters (128-bit key, depth), with depth in the range of 1 to 128.
 An LPM rule is represented by an LPM prefix and some user data associated with the prefix.
 The prefix serves as the unique identifier for the LPM rule.
-In this implementation, the user data is 1-byte long and is called "next hop",
+In this implementation, the user data is 21-bits long and is called "next hop",
 which corresponds to its main use of storing the ID of the next hop in a routing table entry.
 
 The main methods exported for the LPM component are:
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 4b90036..918f483 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -41,6 +41,9 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Increased number of next hops for LPM IPv6 to 2^21.**
+
+  The next_hop field is extended from 8 bits to 21 bits for IPv6.
 
 * **Added powerpc support in pci probing for vfio-pci devices.**
 
@@ -114,6 +117,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* The LPM ``next_hop`` field is extended from 8 bits to 21 bits for IPv6
+  while keeping ABI compatibility.
 
 ABI Changes
 -----------
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 9e9ecae..1b005b5 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -265,8 +265,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		uint8_t queueid, uint8_t port_in)
 {
 	struct rx_queue *rxq;
-	uint32_t i, len, next_hop_ipv4;
-	uint8_t next_hop_ipv6, port_out, ipv6;
+	uint32_t i, len, next_hop;
+	uint8_t port_out, ipv6;
 	int32_t len2;
 
 	ipv6 = 0;
@@ -290,9 +290,9 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			port_out = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
@@ -326,9 +326,10 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct lcore_queue_conf *qconf,
 		ip_hdr = rte_pktmbuf_mtod(m, struct ipv6_hdr *);
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			port_out = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+						&next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			port_out = next_hop;
 
 			/* Build transmission burst for new port */
 			len = qconf->tx_mbufs[port_out].len;
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index e62674c..b641576 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -346,8 +346,8 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 	struct rte_ip_frag_death_row *dr;
 	struct rx_queue *rxq;
 	void *d_addr_bytes;
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6, dst_port;
+	uint32_t next_hop;
+	uint8_t dst_port;
 
 	rxq = &qconf->rx_queue_list[queue];
 
@@ -390,9 +390,9 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		ip_dst = rte_be_to_cpu_32(ip_hdr->dst_addr);
 
 		/* Find destination port */
-		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop_ipv4) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv4) != 0) {
-			dst_port = next_hop_ipv4;
+		if (rte_lpm_lookup(rxq->lpm, ip_dst, &next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
@@ -427,9 +427,10 @@ reassemble(struct rte_mbuf *m, uint8_t portid, uint32_t queue,
 		}
 
 		/* Find destination port */
-		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr, &next_hop_ipv6) == 0 &&
-				(enabled_port_mask & 1 << next_hop_ipv6) != 0) {
-			dst_port = next_hop_ipv6;
+		if (rte_lpm6_lookup(rxq->lpm6, ip_hdr->dst_addr,
+						&next_hop) == 0 &&
+				(enabled_port_mask & 1 << next_hop) != 0) {
+			dst_port = next_hop;
 		}
 
 		eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv6);
diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
index 685feec..d3c229a 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -618,7 +618,7 @@ route4_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 static inline void
 route6_pkts(struct rt_ctx *rt_ctx, struct rte_mbuf *pkts[], uint8_t nb_pkts)
 {
-	int16_t hop[MAX_PKT_BURST * 2];
+	int32_t hop[MAX_PKT_BURST * 2];
 	uint8_t dst_ip[MAX_PKT_BURST * 2][16];
 	uint8_t *ip6_dst;
 	uint16_t i, offset;
diff --git a/examples/l3fwd/l3fwd_lpm.h b/examples/l3fwd/l3fwd_lpm.h
index a43c507..258a82f 100644
--- a/examples/l3fwd/l3fwd_lpm.h
+++ b/examples/l3fwd/l3fwd_lpm.h
@@ -49,7 +49,7 @@ lpm_get_ipv4_dst_port(void *ipv4_hdr,  uint8_t portid, void *lookup_struct)
 static inline uint8_t
 lpm_get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid, void *lookup_struct)
 {
-	uint8_t next_hop;
+	uint32_t next_hop;
 	struct rte_lpm6 *ipv6_l3fwd_lookup_struct =
 		(struct rte_lpm6 *)lookup_struct;
 
diff --git a/examples/l3fwd/l3fwd_lpm_sse.h b/examples/l3fwd/l3fwd_lpm_sse.h
index 538fe3d..aa06b6d 100644
--- a/examples/l3fwd/l3fwd_lpm_sse.h
+++ b/examples/l3fwd/l3fwd_lpm_sse.h
@@ -40,8 +40,7 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ipv4_hdr *ipv4_hdr;
 	struct ether_hdr *eth_hdr;
@@ -51,9 +50,11 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
 		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
 
-		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct,
-				rte_be_to_cpu_32(ipv4_hdr->dst_addr), &next_hop_ipv4) == 0) ?
-						next_hop_ipv4 : portid);
+		return (uint16_t) (
+			(rte_lpm_lookup(qconf->ipv4_lookup_struct,
+					rte_be_to_cpu_32(ipv4_hdr->dst_addr),
+					&next_hop) == 0) ?
+						next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -61,8 +62,8 @@ lpm_get_dst_port(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
@@ -78,14 +79,13 @@ static inline __attribute__((always_inline)) uint16_t
 lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 	uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(qconf->ipv4_lookup_struct, dst_ipv4,
-			&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+			&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -93,8 +93,8 @@ lpm_get_dst_port_with_ipv4(const struct lcore_conf *qconf, struct rte_mbuf *pkt,
 		ipv6_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
 
 		return (uint16_t) ((rte_lpm6_lookup(qconf->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0)
-				? next_hop_ipv6 : portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0)
+				? next_hop : portid);
 
 	}
 
diff --git a/examples/performance-thread/l3fwd-thread/main.c b/examples/performance-thread/l3fwd-thread/main.c
index 6845e28..bf92582 100644
--- a/examples/performance-thread/l3fwd-thread/main.c
+++ b/examples/performance-thread/l3fwd-thread/main.c
@@ -909,7 +909,7 @@ static inline uint8_t
 get_ipv6_dst_port(void *ipv6_hdr,  uint8_t portid,
 		lookup6_struct_t *ipv6_l3fwd_lookup_struct)
 {
-	uint8_t next_hop;
+	uint32_t next_hop;
 
 	return (uint8_t) ((rte_lpm6_lookup(ipv6_l3fwd_lookup_struct,
 			((struct ipv6_hdr *)ipv6_hdr)->dst_addr, &next_hop) == 0) ?
@@ -1396,15 +1396,14 @@ rfc1812_process(struct ipv4_hdr *ipv4_hdr, uint16_t *dp, uint32_t ptype)
 static inline __attribute__((always_inline)) uint16_t
 get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 {
-	uint32_t next_hop_ipv4;
-	uint8_t next_hop_ipv6;
+	uint32_t next_hop;
 	struct ipv6_hdr *ipv6_hdr;
 	struct ether_hdr *eth_hdr;
 
 	if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) {
 		return (uint16_t) ((rte_lpm_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv4_lookup_struct, dst_ipv4,
-				&next_hop_ipv4) == 0) ? next_hop_ipv4 : portid);
+				&next_hop) == 0) ? next_hop : portid);
 
 	} else if (RTE_ETH_IS_IPV6_HDR(pkt->packet_type)) {
 
@@ -1413,8 +1412,8 @@ get_dst_port(struct rte_mbuf *pkt, uint32_t dst_ipv4, uint8_t portid)
 
 		return (uint16_t) ((rte_lpm6_lookup(
 				RTE_PER_LCORE(lcore_conf)->ipv6_lookup_struct,
-				ipv6_hdr->dst_addr, &next_hop_ipv6) == 0) ? next_hop_ipv6 :
-						portid);
+				ipv6_hdr->dst_addr, &next_hop) == 0) ?
+				next_hop : portid);
 
 	}
 
diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 32fdba0..9cc7be7 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -97,7 +97,7 @@ struct rte_lpm6_tbl_entry {
 /** Rules tbl entry structure. */
 struct rte_lpm6_rule {
 	uint8_t ip[RTE_LPM6_IPV6_ADDR_SIZE]; /**< Rule IP address. */
-	uint8_t next_hop; /**< Rule next hop. */
+	uint32_t next_hop; /**< Rule next hop. */
 	uint8_t depth; /**< Rule depth. */
 };
 
@@ -297,7 +297,7 @@ rte_lpm6_free(struct rte_lpm6 *lpm)
  * the nexthop if so. Otherwise it adds a new rule if enough space is available.
  */
 static inline int32_t
-rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
+rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint32_t next_hop, uint8_t depth)
 {
 	uint32_t rule_index;
 
@@ -340,7 +340,7 @@ rule_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t next_hop, uint8_t depth)
  */
 static void
 expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
-		uint8_t next_hop)
+		uint32_t next_hop)
 {
 	uint32_t tbl8_group_end, tbl8_gindex_next, j;
 
@@ -377,7 +377,7 @@ expand_rule(struct rte_lpm6 *lpm, uint32_t tbl8_gindex, uint8_t depth,
 static inline int
 add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
 		struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip, uint8_t bytes,
-		uint8_t first_byte, uint8_t depth, uint8_t next_hop)
+		uint8_t first_byte, uint8_t depth, uint32_t next_hop)
 {
 	uint32_t tbl_index, tbl_range, tbl8_group_start, tbl8_group_end, i;
 	int32_t tbl8_gindex;
@@ -507,9 +507,17 @@ add_step(struct rte_lpm6 *lpm, struct rte_lpm6_tbl_entry *tbl,
  * Add a route
  */
 int
-rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop)
 {
+	return rte_lpm6_add_v1705(lpm, ip, depth, next_hop);
+}
+VERSION_SYMBOL(rte_lpm6_add, _v20, 2.0);
+
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop)
+{
 	struct rte_lpm6_tbl_entry *tbl;
 	struct rte_lpm6_tbl_entry *tbl_next;
 	int32_t rule_index;
@@ -560,6 +568,10 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_add, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip,
+				uint8_t depth, uint32_t next_hop),
+		rte_lpm6_add_v1705);
 
 /*
  * Takes a pointer to a table entry and inspect one level.
@@ -569,7 +581,7 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 static inline int
 lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		const struct rte_lpm6_tbl_entry **tbl_next, uint8_t *ip,
-		uint8_t first_byte, uint8_t *next_hop)
+		uint8_t first_byte, uint32_t *next_hop)
 {
 	uint32_t tbl8_index, tbl_entry;
 
@@ -589,7 +601,7 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
 		return 1;
 	} else {
 		/* If not extended then we can have a match. */
-		*next_hop = (uint8_t)tbl_entry;
+		*next_hop = ((uint32_t)tbl_entry & RTE_LPM6_TBL8_BITMASK);
 		return (tbl_entry & RTE_LPM6_LOOKUP_SUCCESS) ? 0 : -ENOENT;
 	}
 }
@@ -598,7 +610,26 @@ lookup_step(const struct rte_lpm6 *lpm, const struct rte_lpm6_tbl_entry *tbl,
  * Looks up an IP
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL)
+		return -EINVAL;
+
+	status = rte_lpm6_lookup_v1705(lpm, ip, &next_hop32);
+	if (status == 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+}
+VERSION_SYMBOL(rte_lpm6_lookup, _v20, 2.0);
+
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
+		uint32_t *next_hop)
 {
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
@@ -625,20 +656,23 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop)
 
 	return status;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip,
+				uint32_t *next_hop), rte_lpm6_lookup_v1705);
 
 /*
  * Looks up a group of IP addresses
  */
 int
-rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
 		int16_t * next_hops, unsigned n)
 {
 	unsigned i;
 	const struct rte_lpm6_tbl_entry *tbl;
 	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
-	uint32_t tbl24_index;
-	uint8_t first_byte, next_hop;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
 	int status;
 
 	/* DEBUG: Check user input arguments. */
@@ -664,11 +698,59 @@ rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		if (status < 0)
 			next_hops[i] = -1;
 		else
-			next_hops[i] = next_hop;
+			next_hops[i] = (int16_t)next_hop;
+	}
+
+	return 0;
+}
+VERSION_SYMBOL(rte_lpm6_lookup_bulk_func, _v20, 2.0);
+
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t *next_hops, unsigned int n)
+{
+	unsigned int i;
+	const struct rte_lpm6_tbl_entry *tbl;
+	const struct rte_lpm6_tbl_entry *tbl_next = NULL;
+	uint32_t tbl24_index, next_hop;
+	uint8_t first_byte;
+	int status;
+
+	/* DEBUG: Check user input arguments. */
+	if ((lpm == NULL) || (ips == NULL) || (next_hops == NULL))
+		return -EINVAL;
+
+	for (i = 0; i < n; i++) {
+		first_byte = LOOKUP_FIRST_BYTE;
+		tbl24_index = (ips[i][0] << BYTES2_SIZE) |
+				(ips[i][1] << BYTE_SIZE) | ips[i][2];
+
+		/* Calculate pointer to the first entry to be inspected */
+		tbl = &lpm->tbl24[tbl24_index];
+
+		do {
+			/* Continue inspecting following levels
+			 * until success or failure
+			 */
+			status = lookup_step(lpm, tbl, &tbl_next, ips[i],
+					first_byte++, &next_hop);
+			tbl = tbl_next;
+		} while (status == 1);
+
+		if (status < 0)
+			next_hops[i] = -1;
+		else
+			next_hops[i] = (int32_t)next_hop;
 	}
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_lookup_bulk_func, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
+				uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+				int32_t *next_hops, unsigned int n),
+		rte_lpm6_lookup_bulk_func_v1705);
 
 /*
  * Finds a rule in rule table.
@@ -698,8 +780,28 @@ rule_find(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
  * Look for a rule in the high-level rules table
  */
 int
-rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop)
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop)
+{
+	uint32_t next_hop32 = 0;
+	int32_t status;
+
+	/* DEBUG: Check user input arguments. */
+	if (next_hop == NULL)
+		return -EINVAL;
+
+	status = rte_lpm6_is_rule_present_v1705(lpm, ip, depth, &next_hop32);
+	if (status > 0)
+		*next_hop = (uint8_t)next_hop32;
+
+	return status;
+
+}
+VERSION_SYMBOL(rte_lpm6_is_rule_present, _v20, 2.0);
+
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop)
 {
 	uint8_t ip_masked[RTE_LPM6_IPV6_ADDR_SIZE];
 	int32_t rule_index;
@@ -724,6 +826,10 @@ uint8_t *next_hop)
 	/* If rule is not found return 0. */
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_lpm6_is_rule_present, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_lpm6_is_rule_present(struct rte_lpm6 *lpm,
+				uint8_t *ip, uint8_t depth, uint32_t *next_hop),
+		rte_lpm6_is_rule_present_v1705);
 
 /*
  * Delete a rule from the rule table.
diff --git a/lib/librte_lpm/rte_lpm6.h b/lib/librte_lpm/rte_lpm6.h
index 13d027f..3a3342d 100644
--- a/lib/librte_lpm/rte_lpm6.h
+++ b/lib/librte_lpm/rte_lpm6.h
@@ -39,6 +39,7 @@
  */
 
 #include <stdint.h>
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -123,7 +124,13 @@ rte_lpm6_free(struct rte_lpm6 *lpm);
  */
 int
 rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
+int
+rte_lpm6_add_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
 		uint8_t next_hop);
+int
+rte_lpm6_add_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t next_hop);
 
 /**
  * Check if a rule is present in the LPM table,
@@ -142,7 +149,13 @@ rte_lpm6_add(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
  */
 int
 rte_lpm6_is_rule_present(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
-uint8_t *next_hop);
+		uint32_t *next_hop);
+int
+rte_lpm6_is_rule_present_v20(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint8_t *next_hop);
+int
+rte_lpm6_is_rule_present_v1705(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth,
+		uint32_t *next_hop);
 
 /**
  * Delete a rule from the LPM table.
@@ -199,7 +212,12 @@ rte_lpm6_delete_all(struct rte_lpm6 *lpm);
  *   -EINVAL for incorrect arguments, -ENOENT on lookup miss, 0 on lookup hit
  */
 int
-rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint32_t *next_hop);
+int
+rte_lpm6_lookup_v20(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
+int
+rte_lpm6_lookup_v1705(const struct rte_lpm6 *lpm, uint8_t *ip,
+		uint32_t *next_hop);
 
 /**
  * Lookup multiple IP addresses in an LPM table.
@@ -220,7 +238,15 @@ rte_lpm6_lookup(const struct rte_lpm6 *lpm, uint8_t *ip, uint8_t *next_hop);
 int
 rte_lpm6_lookup_bulk_func(const struct rte_lpm6 *lpm,
 		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
-		int16_t * next_hops, unsigned n);
+		int32_t *next_hops, unsigned int n);
+int
+rte_lpm6_lookup_bulk_func_v20(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int16_t *next_hops, unsigned int n);
+int
+rte_lpm6_lookup_bulk_func_v1705(const struct rte_lpm6 *lpm,
+		uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE],
+		int32_t *next_hops, unsigned int n);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 239b371..90beac8 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -34,3 +34,13 @@ DPDK_16.04 {
 	rte_lpm_delete_all;
 
 } DPDK_2.0;
+
+DPDK_17.05 {
+	global:
+
+	rte_lpm6_add;
+	rte_lpm6_is_rule_present;
+	rte_lpm6_lookup;
+	rte_lpm6_lookup_bulk_func;
+
+} DPDK_16.04;
diff --git a/lib/librte_table/rte_table_lpm_ipv6.c b/lib/librte_table/rte_table_lpm_ipv6.c
index 836f4cf..1e1a173 100644
--- a/lib/librte_table/rte_table_lpm_ipv6.c
+++ b/lib/librte_table/rte_table_lpm_ipv6.c
@@ -211,9 +211,8 @@ rte_table_lpm_ipv6_entry_add(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint32_t nht_pos, nht_pos0_valid;
+	uint32_t nht_pos, nht_pos0, nht_pos0_valid;
 	int status;
-	uint8_t nht_pos0;
 
 	/* Check input parameters */
 	if (lpm == NULL) {
@@ -256,7 +255,7 @@ rte_table_lpm_ipv6_entry_add(
 
 	/* Add rule to low level LPM table */
 	if (rte_lpm6_add(lpm->lpm, ip_prefix->ip, ip_prefix->depth,
-		(uint8_t) nht_pos) < 0) {
+		nht_pos) < 0) {
 		RTE_LOG(ERR, TABLE, "%s: LPM IPv6 rule add failed\n", __func__);
 		return -1;
 	}
@@ -280,7 +279,7 @@ rte_table_lpm_ipv6_entry_delete(
 	struct rte_table_lpm_ipv6 *lpm = (struct rte_table_lpm_ipv6 *) table;
 	struct rte_table_lpm_ipv6_key *ip_prefix =
 		(struct rte_table_lpm_ipv6_key *) key;
-	uint8_t nht_pos;
+	uint32_t nht_pos;
 	int status;
 
 	/* Check input parameters */
@@ -356,7 +355,7 @@ rte_table_lpm_ipv6_lookup(
 			uint8_t *ip = RTE_MBUF_METADATA_UINT8_PTR(pkt,
 				lpm->offset);
 			int status;
-			uint8_t nht_pos;
+			uint32_t nht_pos;
 
 			status = rte_lpm6_lookup(lpm->lpm, ip, &nht_pos);
 			if (status == 0) {
diff --git a/test/test/test_lpm6.c b/test/test/test_lpm6.c
index 61134f7..e0e7bf0 100644
--- a/test/test/test_lpm6.c
+++ b/test/test/test_lpm6.c
@@ -79,6 +79,7 @@ static int32_t test24(void);
 static int32_t test25(void);
 static int32_t test26(void);
 static int32_t test27(void);
+static int32_t test28(void);
 
 rte_lpm6_test tests6[] = {
 /* Test Cases */
@@ -110,6 +111,7 @@ rte_lpm6_test tests6[] = {
 	test25,
 	test26,
 	test27,
+	test28,
 };
 
 #define NUM_LPM6_TESTS                (sizeof(tests6)/sizeof(tests6[0]))
@@ -354,7 +356,7 @@ test6(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -392,7 +394,7 @@ test7(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[10][16];
-	int16_t next_hop_return[10];
+	int32_t next_hop_return[10];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -469,7 +471,8 @@ test9(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 16, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 	uint8_t i;
 
@@ -513,7 +516,8 @@ test10(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -557,7 +561,8 @@ test11(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -617,7 +622,8 @@ test12(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -655,7 +661,8 @@ test13(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add = 100;
+	uint8_t depth;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 
 	config.max_rules = 2;
@@ -702,7 +709,8 @@ test14(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 25, next_hop_add = 100;
+	uint8_t depth = 25;
+	uint32_t next_hop_add = 100;
 	int32_t status = 0;
 	int i;
 
@@ -748,7 +756,8 @@ test15(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 24, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 24;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -784,7 +793,8 @@ test16(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[] = {12,12,1,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth = 128, next_hop_add = 100, next_hop_return = 0;
+	uint8_t depth = 128;
+	uint32_t next_hop_add = 100, next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -828,7 +838,8 @@ test17(void)
 	uint8_t ip1[] = {127,255,255,255,255,255,255,255,255,
 			255,255,255,255,255,255,255};
 	uint8_t ip2[] = {128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -857,7 +868,7 @@ test17(void)
 
 	/* Loop with rte_lpm6_delete. */
 	for (depth = 16; depth >= 1; depth--) {
-		next_hop_add = (uint8_t) (depth - 1);
+		next_hop_add = (depth - 1);
 
 		status = rte_lpm6_delete(lpm, ip2, depth);
 		TEST_LPM_ASSERT(status == 0);
@@ -893,8 +904,9 @@ test18(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16], ip_1[16], ip_2[16];
-	uint8_t depth, depth_1, depth_2, next_hop_add, next_hop_add_1,
-		next_hop_add_2, next_hop_return;
+	uint8_t depth, depth_1, depth_2;
+	uint32_t next_hop_add, next_hop_add_1,
+			next_hop_add_2, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1055,7 +1067,8 @@ test19(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1253,7 +1266,8 @@ test20(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1320,8 +1334,9 @@ test21(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[4][16];
-	uint8_t depth, next_hop_add;
-	int16_t next_hop_return[4];
+	uint8_t depth;
+	uint32_t next_hop_add;
+	int32_t next_hop_return[4];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1378,8 +1393,9 @@ test22(void)
 	struct rte_lpm6 *lpm = NULL;
 	struct rte_lpm6_config config;
 	uint8_t ip_batch[5][16];
-	uint8_t depth[5], next_hop_add;
-	int16_t next_hop_return[5];
+	uint8_t depth[5];
+	uint32_t next_hop_add;
+	int32_t next_hop_return[5];
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1495,7 +1511,8 @@ test23(void)
 	struct rte_lpm6_config config;
 	uint32_t i;
 	uint8_t ip[16];
-	uint8_t depth, next_hop_add, next_hop_return;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1579,7 +1596,8 @@ test25(void)
 	struct rte_lpm6_config config;
 	uint8_t ip[16];
 	uint32_t i;
-	uint8_t depth, next_hop_add, next_hop_return, next_hop_expected;
+	uint8_t depth;
+	uint32_t next_hop_add, next_hop_return, next_hop_expected;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1632,10 +1650,10 @@ test26(void)
 	uint8_t d_ip_10_32 = 32;
 	uint8_t	d_ip_10_24 = 24;
 	uint8_t	d_ip_20_25 = 25;
-	uint8_t next_hop_ip_10_32 = 100;
-	uint8_t	next_hop_ip_10_24 = 105;
-	uint8_t	next_hop_ip_20_25 = 111;
-	uint8_t next_hop_return = 0;
+	uint32_t next_hop_ip_10_32 = 100;
+	uint32_t next_hop_ip_10_24 = 105;
+	uint32_t next_hop_ip_20_25 = 111;
+	uint32_t next_hop_return = 0;
 	int32_t status = 0;
 
 	config.max_rules = MAX_RULES;
@@ -1650,7 +1668,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_32, &next_hop_return);
-	uint8_t test_hop_10_32 = next_hop_return;
+	uint32_t test_hop_10_32 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_32);
 
@@ -1659,7 +1677,7 @@ test26(void)
 			return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_10_24, &next_hop_return);
-	uint8_t test_hop_10_24 = next_hop_return;
+	uint32_t test_hop_10_24 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_10_24);
 
@@ -1668,7 +1686,7 @@ test26(void)
 		return -1;
 
 	status = rte_lpm6_lookup(lpm, ip_20_25, &next_hop_return);
-	uint8_t test_hop_20_25 = next_hop_return;
+	uint32_t test_hop_20_25 = next_hop_return;
 	TEST_LPM_ASSERT(status == 0);
 	TEST_LPM_ASSERT(next_hop_return == next_hop_ip_20_25);
 
@@ -1707,7 +1725,8 @@ test27(void)
 		struct rte_lpm6 *lpm = NULL;
 		struct rte_lpm6_config config;
 		uint8_t ip[] = {128,128,128,128,128,128,128,128,128,128,128,128,128,128,0,0};
-		uint8_t depth = 128, next_hop_add = 100, next_hop_return;
+		uint8_t depth = 128;
+		uint32_t next_hop_add = 100, next_hop_return;
 		int32_t status = 0;
 		int i, j;
 
@@ -1746,6 +1765,42 @@ test27(void)
 }
 
 /*
+ * Call add, lookup and delete for a single rule with maximum 21bit next_hop
+ * size.
+ * Check that next_hop returned from lookup is equal to provisioned value.
+ * Delete the rule and check that the same test returs a miss.
+ */
+int32_t
+test28(void)
+{
+	struct rte_lpm6 *lpm = NULL;
+	struct rte_lpm6_config config;
+	uint8_t ip[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+	uint8_t depth = 16;
+	uint32_t next_hop_add = 0x001FFFFF, next_hop_return = 0;
+	int32_t status = 0;
+
+	config.max_rules = MAX_RULES;
+	config.number_tbl8s = NUMBER_TBL8S;
+	config.flags = 0;
+
+	lpm = rte_lpm6_create(__func__, SOCKET_ID_ANY, &config);
+	TEST_LPM_ASSERT(lpm != NULL);
+
+	status = rte_lpm6_add(lpm, ip, depth, next_hop_add);
+	TEST_LPM_ASSERT(status == 0);
+
+	status = rte_lpm6_lookup(lpm, ip, &next_hop_return);
+	TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add));
+
+	status = rte_lpm6_delete(lpm, ip, depth);
+	TEST_LPM_ASSERT(status == 0);
+	rte_lpm6_free(lpm);
+
+	return PASS;
+}
+
+/*
  * Do all unit tests.
  */
 static int
diff --git a/test/test/test_lpm6_perf.c b/test/test/test_lpm6_perf.c
index 0723081..30be430 100644
--- a/test/test/test_lpm6_perf.c
+++ b/test/test/test_lpm6_perf.c
@@ -86,7 +86,7 @@ test_lpm6_perf(void)
 	struct rte_lpm6_config config;
 	uint64_t begin, total_time;
 	unsigned i, j;
-	uint8_t next_hop_add = 0xAA, next_hop_return = 0;
+	uint32_t next_hop_add = 0xAA, next_hop_return = 0;
 	int status = 0;
 	int64_t count = 0;
 
@@ -148,7 +148,7 @@ test_lpm6_perf(void)
 	count = 0;
 
 	uint8_t ip_batch[NUM_IPS_ENTRIES][16];
-	int16_t next_hops[NUM_IPS_ENTRIES];
+	int32_t next_hops[NUM_IPS_ENTRIES];
 
 	for (i = 0; i < NUM_IPS_ENTRIES; i++)
 		memcpy(ip_batch[i], large_ips_table[i].ip, 16);
-- 
2.1.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements
  2017-03-06  9:10  1%           ` [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-15  6:19  2%             ` David Hunt
  2017-03-15  6:19  1%               ` [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-15  6:19  2%               ` [dpdk-dev] [PATCH v10 08/18] lib: add symbol versioning to distributor David Hunt
  0 siblings, 2 replies; 200+ results
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v10 changes:
   * Addressed all review comments from v9 (thanks, Bruce)
   * Squashed the two patches containing distributor structs and code
   * Renamed confusing rte_distributor_v1705.h to rte_distributor_next.h
   * Added usleep in main so as to be a little more gentle with that core
   * Fixed some patch titles and improved some descriptions
   * Updated sample app guide documentation
   * Removed un-needed code limiting Tx rings and cleaned up patch
   * Inherited v9 series Ack by Bruce, except new suggested addition
     for example app documentation (17/18)

v9 changes:
   * fixed symbol versioning so it will compile on CentOS and RedHat

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new distributor code
[04/18] lib: add SIMD flow matching to distributor
[05/18] test/distributor: extra params for autotests
[06/18] lib: switch distributor over to new API
[07/18] lib: make v20 header file private
[08/18] lib: add symbol versioning to distributor
[09/18] test: test single and burst distributor API
[10/18] test: add perf test for distributor burst mode
[11/18] examples/distributor: allow for extra stats
[12/18] examples/distributor: wait for ports to come up
[13/18] examples/distributor: add dedicated core for dist
[14/18] examples/distributor: tweaks for performance
[15/18] examples/distributor: give Rx thread a core
[16/18] doc: distributor library changes for new burst API
[17/18] doc: distributor app changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files
  2017-03-15  6:19  2%             ` [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements David Hunt
@ 2017-03-15  6:19  1%               ` David Hunt
  2017-03-15  6:19  2%               ` [dpdk-dev] [PATCH v10 08/18] lib: add symbol versioning to distributor David Hunt
  1 sibling, 0 replies; 200+ results
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v10 08/18] lib: add symbol versioning to distributor
  2017-03-15  6:19  2%             ` [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements David Hunt
  2017-03-15  6:19  1%               ` [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-15  6:19  2%               ` David Hunt
  1 sibling, 0 replies; 200+ results
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
 lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 +++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++
 5 files changed, 162 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..06df13d 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -44,6 +45,7 @@
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
 #include "rte_distributor_v20.h"
+#include "rte_distributor_v1705.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
@@ -57,7 +59,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
@@ -102,9 +104,14 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count),
+		rte_distributor_request_pkt_v1705);
 
 int
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -138,9 +145,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts),
+		rte_distributor_poll_pkt_v1705);
 
 int
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -168,9 +179,14 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count),
+		rte_distributor_get_pkt_v1705);
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -197,6 +213,10 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num),
+		rte_distributor_return_pkt_v1705);
 
 /**** APIs called on distributor core ***/
 
@@ -342,7 +362,7 @@ release(struct rte_distributor *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -476,10 +496,14 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs),
+		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -504,6 +528,10 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs),
+		rte_distributor_returned_pkts_v1705);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -525,7 +553,7 @@ total_outstanding(const struct rte_distributor *d)
  * queued up.
  */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
@@ -549,10 +577,13 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
+		rte_distributor_flush_v1705);
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
@@ -565,10 +596,13 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
+		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
 struct rte_distributor *
-rte_distributor_create(const char *name,
+rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
@@ -638,3 +672,8 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, _v1705, 17.05);
+MAP_STATIC_SYMBOL(struct rte_distributor *rte_distributor_create(
+		const char *name, unsigned int socket_id,
+		unsigned int num_workers, unsigned int alg_type),
+		rte_distributor_create_v1705);
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..81b2691
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,89 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V1705_H_
+#define _RTE_DISTRIB_V1705_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v4 04/17] net/avp: add PMD version map file
  2017-03-13 19:16  3%       ` [dpdk-dev] [PATCH v4 04/17] net/avp: add PMD version map file Allain Legacy
@ 2017-03-16 14:52  0%         ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2017-03-16 14:52 UTC (permalink / raw)
  To: Allain Legacy
  Cc: dev, ian.jolliffe, bruce.richardson, john.mcnamara, keith.wiles,
	thomas.monjalon, vincent.jardin, jerin.jacob, stephen, 3chas3

On 3/13/2017 7:16 PM, Allain Legacy wrote:
> Adds a default ABI version file for the AVP PMD.
> 
> Signed-off-by: Allain Legacy <allain.legacy@windriver.com>
> Signed-off-by: Matt Peters <matt.peters@windriver.com>
> ---
>  drivers/net/avp/rte_pmd_avp_version.map | 4 ++++
>  1 file changed, 4 insertions(+)
>  create mode 100644 drivers/net/avp/rte_pmd_avp_version.map
> 
> diff --git a/drivers/net/avp/rte_pmd_avp_version.map b/drivers/net/avp/rte_pmd_avp_version.map
> new file mode 100644
> index 0000000..af8f3f4
> --- /dev/null
> +++ b/drivers/net/avp/rte_pmd_avp_version.map
> @@ -0,0 +1,4 @@
> +DPDK_17.05 {
> +
> +    local: *;
> +};
> 

Hi Allain,

Instead of adding files per patch, may I suggest different ordering:
First add skeleton files in a patch, later add functional pieces one by
one, like:

Merge patch 1/17, 3/17, this patch (4/17), 6/17 (removing SYMLINK), into
single patch and make it AVP first patch. This will be skeleton patch.

Second patch can be introducing public headers (2/17) and updating
Makefile to include them.

Third, debug log patch (5/17)

Patch 7/17 and later can stay same.

What do you think?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] mk: Provide option to set Major ABI version
  2017-03-01 14:35  4%             ` Jan Blunck
@ 2017-03-16 17:19  4%               ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2017-03-16 17:19 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: dev, Jan Blunck, cjcollier, ricardo.salveti, Luca Boccassi

2017-03-01 15:35, Jan Blunck:
> On Wed, Mar 1, 2017 at 10:34 AM, Christian Ehrhardt
> <christian.ehrhardt@canonical.com> wrote:
> > Downstreams might want to provide different DPDK releases at the same
> > time to support multiple consumers of DPDK linked against older and newer
> > sonames.
> >
> > Also due to the interdependencies that DPDK libraries can have applications
> > might end up with an executable space in which multiple versions of a
> > library are mapped by ld.so.
> >
> > Think of LibA that got an ABI bump and LibB that did not get an ABI bump
> > but is depending on LibA.
> >
> >     Application
> >     \-> LibA.old
> >     \-> LibB.new -> LibA.new
> >
> > That is a conflict which can be avoided by setting CONFIG_RTE_MAJOR_ABI.
> > If set CONFIG_RTE_MAJOR_ABI overwrites any LIBABIVER value.
> > An example might be ``CONFIG_RTE_MAJOR_ABI=16.11`` which will make all
> > libraries librte<?>.so.16.11 instead of librte<?>.so.<LIBABIVER>.
[...]
> >
> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> 
> Reviewed-by: Jan Blunck <jblunck@infradead.org>
> Tested-by: Jan Blunck <jblunck@infradead.org>

Not sure about how it can be used in distributions, but it does not hurt
to provide the config option.
Are you going to link applications against a fixed DPDK version for
every libraries?

Applied, thanks

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio?
  @ 2017-03-16 23:17  3%           ` Stephen Hemminger
  2017-03-16 23:41  0%             ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back Vincent JARDIN
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2017-03-16 23:17 UTC (permalink / raw)
  To: O'Driscoll, Tim
  Cc: Vincent JARDIN, Legacy, Allain (Wind River),
	Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, Wiles, Keith, thomas.monjalon,
	jerin.jacob, 3chas3

On Wed, 15 Mar 2017 04:10:56 +0000
"O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:

> I've included a couple of specific comments inline below, and a general comment here.
> 
> We have somebody proposing to add a new driver to DPDK. It's standalone and doesn't affect any of the core libraries. They're willing to maintain the driver and have included a patch to update the maintainers file. They've also included the relevant documentation changes. I haven't seen any negative comment on the patches themselves except for a request from John McNamara for an update to the Release Notes that was addressed in a later version. I think we should be welcoming this into DPDK rather than questioning/rejecting it.
> 
> I'd suggest that this is a good topic for the next Tech Board meeting.

This is a virtualization driver for supporting DPDK on platform that provides an alternative
virtual network driver. I see no reason it shouldn't be part of DPDK. Given the unstable
ABI for drivers, supporting out of tree DPDK drivers is difficult. The DPDK should try
to be inclusive and support as many environments as possible.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back
  2017-03-16 23:17  3%           ` Stephen Hemminger
@ 2017-03-16 23:41  0%             ` Vincent JARDIN
  2017-03-17  0:08  0%               ` Wiles, Keith
  0 siblings, 1 reply; 200+ results
From: Vincent JARDIN @ 2017-03-16 23:41 UTC (permalink / raw)
  To: Stephen Hemminger, O'Driscoll, Tim, Legacy, Allain (Wind River)
  Cc: Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, Wiles, Keith, thomas.monjalon,
	jerin.jacob, 3chas3, stefanha, Markus Armbruster

Let's be back to 2014 with Qemu's thoughts on it,
+Stefan
 
https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026767.html

and
+Markus
 
https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026713.html

> 6. Device models belong into QEMU
>
>    Say you build an actual interface on top of ivshmem.  Then ivshmem in
>    QEMU together with the supporting host code outside QEMU (see 3.) and
>    the lower layer of the code using it in guests (kernel + user space)
>    provide something that to me very much looks like a device model.
>
>    Device models belong into QEMU.  It's what QEMU does.


Le 17/03/2017 à 00:17, Stephen Hemminger a écrit :
> On Wed, 15 Mar 2017 04:10:56 +0000
> "O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:
>
>> I've included a couple of specific comments inline below, and a general comment here.
>>
>> We have somebody proposing to add a new driver to DPDK. It's standalone and doesn't affect any of the core libraries. They're willing to maintain the driver and have included a patch to update the maintainers file. They've also included the relevant documentation changes. I haven't seen any negative comment on the patches themselves except for a request from John McNamara for an update to the Release Notes that was addressed in a later version. I think we should be welcoming this into DPDK rather than questioning/rejecting it.
>>
>> I'd suggest that this is a good topic for the next Tech Board meeting.
>
> This is a virtualization driver for supporting DPDK on platform that provides an alternative
> virtual network driver. I see no reason it shouldn't be part of DPDK. Given the unstable
> ABI for drivers, supporting out of tree DPDK drivers is difficult. The DPDK should try
> to be inclusive and support as many environments as possible.
>

On Qemu mailing list, back to 2014, I did push to build models of 
devices over ivshmem, like AVP, but folks did not want that we abuse of 
it. The Qemu community wants that we avoid unfocusing. So, by being too 
much inclusive, we abuse of the Qemu's capabilities.

So, because of being "inclusive", we should allow any cases, it is not a 
proper way to make sure that virtio gets all the focuses it deserves.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back
  2017-03-16 23:41  0%             ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back Vincent JARDIN
@ 2017-03-17  0:08  0%               ` Wiles, Keith
  0 siblings, 0 replies; 200+ results
From: Wiles, Keith @ 2017-03-17  0:08 UTC (permalink / raw)
  To: Vincent JARDIN
  Cc: Stephen Hemminger, O'Driscoll, Tim, Legacy,
	Allain (Wind River),
	Yigit, Ferruh, dev, Jolliffe, Ian (Wind River),
	Richardson, Bruce, Mcnamara, John, thomas.monjalon, jerin.jacob,
	3chas3, stefanha, Markus Armbruster


> On Mar 17, 2017, at 7:41 AM, Vincent JARDIN <vincent.jardin@6wind.com> wrote:
> 
> Let's be back to 2014 with Qemu's thoughts on it,
> +Stefan
> https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026767.html
> 
> and
> +Markus
> https://lists.linuxfoundation.org/pipermail/virtualization/2014-June/026713.html
> 
>> 6. Device models belong into QEMU
>> 
>>   Say you build an actual interface on top of ivshmem.  Then ivshmem in
>>   QEMU together with the supporting host code outside QEMU (see 3.) and
>>   the lower layer of the code using it in guests (kernel + user space)
>>   provide something that to me very much looks like a device model.
>> 
>>   Device models belong into QEMU.  It's what QEMU does.
> 
> 
> Le 17/03/2017 à 00:17, Stephen Hemminger a écrit :
>> On Wed, 15 Mar 2017 04:10:56 +0000
>> "O'Driscoll, Tim" <tim.odriscoll@intel.com> wrote:
>> 
>>> I've included a couple of specific comments inline below, and a general comment here.
>>> 
>>> We have somebody proposing to add a new driver to DPDK. It's standalone and doesn't affect any of the core libraries. They're willing to maintain the driver and have included a patch to update the maintainers file. They've also included the relevant documentation changes. I haven't seen any negative comment on the patches themselves except for a request from John McNamara for an update to the Release Notes that was addressed in a later version. I think we should be welcoming this into DPDK rather than questioning/rejecting it.
>>> 
>>> I'd suggest that this is a good topic for the next Tech Board meeting.
>> 
>> This is a virtualization driver for supporting DPDK on platform that provides an alternative
>> virtual network driver. I see no reason it shouldn't be part of DPDK. Given the unstable
>> ABI for drivers, supporting out of tree DPDK drivers is difficult. The DPDK should try
>> to be inclusive and support as many environments as possible.


+2!! for Stephen’s comment.

>> 
> 
> On Qemu mailing list, back to 2014, I did push to build models of devices over ivshmem, like AVP, but folks did not want that we abuse of it. The Qemu community wants that we avoid unfocusing. So, by being too much inclusive, we abuse of the Qemu's capabilities.
> 
> So, because of being "inclusive", we should allow any cases, it is not a proper way to make sure that virtio gets all the focuses it deserves.
> 
> 

Regards,
Keith


^ permalink raw reply	[relevance 0%]

Results 2601-2800 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2016-11-07  7:38     [dpdk-dev] [PATCH] maintainers: claim responsability for xen Jianfeng Tan
2016-11-10 18:59     ` Konrad Rzeszutek Wilk
2016-11-10 20:49       ` Tan, Jianfeng
2017-02-16 11:06         ` Thomas Monjalon
2017-02-16 13:36           ` Konrad Rzeszutek Wilk
2017-02-16 21:51             ` Vincent JARDIN
2017-02-17 16:07               ` Konrad Rzeszutek Wilk
2017-02-20  9:56                 ` Jan Blunck
2017-02-20 17:36  3%               ` Joao Martins
2017-01-05 10:44     [dpdk-dev] [PATCH v1] doc: announce API and ABI change for ethdev Bernard Iremonger
2017-01-05 15:25     ` [dpdk-dev] [PATCH v2] " Bernard Iremonger
2017-02-13 17:57  4%   ` Thomas Monjalon
2017-02-14  3:17  4%     ` Jerin Jacob
2017-02-14 10:33  4%       ` Iremonger, Bernard
2017-02-14 19:37  4%   ` Thomas Monjalon
2017-01-16 16:19     [dpdk-dev] [PATCH v7 0/6] Expanded statistics reporting Remy Horton
2017-01-16 16:19     ` [dpdk-dev] [PATCH v7 5/6] lib: added new library for latency stats Remy Horton
2017-01-17  4:29       ` Jerin Jacob
2017-01-17 11:19         ` Mcnamara, John
2017-01-17 12:34           ` Jerin Jacob
2017-01-17 14:53             ` Mcnamara, John
2017-01-17 16:25               ` Jerin Jacob
2017-01-18 20:11                 ` Olivier Matz
2017-01-24 15:24  3%               ` Olivier MATZ
2017-01-18 15:05     [dpdk-dev] [PATCH v9 0/7] Expanded statistics reporting Remy Horton
2017-01-18 15:05     ` [dpdk-dev] [PATCH v9 1/7] lib: add information metrics library Remy Horton
2017-01-30 15:50  0%   ` Thomas Monjalon
2017-01-19  5:34     [dpdk-dev] [PATCH] doc: announce ABI change for cloud filter Yong Liu
2017-01-19 18:45     ` Adrien Mazarguil
2017-01-20  2:14       ` Lu, Wenzhuo
2017-01-20 14:57  4%     ` Thomas Monjalon
2017-02-14  3:19  4%       ` Jerin Jacob
2017-01-20  9:51     [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior Zhiyong Yang
2017-01-20 10:26  3% ` Andrew Rybchenko
     [not found]       ` <2601191342CEEE43887BDE71AB9772583F108924@irsmsx105.ger.corp.intel.com>
2017-01-20 11:24  0%     ` Ananyev, Konstantin
2017-01-20 11:48  0%       ` Bruce Richardson
2017-01-23 16:36  0%         ` Adrien Mazarguil
2017-02-07  7:50  0%           ` Yang, Zhiyong
2017-01-21  4:07  0%       ` Yang, Zhiyong
2017-01-21  4:13  2%   ` Yang, Zhiyong
2017-01-23  9:24     [dpdk-dev] [PATCH v6 1/6] lib: distributor performance enhancements David Hunt
2017-02-21  3:17  2% ` [dpdk-dev] [PATCH v7 0/17] distributor library " David Hunt
2017-02-21  3:17  1%   ` [dpdk-dev] [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
2017-02-21 10:27  0%     ` Hunt, David
2017-02-24 14:03  0%     ` Bruce Richardson
2017-03-01  9:55  0%       ` Hunt, David
2017-03-01  7:47  2%     ` [dpdk-dev] [PATCH v8 0/18] distributor library performance enhancements David Hunt
2017-03-01  7:47  1%       ` [dpdk-dev] [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-06  9:10  2%         ` [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements David Hunt
2017-03-06  9:10  1%           ` [dpdk-dev] [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-15  6:19  2%             ` [dpdk-dev] [PATCH v10 0/18] distributor library performance enhancements David Hunt
2017-03-15  6:19  1%               ` [dpdk-dev] [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-15  6:19  2%               ` [dpdk-dev] [PATCH v10 08/18] lib: add symbol versioning to distributor David Hunt
2017-03-06  9:10  2%           ` [dpdk-dev] [PATCH v9 09/18] " David Hunt
2017-03-10 16:22  0%             ` Bruce Richardson
2017-03-13 10:17  0%               ` Hunt, David
2017-03-13 10:28  0%               ` Hunt, David
2017-03-13 11:01  0%                 ` Van Haaren, Harry
2017-03-13 11:02  0%                   ` Hunt, David
2017-03-01  7:47  3%       ` [dpdk-dev] [PATCH v8 " David Hunt
2017-03-01 14:50  0%         ` Hunt, David
2017-02-24 14:01  0%   ` [dpdk-dev] [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
2017-01-23 13:04 12% [dpdk-dev] [PATCH] doc: announce API/ABI changes for vhost Yuanhan Liu
2017-02-13 18:02  4% ` Thomas Monjalon
2017-02-14  3:21  4%   ` Jerin Jacob
2017-02-14 13:54  4% ` Maxime Coquelin
2017-02-14 20:28  4% ` Thomas Monjalon
2017-01-24  7:34  4% [dpdk-dev] [PATCH 0/3] doc upates Jianfeng Tan
2017-01-24  7:34 17% ` [dpdk-dev] [PATCH 3/3] doc: remove ABI changes in igb_uio Jianfeng Tan
2017-01-24 13:35  4%   ` Ferruh Yigit
2017-01-30 17:52  4%     ` Thomas Monjalon
2017-02-01  7:24  4%       ` Tan, Jianfeng
2017-02-09 14:45  0% ` [dpdk-dev] [PATCH 0/3] doc upates Thomas Monjalon
2017-02-09 16:06  4% ` [dpdk-dev] [PATCH v2 " Jianfeng Tan
2017-02-09 16:06 12%   ` [dpdk-dev] [PATCH v2 3/3] doc: postpone ABI changes in igb_uio Jianfeng Tan
2017-02-09 17:40  4%     ` Ferruh Yigit
2017-02-10 10:44  4%       ` Thomas Monjalon
2017-02-10 11:20  4%         ` Tan, Jianfeng
2017-01-24 10:39     [dpdk-dev] [PATCH RFCv2 0/4] generalise rte_ring to allow different datatypes Bruce Richardson
2017-01-24 10:39  2% ` [dpdk-dev] [PATCH RFCv2 1/4] ring: create common ring files Bruce Richardson
2017-01-24 10:39  1% ` [dpdk-dev] [PATCH RFCv2 2/4] ring: separate common and rte_ring specific functions Bruce Richardson
2017-01-25 12:14     [dpdk-dev] rte_ring features in use (or not) Bruce Richardson
2017-01-25 13:20  3% ` Olivier MATZ
2017-01-25 13:54  0%   ` Bruce Richardson
2017-01-25 14:48         ` Bruce Richardson
2017-01-25 15:59  3%       ` Wiles, Keith
2017-01-25 16:57  3%         ` Bruce Richardson
2017-01-25 17:29  0%           ` Ananyev, Konstantin
2017-01-31 10:53  0%             ` Olivier Matz
2017-01-31 11:41  0%               ` Bruce Richardson
2017-01-31 12:10  0%                 ` Bruce Richardson
2017-01-31 13:27  0%                   ` Olivier Matz
2017-01-31 13:46  0%                     ` Bruce Richardson
2017-01-25 22:27  0%           ` Wiles, Keith
2017-02-07 14:12  2% ` [dpdk-dev] [PATCH RFCv3 00/19] ring cleanup and generalization Bruce Richardson
2017-02-14  8:32  3%   ` Olivier Matz
2017-02-14  9:39  0%     ` Bruce Richardson
2017-02-07 14:12  3% ` [dpdk-dev] [PATCH RFCv3 06/19] ring: eliminate duplication of size and mask fields Bruce Richardson
2017-01-27 12:27  4% [dpdk-dev] [PATCH] doc: add PMD specific API Ferruh Yigit
2017-01-30 17:57  0% ` Thomas Monjalon
2017-01-27 14:56     [dpdk-dev] [PATCH 00/24] linux/eal: Remove most causes of panic on init Aaron Conole
2017-01-27 14:57     ` [dpdk-dev] [PATCH 25/25] rte_eal_init: add info about rte_errno codes Aaron Conole
2017-01-27 16:33  3%   ` Stephen Hemminger
2017-01-27 16:47  0%     ` Bruce Richardson
2017-01-27 17:37  0%       ` Stephen Hemminger
2017-01-30 18:38  0%         ` Aaron Conole
2017-01-30 20:19  0%           ` Thomas Monjalon
2017-02-01 10:54  3%             ` Adrien Mazarguil
2017-02-01 12:06  0%               ` Jan Blunck
2017-02-01 14:18  0%                 ` Bruce Richardson
2017-01-31  9:33  0%           ` Bruce Richardson
2017-01-31 16:56  0%             ` Stephen Hemminger
2017-02-01 16:53  3% [dpdk-dev] bugs and glitches in rte_cryptodev_devices_get Stephen Hemminger
2017-02-02 13:55  0% ` Mrozowicz, SlawomirX
2017-02-03 12:26  0% ` Mrozowicz, SlawomirX
2017-02-03 10:33     [dpdk-dev] [PATCH v10 0/7] Expanded statistics reporting Remy Horton
2017-02-03 10:33  1% ` [dpdk-dev] [PATCH v10 1/7] lib: add information metrics library Remy Horton
2017-02-03 10:33  2% ` [dpdk-dev] [PATCH v10 3/7] lib: add bitrate statistics library Remy Horton
2017-02-06 13:35     [dpdk-dev] cryptodev - Session and queue pair relationship Akhil Goyal
2017-02-07 20:52     ` Declan Doherty
2017-02-13 14:38       ` Akhil Goyal
2017-02-13 14:44         ` Trahe, Fiona
2017-02-13 15:09  3%       ` Trahe, Fiona
2017-02-08 22:56  3% [dpdk-dev] Kill off PCI dependencies Stephen Hemminger
2017-02-09 16:26  3% ` Thomas Monjalon
2017-02-10 11:39  9% [dpdk-dev] [PATCH] doc: annouce ABI change for cryptodev ops structure Fan Zhang
2017-02-10 13:59  4% ` Trahe, Fiona
2017-02-13 16:07  7%   ` Zhang, Roy Fan
2017-02-13 17:34  4%     ` Trahe, Fiona
2017-02-14  0:21  4%   ` Hemant Agrawal
2017-02-14  5:11  4%     ` Hemant Agrawal
2017-02-14 10:41  9% ` [dpdk-dev] [PATCH v2] " Fan Zhang
2017-02-14 10:48  4%   ` Doherty, Declan
2017-02-14 11:03  4%     ` De Lara Guarch, Pablo
2017-02-14 20:37  4%   ` Thomas Monjalon
2017-02-10 14:05     [dpdk-dev] [PATCH 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
2017-02-10 14:05  1% ` [dpdk-dev] [PATCH 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
2017-02-21 10:35  0%   ` Hemant Agrawal
2017-02-13 10:56  9% [dpdk-dev] [PATCH] doc: remove announce of Tx preparation Thomas Monjalon
2017-02-13 14:22  0% ` Thomas Monjalon
2017-02-13 11:05 19% [dpdk-dev] [PATCH] doc: postpone ABI changes to 17.05 Olivier Matz
2017-02-13 14:21  4% ` Thomas Monjalon
2017-02-13 11:55  9% [dpdk-dev] [PATCH] doc: add deprecation note for rework of PCI in EAL Shreyansh Jain
2017-02-13 12:00  0% ` Shreyansh Jain
2017-02-13 14:44  0%   ` Thomas Monjalon
2017-02-13 21:56  0%   ` Jan Blunck
2017-02-14  5:18  0%     ` Shreyansh Jain
2017-02-13 11:55  5% [dpdk-dev] [PATCH] doc: remove deprecation notice for rte_bus Shreyansh Jain
2017-02-13 14:36  0% ` Thomas Monjalon
2017-02-13 13:25     [dpdk-dev] crypto drivers in the API Thomas Monjalon
2017-02-14 10:44  4% ` Doherty, Declan
2017-02-14 11:04  0%   ` Thomas Monjalon
2017-02-14 14:46  4%     ` Doherty, Declan
2017-02-14 15:47  0%       ` Thomas Monjalon
2017-02-13 14:26  4% [dpdk-dev] [PATCH] doc: postpone API change in ethdev Thomas Monjalon
2017-02-13 16:02  3% [dpdk-dev] doc: deprecation notice for ethdev ops? Dumitrescu, Cristian
2017-02-13 16:09  0% ` Thomas Monjalon
2017-02-13 16:46  4%   ` Ferruh Yigit
2017-02-13 17:21  0%     ` Dumitrescu, Cristian
2017-02-13 17:36  0%       ` Ferruh Yigit
2017-02-13 17:38  3%     ` Thomas Monjalon
2017-02-13 17:38  9% [dpdk-dev] [PATCH] doc: add ABI change notification for ring library Bruce Richardson
2017-02-14  0:32  4% ` Mcnamara, John
2017-02-14  3:25  4% ` Jerin Jacob
2017-02-14  8:33  4% ` Olivier Matz
2017-02-14 11:43  4%   ` Hemant Agrawal
2017-02-14 18:42  4% ` [dpdk-dev] " Thomas Monjalon
2017-02-14 10:52  8% [dpdk-dev] Further fun with ABI tracking Christian Ehrhardt
2017-02-14 16:19  4% ` Bruce Richardson
2017-02-14 20:31  9% ` Jan Blunck
2017-02-22 13:12  7%   ` Christian Ehrhardt
2017-02-22 13:24 20%     ` [dpdk-dev] [PATCH] mk: Provide option to set Major ABI version Christian Ehrhardt
2017-02-28  8:34  4%       ` Jan Blunck
2017-03-01  9:31  4%         ` Christian Ehrhardt
2017-03-01  9:34 20%           ` [dpdk-dev] [PATCH v2] " Christian Ehrhardt
2017-03-01 14:35  4%             ` Jan Blunck
2017-03-16 17:19  4%               ` Thomas Monjalon
2017-02-23 18:48  4%     ` [dpdk-dev] Further fun with ABI tracking Ferruh Yigit
2017-02-24  7:32  8%       ` Christian Ehrhardt
2017-02-14 15:32  4% [dpdk-dev] [PATCH v1] doc: update release notes for 17.02 John McNamara
2017-02-14 16:26  2% ` [dpdk-dev] [PATCH v2] " John McNamara
2017-02-15 10:02     [dpdk-dev] [PATCH 0/7] Rework vdev probing to use rte_bus infrastructure Jan Blunck
2017-02-20 14:17     ` [dpdk-dev] [PATCH v2 1/8] eal: use different constructor priorities for initcalls Jan Blunck
2017-02-21 12:30  3%   ` Ferruh Yigit
2017-02-15 12:38  6% [dpdk-dev] [PATCH v1] doc: add template release notes for 17.05 John McNamara
2017-02-15 13:15  1% [dpdk-dev] [PATCH] kni: remove KNI vhost support Ferruh Yigit
2017-02-20 14:30  5% ` [dpdk-dev] [PATCH v2 1/2] doc: add removed items section to release notes Ferruh Yigit
2017-02-20 14:30  1%   ` [dpdk-dev] [PATCH v2 2/2] kni: remove KNI vhost support Ferruh Yigit
2017-02-17 12:00     [dpdk-dev] [PATCH 0/3] cryptodev: change device configuration API Fan Zhang
2017-02-17 12:01  5% ` [dpdk-dev] [PATCH 3/3] doc: remove deprecation notice Fan Zhang
2017-02-19 17:14  4% [dpdk-dev] [PATCH] lpm: extend IPv6 next hop field Vladyslav Buslov
2017-02-21 14:46  4% ` [dpdk-dev] [PATCH v2] " Vladyslav Buslov
2017-02-21 10:22 16% [dpdk-dev] [PATCH] maintainers: fix script paths Thomas Monjalon
2017-02-22 16:09     [dpdk-dev] [PATCH 0/4] net/mlx5 add TSO support Shahaf Shuler
2017-03-01 11:11  3% ` [dpdk-dev] [PATCH v2 0/1] net/mlx5: " Shahaf Shuler
2017-02-23 17:23     [dpdk-dev] [PATCH v1 00/14] refactor and cleanup of rte_ring Bruce Richardson
2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 01/14] ring: remove split cacheline build setting Bruce Richardson
2017-02-28 11:35  0%   ` Jerin Jacob
2017-02-28 11:57  0%     ` Bruce Richardson
2017-02-28 12:08  0%       ` Jerin Jacob
2017-02-28 13:52  0%         ` Bruce Richardson
2017-02-28 17:54  0%           ` Jerin Jacob
2017-03-01  9:47  0%             ` Bruce Richardson
2017-02-23 17:23  3% ` [dpdk-dev] [PATCH v1 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 04/14] ring: remove debug setting Bruce Richardson
2017-02-23 17:23  4% ` [dpdk-dev] [PATCH v1 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
2017-02-23 17:23  2% ` [dpdk-dev] [PATCH v1 06/14] ring: remove watermark support Bruce Richardson
2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
2017-02-23 17:24  2% ` [dpdk-dev] [PATCH v1 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
2017-03-07 11:32     ` [dpdk-dev] [PATCH v2 00/14] refactor and cleanup of rte_ring Bruce Richardson
2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 01/14] ring: remove split cacheline build setting Bruce Richardson
2017-03-07 11:32  3%   ` [dpdk-dev] [PATCH v2 03/14] ring: eliminate duplication of size and mask fields Bruce Richardson
2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 04/14] ring: remove debug setting Bruce Richardson
2017-03-07 11:32  4%   ` [dpdk-dev] [PATCH v2 05/14] ring: remove the yield when waiting for tail update Bruce Richardson
2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 06/14] ring: remove watermark support Bruce Richardson
2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 07/14] ring: make bulk and burst fn return vals consistent Bruce Richardson
2017-03-07 11:32  2%   ` [dpdk-dev] [PATCH v2 09/14] ring: allow dequeue fns to return remaining entry count Bruce Richardson
2017-02-24 16:28     [dpdk-dev] [PATCH v2 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
2017-02-24 16:28  1% ` [dpdk-dev] [PATCH v2 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
2017-02-25  1:22     [dpdk-dev] [PATCH 00/16] Wind River Systems AVP PMD Allain Legacy
2017-02-25  1:23  3% ` [dpdk-dev] [PATCH 04/16] net/avp: add PMD version map file Allain Legacy
2017-02-26 19:08     ` [dpdk-dev] [PATCH v2 00/16] Wind River Systems AVP PMD Allain Legacy
2017-02-26 19:08  3%   ` [dpdk-dev] [PATCH v2 04/15] net/avp: add PMD version map file Allain Legacy
2017-03-02  0:19       ` [dpdk-dev] [PATCH v3 00/16] Wind River Systems AVP PMD Allain Legacy
2017-03-02  0:19  3%     ` [dpdk-dev] [PATCH v3 04/16] net/avp: add PMD version map file Allain Legacy
2017-03-13 19:16         ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD Allain Legacy
2017-03-13 19:16  3%       ` [dpdk-dev] [PATCH v4 04/17] net/avp: add PMD version map file Allain Legacy
2017-03-16 14:52  0%         ` Ferruh Yigit
2017-03-14 17:37           ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? Vincent JARDIN
2017-03-15  4:10             ` O'Driscoll, Tim
2017-03-16 23:17  3%           ` Stephen Hemminger
2017-03-16 23:41  0%             ` [dpdk-dev] [PATCH v4 00/17] Wind River Systems AVP PMD vs virtio? - ivshmem is back Vincent JARDIN
2017-03-17  0:08  0%               ` Wiles, Keith
2017-03-02  4:03  3% [dpdk-dev] [PATCH 0/6] introduce prgdev abstraction library Chen Jing D(Mark)
2017-03-02  4:03  4% ` [dpdk-dev] [PATCH 5/6] prgdev: add ABI control info Chen Jing D(Mark)
2017-03-02 19:29     [dpdk-dev] [PATCH 0/5] librte_cfgfile enhancement Allain Legacy
2017-03-02 19:29     ` [dpdk-dev] [PATCH 1/5] cfgfile: configurable comment character Allain Legacy
2017-03-02 21:10       ` Bruce Richardson
2017-03-03  0:53         ` Yuanhan Liu
2017-03-03 11:17           ` Dumitrescu, Cristian
2017-03-03 11:31             ` Legacy, Allain
2017-03-03 12:10  4%           ` Bruce Richardson
2017-03-03 12:17  0%             ` Legacy, Allain
2017-03-03 13:10  0%               ` Bruce Richardson
2017-03-03  9:31     [dpdk-dev] [PATCH 0/4] support replace filter function Beilei Xing
2017-03-03  9:31     ` [dpdk-dev] [PATCH 1/4] net/i40e: support replace filter type Beilei Xing
2017-03-08 15:50       ` Ferruh Yigit
2017-03-09  5:59  3%     ` Xing, Beilei
2017-03-09 10:01  0%       ` Ferruh Yigit
2017-03-09 10:43  0%         ` Xing, Beilei
2017-03-03  9:31     ` [dpdk-dev] [PATCH 4/4] net/i40e: refine consistent tunnel filter Beilei Xing
2017-03-08 15:50       ` Ferruh Yigit
2017-03-09  6:11  3%     ` Xing, Beilei
2017-03-03  9:51  4% [dpdk-dev] [PATCH 00/17] vhost: generic vhost API Yuanhan Liu
2017-03-03  9:51  3% ` [dpdk-dev] [PATCH 16/17] vhost: rename header file Yuanhan Liu
2017-03-04  1:10     [dpdk-dev] [PATCH v3 0/2] ethdev: abstraction layer for QoS hierarchical scheduler Cristian Dumitrescu
2017-03-04  1:10  1% ` [dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler API Cristian Dumitrescu
2017-03-06 16:57     ` [dpdk-dev] [PATCH v3 1/2] ethdev: add capability control API Thomas Monjalon
2017-03-06 18:28       ` Dumitrescu, Cristian
2017-03-06 20:21         ` Thomas Monjalon
2017-03-06 20:41  3%       ` Wiles, Keith
2017-03-07 11:11     [dpdk-dev] Issues with ixgbe and rte_flow Le Scouarnec Nicolas
2017-03-08  3:16     ` Lu, Wenzhuo
2017-03-08  9:24       ` Le Scouarnec Nicolas
2017-03-08 15:41  3%     ` Adrien Mazarguil
2017-03-09 15:37     [dpdk-dev] [PATCH v2] lpm: extend IPv6 next hop field Thomas Monjalon
2017-03-14 17:17  4% ` [dpdk-dev] [PATCH v3] " Vladyslav Buslov
2017-03-09 16:25     [dpdk-dev] [PATCH v11 0/7] Expanded statistics reporting Remy Horton
2017-03-09 16:25  1% ` [dpdk-dev] [PATCH v11 1/7] lib: add information metrics library Remy Horton
2017-03-09 16:25  2% ` [dpdk-dev] [PATCH v11 3/7] lib: add bitrate statistics library Remy Horton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).