DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling
@ 2020-06-10  6:38 Viacheslav Ovsiienko
  2020-06-10 13:33 ` Harman Kalra
                   ` (10 more replies)
  0 siblings, 11 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-06-10  6:38 UTC (permalink / raw)
  To: dev; +Cc: thomas, matan, rasland, orika, olivier.matz, shahafs

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. It the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <intra_gap>,<burst_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..fb5477c 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/*
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, these dynamic flag and field will be used to manage
+ * the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME "rte_dynflag_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-06-10 13:33 ` Harman Kalra
  2020-06-10 15:16   ` Slava Ovsiienko
  2020-07-01 15:36 ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Harman Kalra @ 2020-06-10 13:33 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, thomas, matan, rasland, orika, olivier.matz, shahafs

On Wed, Jun 10, 2020 at 06:38:05AM +0000, Viacheslav Ovsiienko wrote:

Hi Viacheslav,

   I have some queries below:
   	
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. It the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.

Since its applicaiton responsibility to care of packet reordering and
many other parameters, so why cant application itself take the
responsibility of packet scheduling, i.e. applicaton can hold
for the required time before calling tx-burst? Why are we even
offloading this job to PMD? 

> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.

So what I am understanding here is "rte_dynfield_timestamp" will provide
information about three parameters:
- timestamp at which TX should start
- intra packet gap
- intra burst gap.

If its about "intra packet gap" then PMD can take care, but if it is
about intra burst gap, application can take care of it.

> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <intra_gap>,<burst_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the

I think here you mean "rte_eth_read_clock".


Thanks
Harman

> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..fb5477c 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/*
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, these dynamic flag and field will be used to manage
> + * the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME "rte_dynflag_timestamp"
> +
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling
  2020-06-10 13:33 ` Harman Kalra
@ 2020-06-10 15:16   ` Slava Ovsiienko
  2020-06-17 15:57     ` [dpdk-dev] [EXT] " Harman Kalra
  0 siblings, 1 reply; 86+ messages in thread
From: Slava Ovsiienko @ 2020-06-10 15:16 UTC (permalink / raw)
  To: Harman Kalra
  Cc: dev, Thomas Monjalon, Matan Azrad, Raslan Darawsheh, Ori Kam,
	olivier.matz, Shahaf Shuler

Hi, Harman

> -----Original Message-----
> From: Harman Kalra <hkalra@marvell.com>
> Sent: Wednesday, June 10, 2020 16:34
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Matan
> Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Ori Kam <orika@mellanox.com>;
> olivier.matz@6wind.com; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling
> 
> On Wed, Jun 10, 2020 at 06:38:05AM +0000, Viacheslav Ovsiienko wrote:
> 
> Hi Viacheslav,
> 
>    I have some queries below:
> 
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> >
> > The main objective of this RFC is to specify the way how applications
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well.
> >
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. It the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> >
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.
> 
> Since its applicaiton responsibility to care of packet reordering and many
> other parameters, so why cant application itself take the responsibility of
> packet scheduling, i.e. applicaton can hold for the required time before
> calling tx-burst? Why are we even offloading this job to PMD?
> 
- The scheduling is required to be very precise. Within handred(s) of nanoseconds.
- It saves CPU cycles. Application just should prepare the packets, put the desired timestamps
 and call tx_burst().  "Shut-n-forget" approach. 

SW approach is potentially possible, application can hold the time and schedule packets itself.
But... Can we guarantee the stable delay between tx_burst call and data on the wire?
Should we waste CPU cycles to wait the desired moment of time? Can we guarantee
stable interrupt latency if we choose to schedule on interrupts approach?

This RFC splits the responsibility - application should prepare the data and specify
when it desires to send, the rest is on PMD.
 
> >
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> So what I am understanding here is "rte_dynfield_timestamp" will provide
> information about three parameters:
> - timestamp at which TX should start
> - intra packet gap
> - intra burst gap.
> 
> If its about "intra packet gap" then PMD can take care, but if it is about intra
> burst gap, application can take care of it.

Not sure - the intra-burst gap might be pretty small.
It is supposed to handle intra-burst in the same way - by specifying
the timestamps. Waiting is supposed to be implemented on tx_burst() retry.
Prepare the packets with timestamps, tx_burst - if not all packets are sent -
it means queue is waiting for the schedult, retry with the remaining packets.
As option - we can implement intra-burst wait basing rte_eth_read_clock().

> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <intra_gap>,<burst_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> 
> I think here you mean "rte_eth_read_clock".
Yes, exactly. Thank you for the correction.

With best regards, Slava

> 
> 
> Thanks
> Harman
> 
> > current device clock value and provide the reference for the timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  lib/librte_ethdev/rte_ethdev.h |  4 ++++
> > lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
> >  2 files changed, 20 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > +
> > +
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**<
> > Device supports Rx queue setup after device started*/  #define
> > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > index 96c3631..fb5477c 100644
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
> >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> >
> > +/*
> > + * The timestamp dynamic field provides some timing information, the
> > + * units and time references (initial phase) are not explicitly
> > +defined
> > + * but are maintained always the same for a given port. Some devices
> > +allow
> > + * to query rte_eth_read_clock() that will return the current device
> > + * timestamp. The dynamic timestamp flag tells whether the field
> > +contains
> > + * actual timestamp value. For the packets being sent this value can
> > +be
> > + * used by PMD to schedule packet sending.
> > + *
> > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > + * and obsoleting, these dynamic flag and field will be used to
> > +manage
> > + * the timestamps on receiving datapath as well.
> > + */
> > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> > +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME
> "rte_dynflag_timestamp"
> > +
> >  #endif
> > --
> > 1.8.3.1
> >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [EXT] RE: [RFC] mbuf: accurate packet Tx scheduling
  2020-06-10 15:16   ` Slava Ovsiienko
@ 2020-06-17 15:57     ` Harman Kalra
  2020-07-01 15:46       ` Slava Ovsiienko
  0 siblings, 1 reply; 86+ messages in thread
From: Harman Kalra @ 2020-06-17 15:57 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dev, Thomas Monjalon, Matan Azrad, Raslan Darawsheh, Ori Kam,
	olivier.matz, Shahaf Shuler

On Wed, Jun 10, 2020 at 03:16:12PM +0000, Slava Ovsiienko wrote:

> External Email
> 
> ----------------------------------------------------------------------
> Hi, Harman
> 
> > -----Original Message-----
> > From: Harman Kalra <hkalra@marvell.com>
> > Sent: Wednesday, June 10, 2020 16:34
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Matan
> > Azrad <matan@mellanox.com>; Raslan Darawsheh
> > <rasland@mellanox.com>; Ori Kam <orika@mellanox.com>;
> > olivier.matz@6wind.com; Shahaf Shuler <shahafs@mellanox.com>
> > Subject: Re: [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling
> > 
> > On Wed, Jun 10, 2020 at 06:38:05AM +0000, Viacheslav Ovsiienko wrote:
> > 
> > Hi Viacheslav,
> > 
> >    I have some queries below:
> > 
> > > There is the requirement on some networks for precise traffic timing
> > > management. The ability to send (and, generally speaking, receive) the
> > > packets at the very precisely specified moment of time provides the
> > > opportunity to support the connections with Time Division Multiplexing
> > > using the contemporary general purpose NIC without involving an
> > > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > > interface is one of the promising features for potentially usage of
> > > the precise time management for the egress packets.
> > >
> > > The main objective of this RFC is to specify the way how applications
> > > can provide the moment of time at what the packet transmission must be
> > > started and to describe in preliminary the supporting this feature
> > > from
> > > mlx5 PMD side.
> > >
> > > The new dynamic timestamp field is proposed, it provides some timing
> > > information, the units and time references (initial phase) are not
> > > explicitly defined but are maintained always the same for a given port.
> > > Some devices allow to query rte_eth_read_clock() that will return the
> > > current device timestamp. The dynamic timestamp flag tells whether the
> > > field contains actual timestamp value. For the packets being sent this
> > > value can be used by PMD to schedule packet sending.
> > >
> > > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > > obsoleting, these dynamic flag and field will be used to manage the
> > > timestamps on receiving datapath as well.
> > >
> > > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > > sent it tries to synchronize the time of packet appearing on the wire
> > > with the specified packet timestamp. It the specified one is in the
> > > past it should be ignored, if one is in the distant future it should
> > > be capped with some reasonable value (in range of seconds). These
> > > specific cases ("too late" and "distant future") can be optionally
> > > reported via device xstats to assist applications to detect the
> > > time-related problems.
> > >
> > > There is no any packet reordering according timestamps is supposed,
> > > neither within packet burst, nor between packets, it is an entirely
> > > application responsibility to generate packets and its timestamps in
> > > desired order. The timestamps can be put only in the first packet in
> > > the burst providing the entire burst scheduling.
> > 
> > Since its applicaiton responsibility to care of packet reordering and many
> > other parameters, so why cant application itself take the responsibility of
> > packet scheduling, i.e. applicaton can hold for the required time before
> > calling tx-burst? Why are we even offloading this job to PMD?
> > 
> - The scheduling is required to be very precise. Within handred(s) of nanoseconds.
> - It saves CPU cycles. Application just should prepare the packets, put the desired timestamps
>  and call tx_burst().  "Shut-n-forget" approach. 
> 
> SW approach is potentially possible, application can hold the time and schedule packets itself.
> But... Can we guarantee the stable delay between tx_burst call and data on the wire?
> Should we waste CPU cycles to wait the desired moment of time? Can we guarantee
> stable interrupt latency if we choose to schedule on interrupts approach?
> 
> This RFC splits the responsibility - application should prepare the data and specify
> when it desires to send, the rest is on PMD.

I agree with the fact that we cannot guarantee the delay between tx
burst call and data on wire, hence PMD should take care of it.
Even if PMD is holding, it is wastage of CPU cycles or if we setup
an alarm then also interrupt latency might be a concern to achieve
precise timming. So how are you planning to address both of above
issue in PMD.

>  
> > >
> > > PMD reports the ability to synchronize packet sending on timestamp
> > > with new offload flag:
> > >
> > > This is palliative and is going to be replaced with new eth_dev API
> > > about reporting/managing the supported dynamic flags and its related
> > > features. This API would break ABI compatibility and can't be
> > > introduced at the moment, so is postponed to 20.11.
> > >
> > > For testing purposes it is proposed to update testpmd "txonly"
> > > forwarding mode routine. With this update testpmd application
> > > generates the packets and sets the dynamic timestamps according to
> > > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> > 
> > So what I am understanding here is "rte_dynfield_timestamp" will provide
> > information about three parameters:
> > - timestamp at which TX should start
> > - intra packet gap
> > - intra burst gap.
> > 
> > If its about "intra packet gap" then PMD can take care, but if it is about intra
> > burst gap, application can take care of it.
> 
> Not sure - the intra-burst gap might be pretty small.
> It is supposed to handle intra-burst in the same way - by specifying
> the timestamps. Waiting is supposed to be implemented on tx_burst() retry.
> Prepare the packets with timestamps, tx_burst - if not all packets are sent -
> it means queue is waiting for the schedult, retry with the remaining packets.
> As option - we can implement intra-burst wait basing rte_eth_read_clock().

Yeah, I think app can make use of rte_eth_read_clock() to implement
intra-burst gap.
But my actual doubt was, what all information will app provide as
part of "rte_dynfield_timestamp" - one I understand will be timestamp
at which packets should be sent out. What else? intra-packet gap ?


Thanks
Harman

> 
> > > The new testpmd command is proposed to configure sending pattern:
> > >
> > > set tx_times <intra_gap>,<burst_gap>
> > >
> > > <intra_gap> - the delay between the packets within the burst
> > >               specified in the device clock units. The number
> > >               of packets in the burst is defined by txburst parameter
> > >
> > > <burst_gap> - the delay between the bursts in the device clock units
> > >
> > > As the result the bursts of packet will be transmitted with specific
> > > delays between the packets within the burst and specific delay between
> > > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> > 
> > I think here you mean "rte_eth_read_clock".
> Yes, exactly. Thank you for the correction.
> 
> With best regards, Slava
> 
> > 
> > 
> > Thanks
> > Harman
> > 
> > > current device clock value and provide the reference for the timestamps.
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > ---
> > >  lib/librte_ethdev/rte_ethdev.h |  4 ++++
> > > lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
> > >  2 files changed, 20 insertions(+)
> > >
> > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> > >  /** Device supports outer UDP checksum */  #define
> > > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> > >
> > > +/** Device supports send on timestamp */ #define
> > > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > > +
> > > +
> > >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> > /**<
> > > Device supports Rx queue setup after device started*/  #define
> > > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > index 96c3631..fb5477c 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> > "rte_flow_dynfield_metadata"
> > >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> > "rte_flow_dynflag_metadata"
> > >
> > > +/*
> > > + * The timestamp dynamic field provides some timing information, the
> > > + * units and time references (initial phase) are not explicitly
> > > +defined
> > > + * but are maintained always the same for a given port. Some devices
> > > +allow
> > > + * to query rte_eth_read_clock() that will return the current device
> > > + * timestamp. The dynamic timestamp flag tells whether the field
> > > +contains
> > > + * actual timestamp value. For the packets being sent this value can
> > > +be
> > > + * used by PMD to schedule packet sending.
> > > + *
> > > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > > + * and obsoleting, these dynamic flag and field will be used to
> > > +manage
> > > + * the timestamps on receiving datapath as well.
> > > + */
> > > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > "rte_dynfield_timestamp"
> > > +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME
> > "rte_dynflag_timestamp"
> > > +
> > >  #endif
> > > --
> > > 1.8.3.1
> > >

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-06-10 13:33 ` Harman Kalra
@ 2020-07-01 15:36 ` Viacheslav Ovsiienko
  2020-07-01 15:36   ` [dpdk-dev] [PATCH 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
  2020-07-07 11:50   ` [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
  2020-07-07 12:59 ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-01 15:36 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. It the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..fb5477c 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/*
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, these dynamic flag and field will be used to manage
+ * the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME "rte_dynflag_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 2/2] app/testpmd: add send scheduling test capability
  2020-07-01 15:36 ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
@ 2020-07-01 15:36   ` Viacheslav Ovsiienko
  2020-07-07 11:50   ` [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
  1 sibling, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-01 15:36 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

This commit adds testpmd capability to provide timestamps on the packets
being sent in the txonly mode. This includes:

 - SEND_ON_TIMESTAMP support
   new device Tx offload capability support added, example:

     testpmd> port config 0 tx_offload send_on_timestamp on

 - set txtimes, registers field and flag, example:

     testpmd> set txtimes 1000000,0

   This command enables the packet send scheduling on timestamps if
   the first parameter is not zero, generic format:

     testpmd> set txtimes (inter),(intra)

   where:

     inter - is the delay between the bursts in the device clock units.
     intra - is the delay between the packets within the burst specified
	     in the device clock units

     As the result the bursts of packet will be transmitted with
     specific delay between the packets within the burst and specific
     delay between the bursts. The rte_eth_get_clock() is supposed to be
     engaged to get the current device clock value and provide
     the reference for the timestamps.

 - show txtimes, displays the timing settings
 - txonly burst time pattern

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/cmdline.c                      | 59 +++++++++++++++++++++--
 app/test-pmd/config.c                       | 43 +++++++++++++++++
 app/test-pmd/testpmd.c                      |  3 ++
 app/test-pmd/testpmd.h                      |  3 ++
 app/test-pmd/txonly.c                       | 75 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 36 +++++++++++++-
 6 files changed, 210 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 996a498..57e34a0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3926,6 +3926,52 @@ struct cmd_set_txsplit_result {
 	},
 };
 
+/* *** SET TIMES FOR TXONLY PACKETS SCHEDULING ON TIMESTAMPS *** */
+
+struct cmd_set_txtimes_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t txtimes;
+	cmdline_fixed_string_t tx_times;
+};
+
+static void
+cmd_set_txtimes_parsed(void *parsed_result,
+		       __rte_unused struct cmdline *cl,
+		       __rte_unused void *data)
+{
+	struct cmd_set_txtimes_result *res;
+	unsigned int tx_times[2] = {0, 0};
+	unsigned int n_times;
+
+	res = parsed_result;
+	n_times = parse_item_list(res->tx_times, "tx times",
+				  2, tx_times, 0);
+	if (n_times == 2)
+		set_tx_pkt_times(tx_times);
+}
+
+cmdline_parse_token_string_t cmd_set_txtimes_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_txtimes_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 txtimes, "txtimes");
+cmdline_parse_token_string_t cmd_set_txtimes_value =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 tx_times, NULL);
+
+cmdline_parse_inst_t cmd_set_txtimes = {
+	.f = cmd_set_txtimes_parsed,
+	.data = NULL,
+	.help_str = "set txtimes <inter_burst>,<intra_burst>",
+	.tokens = {
+		(void *)&cmd_set_txtimes_keyword,
+		(void *)&cmd_set_txtimes_name,
+		(void *)&cmd_set_txtimes_value,
+		NULL,
+	},
+};
+
 /* *** ADD/REMOVE ALL VLAN IDENTIFIERS TO/FROM A PORT VLAN RX FILTER *** */
 struct cmd_rx_vlan_filter_all_result {
 	cmdline_fixed_string_t rx_vlan;
@@ -7414,6 +7460,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		pkt_fwd_config_display(&cur_fwd_config);
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
+	else if (!strcmp(res->what, "txtimes"))
+		show_tx_pkt_times();
 }
 
 cmdline_parse_token_string_t cmd_showcfg_show =
@@ -7422,12 +7470,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts");
+				 "rxtx#cores#fwd#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts",
+	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -18662,7 +18710,8 @@ struct cmd_config_per_port_tx_offload_result {
 			  "sctp_cksum#tcp_tso#udp_tso#outer_ipv4_cksum#"
 			  "qinq_insert#vxlan_tnl_tso#gre_tnl_tso#"
 			  "ipip_tnl_tso#geneve_tnl_tso#macsec_insert#"
-			  "mt_lockfree#multi_segs#mbuf_fast_free#security");
+			  "mt_lockfree#multi_segs#mbuf_fast_free#security#"
+			  "send_on_timestamp");
 cmdline_parse_token_string_t cmd_config_per_port_tx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_tx_offload_result,
@@ -18747,7 +18796,8 @@ struct cmd_config_per_port_tx_offload_result {
 		    "sctp_cksum|tcp_tso|udp_tso|outer_ipv4_cksum|"
 		    "qinq_insert|vxlan_tnl_tso|gre_tnl_tso|"
 		    "ipip_tnl_tso|geneve_tnl_tso|macsec_insert|"
-		    "mt_lockfree|multi_segs|mbuf_fast_free|security on|off",
+		    "mt_lockfree|multi_segs|mbuf_fast_free|security|"
+		    "send_on_timestamp on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_tx_offload_result_port,
 		(void *)&cmd_config_per_port_tx_offload_result_config,
@@ -19419,6 +19469,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
+	(cmdline_parse_inst_t *)&cmd_set_txtimes,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_list,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mask,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a7112c9..3c318cc 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1027,6 +1027,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP) {
+		printf("Tx scheduling on timestamp:    ");
+		if (ports[port_id].dev_conf.txmode.offloads &
+		    DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 }
 
 int
@@ -2993,6 +3002,40 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_tx_pkt_times(void)
+{
+	printf("Interburst gap: %u\n", tx_pkt_times[0]);
+	printf("Intraburst gap: %u\n", tx_pkt_times[1]);
+}
+
+void
+set_tx_pkt_times(unsigned int *tx_times)
+{
+	int offset;
+	int flag;
+
+	static const struct rte_mbuf_dynfield desc_offs = {
+		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
+		.size = sizeof(uint64_t),
+		.align = __alignof__(uint64_t),
+	};
+	static const struct rte_mbuf_dynflag desc_flag = {
+		.name = RTE_MBUF_DYNFLAG_TIMESTAMP_NAME,
+	};
+
+	offset = rte_mbuf_dynfield_register(&desc_offs);
+	if (offset < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp field registration error: %d",
+		       rte_errno);
+	flag = rte_mbuf_dynflag_register(&desc_flag);
+	if (flag < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp flag registration error: %d",
+		       rte_errno);
+	tx_pkt_times[0] = tx_times[0];
+	tx_pkt_times[1] = tx_times[1];
+}
+
+void
 setup_gro(const char *onoff, portid_t port_id)
 {
 	if (!rte_eth_dev_is_valid_port(port_id)) {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4989d22..4961b8f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -223,6 +223,9 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
+uint32_t tx_pkt_times[2];
+/**< Timings for send sheduling in TXONLY mode. */
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a803750..345ad1d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -442,6 +442,7 @@ struct queue_stats_mappings {
 extern uint16_t tx_pkt_length; /**< Length of TXONLY packet */
 extern uint16_t tx_pkt_seg_lengths[RTE_MAX_SEGS_PER_PKT]; /**< Seg. lengths */
 extern uint8_t  tx_pkt_nb_segs; /**< Number of segments in TX packets */
+extern uint32_t tx_pkt_times[2];
 
 enum tx_pkt_split {
 	TX_PKT_SPLIT_OFF,
@@ -794,6 +795,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void show_tx_pkt_segments(void);
+void set_tx_pkt_times(unsigned int *tx_times);
+void show_tx_pkt_times(void);
 void set_tx_pkt_split(const char *name);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
 char *list_pkt_forwarding_modes(void);
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 076ccaf..468fc09 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -53,6 +53,11 @@
 static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. */
 RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
 static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
+RTE_DEFINE_PER_LCORE(uint64_t, ts_qskew); /**< Timestamp offset per queue */
+static uint64_t ts_mask; /**< Timestamp dynamic flag mask */
+static int32_t ts_off; /**< Timestamp dynamic field offset */
+static bool ts_enable; /**< Timestamp enable */
+static uint64_t ts_init[RTE_MAX_ETHPORTS];
 
 static void
 copy_buf_to_pkt_segs(void* buf, unsigned len, struct rte_mbuf *pkt,
@@ -150,7 +155,8 @@
 static inline bool
 pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp,
 		struct rte_ether_hdr *eth_hdr, const uint16_t vlan_tci,
-		const uint16_t vlan_tci_outer, const uint64_t ol_flags)
+		const uint16_t vlan_tci_outer, const uint64_t ol_flags,
+		const uint16_t idx, const struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT];
 	struct rte_mbuf *pkt_seg;
@@ -213,6 +219,50 @@
 	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
 			sizeof(struct rte_ether_hdr) +
 			sizeof(struct rte_ipv4_hdr));
+	if (unlikely(ts_enable)) {
+		uint64_t skew = RTE_PER_LCORE(ts_qskew);
+		struct {
+			rte_be32_t signature;
+			rte_be16_t pkt_idx;
+			rte_be16_t queue_idx;
+			rte_be64_t ts;
+		} ts_mark;
+
+		if (unlikely(!skew)) {
+			struct rte_eth_dev *dev = &rte_eth_devices[fs->tx_port];
+			unsigned int txqs_n = dev->data->nb_tx_queues;
+			uint64_t phase = tx_pkt_times[0] * fs->tx_queue /
+					 (txqs_n ? txqs_n : 1);
+			/*
+			 * Initialize the scheduling time phase shift
+			 * depending on queue index.
+			 */
+			skew = ts_init[fs->tx_port] + tx_pkt_times[0] + phase;
+			RTE_PER_LCORE(ts_qskew) = skew;
+		}
+		ts_mark.pkt_idx = rte_cpu_to_be_16(idx);
+		ts_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
+		ts_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
+		if (unlikely(!idx)) {
+			skew +=	tx_pkt_times[0];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else if (tx_pkt_times[1]) {
+			skew +=	tx_pkt_times[1];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else {
+			ts_mark.ts = RTE_BE64(0);
+		}
+		copy_buf_to_pkt(&ts_mark, sizeof(ts_mark), pkt,
+			sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(pkt_udp_hdr));
+	}
 	/*
 	 * Complete first mbuf of packet and append it to the
 	 * burst of packets to be transmitted.
@@ -275,7 +325,8 @@
 			if (unlikely(!pkt_burst_prepare(pkts_burst[nb_pkt], mbp,
 							&eth_hdr, vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_mempool_put_bulk(mbp,
 						(void **)&pkts_burst[nb_pkt],
 						nb_pkt_per_burst - nb_pkt);
@@ -290,7 +341,8 @@
 			if (unlikely(!pkt_burst_prepare(pkt, mbp, &eth_hdr,
 							vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_pktmbuf_free(pkt);
 				break;
 			}
@@ -302,6 +354,7 @@
 		return;
 
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+
 	/*
 	 * Retry if necessary
 	 */
@@ -342,15 +395,29 @@
 }
 
 static void
-tx_only_begin(__rte_unused portid_t pi)
+tx_only_begin(portid_t pi)
 {
 	uint16_t pkt_data_len;
+	int dynf;
 
 	pkt_data_len = (uint16_t) (tx_pkt_length - (
 					sizeof(struct rte_ether_hdr) +
 					sizeof(struct rte_ipv4_hdr) +
 					sizeof(struct rte_udp_hdr)));
 	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr, pkt_data_len);
+
+	ts_enable = false;
+	ts_mask = 0;
+	ts_off = -1;
+	RTE_PER_LCORE(ts_qskew) = 0;
+	dynf = rte_mbuf_dynflag_lookup(RTE_MBUF_DYNFLAG_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_mask = 1ULL << dynf;
+	dynf = rte_mbuf_dynfield_lookup(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_off = dynf;
+	ts_enable = tx_pkt_times[0] && ts_mask && ts_off >= 0 &&
+		    !rte_eth_read_clock(pi, &ts_init[pi]);
 }
 
 struct fwd_engine tx_only_engine = {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a..00413cc 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -266,7 +266,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts)
+   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
 
 The available information categories are:
 
@@ -278,6 +278,8 @@ The available information categories are:
 
 * ``txpkts``: Packets to TX configuration.
 
+* ``txtimes``: Burst time pattern for Tx only mode.
+
 For example:
 
 .. code-block:: console
@@ -725,6 +727,38 @@ Set the length of each segment of the TX-ONLY packets or length of packet for FL
 
 Where x[,y]* represents a CSV list of values, without white space.
 
+set txtimes
+~~~~~~~~~~
+
+Configure the timing burst pattern for Tx only mode. This command enables
+the packet send scheduling on dynamic timestamp mbuf field and configures
+timing pattern in Tx only mode. In this mode, if scheduling is enabled
+application provides timestamps in the packets being sent. It is possible
+to configure delay (in unspecified device clock units) between bursts
+and between the packets within the burst::
+
+   testpmd> set txtimes (inter),(intra)
+
+where:
+
+* ``inter``  is the delay between the bursts in the device clock units.
+  If ``intra`` is zero, this is the time between the beginnings of the
+  first packets in the neighbour bursts, if ``intra`` is not zero,
+  ``inter`` specifies the time between the beginning of the first packet
+  of the current burst and the beginning of the last packet of the
+  previous burst. If ``inter`` parameter is zero the send scheduling
+  on timestamps is disabled (default).
+
+* ``intra`` is the delay between the packets within the burst specified
+  in the device clock units. The number of packets in the burst is defined
+  by regular burst setting. If ``intra`` parameter is zero no timestamps
+  provided in the packets excepting  the first one in the burst.
+
+As the result the bursts of packet will be transmitted with specific
+delays between the packets within the burst and specific delay between
+the bursts. The rte_eth_get_clock() is supposed to be engaged to get the
+current device clock value and provide the reference for the timestamps.
+
 set txsplit
 ~~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [EXT] RE: [RFC] mbuf: accurate packet Tx scheduling
  2020-06-17 15:57     ` [dpdk-dev] [EXT] " Harman Kalra
@ 2020-07-01 15:46       ` Slava Ovsiienko
  0 siblings, 0 replies; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-01 15:46 UTC (permalink / raw)
  To: Harman Kalra
  Cc: dev, Thomas Monjalon, Matan Azrad, Raslan Darawsheh, Ori Kam,
	olivier.matz, Shahaf Shuler

> -----Original Message-----
> From: Harman Kalra <hkalra@marvell.com>
> Sent: Wednesday, June 17, 2020 18:58
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Matan
> Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Ori Kam <orika@mellanox.com>;
> olivier.matz@6wind.com; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [EXT] RE: [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling
> 
> On Wed, Jun 10, 2020 at 03:16:12PM +0000, Slava Ovsiienko wrote:
> 
> > External Email
> >
> > ----------------------------------------------------------------------
Hi, Harman

Sorry for delay - missed your reply.

[..skip..]
> > Should we waste CPU cycles to wait the desired moment of time? Can we
> > guarantee stable interrupt latency if we choose to schedule on interrupts
> approach?
> >
> > This RFC splits the responsibility - application should prepare the
> > data and specify when it desires to send, the rest is on PMD.
> 
> I agree with the fact that we cannot guarantee the delay between tx burst
> call and data on wire, hence PMD should take care of it.
> Even if PMD is holding, it is wastage of CPU cycles or if we setup an alarm
> then also interrupt latency might be a concern to achieve precise timming. So
> how are you planning to address both of above issue in PMD.

It is promoted to HW. The special WAIT descriptor with timestamp
is pushed to the queue and hardware just waits for the appropriate moment.
It is exactly the task for PMD - convert the timestamp in hardware related
entities and perform requested operation on hardware. Thus we should not
wait on CPU in any way - loop/interrupts, etc. Let NIC do it for us.
 
> >
> > > >
> > > > PMD reports the ability to synchronize packet sending on timestamp
> > > > with new offload flag:
> > > >
> > > > This is palliative and is going to be replaced with new eth_dev
> > > > API about reporting/managing the supported dynamic flags and its
> > > > related features. This API would break ABI compatibility and can't
> > > > be introduced at the moment, so is postponed to 20.11.
> > > >
> > > > For testing purposes it is proposed to update testpmd "txonly"
> > > > forwarding mode routine. With this update testpmd application
> > > > generates the packets and sets the dynamic timestamps according to
> > > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> registered.
> > >
> > > So what I am understanding here is "rte_dynfield_timestamp" will
> > > provide information about three parameters:
> > > - timestamp at which TX should start
> > > - intra packet gap
> > > - intra burst gap.
> > >
> > > If its about "intra packet gap" then PMD can take care, but if it is
> > > about intra burst gap, application can take care of it.
> >
> > Not sure - the intra-burst gap might be pretty small.
> > It is supposed to handle intra-burst in the same way - by specifying
> > the timestamps. Waiting is supposed to be implemented on tx_burst() retry.
> > Prepare the packets with timestamps, tx_burst - if not all packets are
> > sent - it means queue is waiting for the schedult, retry with the remaining
> packets.
> > As option - we can implement intra-burst wait basing rte_eth_read_clock().
> 
> Yeah, I think app can make use of rte_eth_read_clock() to implement intra-
> burst gap.
> But my actual doubt was, what all information will app provide as part of
> "rte_dynfield_timestamp" - one I understand will be timestamp at which
> packets should be sent out. What else? intra-packet gap ?
Intra-packet gap is just the parameter of testpmd to provide some
preliminary feature testing. If intra-gap is too small even hardware might
not support. In mlx5 we are going to support the scheduling with minimal
granularity 250 nanoseconds, so minimal gap supported is 250ns,
on 100Gbps line speed it means at least two 1500B packets.

With best regards, Slava

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-01 15:36 ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
  2020-07-01 15:36   ` [dpdk-dev] [PATCH 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
@ 2020-07-07 11:50   ` Olivier Matz
  2020-07-07 12:46     ` Slava Ovsiienko
  1 sibling, 1 reply; 86+ messages in thread
From: Olivier Matz @ 2020-07-07 11:50 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

Hi Slava,

Few question/comments below.

On Wed, Jul 01, 2020 at 03:36:26PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.

Do you mean the same flag will be used for both Rx and Tx?  I wonder if
it's a good idea: if you enable the timestamp on Rx, the packets will be
flagged and it will impact Tx, except if the application explicitly
resets the flag in all mbufs. Wouldn't it be safer to have an Rx flag
and a Tx flag?

> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. It the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.

I think what to do with packets to be send in the "past" could be
configurable through an ethdev API in the future (drop or send).

> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.

This constraint makes sense. At first glance, it looks it is imposed by
a PMD or hw limitation, but thinking more about it, I think it is the
correct behavior to have. Packets are ordered within a PMD queue, and
the ability to set the timestamp for one packet to delay the subsequent
ones looks useful.

Should this behavior be documented somewhere? Maybe in the API comment
documenting the dynamic flag?

> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
>  3 files changed, 21 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..fb5477c 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/*
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, these dynamic flag and field will be used to manage
> + * the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME "rte_dynflag_timestamp"
> +

I realize that's not the case for rte_flow_dynfield_metadata, but
I think it would be good to have a doxygen-like comment.



Regards,
Olivier

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 11:50   ` [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
@ 2020-07-07 12:46     ` Slava Ovsiienko
  0 siblings, 0 replies; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-07 12:46 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, Matan Azrad, Raslan Darawsheh, bernard.iremonger, thomas

Hi, Olivier

Thanks a lot for the review.

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Tuesday, July 7, 2020 14:51
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Raslan
> Darawsheh <rasland@mellanox.com>; bernard.iremonger@intel.com;
> thomas@mellanox.net
> Subject: Re: [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
> 
> Hi Slava,
> 
> Few question/comments below.
> 
> On Wed, Jul 01, 2020 at 03:36:26PM +0000, Viacheslav Ovsiienko wrote:
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> >
> > The main objective of this RFC is to specify the way how applications
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well.
> 
> Do you mean the same flag will be used for both Rx and Tx?  I wonder if it's a
> good idea: if you enable the timestamp on Rx, the packets will be flagged
> and it will impact Tx, except if the application explicitly resets the flag in all
> mbufs. Wouldn't it be safer to have an Rx flag and a Tx flag?

A little bit difficult to say ambiguously, I thought about and did not make strong decision.
We have the flag sharing for the Rx/Tx metadata and just follow the same approach.
OK, I will promote ones to:
RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
RTE_MBUF_DYNFIELD_TX_TIMESTAMP_NAME

And, possible, we should reconsider metadata dynamic flags.
> 
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. It the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> 
> I think what to do with packets to be send in the "past" could be configurable
> through an ethdev API in the future (drop or send).
Yes, currently there is no complete understanding how to handle packets out of the time slot.
In 20.11 we are going to introduce the time-based rte flow API to manage out-of-window packets.
 
> 
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.

"can" should be replaced with "might". Current mlx5 implementation
checks each packet in the burst for the timestamp presence.

> 
> This constraint makes sense. At first glance, it looks it is imposed by a PMD or
> hw limitation, but thinking more about it, I think it is the correct behavior to
> have. Packets are ordered within a PMD queue, and the ability to set the
> timestamp for one packet to delay the subsequent ones looks useful.
> 
> Should this behavior be documented somewhere? Maybe in the API
> comment documenting the dynamic flag?

It is documented in mlx5.rst (coming soon), do you think it should be
more common? OK, will update.

> 
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> >
> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <burst_gap>,<intra_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> > current device clock value and provide the reference for the timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  lib/librte_ethdev/rte_ethdev.c |  1 +  lib/librte_ethdev/rte_ethdev.h
> > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
> >  3 files changed, 21 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> >  };
> >
> >  #undef RTE_TX_OFFLOAD_BIT2STR
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > +
> > +
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**<
> > Device supports Rx queue setup after device started*/  #define
> > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > index 96c3631..fb5477c 100644
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
> >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> >
> > +/*
> > + * The timestamp dynamic field provides some timing information, the
> > + * units and time references (initial phase) are not explicitly
> > +defined
> > + * but are maintained always the same for a given port. Some devices
> > +allow
> > + * to query rte_eth_read_clock() that will return the current device
> > + * timestamp. The dynamic timestamp flag tells whether the field
> > +contains
> > + * actual timestamp value. For the packets being sent this value can
> > +be
> > + * used by PMD to schedule packet sending.
> > + *
> > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > + * and obsoleting, these dynamic flag and field will be used to
> > +manage
> > + * the timestamps on receiving datapath as well.
> > + */
> > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> > +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME
> "rte_dynflag_timestamp"
> > +
> 
> I realize that's not the case for rte_flow_dynfield_metadata, but I think it
> would be good to have a doxygen-like comment.
OK, will extend comment  with expected PMD behavior and replace "/*" with "/**"
> 
> 
> 
> Regards,
> Olivier

With best regards, Slava


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-06-10 13:33 ` Harman Kalra
  2020-07-01 15:36 ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
@ 2020-07-07 12:59 ` Viacheslav Ovsiienko
  2020-07-07 12:59   ` [dpdk-dev] [PATCH v2 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
  2020-07-07 13:08 ` [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-07 12:59 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..834acdd 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared timestamp field will be used to handle the
+ * timestamps on receiving datapath as well. Having the dedicated flags
+ * for Rx/Tx timstamps allows applications not to perform explicit flags
+ * reset on forwaring and not to promote received timestamps to the
+ * transmitting datapath by default.
+ *
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according timestamps is supposed,
+ * neither within packet burst, nor between packets, it is an entirely
+ * application responsibility to generate packets and its timestamps in
+ * desired order. The timestamps might be put only in the first packet in
+ * the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 2/2] app/testpmd: add send scheduling test capability
  2020-07-07 12:59 ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
@ 2020-07-07 12:59   ` Viacheslav Ovsiienko
  0 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-07 12:59 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

This commit adds testpmd capability to provide timestamps on the packets
being sent in the txonly mode. This includes:

 - SEND_ON_TIMESTAMP support
   new device Tx offload capability support added, example:

     testpmd> port config 0 tx_offload send_on_timestamp on

 - set txtimes, registers field and flag, example:

     testpmd> set txtimes 1000000,0

   This command enables the packet send scheduling on timestamps if
   the first parameter is not zero, generic format:

     testpmd> set txtimes (inter),(intra)

   where:

     inter - is the delay between the bursts in the device clock units.
     intra - is the delay between the packets within the burst specified
	     in the device clock units

     As the result the bursts of packet will be transmitted with
     specific delay between the packets within the burst and specific
     delay between the bursts. The rte_eth_get_clock() is supposed to be
     engaged to get the current device clock value and provide
     the reference for the timestamps.

 - show txtimes, displays the timing settings
 - txonly burst time pattern

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/cmdline.c                      | 59 ++++++++++++++++++++--
 app/test-pmd/config.c                       | 43 ++++++++++++++++
 app/test-pmd/testpmd.c                      |  3 ++
 app/test-pmd/testpmd.h                      |  3 ++
 app/test-pmd/txonly.c                       | 77 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 36 +++++++++++++-
 6 files changed, 212 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 996a498..57e34a0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3926,6 +3926,52 @@ struct cmd_set_txsplit_result {
 	},
 };
 
+/* *** SET TIMES FOR TXONLY PACKETS SCHEDULING ON TIMESTAMPS *** */
+
+struct cmd_set_txtimes_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t txtimes;
+	cmdline_fixed_string_t tx_times;
+};
+
+static void
+cmd_set_txtimes_parsed(void *parsed_result,
+		       __rte_unused struct cmdline *cl,
+		       __rte_unused void *data)
+{
+	struct cmd_set_txtimes_result *res;
+	unsigned int tx_times[2] = {0, 0};
+	unsigned int n_times;
+
+	res = parsed_result;
+	n_times = parse_item_list(res->tx_times, "tx times",
+				  2, tx_times, 0);
+	if (n_times == 2)
+		set_tx_pkt_times(tx_times);
+}
+
+cmdline_parse_token_string_t cmd_set_txtimes_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_txtimes_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 txtimes, "txtimes");
+cmdline_parse_token_string_t cmd_set_txtimes_value =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 tx_times, NULL);
+
+cmdline_parse_inst_t cmd_set_txtimes = {
+	.f = cmd_set_txtimes_parsed,
+	.data = NULL,
+	.help_str = "set txtimes <inter_burst>,<intra_burst>",
+	.tokens = {
+		(void *)&cmd_set_txtimes_keyword,
+		(void *)&cmd_set_txtimes_name,
+		(void *)&cmd_set_txtimes_value,
+		NULL,
+	},
+};
+
 /* *** ADD/REMOVE ALL VLAN IDENTIFIERS TO/FROM A PORT VLAN RX FILTER *** */
 struct cmd_rx_vlan_filter_all_result {
 	cmdline_fixed_string_t rx_vlan;
@@ -7414,6 +7460,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		pkt_fwd_config_display(&cur_fwd_config);
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
+	else if (!strcmp(res->what, "txtimes"))
+		show_tx_pkt_times();
 }
 
 cmdline_parse_token_string_t cmd_showcfg_show =
@@ -7422,12 +7470,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts");
+				 "rxtx#cores#fwd#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts",
+	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -18662,7 +18710,8 @@ struct cmd_config_per_port_tx_offload_result {
 			  "sctp_cksum#tcp_tso#udp_tso#outer_ipv4_cksum#"
 			  "qinq_insert#vxlan_tnl_tso#gre_tnl_tso#"
 			  "ipip_tnl_tso#geneve_tnl_tso#macsec_insert#"
-			  "mt_lockfree#multi_segs#mbuf_fast_free#security");
+			  "mt_lockfree#multi_segs#mbuf_fast_free#security#"
+			  "send_on_timestamp");
 cmdline_parse_token_string_t cmd_config_per_port_tx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_tx_offload_result,
@@ -18747,7 +18796,8 @@ struct cmd_config_per_port_tx_offload_result {
 		    "sctp_cksum|tcp_tso|udp_tso|outer_ipv4_cksum|"
 		    "qinq_insert|vxlan_tnl_tso|gre_tnl_tso|"
 		    "ipip_tnl_tso|geneve_tnl_tso|macsec_insert|"
-		    "mt_lockfree|multi_segs|mbuf_fast_free|security on|off",
+		    "mt_lockfree|multi_segs|mbuf_fast_free|security|"
+		    "send_on_timestamp on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_tx_offload_result_port,
 		(void *)&cmd_config_per_port_tx_offload_result_config,
@@ -19419,6 +19469,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
+	(cmdline_parse_inst_t *)&cmd_set_txtimes,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_list,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mask,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a7112c9..1625978 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1027,6 +1027,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP) {
+		printf("Tx scheduling on timestamp:    ");
+		if (ports[port_id].dev_conf.txmode.offloads &
+		    DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 }
 
 int
@@ -2993,6 +3002,40 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_tx_pkt_times(void)
+{
+	printf("Interburst gap: %u\n", tx_pkt_times[0]);
+	printf("Intraburst gap: %u\n", tx_pkt_times[1]);
+}
+
+void
+set_tx_pkt_times(unsigned int *tx_times)
+{
+	int offset;
+	int flag;
+
+	static const struct rte_mbuf_dynfield desc_offs = {
+		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
+		.size = sizeof(uint64_t),
+		.align = __alignof__(uint64_t),
+	};
+	static const struct rte_mbuf_dynflag desc_flag = {
+		.name = RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME,
+	};
+
+	offset = rte_mbuf_dynfield_register(&desc_offs);
+	if (offset < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp field registration error: %d",
+		       rte_errno);
+	flag = rte_mbuf_dynflag_register(&desc_flag);
+	if (flag < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp flag registration error: %d",
+		       rte_errno);
+	tx_pkt_times[0] = tx_times[0];
+	tx_pkt_times[1] = tx_times[1];
+}
+
+void
 setup_gro(const char *onoff, portid_t port_id)
 {
 	if (!rte_eth_dev_is_valid_port(port_id)) {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4989d22..4961b8f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -223,6 +223,9 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
+uint32_t tx_pkt_times[2];
+/**< Timings for send sheduling in TXONLY mode. */
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a803750..345ad1d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -442,6 +442,7 @@ struct queue_stats_mappings {
 extern uint16_t tx_pkt_length; /**< Length of TXONLY packet */
 extern uint16_t tx_pkt_seg_lengths[RTE_MAX_SEGS_PER_PKT]; /**< Seg. lengths */
 extern uint8_t  tx_pkt_nb_segs; /**< Number of segments in TX packets */
+extern uint32_t tx_pkt_times[2];
 
 enum tx_pkt_split {
 	TX_PKT_SPLIT_OFF,
@@ -794,6 +795,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void show_tx_pkt_segments(void);
+void set_tx_pkt_times(unsigned int *tx_times);
+void show_tx_pkt_times(void);
 void set_tx_pkt_split(const char *name);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
 char *list_pkt_forwarding_modes(void);
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 076ccaf..3d261cf 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -53,6 +53,11 @@
 static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. */
 RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
 static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
+RTE_DEFINE_PER_LCORE(uint64_t, ts_qskew); /**< Timestamp offset per queue */
+static uint64_t ts_mask; /**< Timestamp dynamic flag mask */
+static int32_t ts_off; /**< Timestamp dynamic field offset */
+static bool ts_enable; /**< Timestamp enable */
+static uint64_t ts_init[RTE_MAX_ETHPORTS];
 
 static void
 copy_buf_to_pkt_segs(void* buf, unsigned len, struct rte_mbuf *pkt,
@@ -150,7 +155,8 @@
 static inline bool
 pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp,
 		struct rte_ether_hdr *eth_hdr, const uint16_t vlan_tci,
-		const uint16_t vlan_tci_outer, const uint64_t ol_flags)
+		const uint16_t vlan_tci_outer, const uint64_t ol_flags,
+		const uint16_t idx, const struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT];
 	struct rte_mbuf *pkt_seg;
@@ -213,6 +219,50 @@
 	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
 			sizeof(struct rte_ether_hdr) +
 			sizeof(struct rte_ipv4_hdr));
+	if (unlikely(ts_enable)) {
+		uint64_t skew = RTE_PER_LCORE(ts_qskew);
+		struct {
+			rte_be32_t signature;
+			rte_be16_t pkt_idx;
+			rte_be16_t queue_idx;
+			rte_be64_t ts;
+		} ts_mark;
+
+		if (unlikely(!skew)) {
+			struct rte_eth_dev *dev = &rte_eth_devices[fs->tx_port];
+			unsigned int txqs_n = dev->data->nb_tx_queues;
+			uint64_t phase = tx_pkt_times[0] * fs->tx_queue /
+					 (txqs_n ? txqs_n : 1);
+			/*
+			 * Initialize the scheduling time phase shift
+			 * depending on queue index.
+			 */
+			skew = ts_init[fs->tx_port] + tx_pkt_times[0] + phase;
+			RTE_PER_LCORE(ts_qskew) = skew;
+		}
+		ts_mark.pkt_idx = rte_cpu_to_be_16(idx);
+		ts_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
+		ts_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
+		if (unlikely(!idx)) {
+			skew +=	tx_pkt_times[0];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else if (tx_pkt_times[1]) {
+			skew +=	tx_pkt_times[1];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else {
+			ts_mark.ts = RTE_BE64(0);
+		}
+		copy_buf_to_pkt(&ts_mark, sizeof(ts_mark), pkt,
+			sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(pkt_udp_hdr));
+	}
 	/*
 	 * Complete first mbuf of packet and append it to the
 	 * burst of packets to be transmitted.
@@ -275,7 +325,8 @@
 			if (unlikely(!pkt_burst_prepare(pkts_burst[nb_pkt], mbp,
 							&eth_hdr, vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_mempool_put_bulk(mbp,
 						(void **)&pkts_burst[nb_pkt],
 						nb_pkt_per_burst - nb_pkt);
@@ -290,7 +341,8 @@
 			if (unlikely(!pkt_burst_prepare(pkt, mbp, &eth_hdr,
 							vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_pktmbuf_free(pkt);
 				break;
 			}
@@ -302,6 +354,7 @@
 		return;
 
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+
 	/*
 	 * Retry if necessary
 	 */
@@ -342,15 +395,31 @@
 }
 
 static void
-tx_only_begin(__rte_unused portid_t pi)
+tx_only_begin(portid_t pi)
 {
 	uint16_t pkt_data_len;
+	int dynf;
 
 	pkt_data_len = (uint16_t) (tx_pkt_length - (
 					sizeof(struct rte_ether_hdr) +
 					sizeof(struct rte_ipv4_hdr) +
 					sizeof(struct rte_udp_hdr)));
 	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr, pkt_data_len);
+
+	ts_enable = false;
+	ts_mask = 0;
+	ts_off = -1;
+	RTE_PER_LCORE(ts_qskew) = 0;
+	dynf = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_mask = 1ULL << dynf;
+	dynf = rte_mbuf_dynfield_lookup
+				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_off = dynf;
+	ts_enable = tx_pkt_times[0] && ts_mask && ts_off >= 0 &&
+		    !rte_eth_read_clock(pi, &ts_init[pi]);
 }
 
 struct fwd_engine tx_only_engine = {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a..00413cc 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -266,7 +266,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts)
+   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
 
 The available information categories are:
 
@@ -278,6 +278,8 @@ The available information categories are:
 
 * ``txpkts``: Packets to TX configuration.
 
+* ``txtimes``: Burst time pattern for Tx only mode.
+
 For example:
 
 .. code-block:: console
@@ -725,6 +727,38 @@ Set the length of each segment of the TX-ONLY packets or length of packet for FL
 
 Where x[,y]* represents a CSV list of values, without white space.
 
+set txtimes
+~~~~~~~~~~
+
+Configure the timing burst pattern for Tx only mode. This command enables
+the packet send scheduling on dynamic timestamp mbuf field and configures
+timing pattern in Tx only mode. In this mode, if scheduling is enabled
+application provides timestamps in the packets being sent. It is possible
+to configure delay (in unspecified device clock units) between bursts
+and between the packets within the burst::
+
+   testpmd> set txtimes (inter),(intra)
+
+where:
+
+* ``inter``  is the delay between the bursts in the device clock units.
+  If ``intra`` is zero, this is the time between the beginnings of the
+  first packets in the neighbour bursts, if ``intra`` is not zero,
+  ``inter`` specifies the time between the beginning of the first packet
+  of the current burst and the beginning of the last packet of the
+  previous burst. If ``inter`` parameter is zero the send scheduling
+  on timestamps is disabled (default).
+
+* ``intra`` is the delay between the packets within the burst specified
+  in the device clock units. The number of packets in the burst is defined
+  by regular burst setting. If ``intra`` parameter is zero no timestamps
+  provided in the packets excepting  the first one in the burst.
+
+As the result the bursts of packet will be transmitted with specific
+delays between the packets within the burst and specific delay between
+the bursts. The rte_eth_get_clock() is supposed to be engaged to get the
+current device clock value and provide the reference for the timestamps.
+
 set txsplit
 ~~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (2 preceding siblings ...)
  2020-07-07 12:59 ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
@ 2020-07-07 13:08 ` Viacheslav Ovsiienko
  2020-07-07 13:08   ` [dpdk-dev] [PATCH v3 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
  2020-07-07 14:32   ` [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
  2020-07-07 14:57 ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-07 13:08 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..5b2f3da 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared timestamp field will be used to handle the
+ * timestamps on receiving datapath as well. Having the dedicated flags
+ * for Rx/Tx timstamps allows applications not to perform explicit flags
+ * reset on forwarding and not to promote received timestamps to the
+ * transmitting datapath by default.
+ *
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according timestamps is supposed,
+ * neither within packet burst, nor between packets, it is an entirely
+ * application responsibility to generate packets and its timestamps in
+ * desired order. The timestamps might be put only in the first packet in
+ * the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 2/2] app/testpmd: add send scheduling test capability
  2020-07-07 13:08 ` [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-07 13:08   ` Viacheslav Ovsiienko
  2020-07-07 14:32   ` [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
  1 sibling, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-07 13:08 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

This commit adds testpmd capability to provide timestamps on the packets
being sent in the txonly mode. This includes:

 - SEND_ON_TIMESTAMP support
   new device Tx offload capability support added, example:

     testpmd> port config 0 tx_offload send_on_timestamp on

 - set txtimes, registers field and flag, example:

     testpmd> set txtimes 1000000,0

   This command enables the packet send scheduling on timestamps if
   the first parameter is not zero, generic format:

     testpmd> set txtimes (inter),(intra)

   where:

     inter - is the delay between the bursts in the device clock units.
     intra - is the delay between the packets within the burst specified
	     in the device clock units

     As the result the bursts of packet will be transmitted with
     specific delay between the packets within the burst and specific
     delay between the bursts. The rte_eth_get_clock() is supposed to be
     engaged to get the current device clock value and provide
     the reference for the timestamps.

 - show txtimes, displays the timing settings
 - txonly burst time pattern

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/cmdline.c                      | 59 ++++++++++++++++++++--
 app/test-pmd/config.c                       | 43 ++++++++++++++++
 app/test-pmd/testpmd.c                      |  3 ++
 app/test-pmd/testpmd.h                      |  3 ++
 app/test-pmd/txonly.c                       | 77 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 36 +++++++++++++-
 6 files changed, 212 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 996a498..57e34a0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3926,6 +3926,52 @@ struct cmd_set_txsplit_result {
 	},
 };
 
+/* *** SET TIMES FOR TXONLY PACKETS SCHEDULING ON TIMESTAMPS *** */
+
+struct cmd_set_txtimes_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t txtimes;
+	cmdline_fixed_string_t tx_times;
+};
+
+static void
+cmd_set_txtimes_parsed(void *parsed_result,
+		       __rte_unused struct cmdline *cl,
+		       __rte_unused void *data)
+{
+	struct cmd_set_txtimes_result *res;
+	unsigned int tx_times[2] = {0, 0};
+	unsigned int n_times;
+
+	res = parsed_result;
+	n_times = parse_item_list(res->tx_times, "tx times",
+				  2, tx_times, 0);
+	if (n_times == 2)
+		set_tx_pkt_times(tx_times);
+}
+
+cmdline_parse_token_string_t cmd_set_txtimes_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_txtimes_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 txtimes, "txtimes");
+cmdline_parse_token_string_t cmd_set_txtimes_value =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 tx_times, NULL);
+
+cmdline_parse_inst_t cmd_set_txtimes = {
+	.f = cmd_set_txtimes_parsed,
+	.data = NULL,
+	.help_str = "set txtimes <inter_burst>,<intra_burst>",
+	.tokens = {
+		(void *)&cmd_set_txtimes_keyword,
+		(void *)&cmd_set_txtimes_name,
+		(void *)&cmd_set_txtimes_value,
+		NULL,
+	},
+};
+
 /* *** ADD/REMOVE ALL VLAN IDENTIFIERS TO/FROM A PORT VLAN RX FILTER *** */
 struct cmd_rx_vlan_filter_all_result {
 	cmdline_fixed_string_t rx_vlan;
@@ -7414,6 +7460,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		pkt_fwd_config_display(&cur_fwd_config);
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
+	else if (!strcmp(res->what, "txtimes"))
+		show_tx_pkt_times();
 }
 
 cmdline_parse_token_string_t cmd_showcfg_show =
@@ -7422,12 +7470,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts");
+				 "rxtx#cores#fwd#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts",
+	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -18662,7 +18710,8 @@ struct cmd_config_per_port_tx_offload_result {
 			  "sctp_cksum#tcp_tso#udp_tso#outer_ipv4_cksum#"
 			  "qinq_insert#vxlan_tnl_tso#gre_tnl_tso#"
 			  "ipip_tnl_tso#geneve_tnl_tso#macsec_insert#"
-			  "mt_lockfree#multi_segs#mbuf_fast_free#security");
+			  "mt_lockfree#multi_segs#mbuf_fast_free#security#"
+			  "send_on_timestamp");
 cmdline_parse_token_string_t cmd_config_per_port_tx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_tx_offload_result,
@@ -18747,7 +18796,8 @@ struct cmd_config_per_port_tx_offload_result {
 		    "sctp_cksum|tcp_tso|udp_tso|outer_ipv4_cksum|"
 		    "qinq_insert|vxlan_tnl_tso|gre_tnl_tso|"
 		    "ipip_tnl_tso|geneve_tnl_tso|macsec_insert|"
-		    "mt_lockfree|multi_segs|mbuf_fast_free|security on|off",
+		    "mt_lockfree|multi_segs|mbuf_fast_free|security|"
+		    "send_on_timestamp on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_tx_offload_result_port,
 		(void *)&cmd_config_per_port_tx_offload_result_config,
@@ -19419,6 +19469,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
+	(cmdline_parse_inst_t *)&cmd_set_txtimes,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_list,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mask,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a7112c9..1625978 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1027,6 +1027,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP) {
+		printf("Tx scheduling on timestamp:    ");
+		if (ports[port_id].dev_conf.txmode.offloads &
+		    DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 }
 
 int
@@ -2993,6 +3002,40 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_tx_pkt_times(void)
+{
+	printf("Interburst gap: %u\n", tx_pkt_times[0]);
+	printf("Intraburst gap: %u\n", tx_pkt_times[1]);
+}
+
+void
+set_tx_pkt_times(unsigned int *tx_times)
+{
+	int offset;
+	int flag;
+
+	static const struct rte_mbuf_dynfield desc_offs = {
+		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
+		.size = sizeof(uint64_t),
+		.align = __alignof__(uint64_t),
+	};
+	static const struct rte_mbuf_dynflag desc_flag = {
+		.name = RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME,
+	};
+
+	offset = rte_mbuf_dynfield_register(&desc_offs);
+	if (offset < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp field registration error: %d",
+		       rte_errno);
+	flag = rte_mbuf_dynflag_register(&desc_flag);
+	if (flag < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp flag registration error: %d",
+		       rte_errno);
+	tx_pkt_times[0] = tx_times[0];
+	tx_pkt_times[1] = tx_times[1];
+}
+
+void
 setup_gro(const char *onoff, portid_t port_id)
 {
 	if (!rte_eth_dev_is_valid_port(port_id)) {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4989d22..aab9a23 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -223,6 +223,9 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
+uint32_t tx_pkt_times[2];
+/**< Timings for send scheduling in TXONLY mode. */
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a803750..345ad1d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -442,6 +442,7 @@ struct queue_stats_mappings {
 extern uint16_t tx_pkt_length; /**< Length of TXONLY packet */
 extern uint16_t tx_pkt_seg_lengths[RTE_MAX_SEGS_PER_PKT]; /**< Seg. lengths */
 extern uint8_t  tx_pkt_nb_segs; /**< Number of segments in TX packets */
+extern uint32_t tx_pkt_times[2];
 
 enum tx_pkt_split {
 	TX_PKT_SPLIT_OFF,
@@ -794,6 +795,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void show_tx_pkt_segments(void);
+void set_tx_pkt_times(unsigned int *tx_times);
+void show_tx_pkt_times(void);
 void set_tx_pkt_split(const char *name);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
 char *list_pkt_forwarding_modes(void);
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 076ccaf..3d261cf 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -53,6 +53,11 @@
 static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. */
 RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
 static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
+RTE_DEFINE_PER_LCORE(uint64_t, ts_qskew); /**< Timestamp offset per queue */
+static uint64_t ts_mask; /**< Timestamp dynamic flag mask */
+static int32_t ts_off; /**< Timestamp dynamic field offset */
+static bool ts_enable; /**< Timestamp enable */
+static uint64_t ts_init[RTE_MAX_ETHPORTS];
 
 static void
 copy_buf_to_pkt_segs(void* buf, unsigned len, struct rte_mbuf *pkt,
@@ -150,7 +155,8 @@
 static inline bool
 pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp,
 		struct rte_ether_hdr *eth_hdr, const uint16_t vlan_tci,
-		const uint16_t vlan_tci_outer, const uint64_t ol_flags)
+		const uint16_t vlan_tci_outer, const uint64_t ol_flags,
+		const uint16_t idx, const struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT];
 	struct rte_mbuf *pkt_seg;
@@ -213,6 +219,50 @@
 	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
 			sizeof(struct rte_ether_hdr) +
 			sizeof(struct rte_ipv4_hdr));
+	if (unlikely(ts_enable)) {
+		uint64_t skew = RTE_PER_LCORE(ts_qskew);
+		struct {
+			rte_be32_t signature;
+			rte_be16_t pkt_idx;
+			rte_be16_t queue_idx;
+			rte_be64_t ts;
+		} ts_mark;
+
+		if (unlikely(!skew)) {
+			struct rte_eth_dev *dev = &rte_eth_devices[fs->tx_port];
+			unsigned int txqs_n = dev->data->nb_tx_queues;
+			uint64_t phase = tx_pkt_times[0] * fs->tx_queue /
+					 (txqs_n ? txqs_n : 1);
+			/*
+			 * Initialize the scheduling time phase shift
+			 * depending on queue index.
+			 */
+			skew = ts_init[fs->tx_port] + tx_pkt_times[0] + phase;
+			RTE_PER_LCORE(ts_qskew) = skew;
+		}
+		ts_mark.pkt_idx = rte_cpu_to_be_16(idx);
+		ts_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
+		ts_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
+		if (unlikely(!idx)) {
+			skew +=	tx_pkt_times[0];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else if (tx_pkt_times[1]) {
+			skew +=	tx_pkt_times[1];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else {
+			ts_mark.ts = RTE_BE64(0);
+		}
+		copy_buf_to_pkt(&ts_mark, sizeof(ts_mark), pkt,
+			sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(pkt_udp_hdr));
+	}
 	/*
 	 * Complete first mbuf of packet and append it to the
 	 * burst of packets to be transmitted.
@@ -275,7 +325,8 @@
 			if (unlikely(!pkt_burst_prepare(pkts_burst[nb_pkt], mbp,
 							&eth_hdr, vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_mempool_put_bulk(mbp,
 						(void **)&pkts_burst[nb_pkt],
 						nb_pkt_per_burst - nb_pkt);
@@ -290,7 +341,8 @@
 			if (unlikely(!pkt_burst_prepare(pkt, mbp, &eth_hdr,
 							vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_pktmbuf_free(pkt);
 				break;
 			}
@@ -302,6 +354,7 @@
 		return;
 
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+
 	/*
 	 * Retry if necessary
 	 */
@@ -342,15 +395,31 @@
 }
 
 static void
-tx_only_begin(__rte_unused portid_t pi)
+tx_only_begin(portid_t pi)
 {
 	uint16_t pkt_data_len;
+	int dynf;
 
 	pkt_data_len = (uint16_t) (tx_pkt_length - (
 					sizeof(struct rte_ether_hdr) +
 					sizeof(struct rte_ipv4_hdr) +
 					sizeof(struct rte_udp_hdr)));
 	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr, pkt_data_len);
+
+	ts_enable = false;
+	ts_mask = 0;
+	ts_off = -1;
+	RTE_PER_LCORE(ts_qskew) = 0;
+	dynf = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_mask = 1ULL << dynf;
+	dynf = rte_mbuf_dynfield_lookup
+				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_off = dynf;
+	ts_enable = tx_pkt_times[0] && ts_mask && ts_off >= 0 &&
+		    !rte_eth_read_clock(pi, &ts_init[pi]);
 }
 
 struct fwd_engine tx_only_engine = {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a..00413cc 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -266,7 +266,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts)
+   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
 
 The available information categories are:
 
@@ -278,6 +278,8 @@ The available information categories are:
 
 * ``txpkts``: Packets to TX configuration.
 
+* ``txtimes``: Burst time pattern for Tx only mode.
+
 For example:
 
 .. code-block:: console
@@ -725,6 +727,38 @@ Set the length of each segment of the TX-ONLY packets or length of packet for FL
 
 Where x[,y]* represents a CSV list of values, without white space.
 
+set txtimes
+~~~~~~~~~~
+
+Configure the timing burst pattern for Tx only mode. This command enables
+the packet send scheduling on dynamic timestamp mbuf field and configures
+timing pattern in Tx only mode. In this mode, if scheduling is enabled
+application provides timestamps in the packets being sent. It is possible
+to configure delay (in unspecified device clock units) between bursts
+and between the packets within the burst::
+
+   testpmd> set txtimes (inter),(intra)
+
+where:
+
+* ``inter``  is the delay between the bursts in the device clock units.
+  If ``intra`` is zero, this is the time between the beginnings of the
+  first packets in the neighbour bursts, if ``intra`` is not zero,
+  ``inter`` specifies the time between the beginning of the first packet
+  of the current burst and the beginning of the last packet of the
+  previous burst. If ``inter`` parameter is zero the send scheduling
+  on timestamps is disabled (default).
+
+* ``intra`` is the delay between the packets within the burst specified
+  in the device clock units. The number of packets in the burst is defined
+  by regular burst setting. If ``intra`` parameter is zero no timestamps
+  provided in the packets excepting  the first one in the burst.
+
+As the result the bursts of packet will be transmitted with specific
+delays between the packets within the burst and specific delay between
+the bursts. The rte_eth_get_clock() is supposed to be engaged to get the
+current device clock value and provide the reference for the timestamps.
+
 set txsplit
 ~~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 13:08 ` [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-07-07 13:08   ` [dpdk-dev] [PATCH v3 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
@ 2020-07-07 14:32   ` Olivier Matz
  1 sibling, 0 replies; 86+ messages in thread
From: Olivier Matz @ 2020-07-07 14:32 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

On Tue, Jul 07, 2020 at 01:08:02PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..5b2f3da 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared timestamp field will be used to handle the
> + * timestamps on receiving datapath as well. Having the dedicated flags
> + * for Rx/Tx timstamps allows applications not to perform explicit flags
> + * reset on forwarding and not to promote received timestamps to the
> + * transmitting datapath by default.
> + *
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
> + * packet being sent it tries to synchronize the time of packet appearing
> + * on the wire with the specified packet timestamp. If the specified one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according timestamps is supposed,

I think there is a typo here

> + * neither within packet burst, nor between packets, it is an entirely
> + * application responsibility to generate packets and its timestamps in
> + * desired order. The timestamps might be put only in the first packet in
> + * the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"

Is it possible to split the comment, to document both
RTE_MBUF_DYNFIELD_TIMESTAMP_NAME and RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME ?  I
didn't try to generate the documentation, but I think, like this, that
RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME will be undocumented.

Apart from that, it looks good to me.


> +
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (3 preceding siblings ...)
  2020-07-07 13:08 ` [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-07 14:57 ` Viacheslav Ovsiienko
  2020-07-07 14:57   ` [dpdk-dev] [PATCH v4 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
                     ` (2 more replies)
  2020-07-08 15:47 ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-07 14:57 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 v1->v4:
    - dedicated dynamic Tx timestamp flag instead of shared with Rx
    - Doxygen-style comment
    - comments update

---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..7e9f7d2 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow4
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v4 2/2] app/testpmd: add send scheduling test capability
  2020-07-07 14:57 ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
@ 2020-07-07 14:57   ` Viacheslav Ovsiienko
  2020-07-07 15:23   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
  2020-07-08 14:16   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
  2 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-07 14:57 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

This commit adds testpmd capability to provide timestamps on the packets
being sent in the txonly mode. This includes:

 - SEND_ON_TIMESTAMP support
   new device Tx offload capability support added, example:

     testpmd> port config 0 tx_offload send_on_timestamp on

 - set txtimes, registers field and flag, example:

     testpmd> set txtimes 1000000,0

   This command enables the packet send scheduling on timestamps if
   the first parameter is not zero, generic format:

     testpmd> set txtimes (inter),(intra)

   where:

     inter - is the delay between the bursts in the device clock units.
     intra - is the delay between the packets within the burst specified
	     in the device clock units

     As the result the bursts of packet will be transmitted with
     specific delay between the packets within the burst and specific
     delay between the bursts. The rte_eth_get_clock() is supposed to be
     engaged to get the current device clock value and provide
     the reference for the timestamps.

 - show txtimes, displays the timing settings
 - txonly burst time pattern

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/cmdline.c                      | 59 ++++++++++++++++++++--
 app/test-pmd/config.c                       | 43 ++++++++++++++++
 app/test-pmd/testpmd.c                      |  3 ++
 app/test-pmd/testpmd.h                      |  3 ++
 app/test-pmd/txonly.c                       | 77 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 36 +++++++++++++-
 6 files changed, 212 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 996a498..57e34a0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3926,6 +3926,52 @@ struct cmd_set_txsplit_result {
 	},
 };
 
+/* *** SET TIMES FOR TXONLY PACKETS SCHEDULING ON TIMESTAMPS *** */
+
+struct cmd_set_txtimes_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t txtimes;
+	cmdline_fixed_string_t tx_times;
+};
+
+static void
+cmd_set_txtimes_parsed(void *parsed_result,
+		       __rte_unused struct cmdline *cl,
+		       __rte_unused void *data)
+{
+	struct cmd_set_txtimes_result *res;
+	unsigned int tx_times[2] = {0, 0};
+	unsigned int n_times;
+
+	res = parsed_result;
+	n_times = parse_item_list(res->tx_times, "tx times",
+				  2, tx_times, 0);
+	if (n_times == 2)
+		set_tx_pkt_times(tx_times);
+}
+
+cmdline_parse_token_string_t cmd_set_txtimes_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_txtimes_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 txtimes, "txtimes");
+cmdline_parse_token_string_t cmd_set_txtimes_value =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 tx_times, NULL);
+
+cmdline_parse_inst_t cmd_set_txtimes = {
+	.f = cmd_set_txtimes_parsed,
+	.data = NULL,
+	.help_str = "set txtimes <inter_burst>,<intra_burst>",
+	.tokens = {
+		(void *)&cmd_set_txtimes_keyword,
+		(void *)&cmd_set_txtimes_name,
+		(void *)&cmd_set_txtimes_value,
+		NULL,
+	},
+};
+
 /* *** ADD/REMOVE ALL VLAN IDENTIFIERS TO/FROM A PORT VLAN RX FILTER *** */
 struct cmd_rx_vlan_filter_all_result {
 	cmdline_fixed_string_t rx_vlan;
@@ -7414,6 +7460,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		pkt_fwd_config_display(&cur_fwd_config);
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
+	else if (!strcmp(res->what, "txtimes"))
+		show_tx_pkt_times();
 }
 
 cmdline_parse_token_string_t cmd_showcfg_show =
@@ -7422,12 +7470,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts");
+				 "rxtx#cores#fwd#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts",
+	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -18662,7 +18710,8 @@ struct cmd_config_per_port_tx_offload_result {
 			  "sctp_cksum#tcp_tso#udp_tso#outer_ipv4_cksum#"
 			  "qinq_insert#vxlan_tnl_tso#gre_tnl_tso#"
 			  "ipip_tnl_tso#geneve_tnl_tso#macsec_insert#"
-			  "mt_lockfree#multi_segs#mbuf_fast_free#security");
+			  "mt_lockfree#multi_segs#mbuf_fast_free#security#"
+			  "send_on_timestamp");
 cmdline_parse_token_string_t cmd_config_per_port_tx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_tx_offload_result,
@@ -18747,7 +18796,8 @@ struct cmd_config_per_port_tx_offload_result {
 		    "sctp_cksum|tcp_tso|udp_tso|outer_ipv4_cksum|"
 		    "qinq_insert|vxlan_tnl_tso|gre_tnl_tso|"
 		    "ipip_tnl_tso|geneve_tnl_tso|macsec_insert|"
-		    "mt_lockfree|multi_segs|mbuf_fast_free|security on|off",
+		    "mt_lockfree|multi_segs|mbuf_fast_free|security|"
+		    "send_on_timestamp on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_tx_offload_result_port,
 		(void *)&cmd_config_per_port_tx_offload_result_config,
@@ -19419,6 +19469,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
+	(cmdline_parse_inst_t *)&cmd_set_txtimes,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_list,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mask,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a7112c9..1625978 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1027,6 +1027,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP) {
+		printf("Tx scheduling on timestamp:    ");
+		if (ports[port_id].dev_conf.txmode.offloads &
+		    DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 }
 
 int
@@ -2993,6 +3002,40 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_tx_pkt_times(void)
+{
+	printf("Interburst gap: %u\n", tx_pkt_times[0]);
+	printf("Intraburst gap: %u\n", tx_pkt_times[1]);
+}
+
+void
+set_tx_pkt_times(unsigned int *tx_times)
+{
+	int offset;
+	int flag;
+
+	static const struct rte_mbuf_dynfield desc_offs = {
+		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
+		.size = sizeof(uint64_t),
+		.align = __alignof__(uint64_t),
+	};
+	static const struct rte_mbuf_dynflag desc_flag = {
+		.name = RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME,
+	};
+
+	offset = rte_mbuf_dynfield_register(&desc_offs);
+	if (offset < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp field registration error: %d",
+		       rte_errno);
+	flag = rte_mbuf_dynflag_register(&desc_flag);
+	if (flag < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp flag registration error: %d",
+		       rte_errno);
+	tx_pkt_times[0] = tx_times[0];
+	tx_pkt_times[1] = tx_times[1];
+}
+
+void
 setup_gro(const char *onoff, portid_t port_id)
 {
 	if (!rte_eth_dev_is_valid_port(port_id)) {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4989d22..aab9a23 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -223,6 +223,9 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
+uint32_t tx_pkt_times[2];
+/**< Timings for send scheduling in TXONLY mode. */
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a803750..345ad1d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -442,6 +442,7 @@ struct queue_stats_mappings {
 extern uint16_t tx_pkt_length; /**< Length of TXONLY packet */
 extern uint16_t tx_pkt_seg_lengths[RTE_MAX_SEGS_PER_PKT]; /**< Seg. lengths */
 extern uint8_t  tx_pkt_nb_segs; /**< Number of segments in TX packets */
+extern uint32_t tx_pkt_times[2];
 
 enum tx_pkt_split {
 	TX_PKT_SPLIT_OFF,
@@ -794,6 +795,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void show_tx_pkt_segments(void);
+void set_tx_pkt_times(unsigned int *tx_times);
+void show_tx_pkt_times(void);
 void set_tx_pkt_split(const char *name);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
 char *list_pkt_forwarding_modes(void);
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 076ccaf..3d261cf 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -53,6 +53,11 @@
 static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. */
 RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
 static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
+RTE_DEFINE_PER_LCORE(uint64_t, ts_qskew); /**< Timestamp offset per queue */
+static uint64_t ts_mask; /**< Timestamp dynamic flag mask */
+static int32_t ts_off; /**< Timestamp dynamic field offset */
+static bool ts_enable; /**< Timestamp enable */
+static uint64_t ts_init[RTE_MAX_ETHPORTS];
 
 static void
 copy_buf_to_pkt_segs(void* buf, unsigned len, struct rte_mbuf *pkt,
@@ -150,7 +155,8 @@
 static inline bool
 pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp,
 		struct rte_ether_hdr *eth_hdr, const uint16_t vlan_tci,
-		const uint16_t vlan_tci_outer, const uint64_t ol_flags)
+		const uint16_t vlan_tci_outer, const uint64_t ol_flags,
+		const uint16_t idx, const struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT];
 	struct rte_mbuf *pkt_seg;
@@ -213,6 +219,50 @@
 	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
 			sizeof(struct rte_ether_hdr) +
 			sizeof(struct rte_ipv4_hdr));
+	if (unlikely(ts_enable)) {
+		uint64_t skew = RTE_PER_LCORE(ts_qskew);
+		struct {
+			rte_be32_t signature;
+			rte_be16_t pkt_idx;
+			rte_be16_t queue_idx;
+			rte_be64_t ts;
+		} ts_mark;
+
+		if (unlikely(!skew)) {
+			struct rte_eth_dev *dev = &rte_eth_devices[fs->tx_port];
+			unsigned int txqs_n = dev->data->nb_tx_queues;
+			uint64_t phase = tx_pkt_times[0] * fs->tx_queue /
+					 (txqs_n ? txqs_n : 1);
+			/*
+			 * Initialize the scheduling time phase shift
+			 * depending on queue index.
+			 */
+			skew = ts_init[fs->tx_port] + tx_pkt_times[0] + phase;
+			RTE_PER_LCORE(ts_qskew) = skew;
+		}
+		ts_mark.pkt_idx = rte_cpu_to_be_16(idx);
+		ts_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
+		ts_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
+		if (unlikely(!idx)) {
+			skew +=	tx_pkt_times[0];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else if (tx_pkt_times[1]) {
+			skew +=	tx_pkt_times[1];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else {
+			ts_mark.ts = RTE_BE64(0);
+		}
+		copy_buf_to_pkt(&ts_mark, sizeof(ts_mark), pkt,
+			sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(pkt_udp_hdr));
+	}
 	/*
 	 * Complete first mbuf of packet and append it to the
 	 * burst of packets to be transmitted.
@@ -275,7 +325,8 @@
 			if (unlikely(!pkt_burst_prepare(pkts_burst[nb_pkt], mbp,
 							&eth_hdr, vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_mempool_put_bulk(mbp,
 						(void **)&pkts_burst[nb_pkt],
 						nb_pkt_per_burst - nb_pkt);
@@ -290,7 +341,8 @@
 			if (unlikely(!pkt_burst_prepare(pkt, mbp, &eth_hdr,
 							vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_pktmbuf_free(pkt);
 				break;
 			}
@@ -302,6 +354,7 @@
 		return;
 
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+
 	/*
 	 * Retry if necessary
 	 */
@@ -342,15 +395,31 @@
 }
 
 static void
-tx_only_begin(__rte_unused portid_t pi)
+tx_only_begin(portid_t pi)
 {
 	uint16_t pkt_data_len;
+	int dynf;
 
 	pkt_data_len = (uint16_t) (tx_pkt_length - (
 					sizeof(struct rte_ether_hdr) +
 					sizeof(struct rte_ipv4_hdr) +
 					sizeof(struct rte_udp_hdr)));
 	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr, pkt_data_len);
+
+	ts_enable = false;
+	ts_mask = 0;
+	ts_off = -1;
+	RTE_PER_LCORE(ts_qskew) = 0;
+	dynf = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_mask = 1ULL << dynf;
+	dynf = rte_mbuf_dynfield_lookup
+				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_off = dynf;
+	ts_enable = tx_pkt_times[0] && ts_mask && ts_off >= 0 &&
+		    !rte_eth_read_clock(pi, &ts_init[pi]);
 }
 
 struct fwd_engine tx_only_engine = {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a..00413cc 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -266,7 +266,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts)
+   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
 
 The available information categories are:
 
@@ -278,6 +278,8 @@ The available information categories are:
 
 * ``txpkts``: Packets to TX configuration.
 
+* ``txtimes``: Burst time pattern for Tx only mode.
+
 For example:
 
 .. code-block:: console
@@ -725,6 +727,38 @@ Set the length of each segment of the TX-ONLY packets or length of packet for FL
 
 Where x[,y]* represents a CSV list of values, without white space.
 
+set txtimes
+~~~~~~~~~~
+
+Configure the timing burst pattern for Tx only mode. This command enables
+the packet send scheduling on dynamic timestamp mbuf field and configures
+timing pattern in Tx only mode. In this mode, if scheduling is enabled
+application provides timestamps in the packets being sent. It is possible
+to configure delay (in unspecified device clock units) between bursts
+and between the packets within the burst::
+
+   testpmd> set txtimes (inter),(intra)
+
+where:
+
+* ``inter``  is the delay between the bursts in the device clock units.
+  If ``intra`` is zero, this is the time between the beginnings of the
+  first packets in the neighbour bursts, if ``intra`` is not zero,
+  ``inter`` specifies the time between the beginning of the first packet
+  of the current burst and the beginning of the last packet of the
+  previous burst. If ``inter`` parameter is zero the send scheduling
+  on timestamps is disabled (default).
+
+* ``intra`` is the delay between the packets within the burst specified
+  in the device clock units. The number of packets in the burst is defined
+  by regular burst setting. If ``intra`` parameter is zero no timestamps
+  provided in the packets excepting  the first one in the burst.
+
+As the result the bursts of packet will be transmitted with specific
+delays between the packets within the burst and specific delay between
+the bursts. The rte_eth_get_clock() is supposed to be engaged to get the
+current device clock value and provide the reference for the timestamps.
+
 set txsplit
 ~~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 14:57 ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
  2020-07-07 14:57   ` [dpdk-dev] [PATCH v4 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
@ 2020-07-07 15:23   ` Olivier Matz
  2020-07-08 14:16   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
  2 siblings, 0 replies; 86+ messages in thread
From: Olivier Matz @ 2020-07-07 15:23 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

On Tue, Jul 07, 2020 at 02:57:11PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  v1->v4:
>     - dedicated dynamic Tx timestamp flag instead of shared with Rx
>     - Doxygen-style comment
>     - comments update
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..7e9f7d2 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow4

allow4 -> allow

> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
> + * packet being sent it tries to synchronize the time of packet appearing
> + * on the wire with the specified packet timestamp. If the specified one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according to timestamps is supposed,
> + * neither for packet within the burst, nor for the whole bursts, it is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
> +

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-07 14:57 ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
  2020-07-07 14:57   ` [dpdk-dev] [PATCH v4 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
  2020-07-07 15:23   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
@ 2020-07-08 14:16   ` Morten Brørup
  2020-07-08 14:54     ` Slava Ovsiienko
  2 siblings, 1 reply; 86+ messages in thread
From: Morten Brørup @ 2020-07-08 14:16 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> Ovsiienko
> Sent: Tuesday, July 7, 2020 4:57 PM
> 
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without
> involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be
> introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the
> timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  v1->v4:
>     - dedicated dynamic Tx timestamp flag instead of shared with Rx

The detailed description above should be updated to reflect that it is now two flags.

>     - Doxygen-style comment
>     - comments update
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c
> b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
> 
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h
> b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> 
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h
> b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..7e9f7d2 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
> 
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly
> defined
> + * but are maintained always the same for a given port. Some devices
> allow4
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field
> contains
> + * actual timestamp value. For the packets being sent this value can
> be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"

The description above should not say anything about the dynamic TX timestamp flag.

Please elaborate "some timing information", e.g. add "... about when the packet was received".

> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on
> the
> + * packet being sent it tries to synchronize the time of packet
> appearing
> + * on the wire with the specified packet timestamp. If the specified
> one
> + * is in the past it should be ignored, if one is in the distant
> future
> + * it should be capped with some reasonable value (in range of
> seconds).
> + *
> + * There is no any packet reordering according to timestamps is
> supposed,
> + * neither for packet within the burst, nor for the whole bursts, it
> is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst
> scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
> +
>  #endif
> --
> 1.8.3.1
> 

It may be worth adding some documentation about how the clocks of the NICs are out of sync with the clock of the CPU, and are all drifting relatively.

And those clocks are also out of sync with the actual time (NTP clock).

Preferably, some sort of cookbook for handling this should be provided. PCAP could be used as an example.



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 14:16   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
@ 2020-07-08 14:54     ` Slava Ovsiienko
  2020-07-08 15:27       ` Morten Brørup
  0 siblings, 1 reply; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-08 14:54 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

Hi, Morten

Thank you for the comments. Please, see below.

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, July 8, 2020 17:16
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@mellanox.net
> Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet
> Txscheduling
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > Ovsiienko
> > Sent: Tuesday, July 7, 2020 4:57 PM
> >
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> >
> > The main objective of this RFC is to specify the way how applications
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well.
> >
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. If the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> >
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.
> >
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> >
> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <burst_gap>,<intra_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> > current device clock value and provide the reference for the
> > timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  v1->v4:
> >     - dedicated dynamic Tx timestamp flag instead of shared with Rx
> 
> The detailed description above should be updated to reflect that it is now
> two flags.
OK

> 
> >     - Doxygen-style comment
> >     - comments update
> >
> > ---
> >  lib/librte_ethdev/rte_ethdev.c |  1 +  lib/librte_ethdev/rte_ethdev.h
> > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > +++++++++++++++++++++++++++++++
> >  3 files changed, 36 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> >  };
> >
> >  #undef RTE_TX_OFFLOAD_BIT2STR
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > +
> > +
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**<
> > Device supports Rx queue setup after device started*/  #define
> > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > index 96c3631..7e9f7d2 100644
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
> >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> >
> > +/**
> > + * The timestamp dynamic field provides some timing information, the
> > + * units and time references (initial phase) are not explicitly
> > defined
> > + * but are maintained always the same for a given port. Some devices
> > allow4
> > + * to query rte_eth_read_clock() that will return the current device
> > + * timestamp. The dynamic Tx timestamp flag tells whether the field
> > contains
> > + * actual timestamp value. For the packets being sent this value can
> > be
> > + * used by PMD to schedule packet sending.
> > + *
> > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> > + * introduced and the shared dynamic timestamp field will be used
> > + * to handle the timestamps on receiving datapath as well.
> > + */
> > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> 
> The description above should not say anything about the dynamic TX
> timestamp flag.
It does not. Or do you mean RX? 
Not sure, field and flag are tightly coupled,
it is nice to mention this relation for better understanding. 
And mentioning the RX explains why it is not like this:
RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME

> 
> Please elaborate "some timing information", e.g. add "... about when the
> packet was received".

Sorry, I do not follow,  currently the dynamic field is not
"about when the packet was received". Now it is introduced for Tx
only and just the opportunity to be shared with Rx one in coming releases
is mentioned. "Some" means - not specified (herein) exactly.
And it is elaborated what Is not specified and how it is supposed
to use Tx timestamp.

> 
> > +
> > +/**
> > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> set on
> > the
> > + * packet being sent it tries to synchronize the time of packet
> > appearing
> > + * on the wire with the specified packet timestamp. If the specified
> > one
> > + * is in the past it should be ignored, if one is in the distant
> > future
> > + * it should be capped with some reasonable value (in range of
> > seconds).
> > + *
> > + * There is no any packet reordering according to timestamps is
> > supposed,
> > + * neither for packet within the burst, nor for the whole bursts, it
> > is
> > + * an entirely application responsibility to generate packets and its
> > + * timestamps in desired order. The timestamps might be put only in
> > + * the first packet in the burst providing the entire burst
> > scheduling.
> > + */
> > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> "rte_dynflag_tx_timestamp"
> > +
> >  #endif
> > --
> > 1.8.3.1
> >
> 
> It may be worth adding some documentation about how the clocks of the
> NICs are out of sync with the clock of the CPU, and are all drifting relatively.
> 
> And those clocks are also out of sync with the actual time (NTP clock).

IMO, It is out of scope of this very generic patch.  As for mlx NICs - the internal device
clock might be (or might be not) synchronized with PTP, it can provide timestamps
in real nanoseconds in various formats or just some free running counter.
On some systems the NIC and CPU might share the same clock source (for their PLL inputs
for example) and there will be no any drifts. As we can see - it is a wide and interesting 
opic to discuss, but, IMO,  the comment in header file might be not the most relevant
place to do. As for mlx5 devices clock specifics - it will be documented in PMD chapter.

OK, will add few generic words, the few ones - in order not to make comment wordy, just
point the direction for further thinking.

> 
> Preferably, some sort of cookbook for handling this should be provided.
> PCAP could be used as an example.
> 
testpmd example is included in series, mlx5 PMD patch is prepared and coming soon.

With best regards, Slava


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 14:54     ` Slava Ovsiienko
@ 2020-07-08 15:27       ` Morten Brørup
  2020-07-08 15:51         ` Slava Ovsiienko
  0 siblings, 1 reply; 86+ messages in thread
From: Morten Brørup @ 2020-07-08 15:27 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> Sent: Wednesday, July 8, 2020 4:54 PM
> 
> Hi, Morten
> 
> Thank you for the comments. Please, see below.
> 
> > -----Original Message-----
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Wednesday, July 8, 2020 17:16
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> > <rasland@mellanox.com>; olivier.matz@6wind.com;
> > bernard.iremonger@intel.com; thomas@mellanox.net
> > Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate
> packet
> > Txscheduling
> >
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > > Ovsiienko
> > > Sent: Tuesday, July 7, 2020 4:57 PM
> > >
> > > There is the requirement on some networks for precise traffic
> timing
> > > management. The ability to send (and, generally speaking, receive)
> the
> > > packets at the very precisely specified moment of time provides the
> > > opportunity to support the connections with Time Division
> Multiplexing
> > > using the contemporary general purpose NIC without involving an
> > > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > > interface is one of the promising features for potentially usage of
> > > the precise time management for the egress packets.
> > >
> > > The main objective of this RFC is to specify the way how
> applications
> > > can provide the moment of time at what the packet transmission must
> be
> > > started and to describe in preliminary the supporting this feature
> > > from
> > > mlx5 PMD side.
> > >
> > > The new dynamic timestamp field is proposed, it provides some
> timing
> > > information, the units and time references (initial phase) are not
> > > explicitly defined but are maintained always the same for a given
> port.
> > > Some devices allow to query rte_eth_read_clock() that will return
> the
> > > current device timestamp. The dynamic timestamp flag tells whether
> the
> > > field contains actual timestamp value. For the packets being sent
> this
> > > value can be used by PMD to schedule packet sending.
> > >
> > > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and
> > > obsoleting, these dynamic flag and field will be used to manage the
> > > timestamps on receiving datapath as well.
> > >
> > > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > > sent it tries to synchronize the time of packet appearing on the
> wire
> > > with the specified packet timestamp. If the specified one is in the
> > > past it should be ignored, if one is in the distant future it
> should
> > > be capped with some reasonable value (in range of seconds). These
> > > specific cases ("too late" and "distant future") can be optionally
> > > reported via device xstats to assist applications to detect the
> > > time-related problems.
> > >
> > > There is no any packet reordering according timestamps is supposed,
> > > neither within packet burst, nor between packets, it is an entirely
> > > application responsibility to generate packets and its timestamps
> in
> > > desired order. The timestamps can be put only in the first packet
> in
> > > the burst providing the entire burst scheduling.
> > >
> > > PMD reports the ability to synchronize packet sending on timestamp
> > > with new offload flag:
> > >
> > > This is palliative and is going to be replaced with new eth_dev API
> > > about reporting/managing the supported dynamic flags and its
> related
> > > features. This API would break ABI compatibility and can't be
> > > introduced at the moment, so is postponed to 20.11.
> > >
> > > For testing purposes it is proposed to update testpmd "txonly"
> > > forwarding mode routine. With this update testpmd application
> > > generates the packets and sets the dynamic timestamps according to
> > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> registered.
> > >
> > > The new testpmd command is proposed to configure sending pattern:
> > >
> > > set tx_times <burst_gap>,<intra_gap>
> > >
> > > <intra_gap> - the delay between the packets within the burst
> > >               specified in the device clock units. The number
> > >               of packets in the burst is defined by txburst
> parameter
> > >
> > > <burst_gap> - the delay between the bursts in the device clock
> units
> > >
> > > As the result the bursts of packet will be transmitted with
> specific
> > > delays between the packets within the burst and specific delay
> between
> > > the bursts. The rte_eth_get_clock is supposed to be engaged to get
> the
> > > current device clock value and provide the reference for the
> > > timestamps.
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > ---
> > >  v1->v4:
> > >     - dedicated dynamic Tx timestamp flag instead of shared with Rx
> >
> > The detailed description above should be updated to reflect that it
> is now
> > two flags.
> OK
> 
> >
> > >     - Doxygen-style comment
> > >     - comments update
> > >
> > > ---
> > >  lib/librte_ethdev/rte_ethdev.c |  1 +
> lib/librte_ethdev/rte_ethdev.h
> > > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > > +++++++++++++++++++++++++++++++
> > >  3 files changed, 36 insertions(+)
> > >
> > > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.c
> > > +++ b/lib/librte_ethdev/rte_ethdev.c
> > > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> > >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> > >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> > >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> > >  };
> > >
> > >  #undef RTE_TX_OFFLOAD_BIT2STR
> > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> > >  /** Device supports outer UDP checksum */  #define
> > > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> > >
> > > +/** Device supports send on timestamp */ #define
> > > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > > +
> > > +
> > >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> > /**<
> > > Device supports Rx queue setup after device started*/  #define
> > > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > index 96c3631..7e9f7d2 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> > "rte_flow_dynfield_metadata"
> > >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> > "rte_flow_dynflag_metadata"
> > >
> > > +/**
> > > + * The timestamp dynamic field provides some timing information,
> the
> > > + * units and time references (initial phase) are not explicitly
> > > defined
> > > + * but are maintained always the same for a given port. Some
> devices
> > > allow4
> > > + * to query rte_eth_read_clock() that will return the current
> device
> > > + * timestamp. The dynamic Tx timestamp flag tells whether the
> field
> > > contains
> > > + * actual timestamp value. For the packets being sent this value
> can
> > > be
> > > + * used by PMD to schedule packet sending.
> > > + *
> > > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field
> deprecation
> > > + * and obsoleting, the dedicated Rx timestamp flag is supposed to
> be
> > > + * introduced and the shared dynamic timestamp field will be used
> > > + * to handle the timestamps on receiving datapath as well.
> > > + */
> > > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > "rte_dynfield_timestamp"
> >
> > The description above should not say anything about the dynamic TX
> > timestamp flag.
> It does not. Or do you mean RX?
> Not sure, field and flag are tightly coupled,
> it is nice to mention this relation for better understanding.
> And mentioning the RX explains why it is not like this:
> RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME

Sorry. I misunderstood its purpose!
It's the name of the field, and the field will not only be used for RX, but in the future also for RX.
(I thought it was the name of the RX flag, reserved for future use.)

> 
> >
> > Please elaborate "some timing information", e.g. add "... about when
> the
> > packet was received".
> 
> Sorry, I do not follow,  currently the dynamic field is not
> "about when the packet was received". Now it is introduced for Tx
> only and just the opportunity to be shared with Rx one in coming
> releases
> is mentioned. "Some" means - not specified (herein) exactly.
> And it is elaborated what Is not specified and how it is supposed
> to use Tx timestamp.

It should be described when it is valid, and how it is being used, e.g. by adding a reference to the "rte_dynflag_tx_timestamp" flag.

> >
> > > +
> > > +/**
> > > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> > set on
> > > the
> > > + * packet being sent it tries to synchronize the time of packet
> > > appearing
> > > + * on the wire with the specified packet timestamp. If the
> specified
> > > one
> > > + * is in the past it should be ignored, if one is in the distant
> > > future
> > > + * it should be capped with some reasonable value (in range of
> > > seconds).
> > > + *
> > > + * There is no any packet reordering according to timestamps is
> > > supposed,
> > > + * neither for packet within the burst, nor for the whole bursts,
> it
> > > is
> > > + * an entirely application responsibility to generate packets and
> its
> > > + * timestamps in desired order. The timestamps might be put only
> in
> > > + * the first packet in the burst providing the entire burst
> > > scheduling.
> > > + */
> > > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> > "rte_dynflag_tx_timestamp"
> > > +
> > >  #endif
> > > --
> > > 1.8.3.1
> > >
> >
> > It may be worth adding some documentation about how the clocks of the
> > NICs are out of sync with the clock of the CPU, and are all drifting
> relatively.
> >
> > And those clocks are also out of sync with the actual time (NTP
> clock).
> 
> IMO, It is out of scope of this very generic patch.  As for mlx NICs -
> the internal device
> clock might be (or might be not) synchronized with PTP, it can provide
> timestamps
> in real nanoseconds in various formats or just some free running
> counter.

Cool!

> On some systems the NIC and CPU might share the same clock source (for
> their PLL inputs
> for example) and there will be no any drifts. As we can see - it is a
> wide and interesting
> opic to discuss, but, IMO,  the comment in header file might be not the
> most relevant
> place to do. As for mlx5 devices clock specifics - it will be
> documented in PMD chapter.
> 
> OK, will add few generic words, the few ones - in order not to make
> comment wordy, just
> point the direction for further thinking.

I agree - we don't want cookbooks in the header files. Only enough description to avoid the worst misunderstandings.

> 
> >
> > Preferably, some sort of cookbook for handling this should be
> provided.
> > PCAP could be used as an example.
> >
> testpmd example is included in series, mlx5 PMD patch is prepared and
> coming soon.

Great.

And I suppose that the more detailed cookbook/example - regarding offset and drift of various clocks - is probably more relevant for the RX side (for various PCAP applications), and thus completely unrelated to this patch.

> 
> With best regards, Slava


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (4 preceding siblings ...)
  2020-07-07 14:57 ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
@ 2020-07-08 15:47 ` Viacheslav Ovsiienko
  2020-07-08 15:47   ` [dpdk-dev] [PATCH v5 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
                     ` (2 more replies)
  2020-07-09 12:36 ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
                   ` (4 subsequent siblings)
  10 siblings, 3 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-08 15:47 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas, mb

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well. Having the dedicated
flags for Rx/Tx timestamps allows applications not to perform explicit
flags reset on forwarding and not to promote received timestamps
to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
is considered as candidate to become the dynamic flag.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
  v1->v4:
     - dedicated dynamic Tx timestamp flag instead of shared with Rx
  v4->v5:
     - elaborated commit message
     - more words about device clocks added,
     - note about dedicated Rx/Tx timestamp flags added

---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..8407230 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value for the packets being sent, this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v5 2/2] app/testpmd: add send scheduling test capability
  2020-07-08 15:47 ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-08 15:47   ` Viacheslav Ovsiienko
  2020-07-08 16:05   ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Slava Ovsiienko
  2020-07-09 12:26   ` Thomas Monjalon
  2 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-08 15:47 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas, mb

This commit adds testpmd capability to provide timestamps on the packets
being sent in the txonly mode. This includes:

 - SEND_ON_TIMESTAMP support
   new device Tx offload capability support added, example:

     testpmd> port config 0 tx_offload send_on_timestamp on

 - set txtimes, registers field and flag, example:

     testpmd> set txtimes 1000000,0

   This command enables the packet send scheduling on timestamps if
   the first parameter is not zero, generic format:

     testpmd> set txtimes (inter),(intra)

   where:

     inter - is the delay between the bursts in the device clock units.
     intra - is the delay between the packets within the burst specified
	     in the device clock units

     As the result the bursts of packet will be transmitted with
     specific delay between the packets within the burst and specific
     delay between the bursts. The rte_eth_get_clock() is supposed to be
     engaged to get the current device clock value and provide
     the reference for the timestamps.

 - show txtimes, displays the timing settings
 - txonly burst time pattern

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/cmdline.c                      | 59 ++++++++++++++++++++--
 app/test-pmd/config.c                       | 43 ++++++++++++++++
 app/test-pmd/testpmd.c                      |  3 ++
 app/test-pmd/testpmd.h                      |  3 ++
 app/test-pmd/txonly.c                       | 77 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 36 +++++++++++++-
 6 files changed, 212 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 996a498..57e34a0 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3926,6 +3926,52 @@ struct cmd_set_txsplit_result {
 	},
 };
 
+/* *** SET TIMES FOR TXONLY PACKETS SCHEDULING ON TIMESTAMPS *** */
+
+struct cmd_set_txtimes_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t txtimes;
+	cmdline_fixed_string_t tx_times;
+};
+
+static void
+cmd_set_txtimes_parsed(void *parsed_result,
+		       __rte_unused struct cmdline *cl,
+		       __rte_unused void *data)
+{
+	struct cmd_set_txtimes_result *res;
+	unsigned int tx_times[2] = {0, 0};
+	unsigned int n_times;
+
+	res = parsed_result;
+	n_times = parse_item_list(res->tx_times, "tx times",
+				  2, tx_times, 0);
+	if (n_times == 2)
+		set_tx_pkt_times(tx_times);
+}
+
+cmdline_parse_token_string_t cmd_set_txtimes_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_txtimes_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 txtimes, "txtimes");
+cmdline_parse_token_string_t cmd_set_txtimes_value =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 tx_times, NULL);
+
+cmdline_parse_inst_t cmd_set_txtimes = {
+	.f = cmd_set_txtimes_parsed,
+	.data = NULL,
+	.help_str = "set txtimes <inter_burst>,<intra_burst>",
+	.tokens = {
+		(void *)&cmd_set_txtimes_keyword,
+		(void *)&cmd_set_txtimes_name,
+		(void *)&cmd_set_txtimes_value,
+		NULL,
+	},
+};
+
 /* *** ADD/REMOVE ALL VLAN IDENTIFIERS TO/FROM A PORT VLAN RX FILTER *** */
 struct cmd_rx_vlan_filter_all_result {
 	cmdline_fixed_string_t rx_vlan;
@@ -7414,6 +7460,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		pkt_fwd_config_display(&cur_fwd_config);
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
+	else if (!strcmp(res->what, "txtimes"))
+		show_tx_pkt_times();
 }
 
 cmdline_parse_token_string_t cmd_showcfg_show =
@@ -7422,12 +7470,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts");
+				 "rxtx#cores#fwd#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts",
+	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -18662,7 +18710,8 @@ struct cmd_config_per_port_tx_offload_result {
 			  "sctp_cksum#tcp_tso#udp_tso#outer_ipv4_cksum#"
 			  "qinq_insert#vxlan_tnl_tso#gre_tnl_tso#"
 			  "ipip_tnl_tso#geneve_tnl_tso#macsec_insert#"
-			  "mt_lockfree#multi_segs#mbuf_fast_free#security");
+			  "mt_lockfree#multi_segs#mbuf_fast_free#security#"
+			  "send_on_timestamp");
 cmdline_parse_token_string_t cmd_config_per_port_tx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_tx_offload_result,
@@ -18747,7 +18796,8 @@ struct cmd_config_per_port_tx_offload_result {
 		    "sctp_cksum|tcp_tso|udp_tso|outer_ipv4_cksum|"
 		    "qinq_insert|vxlan_tnl_tso|gre_tnl_tso|"
 		    "ipip_tnl_tso|geneve_tnl_tso|macsec_insert|"
-		    "mt_lockfree|multi_segs|mbuf_fast_free|security on|off",
+		    "mt_lockfree|multi_segs|mbuf_fast_free|security|"
+		    "send_on_timestamp on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_tx_offload_result_port,
 		(void *)&cmd_config_per_port_tx_offload_result_config,
@@ -19419,6 +19469,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
+	(cmdline_parse_inst_t *)&cmd_set_txtimes,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_list,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mask,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index a7112c9..1625978 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1027,6 +1027,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP) {
+		printf("Tx scheduling on timestamp:    ");
+		if (ports[port_id].dev_conf.txmode.offloads &
+		    DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 }
 
 int
@@ -2993,6 +3002,40 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_tx_pkt_times(void)
+{
+	printf("Interburst gap: %u\n", tx_pkt_times[0]);
+	printf("Intraburst gap: %u\n", tx_pkt_times[1]);
+}
+
+void
+set_tx_pkt_times(unsigned int *tx_times)
+{
+	int offset;
+	int flag;
+
+	static const struct rte_mbuf_dynfield desc_offs = {
+		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
+		.size = sizeof(uint64_t),
+		.align = __alignof__(uint64_t),
+	};
+	static const struct rte_mbuf_dynflag desc_flag = {
+		.name = RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME,
+	};
+
+	offset = rte_mbuf_dynfield_register(&desc_offs);
+	if (offset < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp field registration error: %d",
+		       rte_errno);
+	flag = rte_mbuf_dynflag_register(&desc_flag);
+	if (flag < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp flag registration error: %d",
+		       rte_errno);
+	tx_pkt_times[0] = tx_times[0];
+	tx_pkt_times[1] = tx_times[1];
+}
+
+void
 setup_gro(const char *onoff, portid_t port_id)
 {
 	if (!rte_eth_dev_is_valid_port(port_id)) {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4989d22..aab9a23 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -223,6 +223,9 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
+uint32_t tx_pkt_times[2];
+/**< Timings for send scheduling in TXONLY mode. */
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a803750..345ad1d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -442,6 +442,7 @@ struct queue_stats_mappings {
 extern uint16_t tx_pkt_length; /**< Length of TXONLY packet */
 extern uint16_t tx_pkt_seg_lengths[RTE_MAX_SEGS_PER_PKT]; /**< Seg. lengths */
 extern uint8_t  tx_pkt_nb_segs; /**< Number of segments in TX packets */
+extern uint32_t tx_pkt_times[2];
 
 enum tx_pkt_split {
 	TX_PKT_SPLIT_OFF,
@@ -794,6 +795,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void show_tx_pkt_segments(void);
+void set_tx_pkt_times(unsigned int *tx_times);
+void show_tx_pkt_times(void);
 void set_tx_pkt_split(const char *name);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
 char *list_pkt_forwarding_modes(void);
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 076ccaf..3d261cf 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -53,6 +53,11 @@
 static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. */
 RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
 static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
+RTE_DEFINE_PER_LCORE(uint64_t, ts_qskew); /**< Timestamp offset per queue */
+static uint64_t ts_mask; /**< Timestamp dynamic flag mask */
+static int32_t ts_off; /**< Timestamp dynamic field offset */
+static bool ts_enable; /**< Timestamp enable */
+static uint64_t ts_init[RTE_MAX_ETHPORTS];
 
 static void
 copy_buf_to_pkt_segs(void* buf, unsigned len, struct rte_mbuf *pkt,
@@ -150,7 +155,8 @@
 static inline bool
 pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp,
 		struct rte_ether_hdr *eth_hdr, const uint16_t vlan_tci,
-		const uint16_t vlan_tci_outer, const uint64_t ol_flags)
+		const uint16_t vlan_tci_outer, const uint64_t ol_flags,
+		const uint16_t idx, const struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT];
 	struct rte_mbuf *pkt_seg;
@@ -213,6 +219,50 @@
 	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
 			sizeof(struct rte_ether_hdr) +
 			sizeof(struct rte_ipv4_hdr));
+	if (unlikely(ts_enable)) {
+		uint64_t skew = RTE_PER_LCORE(ts_qskew);
+		struct {
+			rte_be32_t signature;
+			rte_be16_t pkt_idx;
+			rte_be16_t queue_idx;
+			rte_be64_t ts;
+		} ts_mark;
+
+		if (unlikely(!skew)) {
+			struct rte_eth_dev *dev = &rte_eth_devices[fs->tx_port];
+			unsigned int txqs_n = dev->data->nb_tx_queues;
+			uint64_t phase = tx_pkt_times[0] * fs->tx_queue /
+					 (txqs_n ? txqs_n : 1);
+			/*
+			 * Initialize the scheduling time phase shift
+			 * depending on queue index.
+			 */
+			skew = ts_init[fs->tx_port] + tx_pkt_times[0] + phase;
+			RTE_PER_LCORE(ts_qskew) = skew;
+		}
+		ts_mark.pkt_idx = rte_cpu_to_be_16(idx);
+		ts_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
+		ts_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
+		if (unlikely(!idx)) {
+			skew +=	tx_pkt_times[0];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else if (tx_pkt_times[1]) {
+			skew +=	tx_pkt_times[1];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else {
+			ts_mark.ts = RTE_BE64(0);
+		}
+		copy_buf_to_pkt(&ts_mark, sizeof(ts_mark), pkt,
+			sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(pkt_udp_hdr));
+	}
 	/*
 	 * Complete first mbuf of packet and append it to the
 	 * burst of packets to be transmitted.
@@ -275,7 +325,8 @@
 			if (unlikely(!pkt_burst_prepare(pkts_burst[nb_pkt], mbp,
 							&eth_hdr, vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_mempool_put_bulk(mbp,
 						(void **)&pkts_burst[nb_pkt],
 						nb_pkt_per_burst - nb_pkt);
@@ -290,7 +341,8 @@
 			if (unlikely(!pkt_burst_prepare(pkt, mbp, &eth_hdr,
 							vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_pktmbuf_free(pkt);
 				break;
 			}
@@ -302,6 +354,7 @@
 		return;
 
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+
 	/*
 	 * Retry if necessary
 	 */
@@ -342,15 +395,31 @@
 }
 
 static void
-tx_only_begin(__rte_unused portid_t pi)
+tx_only_begin(portid_t pi)
 {
 	uint16_t pkt_data_len;
+	int dynf;
 
 	pkt_data_len = (uint16_t) (tx_pkt_length - (
 					sizeof(struct rte_ether_hdr) +
 					sizeof(struct rte_ipv4_hdr) +
 					sizeof(struct rte_udp_hdr)));
 	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr, pkt_data_len);
+
+	ts_enable = false;
+	ts_mask = 0;
+	ts_off = -1;
+	RTE_PER_LCORE(ts_qskew) = 0;
+	dynf = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_mask = 1ULL << dynf;
+	dynf = rte_mbuf_dynfield_lookup
+				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_off = dynf;
+	ts_enable = tx_pkt_times[0] && ts_mask && ts_off >= 0 &&
+		    !rte_eth_read_clock(pi, &ts_init[pi]);
 }
 
 struct fwd_engine tx_only_engine = {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a..00413cc 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -266,7 +266,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts)
+   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
 
 The available information categories are:
 
@@ -278,6 +278,8 @@ The available information categories are:
 
 * ``txpkts``: Packets to TX configuration.
 
+* ``txtimes``: Burst time pattern for Tx only mode.
+
 For example:
 
 .. code-block:: console
@@ -725,6 +727,38 @@ Set the length of each segment of the TX-ONLY packets or length of packet for FL
 
 Where x[,y]* represents a CSV list of values, without white space.
 
+set txtimes
+~~~~~~~~~~
+
+Configure the timing burst pattern for Tx only mode. This command enables
+the packet send scheduling on dynamic timestamp mbuf field and configures
+timing pattern in Tx only mode. In this mode, if scheduling is enabled
+application provides timestamps in the packets being sent. It is possible
+to configure delay (in unspecified device clock units) between bursts
+and between the packets within the burst::
+
+   testpmd> set txtimes (inter),(intra)
+
+where:
+
+* ``inter``  is the delay between the bursts in the device clock units.
+  If ``intra`` is zero, this is the time between the beginnings of the
+  first packets in the neighbour bursts, if ``intra`` is not zero,
+  ``inter`` specifies the time between the beginning of the first packet
+  of the current burst and the beginning of the last packet of the
+  previous burst. If ``inter`` parameter is zero the send scheduling
+  on timestamps is disabled (default).
+
+* ``intra`` is the delay between the packets within the burst specified
+  in the device clock units. The number of packets in the burst is defined
+  by regular burst setting. If ``intra`` parameter is zero no timestamps
+  provided in the packets excepting  the first one in the burst.
+
+As the result the bursts of packet will be transmitted with specific
+delays between the packets within the burst and specific delay between
+the bursts. The rte_eth_get_clock() is supposed to be engaged to get the
+current device clock value and provide the reference for the timestamps.
+
 set txsplit
 ~~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 15:27       ` Morten Brørup
@ 2020-07-08 15:51         ` Slava Ovsiienko
  0 siblings, 0 replies; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-08 15:51 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

Hi, Morten

Addressed most of your comments in the v5 commit message.
Header file comments are close to become too wordy,
and I did not dare to elaborate ones more.

With best regards, Slava

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, July 8, 2020 18:27
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@mellanox.net
> Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet
> Txscheduling
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> > Sent: Wednesday, July 8, 2020 4:54 PM
> >
> > Hi, Morten
> >
> > Thank you for the comments. Please, see below.
> >
> > > -----Original Message-----
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Wednesday, July 8, 2020 17:16
> > > To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > > Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> > > <rasland@mellanox.com>; olivier.matz@6wind.com;
> > > bernard.iremonger@intel.com; thomas@mellanox.net
> > > Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate
> > packet
> > > Txscheduling
> > >
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > > > Ovsiienko
> > > > Sent: Tuesday, July 7, 2020 4:57 PM
> > > >
> > > > There is the requirement on some networks for precise traffic
> > timing
> > > > management. The ability to send (and, generally speaking, receive)
> > the
> > > > packets at the very precisely specified moment of time provides
> > > > the opportunity to support the connections with Time Division
> > Multiplexing
> > > > using the contemporary general purpose NIC without involving an
> > > > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > > > interface is one of the promising features for potentially usage
> > > > of the precise time management for the egress packets.
> > > >
> > > > The main objective of this RFC is to specify the way how
> > applications
> > > > can provide the moment of time at what the packet transmission
> > > > must
> > be
> > > > started and to describe in preliminary the supporting this feature
> > > > from
> > > > mlx5 PMD side.
> > > >
> > > > The new dynamic timestamp field is proposed, it provides some
> > timing
> > > > information, the units and time references (initial phase) are not
> > > > explicitly defined but are maintained always the same for a given
> > port.
> > > > Some devices allow to query rte_eth_read_clock() that will return
> > the
> > > > current device timestamp. The dynamic timestamp flag tells whether
> > the
> > > > field contains actual timestamp value. For the packets being sent
> > this
> > > > value can be used by PMD to schedule packet sending.
> > > >
> > > > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > and
> > > > obsoleting, these dynamic flag and field will be used to manage
> > > > the timestamps on receiving datapath as well.
> > > >
> > > > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > > > sent it tries to synchronize the time of packet appearing on the
> > wire
> > > > with the specified packet timestamp. If the specified one is in
> > > > the past it should be ignored, if one is in the distant future it
> > should
> > > > be capped with some reasonable value (in range of seconds). These
> > > > specific cases ("too late" and "distant future") can be optionally
> > > > reported via device xstats to assist applications to detect the
> > > > time-related problems.
> > > >
> > > > There is no any packet reordering according timestamps is
> > > > supposed, neither within packet burst, nor between packets, it is
> > > > an entirely application responsibility to generate packets and its
> > > > timestamps
> > in
> > > > desired order. The timestamps can be put only in the first packet
> > in
> > > > the burst providing the entire burst scheduling.
> > > >
> > > > PMD reports the ability to synchronize packet sending on timestamp
> > > > with new offload flag:
> > > >
> > > > This is palliative and is going to be replaced with new eth_dev
> > > > API about reporting/managing the supported dynamic flags and its
> > related
> > > > features. This API would break ABI compatibility and can't be
> > > > introduced at the moment, so is postponed to 20.11.
> > > >
> > > > For testing purposes it is proposed to update testpmd "txonly"
> > > > forwarding mode routine. With this update testpmd application
> > > > generates the packets and sets the dynamic timestamps according to
> > > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> > registered.
> > > >
> > > > The new testpmd command is proposed to configure sending pattern:
> > > >
> > > > set tx_times <burst_gap>,<intra_gap>
> > > >
> > > > <intra_gap> - the delay between the packets within the burst
> > > >               specified in the device clock units. The number
> > > >               of packets in the burst is defined by txburst
> > parameter
> > > >
> > > > <burst_gap> - the delay between the bursts in the device clock
> > units
> > > >
> > > > As the result the bursts of packet will be transmitted with
> > specific
> > > > delays between the packets within the burst and specific delay
> > between
> > > > the bursts. The rte_eth_get_clock is supposed to be engaged to get
> > the
> > > > current device clock value and provide the reference for the
> > > > timestamps.
> > > >
> > > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > > ---
> > > >  v1->v4:
> > > >     - dedicated dynamic Tx timestamp flag instead of shared with
> > > > Rx
> > >
> > > The detailed description above should be updated to reflect that it
> > is now
> > > two flags.
> > OK
> >
> > >
> > > >     - Doxygen-style comment
> > > >     - comments update
> > > >
> > > > ---
> > > >  lib/librte_ethdev/rte_ethdev.c |  1 +
> > lib/librte_ethdev/rte_ethdev.h
> > > > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > > > +++++++++++++++++++++++++++++++
> > > >  3 files changed, 36 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > > > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.c
> > > > +++ b/lib/librte_ethdev/rte_ethdev.c
> > > > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> > > >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> > > >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> > > >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > > > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> > > >  };
> > > >
> > > >  #undef RTE_TX_OFFLOAD_BIT2STR
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> > > >  /** Device supports outer UDP checksum */  #define
> > > > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> > > >
> > > > +/** Device supports send on timestamp */ #define
> > > > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > > > +
> > > > +
> > > >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP
> 0x00000001
> > > /**<
> > > > Device supports Rx queue setup after device started*/  #define
> > > > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --
> git
> > > > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > index 96c3631..7e9f7d2 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char
> *name,
> > > > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> > > "rte_flow_dynfield_metadata"
> > > >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> > > "rte_flow_dynflag_metadata"
> > > >
> > > > +/**
> > > > + * The timestamp dynamic field provides some timing information,
> > the
> > > > + * units and time references (initial phase) are not explicitly
> > > > defined
> > > > + * but are maintained always the same for a given port. Some
> > devices
> > > > allow4
> > > > + * to query rte_eth_read_clock() that will return the current
> > device
> > > > + * timestamp. The dynamic Tx timestamp flag tells whether the
> > field
> > > > contains
> > > > + * actual timestamp value. For the packets being sent this value
> > can
> > > > be
> > > > + * used by PMD to schedule packet sending.
> > > > + *
> > > > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field
> > deprecation
> > > > + * and obsoleting, the dedicated Rx timestamp flag is supposed to
> > be
> > > > + * introduced and the shared dynamic timestamp field will be used
> > > > + * to handle the timestamps on receiving datapath as well.
> > > > + */
> > > > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > > "rte_dynfield_timestamp"
> > >
> > > The description above should not say anything about the dynamic TX
> > > timestamp flag.
> > It does not. Or do you mean RX?
> > Not sure, field and flag are tightly coupled, it is nice to mention
> > this relation for better understanding.
> > And mentioning the RX explains why it is not like this:
> > RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME
> 
> Sorry. I misunderstood its purpose!
> It's the name of the field, and the field will not only be used for RX, but in the
> future also for RX.
> (I thought it was the name of the RX flag, reserved for future use.)
> 
> >
> > >
> > > Please elaborate "some timing information", e.g. add "... about when
> > the
> > > packet was received".
> >
> > Sorry, I do not follow,  currently the dynamic field is not "about
> > when the packet was received". Now it is introduced for Tx only and
> > just the opportunity to be shared with Rx one in coming releases is
> > mentioned. "Some" means - not specified (herein) exactly.
> > And it is elaborated what Is not specified and how it is supposed to
> > use Tx timestamp.
> 
> It should be described when it is valid, and how it is being used, e.g. by
> adding a reference to the "rte_dynflag_tx_timestamp" flag.
> 
> > >
> > > > +
> > > > +/**
> > > > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> flag
> > > set on
> > > > the
> > > > + * packet being sent it tries to synchronize the time of packet
> > > > appearing
> > > > + * on the wire with the specified packet timestamp. If the
> > specified
> > > > one
> > > > + * is in the past it should be ignored, if one is in the distant
> > > > future
> > > > + * it should be capped with some reasonable value (in range of
> > > > seconds).
> > > > + *
> > > > + * There is no any packet reordering according to timestamps is
> > > > supposed,
> > > > + * neither for packet within the burst, nor for the whole bursts,
> > it
> > > > is
> > > > + * an entirely application responsibility to generate packets and
> > its
> > > > + * timestamps in desired order. The timestamps might be put only
> > in
> > > > + * the first packet in the burst providing the entire burst
> > > > scheduling.
> > > > + */
> > > > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> > > "rte_dynflag_tx_timestamp"
> > > > +
> > > >  #endif
> > > > --
> > > > 1.8.3.1
> > > >
> > >
> > > It may be worth adding some documentation about how the clocks of
> > > the NICs are out of sync with the clock of the CPU, and are all
> > > drifting
> > relatively.
> > >
> > > And those clocks are also out of sync with the actual time (NTP
> > clock).
> >
> > IMO, It is out of scope of this very generic patch.  As for mlx NICs -
> > the internal device clock might be (or might be not) synchronized with
> > PTP, it can provide timestamps in real nanoseconds in various formats
> > or just some free running counter.
> 
> Cool!
> 
> > On some systems the NIC and CPU might share the same clock source (for
> > their PLL inputs for example) and there will be no any drifts. As we
> > can see - it is a wide and interesting opic to discuss, but, IMO,  the
> > comment in header file might be not the most relevant place to do. As
> > for mlx5 devices clock specifics - it will be documented in PMD
> > chapter.
> >
> > OK, will add few generic words, the few ones - in order not to make
> > comment wordy, just point the direction for further thinking.
> 
> I agree - we don't want cookbooks in the header files. Only enough
> description to avoid the worst misunderstandings.
> 
> >
> > >
> > > Preferably, some sort of cookbook for handling this should be
> > provided.
> > > PCAP could be used as an example.
> > >
> > testpmd example is included in series, mlx5 PMD patch is prepared and
> > coming soon.
> 
> Great.
> 
> And I suppose that the more detailed cookbook/example - regarding offset
> and drift of various clocks - is probably more relevant for the RX side (for
> various PCAP applications), and thus completely unrelated to this patch.
> 
> >
> > With best regards, Slava


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-08 15:47 ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-07-08 15:47   ` [dpdk-dev] [PATCH v5 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
@ 2020-07-08 16:05   ` Slava Ovsiienko
  2020-07-09 12:26   ` Thomas Monjalon
  2 siblings, 0 replies; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-08 16:05 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger,
	thomas, mb

> promote Acked-bt from previous patch version to maintain patchwork status accordingly

Acked-by: Olivier Matz <olivier.matz@6wind.com>

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
> Sent: Wednesday, July 8, 2020 18:47
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@monjalon.com;
> mb@smartsharesystems.com
> Subject: [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx
> scheduling
> 
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive) the
> packets at the very precisely specified moment of time provides the
> opportunity to support the connections with Time Division Multiplexing using
> the contemporary general purpose NIC without involving an auxiliary
> hardware. For example, the supporting of O-RAN Fronthaul interface is one
> of the promising features for potentially usage of the precise time
> management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications can
> provide the moment of time at what the packet transmission must be started
> and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not explicitly
> defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return the current
> device timestamp. The dynamic timestamp flag tells whether the field
> contains actual timestamp value. For the packets being sent this value can be
> used by PMD to schedule packet sending.
> 
> The device clock is opaque entity, the units and frequency are vendor specific
> and might depend on hardware capabilities and configurations. If might (or
> not) be synchronized with real time via PTP, might (or not) be synchronous
> with CPU clock (for example if NIC and CPU share the same clock source
> there might be no any drift between the NIC and CPU clocks), etc.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> obsoleting, these dynamic flag and field will be used to manage the
> timestamps on receiving datapath as well. Having the dedicated flags for
> Rx/Tx timestamps allows applications not to perform explicit flags reset on
> forwarding and not to promote received timestamps to the transmitting
> datapath by default. The static PKT_RX_TIMESTAMP is considered as
> candidate to become the dynamic flag.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it
> tries to synchronize the time of packet appearing on the wire with the
> specified packet timestamp. If the specified one is in the past it should be
> ignored, if one is in the distant future it should be capped with some
> reasonable value (in range of seconds). These specific cases ("too late" and
> "distant future") can be optionally reported via device xstats to assist
> applications to detect the time-related problems.
> 
> There is no any packet reordering according timestamps is supposed, neither
> within packet burst, nor between packets, it is an entirely application
> responsibility to generate packets and its timestamps in desired order. The
> timestamps can be put only in the first packet in the burst providing the
> entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp with
> new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API about
> reporting/managing the supported dynamic flags and its related features.
> This API would break ABI compatibility and can't be introduced at the
> moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific delays
> between the packets within the burst and specific delay between the bursts.
> The rte_eth_get_clock is supposed to be engaged to get the current device
> clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> ---
>   v1->v4:
>      - dedicated dynamic Tx timestamp flag instead of shared with Rx
>   v4->v5:
>      - elaborated commit message
>      - more words about device clocks added,
>      - note about dedicated Rx/Tx timestamp flags added
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h |
> 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
> 
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */  #define
> DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> 
> +/** Device supports send on timestamp */ #define
> +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**< Device supports Rx queue setup after device started*/  #define
> RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h index
> 96c3631..8407230 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> 
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices
> +allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field
> +contains
> + * actual timestamp value for the packets being sent, this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> set on the
> + * packet being sent it tries to synchronize the time of packet
> +appearing
> + * on the wire with the specified packet timestamp. If the specified
> +one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according to timestamps is
> +supposed,
> + * neither for packet within the burst, nor for the whole bursts, it is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> "rte_dynflag_tx_timestamp"
> +
>  #endif
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-08 15:47 ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-07-08 15:47   ` [dpdk-dev] [PATCH v5 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
  2020-07-08 16:05   ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Slava Ovsiienko
@ 2020-07-09 12:26   ` Thomas Monjalon
  2 siblings, 0 replies; 86+ messages in thread
From: Thomas Monjalon @ 2020-07-09 12:26 UTC (permalink / raw)
  To: Viacheslav Ovsiienko
  Cc: dev, matan, rasland, olivier.matz, bernard.iremonger, mb

08/07/2020 17:47, Viacheslav Ovsiienko:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
[...]
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)

I think this feature should be added in the release notes.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (5 preceding siblings ...)
  2020-07-08 15:47 ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-09 12:36 ` Viacheslav Ovsiienko
  2020-07-09 12:36   ` [dpdk-dev] [PATCH v6 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
  2020-07-09 23:47   ` [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling Ferruh Yigit
  2020-07-10 12:39 ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-09 12:36 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well. Having the dedicated
flags for Rx/Tx timestamps allows applications not to perform explicit
flags reset on forwarding and not to promote received timestamps
to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
is considered as candidate to become the dynamic flag.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

---
  v1->v4:
     - dedicated dynamic Tx timestamp flag instead of shared with Rx
  v4->v5:
     - elaborated commit message
     - more words about device clocks added,
     - note about dedicated Rx/Tx timestamp flags added
  v5->v6:
     - release notes are updated
---
 doc/guides/rel_notes/release_20_08.rst |  6 ++++++
 lib/librte_ethdev/rte_ethdev.c         |  1 +
 lib/librte_ethdev/rte_ethdev.h         |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 988474c..5527bab 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -200,6 +200,12 @@ New Features
   See the :doc:`../sample_app_ug/l2_forward_real_virtual` for more
   details of this parameter usage.
 
+* **Introduced send packet scheduling on the timestamps.**
+
+  Added the new mbuf dynamic field and flag to provide timestamp on what packet
+  transmitting can be synchronized. The device Tx offload flag is added to
+  indicate the PMD supports send scheduling.
+
 
 Removed Items
 -------------
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7022bd7..c48ca2a 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 631b146..97313a0 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..8407230 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value for the packets being sent, this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v6 2/2] app/testpmd: add send scheduling test capability
  2020-07-09 12:36 ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
@ 2020-07-09 12:36   ` Viacheslav Ovsiienko
  2020-07-09 23:58     ` Ferruh Yigit
  2020-07-09 23:47   ` [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling Ferruh Yigit
  1 sibling, 1 reply; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-09 12:36 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

This commit adds testpmd capability to provide timestamps on the packets
being sent in the txonly mode. This includes:

 - SEND_ON_TIMESTAMP support
   new device Tx offload capability support added, example:

     testpmd> port config 0 tx_offload send_on_timestamp on

 - set txtimes, registers field and flag, example:

     testpmd> set txtimes 1000000,0

   This command enables the packet send scheduling on timestamps if
   the first parameter is not zero, generic format:

     testpmd> set txtimes (inter),(intra)

   where:

     inter - is the delay between the bursts in the device clock units.
     intra - is the delay between the packets within the burst specified
	     in the device clock units

     As the result the bursts of packet will be transmitted with
     specific delay between the packets within the burst and specific
     delay between the bursts. The rte_eth_get_clock() is supposed to be
     engaged to get the current device clock value and provide
     the reference for the timestamps.

 - show txtimes, displays the timing settings
 - txonly burst time pattern

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/cmdline.c                      | 59 ++++++++++++++++++++--
 app/test-pmd/config.c                       | 43 ++++++++++++++++
 app/test-pmd/testpmd.c                      |  3 ++
 app/test-pmd/testpmd.h                      |  3 ++
 app/test-pmd/txonly.c                       | 77 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 36 +++++++++++++-
 6 files changed, 212 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 39ad938..ec5aad6 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3930,6 +3930,52 @@ struct cmd_set_txsplit_result {
 	},
 };
 
+/* *** SET TIMES FOR TXONLY PACKETS SCHEDULING ON TIMESTAMPS *** */
+
+struct cmd_set_txtimes_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t txtimes;
+	cmdline_fixed_string_t tx_times;
+};
+
+static void
+cmd_set_txtimes_parsed(void *parsed_result,
+		       __rte_unused struct cmdline *cl,
+		       __rte_unused void *data)
+{
+	struct cmd_set_txtimes_result *res;
+	unsigned int tx_times[2] = {0, 0};
+	unsigned int n_times;
+
+	res = parsed_result;
+	n_times = parse_item_list(res->tx_times, "tx times",
+				  2, tx_times, 0);
+	if (n_times == 2)
+		set_tx_pkt_times(tx_times);
+}
+
+cmdline_parse_token_string_t cmd_set_txtimes_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_txtimes_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 txtimes, "txtimes");
+cmdline_parse_token_string_t cmd_set_txtimes_value =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 tx_times, NULL);
+
+cmdline_parse_inst_t cmd_set_txtimes = {
+	.f = cmd_set_txtimes_parsed,
+	.data = NULL,
+	.help_str = "set txtimes <inter_burst>,<intra_burst>",
+	.tokens = {
+		(void *)&cmd_set_txtimes_keyword,
+		(void *)&cmd_set_txtimes_name,
+		(void *)&cmd_set_txtimes_value,
+		NULL,
+	},
+};
+
 /* *** ADD/REMOVE ALL VLAN IDENTIFIERS TO/FROM A PORT VLAN RX FILTER *** */
 struct cmd_rx_vlan_filter_all_result {
 	cmdline_fixed_string_t rx_vlan;
@@ -7418,6 +7464,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		pkt_fwd_config_display(&cur_fwd_config);
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
+	else if (!strcmp(res->what, "txtimes"))
+		show_tx_pkt_times();
 }
 
 cmdline_parse_token_string_t cmd_showcfg_show =
@@ -7426,12 +7474,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts");
+				 "rxtx#cores#fwd#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts",
+	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -18670,7 +18718,8 @@ struct cmd_config_per_port_tx_offload_result {
 			  "sctp_cksum#tcp_tso#udp_tso#outer_ipv4_cksum#"
 			  "qinq_insert#vxlan_tnl_tso#gre_tnl_tso#"
 			  "ipip_tnl_tso#geneve_tnl_tso#macsec_insert#"
-			  "mt_lockfree#multi_segs#mbuf_fast_free#security");
+			  "mt_lockfree#multi_segs#mbuf_fast_free#security#"
+			  "send_on_timestamp");
 cmdline_parse_token_string_t cmd_config_per_port_tx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_tx_offload_result,
@@ -18755,7 +18804,8 @@ struct cmd_config_per_port_tx_offload_result {
 		    "sctp_cksum|tcp_tso|udp_tso|outer_ipv4_cksum|"
 		    "qinq_insert|vxlan_tnl_tso|gre_tnl_tso|"
 		    "ipip_tnl_tso|geneve_tnl_tso|macsec_insert|"
-		    "mt_lockfree|multi_segs|mbuf_fast_free|security on|off",
+		    "mt_lockfree|multi_segs|mbuf_fast_free|security|"
+		    "send_on_timestamp on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_tx_offload_result_port,
 		(void *)&cmd_config_per_port_tx_offload_result_config,
@@ -19427,6 +19477,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
+	(cmdline_parse_inst_t *)&cmd_set_txtimes,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_list,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mask,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 53a9b97..f85b88e 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1046,6 +1046,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP) {
+		printf("Tx scheduling on timestamp:    ");
+		if (ports[port_id].dev_conf.txmode.offloads &
+		    DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 }
 
 int
@@ -3067,6 +3076,40 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_tx_pkt_times(void)
+{
+	printf("Interburst gap: %u\n", tx_pkt_times[0]);
+	printf("Intraburst gap: %u\n", tx_pkt_times[1]);
+}
+
+void
+set_tx_pkt_times(unsigned int *tx_times)
+{
+	int offset;
+	int flag;
+
+	static const struct rte_mbuf_dynfield desc_offs = {
+		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
+		.size = sizeof(uint64_t),
+		.align = __alignof__(uint64_t),
+	};
+	static const struct rte_mbuf_dynflag desc_flag = {
+		.name = RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME,
+	};
+
+	offset = rte_mbuf_dynfield_register(&desc_offs);
+	if (offset < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp field registration error: %d",
+		       rte_errno);
+	flag = rte_mbuf_dynflag_register(&desc_flag);
+	if (flag < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp flag registration error: %d",
+		       rte_errno);
+	tx_pkt_times[0] = tx_times[0];
+	tx_pkt_times[1] = tx_times[1];
+}
+
+void
 setup_gro(const char *onoff, portid_t port_id)
 {
 	if (!rte_eth_dev_is_valid_port(port_id)) {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index f5feb62..80c27e6 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -223,6 +223,9 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
+uint32_t tx_pkt_times[2];
+/**< Timings for send scheduling in TXONLY mode. */
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a803750..345ad1d 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -442,6 +442,7 @@ struct queue_stats_mappings {
 extern uint16_t tx_pkt_length; /**< Length of TXONLY packet */
 extern uint16_t tx_pkt_seg_lengths[RTE_MAX_SEGS_PER_PKT]; /**< Seg. lengths */
 extern uint8_t  tx_pkt_nb_segs; /**< Number of segments in TX packets */
+extern uint32_t tx_pkt_times[2];
 
 enum tx_pkt_split {
 	TX_PKT_SPLIT_OFF,
@@ -794,6 +795,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void show_tx_pkt_segments(void);
+void set_tx_pkt_times(unsigned int *tx_times);
+void show_tx_pkt_times(void);
 void set_tx_pkt_split(const char *name);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
 char *list_pkt_forwarding_modes(void);
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 076ccaf..3d261cf 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -53,6 +53,11 @@
 static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. */
 RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
 static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
+RTE_DEFINE_PER_LCORE(uint64_t, ts_qskew); /**< Timestamp offset per queue */
+static uint64_t ts_mask; /**< Timestamp dynamic flag mask */
+static int32_t ts_off; /**< Timestamp dynamic field offset */
+static bool ts_enable; /**< Timestamp enable */
+static uint64_t ts_init[RTE_MAX_ETHPORTS];
 
 static void
 copy_buf_to_pkt_segs(void* buf, unsigned len, struct rte_mbuf *pkt,
@@ -150,7 +155,8 @@
 static inline bool
 pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp,
 		struct rte_ether_hdr *eth_hdr, const uint16_t vlan_tci,
-		const uint16_t vlan_tci_outer, const uint64_t ol_flags)
+		const uint16_t vlan_tci_outer, const uint64_t ol_flags,
+		const uint16_t idx, const struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT];
 	struct rte_mbuf *pkt_seg;
@@ -213,6 +219,50 @@
 	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
 			sizeof(struct rte_ether_hdr) +
 			sizeof(struct rte_ipv4_hdr));
+	if (unlikely(ts_enable)) {
+		uint64_t skew = RTE_PER_LCORE(ts_qskew);
+		struct {
+			rte_be32_t signature;
+			rte_be16_t pkt_idx;
+			rte_be16_t queue_idx;
+			rte_be64_t ts;
+		} ts_mark;
+
+		if (unlikely(!skew)) {
+			struct rte_eth_dev *dev = &rte_eth_devices[fs->tx_port];
+			unsigned int txqs_n = dev->data->nb_tx_queues;
+			uint64_t phase = tx_pkt_times[0] * fs->tx_queue /
+					 (txqs_n ? txqs_n : 1);
+			/*
+			 * Initialize the scheduling time phase shift
+			 * depending on queue index.
+			 */
+			skew = ts_init[fs->tx_port] + tx_pkt_times[0] + phase;
+			RTE_PER_LCORE(ts_qskew) = skew;
+		}
+		ts_mark.pkt_idx = rte_cpu_to_be_16(idx);
+		ts_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
+		ts_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
+		if (unlikely(!idx)) {
+			skew +=	tx_pkt_times[0];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else if (tx_pkt_times[1]) {
+			skew +=	tx_pkt_times[1];
+			pkt->ol_flags |= ts_mask;
+			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
+			RTE_PER_LCORE(ts_qskew) = skew;
+			ts_mark.ts = rte_cpu_to_be_64(skew);
+		} else {
+			ts_mark.ts = RTE_BE64(0);
+		}
+		copy_buf_to_pkt(&ts_mark, sizeof(ts_mark), pkt,
+			sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(pkt_udp_hdr));
+	}
 	/*
 	 * Complete first mbuf of packet and append it to the
 	 * burst of packets to be transmitted.
@@ -275,7 +325,8 @@
 			if (unlikely(!pkt_burst_prepare(pkts_burst[nb_pkt], mbp,
 							&eth_hdr, vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_mempool_put_bulk(mbp,
 						(void **)&pkts_burst[nb_pkt],
 						nb_pkt_per_burst - nb_pkt);
@@ -290,7 +341,8 @@
 			if (unlikely(!pkt_burst_prepare(pkt, mbp, &eth_hdr,
 							vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_pktmbuf_free(pkt);
 				break;
 			}
@@ -302,6 +354,7 @@
 		return;
 
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+
 	/*
 	 * Retry if necessary
 	 */
@@ -342,15 +395,31 @@
 }
 
 static void
-tx_only_begin(__rte_unused portid_t pi)
+tx_only_begin(portid_t pi)
 {
 	uint16_t pkt_data_len;
+	int dynf;
 
 	pkt_data_len = (uint16_t) (tx_pkt_length - (
 					sizeof(struct rte_ether_hdr) +
 					sizeof(struct rte_ipv4_hdr) +
 					sizeof(struct rte_udp_hdr)));
 	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr, pkt_data_len);
+
+	ts_enable = false;
+	ts_mask = 0;
+	ts_off = -1;
+	RTE_PER_LCORE(ts_qskew) = 0;
+	dynf = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_mask = 1ULL << dynf;
+	dynf = rte_mbuf_dynfield_lookup
+				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		ts_off = dynf;
+	ts_enable = tx_pkt_times[0] && ts_mask && ts_off >= 0 &&
+		    !rte_eth_read_clock(pi, &ts_init[pi]);
 }
 
 struct fwd_engine tx_only_engine = {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a..00413cc 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -266,7 +266,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts)
+   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
 
 The available information categories are:
 
@@ -278,6 +278,8 @@ The available information categories are:
 
 * ``txpkts``: Packets to TX configuration.
 
+* ``txtimes``: Burst time pattern for Tx only mode.
+
 For example:
 
 .. code-block:: console
@@ -725,6 +727,38 @@ Set the length of each segment of the TX-ONLY packets or length of packet for FL
 
 Where x[,y]* represents a CSV list of values, without white space.
 
+set txtimes
+~~~~~~~~~~
+
+Configure the timing burst pattern for Tx only mode. This command enables
+the packet send scheduling on dynamic timestamp mbuf field and configures
+timing pattern in Tx only mode. In this mode, if scheduling is enabled
+application provides timestamps in the packets being sent. It is possible
+to configure delay (in unspecified device clock units) between bursts
+and between the packets within the burst::
+
+   testpmd> set txtimes (inter),(intra)
+
+where:
+
+* ``inter``  is the delay between the bursts in the device clock units.
+  If ``intra`` is zero, this is the time between the beginnings of the
+  first packets in the neighbour bursts, if ``intra`` is not zero,
+  ``inter`` specifies the time between the beginning of the first packet
+  of the current burst and the beginning of the last packet of the
+  previous burst. If ``inter`` parameter is zero the send scheduling
+  on timestamps is disabled (default).
+
+* ``intra`` is the delay between the packets within the burst specified
+  in the device clock units. The number of packets in the burst is defined
+  by regular burst setting. If ``intra`` parameter is zero no timestamps
+  provided in the packets excepting  the first one in the burst.
+
+As the result the bursts of packet will be transmitted with specific
+delays between the packets within the burst and specific delay between
+the bursts. The rte_eth_get_clock() is supposed to be engaged to get the
+current device clock value and provide the reference for the timestamps.
+
 set txsplit
 ~~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-09 12:36 ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
  2020-07-09 12:36   ` [dpdk-dev] [PATCH v6 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
@ 2020-07-09 23:47   ` Ferruh Yigit
  2020-07-10 12:32     ` Slava Ovsiienko
  1 sibling, 1 reply; 86+ messages in thread
From: Ferruh Yigit @ 2020-07-09 23:47 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas,
	Andrew Rybchenko

On 7/9/2020 1:36 PM, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.

Is this a HW support, or is the scheduling planned to be done in the driver?

> 
> The main objective of this RFC is to specify the way how applications

It is no more RFC.

> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.

I was about the ask this, will there be a PMD counterpart implementation of the
feature? It would be better to have it as part of this set.
What is the plan for the PMD implementation?

> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> The device clock is opaque entity, the units and frequency are
> vendor specific and might depend on hardware capabilities and
> configurations. If might (or not) be synchronized with real time
> via PTP, might (or not) be synchronous with CPU clock (for example
> if NIC and CPU share the same clock source there might be no
> any drift between the NIC and CPU clocks), etc.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well. Having the dedicated
> flags for Rx/Tx timestamps allows applications not to perform explicit
> flags reset on forwarding and not to promote received timestamps
> to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
> is considered as candidate to become the dynamic flag.

Is there a deprecation notice for 'PKT_RX_TIMESTAMP'? Is this decided?

> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.

Good to hear that there will be a generic API to get supported dynamic flags. I
was concerned about adding 'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP' flag, since not
sure if there will be any other PMD that will want to use it.
The trouble is it is hard to remove a public macro after it is introduced, in
this release I think only single PMD (mlx) will support this feature, and in
next release the plan is to remove the macro. In this case what do you think to
not introduce the flag at all?

> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the

'rte_eth_read_clock()'?

> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> ---
>   v1->v4:
>      - dedicated dynamic Tx timestamp flag instead of shared with Rx
>   v4->v5:
>      - elaborated commit message
>      - more words about device clocks added,
>      - note about dedicated Rx/Tx timestamp flags added
>   v5->v6:
>      - release notes are updated
> ---
>  doc/guides/rel_notes/release_20_08.rst |  6 ++++++
>  lib/librte_ethdev/rte_ethdev.c         |  1 +
>  lib/librte_ethdev/rte_ethdev.h         |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h         | 31 +++++++++++++++++++++++++++++++
>  4 files changed, 42 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
> index 988474c..5527bab 100644
> --- a/doc/guides/rel_notes/release_20_08.rst
> +++ b/doc/guides/rel_notes/release_20_08.rst
> @@ -200,6 +200,12 @@ New Features
>    See the :doc:`../sample_app_ug/l2_forward_real_virtual` for more
>    details of this parameter usage.
>  
> +* **Introduced send packet scheduling on the timestamps.**
> +
> +  Added the new mbuf dynamic field and flag to provide timestamp on what packet
> +  transmitting can be synchronized. The device Tx offload flag is added to
> +  indicate the PMD supports send scheduling.
> +

This is a core library change, can go up in the section, please check the
section comment for the ordering details.

>  
>  Removed Items
>  -------------
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 7022bd7..c48ca2a 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 631b146..97313a0 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000

Please cc the ethdev maintainers.

As mentioned above my concern is if this is generic enough or are we adding a
flag to a specific PMD? And since commit log says this is temporary solution for
just this release, I repeat my question if we can remove the flag completely?


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/2] app/testpmd: add send scheduling test capability
  2020-07-09 12:36   ` [dpdk-dev] [PATCH v6 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
@ 2020-07-09 23:58     ` Ferruh Yigit
  2020-07-10 12:41       ` Slava Ovsiienko
  0 siblings, 1 reply; 86+ messages in thread
From: Ferruh Yigit @ 2020-07-09 23:58 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

On 7/9/2020 1:36 PM, Viacheslav Ovsiienko wrote:
> This commit adds testpmd capability to provide timestamps on the packets
> being sent in the txonly mode. This includes:
> 
>  - SEND_ON_TIMESTAMP support
>    new device Tx offload capability support added, example:
> 
>      testpmd> port config 0 tx_offload send_on_timestamp on
> 
>  - set txtimes, registers field and flag, example:
> 
>      testpmd> set txtimes 1000000,0
> 
>    This command enables the packet send scheduling on timestamps if
>    the first parameter is not zero, generic format:
> 
>      testpmd> set txtimes (inter),(intra)
> 
>    where:
> 
>      inter - is the delay between the bursts in the device clock units.
>      intra - is the delay between the packets within the burst specified
> 	     in the device clock units
> 
>      As the result the bursts of packet will be transmitted with
>      specific delay between the packets within the burst and specific
>      delay between the bursts. The rte_eth_get_clock() is supposed to be
>      engaged to get the current device clock value and provide
>      the reference for the timestamps.
> 
>  - show txtimes, displays the timing settings
>  - txonly burst time pattern
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

<...>

> +cmdline_parse_inst_t cmd_set_txtimes = {
> +	.f = cmd_set_txtimes_parsed,
> +	.data = NULL,
> +	.help_str = "set txtimes <inter_burst>,<intra_burst>",
> +	.tokens = {
> +		(void *)&cmd_set_txtimes_keyword,
> +		(void *)&cmd_set_txtimes_name,
> +		(void *)&cmd_set_txtimes_value,
> +		NULL,
> +	},
> +};

Can you please update 'cmd_help_long_parsed()' with command updates?

<...>

>  void
> +show_tx_pkt_times(void)
> +{
> +	printf("Interburst gap: %u\n", tx_pkt_times[0]);
> +	printf("Intraburst gap: %u\n", tx_pkt_times[1]);
> +}
> +
> +void
> +set_tx_pkt_times(unsigned int *tx_times)
> +{
> +	int offset;
> +	int flag;
> +
> +	static const struct rte_mbuf_dynfield desc_offs = {
> +		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
> +		.size = sizeof(uint64_t),
> +		.align = __alignof__(uint64_t),
> +	};
> +	static const struct rte_mbuf_dynflag desc_flag = {
> +		.name = RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME,
> +	};
> +
> +	offset = rte_mbuf_dynfield_register(&desc_offs);
> +	if (offset < 0 && rte_errno != EEXIST)
> +		printf("Dynamic timestamp field registration error: %d",
> +		       rte_errno);
> +	flag = rte_mbuf_dynflag_register(&desc_flag);
> +	if (flag < 0 && rte_errno != EEXIST)
> +		printf("Dynamic timestamp flag registration error: %d",
> +		       rte_errno);

You are not checking at all if the device supports
'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP' offload or if it is configured or not, but
blindly registering the dynamic fields.
'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP' seems not really used, as mentioned in prev
patch I would be OK to drop the flag.

> +	tx_pkt_times[0] = tx_times[0];
> +	tx_pkt_times[1] = tx_times[1];

I think it is better to rename 'tx_pkt_times[0]' -> 'tx_pkt_times_inter',
'tx_pkt_times[1]' --> 'tx_pkt_times_intra' to increase code readability.

<...>

> --- a/app/test-pmd/txonly.c
> +++ b/app/test-pmd/txonly.c
> @@ -53,6 +53,11 @@
>  static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. */
>  RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
>  static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
> +RTE_DEFINE_PER_LCORE(uint64_t, ts_qskew); /**< Timestamp offset per queue */
> +static uint64_t ts_mask; /**< Timestamp dynamic flag mask */
> +static int32_t ts_off; /**< Timestamp dynamic field offset */
> +static bool ts_enable; /**< Timestamp enable */
> +static uint64_t ts_init[RTE_MAX_ETHPORTS];

What do you think renaming the 'ts_' prefix to long 'timestamp_' prefix, will
variable names be too long? When you are out of this patch context and reading
code 'ts_init' is not that expressive.

<...>

> @@ -213,6 +219,50 @@
>  	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
>  			sizeof(struct rte_ether_hdr) +
>  			sizeof(struct rte_ipv4_hdr));
> +	if (unlikely(ts_enable)) {
> +		uint64_t skew = RTE_PER_LCORE(ts_qskew);
> +		struct {
> +			rte_be32_t signature;
> +			rte_be16_t pkt_idx;
> +			rte_be16_t queue_idx;
> +			rte_be64_t ts;
> +		} ts_mark;
> +
> +		if (unlikely(!skew)) {
> +			struct rte_eth_dev *dev = &rte_eth_devices[fs->tx_port];
> +			unsigned int txqs_n = dev->data->nb_tx_queues;
> +			uint64_t phase = tx_pkt_times[0] * fs->tx_queue /
> +					 (txqs_n ? txqs_n : 1);
> +			/*
> +			 * Initialize the scheduling time phase shift
> +			 * depending on queue index.
> +			 */
> +			skew = ts_init[fs->tx_port] + tx_pkt_times[0] + phase;
> +			RTE_PER_LCORE(ts_qskew) = skew;
> +		}
> +		ts_mark.pkt_idx = rte_cpu_to_be_16(idx);
> +		ts_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
> +		ts_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
> +		if (unlikely(!idx)) {
> +			skew +=	tx_pkt_times[0];
> +			pkt->ol_flags |= ts_mask;
> +			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
> +			RTE_PER_LCORE(ts_qskew) = skew;
> +			ts_mark.ts = rte_cpu_to_be_64(skew);
> +		} else if (tx_pkt_times[1]) {
> +			skew +=	tx_pkt_times[1];
> +			pkt->ol_flags |= ts_mask;
> +			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) = skew;
> +			RTE_PER_LCORE(ts_qskew) = skew;
> +			ts_mark.ts = rte_cpu_to_be_64(skew);
> +		} else {
> +			ts_mark.ts = RTE_BE64(0);
> +		}
> +		copy_buf_to_pkt(&ts_mark, sizeof(ts_mark), pkt,
> +			sizeof(struct rte_ether_hdr) +
> +			sizeof(struct rte_ipv4_hdr) +
> +			sizeof(pkt_udp_hdr));

timestamp data seems put into packet payload, I assume this is for debug, is
there any intendent target app for the packets?

<...>

>  static void
> -tx_only_begin(__rte_unused portid_t pi)
> +tx_only_begin(portid_t pi)
>  {
>  	uint16_t pkt_data_len;
> +	int dynf;
>  
>  	pkt_data_len = (uint16_t) (tx_pkt_length - (
>  					sizeof(struct rte_ether_hdr) +
>  					sizeof(struct rte_ipv4_hdr) +
>  					sizeof(struct rte_udp_hdr)));
>  	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr, pkt_data_len);
> +
> +	ts_enable = false;
> +	ts_mask = 0;
> +	ts_off = -1;
> +	RTE_PER_LCORE(ts_qskew) = 0;
> +	dynf = rte_mbuf_dynflag_lookup
> +				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
> +	if (dynf >= 0)
> +		ts_mask = 1ULL << dynf;
> +	dynf = rte_mbuf_dynfield_lookup
> +				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
> +	if (dynf >= 0)
> +		ts_off = dynf;
> +	ts_enable = tx_pkt_times[0] && ts_mask && ts_off >= 0 &&
> +		    !rte_eth_read_clock(pi, &ts_init[pi]);

'rte_eth_read_clock()' support is a 'must' to have this feature. Can you please
clarify this in the document/commit_log?

<...>

>  }
>  
>  struct fwd_engine tx_only_engine = {
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index a808b6a..00413cc 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -266,7 +266,7 @@ show config
>  Displays the configuration of the application.
>  The configuration comes from the command-line, the runtime or the application defaults::
>  
> -   testpmd> show config (rxtx|cores|fwd|txpkts)
> +   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
>  
>  The available information categories are:
>  
> @@ -278,6 +278,8 @@ The available information categories are:
>  
>  * ``txpkts``: Packets to TX configuration.
>  
> +* ``txtimes``: Burst time pattern for Tx only mode.

The description is not clear, can you pleaes give a little more detail?

> +
>  For example:
>  
>  .. code-block:: console
> @@ -725,6 +727,38 @@ Set the length of each segment of the TX-ONLY packets or length of packet for FL
>  
>  Where x[,y]* represents a CSV list of values, without white space.
>  
> +set txtimes
> +~~~~~~~~~~

WARNING: Title underline too short.

> +
> +Configure the timing burst pattern for Tx only mode. This command enables
> +the packet send scheduling on dynamic timestamp mbuf field and configures
> +timing pattern in Tx only mode. In this mode, if scheduling is enabled
> +application provides timestamps in the packets being sent. It is possible
> +to configure delay (in unspecified device clock units) between bursts
> +and between the packets within the burst::
> +
> +   testpmd> set txtimes (inter),(intra)
> +
> +where:
> +
> +* ``inter``  is the delay between the bursts in the device clock units.
> +  If ``intra`` is zero, this is the time between the beginnings of the
> +  first packets in the neighbour bursts, if ``intra`` is not zero,
> +  ``inter`` specifies the time between the beginning of the first packet
> +  of the current burst and the beginning of the last packet of the
> +  previous burst. If ``inter`` parameter is zero the send scheduling
> +  on timestamps is disabled (default).
> +
> +* ``intra`` is the delay between the packets within the burst specified
> +  in the device clock units. The number of packets in the burst is defined
> +  by regular burst setting. If ``intra`` parameter is zero no timestamps
> +  provided in the packets excepting  the first one in the burst.
> +
> +As the result the bursts of packet will be transmitted with specific
> +delays between the packets within the burst and specific delay between
> +the bursts. The rte_eth_get_clock() is supposed to be engaged to get the

'rte_eth_read_clock()'?


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-09 23:47   ` [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling Ferruh Yigit
@ 2020-07-10 12:32     ` Slava Ovsiienko
  0 siblings, 0 replies; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-10 12:32 UTC (permalink / raw)
  To: Ferruh Yigit, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger,
	thomas, Andrew Rybchenko

Hi, Ferruh


Thanks a lot for the review.

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Friday, July 10, 2020 2:47
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@monjalon.com; Andrew Rybchenko
> <arybchenko@solarflare.com>
> Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx
> scheduling
> 
> On 7/9/2020 1:36 PM, Viacheslav Ovsiienko wrote:
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> 
> Is this a HW support, or is the scheduling planned to be done in the driver?
Yes, mlx5 PMD feature v1 is sent: http://patches.dpdk.org/patch/73714/

> 
> >
> > The main objective of this RFC is to specify the way how applications
> 
> It is no more RFC.
Oops, miscopy. Thanks.

> 
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> 
> I was about the ask this, will there be a PMD counterpart implementation of
> the feature? It would be better to have it as part of this set.
> What is the plan for the PMD implementation?
Please, see above.
> 
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > The device clock is opaque entity, the units and frequency are vendor
> > specific and might depend on hardware capabilities and configurations.
> > If might (or not) be synchronized with real time via PTP, might (or
> > not) be synchronous with CPU clock (for example if NIC and CPU share
> > the same clock source there might be no any drift between the NIC and
> > CPU clocks), etc.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well. Having the dedicated flags
> > for Rx/Tx timestamps allows applications not to perform explicit flags
> > reset on forwarding and not to promote received timestamps to the
> > transmitting datapath by default. The static PKT_RX_TIMESTAMP is
> > considered as candidate to become the dynamic flag.
> 
> Is there a deprecation notice for 'PKT_RX_TIMESTAMP'? Is this decided?
No, we are going to discuss that, the Rx timestamp is a good candidate to be
moved out from the first mbuf cacheline to the dynamic field.
There are good chances we will deprecate fixed Rx timestamp flag/field,
that's why we'd prefer not to rely on ones anymore.

> 
> >
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. If the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> >
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.
> >
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> 
> Good to hear that there will be a generic API to get supported dynamic flags.
> I was concerned about adding 'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP'
> flag, since not sure if there will be any other PMD that will want to use it.
> The trouble is it is hard to remove a public macro after it is introduced, in this
> release I think only single PMD (mlx) will support this feature, and in next
> release the plan is to remove the macro. In this case what do you think to
> not introduce the flag at all?

Currently no other way to report/control the port caps/cfg, but these xx_OFFLOAD_xx flags.
If the new side-channel API to report/control very specific PMD caps is introduced
this should be consistent with OFFLOAD flags, ie., if cap is disabled via the new API
it will be reflected in OFFLOAD flags either. The new API is questionable, the OFFLOAD
flags is not scarce resource, the offload field can be extended and we are still far
from exhausting the existing one. So, I replaced the "will" with "might" in commit
message. Not sure we should remove this flag, we can keep this consistent.

> 
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> >
> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <burst_gap>,<intra_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> 
> 'rte_eth_read_clock()'?
Sure, my bad.

> 
> > current device clock value and provide the reference for the timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > Acked-by: Olivier Matz <olivier.matz@6wind.com>
> >
> > ---
> >   v1->v4:
> >      - dedicated dynamic Tx timestamp flag instead of shared with Rx
> >   v4->v5:
> >      - elaborated commit message
> >      - more words about device clocks added,
> >      - note about dedicated Rx/Tx timestamp flags added
> >   v5->v6:
> >      - release notes are updated
> > ---
> >  doc/guides/rel_notes/release_20_08.rst |  6 ++++++
> >  lib/librte_ethdev/rte_ethdev.c         |  1 +
> >  lib/librte_ethdev/rte_ethdev.h         |  4 ++++
> >  lib/librte_mbuf/rte_mbuf_dyn.h         | 31
> +++++++++++++++++++++++++++++++
> >  4 files changed, 42 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_20_08.rst
> > b/doc/guides/rel_notes/release_20_08.rst
> > index 988474c..5527bab 100644
> > --- a/doc/guides/rel_notes/release_20_08.rst
> > +++ b/doc/guides/rel_notes/release_20_08.rst
> > @@ -200,6 +200,12 @@ New Features
> >    See the :doc:`../sample_app_ug/l2_forward_real_virtual` for more
> >    details of this parameter usage.
> >
> > +* **Introduced send packet scheduling on the timestamps.**
> > +
> > +  Added the new mbuf dynamic field and flag to provide timestamp on
> > + what packet  transmitting can be synchronized. The device Tx offload
> > + flag is added to  indicate the PMD supports send scheduling.
> > +
> 
> This is a core library change, can go up in the section, please check the
> section comment for the ordering details.
> 
Done.

> >
> >  Removed Items
> >  -------------
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 7022bd7..c48ca2a 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
> >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> >  };
> >
> >  #undef RTE_TX_OFFLOAD_BIT2STR
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index 631b146..97313a0 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> 
> Please cc the ethdev maintainers.
> 
> As mentioned above my concern is if this is generic enough or are we adding
> a flag to a specific PMD? And since commit log says this is temporary
> solution for just this release, I repeat my question if we can remove the flag
> completely?
Will remove "temporary", replace with might. And now I do not think this flag
will be actually removed. As this feature development proved - it is on right place
and is easy to use by PMD and application in  standardized (for offloads)way.

With best regards, Slava


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (6 preceding siblings ...)
  2020-07-09 12:36 ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
@ 2020-07-10 12:39 ` Viacheslav Ovsiienko
  2020-07-10 12:39   ` [dpdk-dev] [PATCH v7 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
  2020-07-10 15:46   ` [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling Slava Ovsiienko
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-10 12:39 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, arybchenko, thomas, ferruh.yigit

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this patchset is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature
from mlx5 PMD side [1].

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
deprecation and obsoleting, these dynamic flag and field might be
used to manage the timestamps on receiving datapath as well. Having
the dedicated flags for Rx/Tx timestamps allows applications not
to perform explicit flags reset on forwarding and not to promote
received timestamps to the transmitting datapath by default.
The static PKT_RX_TIMESTAMP is considered as candidate to become
the dynamic flag and this move should be discussed.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and might be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_read_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

[1] http://patches.dpdk.org/patch/73714/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
  v1->v4:
     - dedicated dynamic Tx timestamp flag instead of shared with Rx
  v4->v5:
     - elaborated commit message
     - more words about device clocks added,
     - note about dedicated Rx/Tx timestamp flags added
  v5->v6:
     - release notes are updated
  v6->v7:
     - commit message is updated
     - testpmd checks the supported offloads before registering
       dynamic timestamp flag/field
---
 doc/guides/rel_notes/release_20_08.rst |  7 +++++++
 lib/librte_ethdev/rte_ethdev.c         |  1 +
 lib/librte_ethdev/rte_ethdev.h         |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 43 insertions(+)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 988474c..bdea389 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -81,6 +81,13 @@ New Features
   Added the RegEx library which provides an API for offload of regular
   expressions search operations to hardware or software accelerator devices.
 
+
+* **Introduced send packet scheduling on the timestamps.**
+
+   Added the new mbuf dynamic field and flag to provide timestamp on what packet
+   transmitting can be synchronized. The device Tx offload flag is added to
+   indicate the PMD supports send scheduling.
+
 * **Updated PCAP driver.**
 
   Updated PCAP driver with new features and improvements, including:
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7022bd7..c48ca2a 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 631b146..97313a0 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..8407230 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value for the packets being sent, this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v7 2/2] app/testpmd: add send scheduling test capability
  2020-07-10 12:39 ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
@ 2020-07-10 12:39   ` Viacheslav Ovsiienko
  2020-07-10 15:46   ` [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling Slava Ovsiienko
  1 sibling, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-10 12:39 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, arybchenko, thomas, ferruh.yigit

This commit adds testpmd capability to provide timestamps on the packets
being sent in the txonly mode. This includes:

 - SEND_ON_TIMESTAMP support
   new device Tx offload capability support added, example:

     testpmd> port config 0 tx_offload send_on_timestamp on

 - set txtimes, registers field and flag, example:

     testpmd> set txtimes 1000000,0

   This command enables the packet send scheduling on timestamps if
   the first parameter is not zero, generic format:

     testpmd> set txtimes (inter),(intra)

   where:

     inter - is the delay between the bursts in the device clock units.
     If "intra" (next parameter) is zero, this is the time between the
     beginnings of the first packets in the neighbour bursts, if "intra"
     is not zero, "inter" specifies the time between the beginning of the
     first packet of the current burst and the beginning of the last packet
     of the previous burst. If "inter"parameter is zero the send scheduling
     on timestamps is disabled (default).

     intra - is the delay between the packets within the burst specified
     in the device clock units. The number of packets in the burst is
     defined by regular burst setting. If "intra" parameter is zero no
     timestamps provided in the packets excepting  the first one in
     the burst.

     As the result the bursts of packet will be transmitted with
     specific delay between the packets within the burst and specific
     delay between the bursts. The rte_eth_read_clock() is supposed to be
     engaged to get the current device clock value and provide
     the reference for the timestamps. If there is no supported
     rte_eth_read_clock() there will be no provided send scheduling
     on the device.

 - show txtimes, displays the timing settings
 - txonly burst time pattern

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 app/test-pmd/cmdline.c                      | 63 ++++++++++++++++++++--
 app/test-pmd/config.c                       | 61 +++++++++++++++++++++
 app/test-pmd/testpmd.c                      |  6 +++
 app/test-pmd/testpmd.h                      |  4 ++
 app/test-pmd/txonly.c                       | 83 +++++++++++++++++++++++++++--
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 38 ++++++++++++-
 6 files changed, 246 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 39ad938..def0709 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -294,6 +294,10 @@ static void cmd_help_long_parsed(void *parsed_result,
 			" Right now only applicable for CSUM and TXONLY"
 			" modes\n\n"
 
+			"set txtimes (x, y)\n"
+			"    Set the scheduling on timestamps"
+			" timings for the TXOMLY mode\n\n"
+
 			"set corelist (x[,y]*)\n"
 			"    Set the list of forwarding cores.\n\n"
 
@@ -3930,6 +3934,52 @@ struct cmd_set_txsplit_result {
 	},
 };
 
+/* *** SET TIMES FOR TXONLY PACKETS SCHEDULING ON TIMESTAMPS *** */
+
+struct cmd_set_txtimes_result {
+	cmdline_fixed_string_t cmd_keyword;
+	cmdline_fixed_string_t txtimes;
+	cmdline_fixed_string_t tx_times;
+};
+
+static void
+cmd_set_txtimes_parsed(void *parsed_result,
+		       __rte_unused struct cmdline *cl,
+		       __rte_unused void *data)
+{
+	struct cmd_set_txtimes_result *res;
+	unsigned int tx_times[2] = {0, 0};
+	unsigned int n_times;
+
+	res = parsed_result;
+	n_times = parse_item_list(res->tx_times, "tx times",
+				  2, tx_times, 0);
+	if (n_times == 2)
+		set_tx_pkt_times(tx_times);
+}
+
+cmdline_parse_token_string_t cmd_set_txtimes_keyword =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 cmd_keyword, "set");
+cmdline_parse_token_string_t cmd_set_txtimes_name =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 txtimes, "txtimes");
+cmdline_parse_token_string_t cmd_set_txtimes_value =
+	TOKEN_STRING_INITIALIZER(struct cmd_set_txtimes_result,
+				 tx_times, NULL);
+
+cmdline_parse_inst_t cmd_set_txtimes = {
+	.f = cmd_set_txtimes_parsed,
+	.data = NULL,
+	.help_str = "set txtimes <inter_burst>,<intra_burst>",
+	.tokens = {
+		(void *)&cmd_set_txtimes_keyword,
+		(void *)&cmd_set_txtimes_name,
+		(void *)&cmd_set_txtimes_value,
+		NULL,
+	},
+};
+
 /* *** ADD/REMOVE ALL VLAN IDENTIFIERS TO/FROM A PORT VLAN RX FILTER *** */
 struct cmd_rx_vlan_filter_all_result {
 	cmdline_fixed_string_t rx_vlan;
@@ -7418,6 +7468,8 @@ static void cmd_showcfg_parsed(void *parsed_result,
 		pkt_fwd_config_display(&cur_fwd_config);
 	else if (!strcmp(res->what, "txpkts"))
 		show_tx_pkt_segments();
+	else if (!strcmp(res->what, "txtimes"))
+		show_tx_pkt_times();
 }
 
 cmdline_parse_token_string_t cmd_showcfg_show =
@@ -7426,12 +7478,12 @@ static void cmd_showcfg_parsed(void *parsed_result,
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, cfg, "config");
 cmdline_parse_token_string_t cmd_showcfg_what =
 	TOKEN_STRING_INITIALIZER(struct cmd_showcfg_result, what,
-				 "rxtx#cores#fwd#txpkts");
+				 "rxtx#cores#fwd#txpkts#txtimes");
 
 cmdline_parse_inst_t cmd_showcfg = {
 	.f = cmd_showcfg_parsed,
 	.data = NULL,
-	.help_str = "show config rxtx|cores|fwd|txpkts",
+	.help_str = "show config rxtx|cores|fwd|txpkts|txtimes",
 	.tokens = {
 		(void *)&cmd_showcfg_show,
 		(void *)&cmd_showcfg_port,
@@ -18670,7 +18722,8 @@ struct cmd_config_per_port_tx_offload_result {
 			  "sctp_cksum#tcp_tso#udp_tso#outer_ipv4_cksum#"
 			  "qinq_insert#vxlan_tnl_tso#gre_tnl_tso#"
 			  "ipip_tnl_tso#geneve_tnl_tso#macsec_insert#"
-			  "mt_lockfree#multi_segs#mbuf_fast_free#security");
+			  "mt_lockfree#multi_segs#mbuf_fast_free#security#"
+			  "send_on_timestamp");
 cmdline_parse_token_string_t cmd_config_per_port_tx_offload_result_on_off =
 	TOKEN_STRING_INITIALIZER
 		(struct cmd_config_per_port_tx_offload_result,
@@ -18755,7 +18808,8 @@ struct cmd_config_per_port_tx_offload_result {
 		    "sctp_cksum|tcp_tso|udp_tso|outer_ipv4_cksum|"
 		    "qinq_insert|vxlan_tnl_tso|gre_tnl_tso|"
 		    "ipip_tnl_tso|geneve_tnl_tso|macsec_insert|"
-		    "mt_lockfree|multi_segs|mbuf_fast_free|security on|off",
+		    "mt_lockfree|multi_segs|mbuf_fast_free|security|"
+		    "send_on_timestamp on|off",
 	.tokens = {
 		(void *)&cmd_config_per_port_tx_offload_result_port,
 		(void *)&cmd_config_per_port_tx_offload_result_config,
@@ -19427,6 +19481,7 @@ struct cmd_showport_macs_result {
 	(cmdline_parse_inst_t *)&cmd_set_log,
 	(cmdline_parse_inst_t *)&cmd_set_txpkts,
 	(cmdline_parse_inst_t *)&cmd_set_txsplit,
+	(cmdline_parse_inst_t *)&cmd_set_txtimes,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_list,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mask,
 	(cmdline_parse_inst_t *)&cmd_set_fwd_mode,
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 53a9b97..48a8790 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1046,6 +1046,15 @@ static int bus_match_all(const struct rte_bus *bus, const void *data)
 			printf("off\n");
 	}
 
+	if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP) {
+		printf("Tx scheduling on timestamp:    ");
+		if (ports[port_id].dev_conf.txmode.offloads &
+		    DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP)
+			printf("on\n");
+		else
+			printf("off\n");
+	}
+
 }
 
 int
@@ -3067,6 +3076,58 @@ struct igb_ring_desc_16_bytes {
 }
 
 void
+show_tx_pkt_times(void)
+{
+	printf("Interburst gap: %u\n", tx_pkt_times_inter);
+	printf("Intraburst gap: %u\n", tx_pkt_times_intra);
+}
+
+void
+set_tx_pkt_times(unsigned int *tx_times)
+{
+	uint16_t port_id;
+	int offload_found = 0;
+	int offset;
+	int flag;
+
+	static const struct rte_mbuf_dynfield desc_offs = {
+		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
+		.size = sizeof(uint64_t),
+		.align = __alignof__(uint64_t),
+	};
+	static const struct rte_mbuf_dynflag desc_flag = {
+		.name = RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME,
+	};
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		struct rte_eth_dev_info dev_info = { 0 };
+		int ret;
+
+		ret = rte_eth_dev_info_get(port_id, &dev_info);
+		if (ret == 0 && dev_info.tx_offload_capa &
+				DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP) {
+			offload_found = 1;
+			break;
+		}
+	}
+	if (!offload_found) {
+		printf("No device supporting Tx timestamp scheduling found, "
+		       "dynamic flag and field not registered\n");
+		return;
+	}
+	offset = rte_mbuf_dynfield_register(&desc_offs);
+	if (offset < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp field registration error: %d",
+		       rte_errno);
+	flag = rte_mbuf_dynflag_register(&desc_flag);
+	if (flag < 0 && rte_errno != EEXIST)
+		printf("Dynamic timestamp flag registration error: %d",
+		       rte_errno);
+	tx_pkt_times_inter = tx_times[0];
+	tx_pkt_times_intra = tx_times[1];
+}
+
+void
 setup_gro(const char *onoff, portid_t port_id)
 {
 	if (!rte_eth_dev_is_valid_port(port_id)) {
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index f5feb62..d1216e0 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -223,6 +223,12 @@ struct fwd_engine * fwd_engines[] = {
 uint8_t txonly_multi_flow;
 /**< Whether multiple flows are generated in TXONLY mode. */
 
+uint32_t tx_pkt_times_inter;
+/**< Timings for send scheduling in TXONLY mode, time between bursts. */
+
+uint32_t tx_pkt_times_intra;
+/**< Timings for send scheduling in TXONLY mode, time between packets. */
+
 uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */
 uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */
 
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index a803750..9825649 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -442,6 +442,8 @@ struct queue_stats_mappings {
 extern uint16_t tx_pkt_length; /**< Length of TXONLY packet */
 extern uint16_t tx_pkt_seg_lengths[RTE_MAX_SEGS_PER_PKT]; /**< Seg. lengths */
 extern uint8_t  tx_pkt_nb_segs; /**< Number of segments in TX packets */
+extern uint32_t tx_pkt_times_intra;
+extern uint32_t tx_pkt_times_inter;
 
 enum tx_pkt_split {
 	TX_PKT_SPLIT_OFF,
@@ -794,6 +796,8 @@ void vlan_tpid_set(portid_t port_id, enum rte_vlan_type vlan_type,
 void set_verbose_level(uint16_t vb_level);
 void set_tx_pkt_segments(unsigned *seg_lengths, unsigned nb_segs);
 void show_tx_pkt_segments(void);
+void set_tx_pkt_times(unsigned int *tx_times);
+void show_tx_pkt_times(void);
 void set_tx_pkt_split(const char *name);
 void set_nb_pkt_per_burst(uint16_t pkt_burst);
 char *list_pkt_forwarding_modes(void);
diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 076ccaf..97f4a45 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -53,6 +53,12 @@
 static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted packets. */
 RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address variation */
 static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header of tx packets. */
+RTE_DEFINE_PER_LCORE(uint64_t, timestamp_qskew);
+					/**< Timestamp offset per queue */
+static uint64_t timestamp_mask; /**< Timestamp dynamic flag mask */
+static int32_t timestamp_off; /**< Timestamp dynamic field offset */
+static bool timestamp_enable; /**< Timestamp enable */
+static uint64_t timestamp_initial[RTE_MAX_ETHPORTS];
 
 static void
 copy_buf_to_pkt_segs(void* buf, unsigned len, struct rte_mbuf *pkt,
@@ -150,7 +156,8 @@
 static inline bool
 pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp,
 		struct rte_ether_hdr *eth_hdr, const uint16_t vlan_tci,
-		const uint16_t vlan_tci_outer, const uint64_t ol_flags)
+		const uint16_t vlan_tci_outer, const uint64_t ol_flags,
+		const uint16_t idx, const struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT];
 	struct rte_mbuf *pkt_seg;
@@ -213,6 +220,53 @@
 	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
 			sizeof(struct rte_ether_hdr) +
 			sizeof(struct rte_ipv4_hdr));
+	if (unlikely(timestamp_enable)) {
+		uint64_t skew = RTE_PER_LCORE(timestamp_qskew);
+		struct {
+			rte_be32_t signature;
+			rte_be16_t pkt_idx;
+			rte_be16_t queue_idx;
+			rte_be64_t ts;
+		} timestamp_mark;
+
+		if (unlikely(!skew)) {
+			struct rte_eth_dev *dev = &rte_eth_devices[fs->tx_port];
+			unsigned int txqs_n = dev->data->nb_tx_queues;
+			uint64_t phase = tx_pkt_times_inter * fs->tx_queue /
+					 (txqs_n ? txqs_n : 1);
+			/*
+			 * Initialize the scheduling time phase shift
+			 * depending on queue index.
+			 */
+			skew = timestamp_initial[fs->tx_port] +
+			       tx_pkt_times_inter + phase;
+			RTE_PER_LCORE(timestamp_qskew) = skew;
+		}
+		timestamp_mark.pkt_idx = rte_cpu_to_be_16(idx);
+		timestamp_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
+		timestamp_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
+		if (unlikely(!idx)) {
+			skew +=	tx_pkt_times_inter;
+			pkt->ol_flags |= timestamp_mask;
+			*RTE_MBUF_DYNFIELD
+				(pkt, timestamp_off, uint64_t *) = skew;
+			RTE_PER_LCORE(timestamp_qskew) = skew;
+			timestamp_mark.ts = rte_cpu_to_be_64(skew);
+		} else if (tx_pkt_times_intra) {
+			skew +=	tx_pkt_times_intra;
+			pkt->ol_flags |= timestamp_mask;
+			*RTE_MBUF_DYNFIELD
+				(pkt, timestamp_off, uint64_t *) = skew;
+			RTE_PER_LCORE(timestamp_qskew) = skew;
+			timestamp_mark.ts = rte_cpu_to_be_64(skew);
+		} else {
+			timestamp_mark.ts = RTE_BE64(0);
+		}
+		copy_buf_to_pkt(&timestamp_mark, sizeof(timestamp_mark), pkt,
+			sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(pkt_udp_hdr));
+	}
 	/*
 	 * Complete first mbuf of packet and append it to the
 	 * burst of packets to be transmitted.
@@ -275,7 +329,8 @@
 			if (unlikely(!pkt_burst_prepare(pkts_burst[nb_pkt], mbp,
 							&eth_hdr, vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_mempool_put_bulk(mbp,
 						(void **)&pkts_burst[nb_pkt],
 						nb_pkt_per_burst - nb_pkt);
@@ -290,7 +345,8 @@
 			if (unlikely(!pkt_burst_prepare(pkt, mbp, &eth_hdr,
 							vlan_tci,
 							vlan_tci_outer,
-							ol_flags))) {
+							ol_flags,
+							nb_pkt, fs))) {
 				rte_pktmbuf_free(pkt);
 				break;
 			}
@@ -302,6 +358,7 @@
 		return;
 
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
+
 	/*
 	 * Retry if necessary
 	 */
@@ -342,15 +399,33 @@
 }
 
 static void
-tx_only_begin(__rte_unused portid_t pi)
+tx_only_begin(portid_t pi)
 {
 	uint16_t pkt_data_len;
+	int dynf;
 
 	pkt_data_len = (uint16_t) (tx_pkt_length - (
 					sizeof(struct rte_ether_hdr) +
 					sizeof(struct rte_ipv4_hdr) +
 					sizeof(struct rte_udp_hdr)));
 	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr, pkt_data_len);
+
+	timestamp_enable = false;
+	timestamp_mask = 0;
+	timestamp_off = -1;
+	RTE_PER_LCORE(timestamp_qskew) = 0;
+	dynf = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		timestamp_mask = 1ULL << dynf;
+	dynf = rte_mbuf_dynfield_lookup
+				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (dynf >= 0)
+		timestamp_off = dynf;
+	timestamp_enable = tx_pkt_times_inter &&
+			   timestamp_mask &&
+			   timestamp_off >= 0 &&
+			   !rte_eth_read_clock(pi, &timestamp_initial[pi]);
 }
 
 struct fwd_engine tx_only_engine = {
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a..3f54787 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -266,7 +266,7 @@ show config
 Displays the configuration of the application.
 The configuration comes from the command-line, the runtime or the application defaults::
 
-   testpmd> show config (rxtx|cores|fwd|txpkts)
+   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
 
 The available information categories are:
 
@@ -278,6 +278,8 @@ The available information categories are:
 
 * ``txpkts``: Packets to TX configuration.
 
+* ``txtimes``: Burst time pattern for Tx only mode.
+
 For example:
 
 .. code-block:: console
@@ -725,6 +727,40 @@ Set the length of each segment of the TX-ONLY packets or length of packet for FL
 
 Where x[,y]* represents a CSV list of values, without white space.
 
+set txtimes
+~~~~~~~~~~~
+
+Configure the timing burst pattern for Tx only mode. This command enables
+the packet send scheduling on dynamic timestamp mbuf field and configures
+timing pattern in Tx only mode. In this mode, if scheduling is enabled
+application provides timestamps in the packets being sent. It is possible
+to configure delay (in unspecified device clock units) between bursts
+and between the packets within the burst::
+
+   testpmd> set txtimes (inter),(intra)
+
+where:
+
+* ``inter``  is the delay between the bursts in the device clock units.
+  If ``intra`` is zero, this is the time between the beginnings of the
+  first packets in the neighbour bursts, if ``intra`` is not zero,
+  ``inter`` specifies the time between the beginning of the first packet
+  of the current burst and the beginning of the last packet of the
+  previous burst. If ``inter`` parameter is zero the send scheduling
+  on timestamps is disabled (default).
+
+* ``intra`` is the delay between the packets within the burst specified
+  in the device clock units. The number of packets in the burst is defined
+  by regular burst setting. If ``intra`` parameter is zero no timestamps
+  provided in the packets excepting  the first one in the burst.
+
+As the result the bursts of packet will be transmitted with specific
+delays between the packets within the burst and specific delay between
+the bursts. The rte_eth_read_clock() must be supported by the device(s)
+and is supposed to be engaged to get the current device clock value
+and provide the reference for the timestamps. If there is no supported
+rte_eth_read_clock() there will be no send scheduling provided on the port.
+
 set txsplit
 ~~~~~~~~~~~
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v6 2/2] app/testpmd: add send scheduling test capability
  2020-07-09 23:58     ` Ferruh Yigit
@ 2020-07-10 12:41       ` Slava Ovsiienko
  0 siblings, 0 replies; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-10 12:41 UTC (permalink / raw)
  To: Ferruh Yigit, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

Hi Ferruh,

Thanks a lot for the comments, addressed all of them.

With best regards, Slava

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Friday, July 10, 2020 2:58
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@monjalon.com
> Subject: Re: [dpdk-dev] [PATCH v6 2/2] app/testpmd: add send scheduling
> test capability
> 
> On 7/9/2020 1:36 PM, Viacheslav Ovsiienko wrote:
> > This commit adds testpmd capability to provide timestamps on the
> > packets being sent in the txonly mode. This includes:
> >
> >  - SEND_ON_TIMESTAMP support
> >    new device Tx offload capability support added, example:
> >
> >      testpmd> port config 0 tx_offload send_on_timestamp on
> >
> >  - set txtimes, registers field and flag, example:
> >
> >      testpmd> set txtimes 1000000,0
> >
> >    This command enables the packet send scheduling on timestamps if
> >    the first parameter is not zero, generic format:
> >
> >      testpmd> set txtimes (inter),(intra)
> >
> >    where:
> >
> >      inter - is the delay between the bursts in the device clock units.
> >      intra - is the delay between the packets within the burst specified
> > 	     in the device clock units
> >
> >      As the result the bursts of packet will be transmitted with
> >      specific delay between the packets within the burst and specific
> >      delay between the bursts. The rte_eth_get_clock() is supposed to be
> >      engaged to get the current device clock value and provide
> >      the reference for the timestamps.
> >
> >  - show txtimes, displays the timing settings
> >  - txonly burst time pattern
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> <...>
> 
> > +cmdline_parse_inst_t cmd_set_txtimes = {
> > +	.f = cmd_set_txtimes_parsed,
> > +	.data = NULL,
> > +	.help_str = "set txtimes <inter_burst>,<intra_burst>",
> > +	.tokens = {
> > +		(void *)&cmd_set_txtimes_keyword,
> > +		(void *)&cmd_set_txtimes_name,
> > +		(void *)&cmd_set_txtimes_value,
> > +		NULL,
> > +	},
> > +};
> 
> Can you please update 'cmd_help_long_parsed()' with command updates?
> 
> <...>
> 
> >  void
> > +show_tx_pkt_times(void)
> > +{
> > +	printf("Interburst gap: %u\n", tx_pkt_times[0]);
> > +	printf("Intraburst gap: %u\n", tx_pkt_times[1]); }
> > +
> > +void
> > +set_tx_pkt_times(unsigned int *tx_times) {
> > +	int offset;
> > +	int flag;
> > +
> > +	static const struct rte_mbuf_dynfield desc_offs = {
> > +		.name = RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
> > +		.size = sizeof(uint64_t),
> > +		.align = __alignof__(uint64_t),
> > +	};
> > +	static const struct rte_mbuf_dynflag desc_flag = {
> > +		.name = RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME,
> > +	};
> > +
> > +	offset = rte_mbuf_dynfield_register(&desc_offs);
> > +	if (offset < 0 && rte_errno != EEXIST)
> > +		printf("Dynamic timestamp field registration error: %d",
> > +		       rte_errno);
> > +	flag = rte_mbuf_dynflag_register(&desc_flag);
> > +	if (flag < 0 && rte_errno != EEXIST)
> > +		printf("Dynamic timestamp flag registration error: %d",
> > +		       rte_errno);
> 
> You are not checking at all if the device supports
> 'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP' offload or if it is configured or
> not, but blindly registering the dynamic fields.
> 'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP' seems not really used, as
> mentioned in prev patch I would be OK to drop the flag.
> 
> > +	tx_pkt_times[0] = tx_times[0];
> > +	tx_pkt_times[1] = tx_times[1];
> 
> I think it is better to rename 'tx_pkt_times[0]' -> 'tx_pkt_times_inter',
> 'tx_pkt_times[1]' --> 'tx_pkt_times_intra' to increase code readability.
> 
> <...>
> 
> > --- a/app/test-pmd/txonly.c
> > +++ b/app/test-pmd/txonly.c
> > @@ -53,6 +53,11 @@
> >  static struct rte_ipv4_hdr pkt_ip_hdr; /**< IP header of transmitted
> > packets. */  RTE_DEFINE_PER_LCORE(uint8_t, _ip_var); /**< IP address
> > variation */  static struct rte_udp_hdr pkt_udp_hdr; /**< UDP header
> > of tx packets. */
> > +RTE_DEFINE_PER_LCORE(uint64_t, ts_qskew); /**< Timestamp offset per
> > +queue */ static uint64_t ts_mask; /**< Timestamp dynamic flag mask */
> > +static int32_t ts_off; /**< Timestamp dynamic field offset */ static
> > +bool ts_enable; /**< Timestamp enable */ static uint64_t
> > +ts_init[RTE_MAX_ETHPORTS];
> 
> What do you think renaming the 'ts_' prefix to long 'timestamp_' prefix, will
> variable names be too long? When you are out of this patch context and
> reading code 'ts_init' is not that expressive.
> 
> <...>
> 
> > @@ -213,6 +219,50 @@
> >  	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
> >  			sizeof(struct rte_ether_hdr) +
> >  			sizeof(struct rte_ipv4_hdr));
> > +	if (unlikely(ts_enable)) {
> > +		uint64_t skew = RTE_PER_LCORE(ts_qskew);
> > +		struct {
> > +			rte_be32_t signature;
> > +			rte_be16_t pkt_idx;
> > +			rte_be16_t queue_idx;
> > +			rte_be64_t ts;
> > +		} ts_mark;
> > +
> > +		if (unlikely(!skew)) {
> > +			struct rte_eth_dev *dev = &rte_eth_devices[fs-
> >tx_port];
> > +			unsigned int txqs_n = dev->data->nb_tx_queues;
> > +			uint64_t phase = tx_pkt_times[0] * fs->tx_queue /
> > +					 (txqs_n ? txqs_n : 1);
> > +			/*
> > +			 * Initialize the scheduling time phase shift
> > +			 * depending on queue index.
> > +			 */
> > +			skew = ts_init[fs->tx_port] + tx_pkt_times[0] +
> phase;
> > +			RTE_PER_LCORE(ts_qskew) = skew;
> > +		}
> > +		ts_mark.pkt_idx = rte_cpu_to_be_16(idx);
> > +		ts_mark.queue_idx = rte_cpu_to_be_16(fs->tx_queue);
> > +		ts_mark.signature = rte_cpu_to_be_32(0xBEEFC0DE);
> > +		if (unlikely(!idx)) {
> > +			skew +=	tx_pkt_times[0];
> > +			pkt->ol_flags |= ts_mask;
> > +			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) =
> skew;
> > +			RTE_PER_LCORE(ts_qskew) = skew;
> > +			ts_mark.ts = rte_cpu_to_be_64(skew);
> > +		} else if (tx_pkt_times[1]) {
> > +			skew +=	tx_pkt_times[1];
> > +			pkt->ol_flags |= ts_mask;
> > +			*RTE_MBUF_DYNFIELD(pkt, ts_off, uint64_t *) =
> skew;
> > +			RTE_PER_LCORE(ts_qskew) = skew;
> > +			ts_mark.ts = rte_cpu_to_be_64(skew);
> > +		} else {
> > +			ts_mark.ts = RTE_BE64(0);
> > +		}
> > +		copy_buf_to_pkt(&ts_mark, sizeof(ts_mark), pkt,
> > +			sizeof(struct rte_ether_hdr) +
> > +			sizeof(struct rte_ipv4_hdr) +
> > +			sizeof(pkt_udp_hdr));
> 
> timestamp data seems put into packet payload, I assume this is for debug, is
> there any intendent target app for the packets?
> 
> <...>
> 
> >  static void
> > -tx_only_begin(__rte_unused portid_t pi)
> > +tx_only_begin(portid_t pi)
> >  {
> >  	uint16_t pkt_data_len;
> > +	int dynf;
> >
> >  	pkt_data_len = (uint16_t) (tx_pkt_length - (
> >  					sizeof(struct rte_ether_hdr) +
> >  					sizeof(struct rte_ipv4_hdr) +
> >  					sizeof(struct rte_udp_hdr)));
> >  	setup_pkt_udp_ip_headers(&pkt_ip_hdr, &pkt_udp_hdr,
> pkt_data_len);
> > +
> > +	ts_enable = false;
> > +	ts_mask = 0;
> > +	ts_off = -1;
> > +	RTE_PER_LCORE(ts_qskew) = 0;
> > +	dynf = rte_mbuf_dynflag_lookup
> > +
> 	(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
> > +	if (dynf >= 0)
> > +		ts_mask = 1ULL << dynf;
> > +	dynf = rte_mbuf_dynfield_lookup
> > +				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME,
> NULL);
> > +	if (dynf >= 0)
> > +		ts_off = dynf;
> > +	ts_enable = tx_pkt_times[0] && ts_mask && ts_off >= 0 &&
> > +		    !rte_eth_read_clock(pi, &ts_init[pi]);
> 
> 'rte_eth_read_clock()' support is a 'must' to have this feature. Can you please
> clarify this in the document/commit_log?
> 
> <...>
> 
> >  }
> >
> >  struct fwd_engine tx_only_engine = {
> > diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > index a808b6a..00413cc 100644
> > --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > @@ -266,7 +266,7 @@ show config
> >  Displays the configuration of the application.
> >  The configuration comes from the command-line, the runtime or the
> application defaults::
> >
> > -   testpmd> show config (rxtx|cores|fwd|txpkts)
> > +   testpmd> show config (rxtx|cores|fwd|txpkts|txtimes)
> >
> >  The available information categories are:
> >
> > @@ -278,6 +278,8 @@ The available information categories are:
> >
> >  * ``txpkts``: Packets to TX configuration.
> >
> > +* ``txtimes``: Burst time pattern for Tx only mode.
> 
> The description is not clear, can you pleaes give a little more detail?
> 
> > +
> >  For example:
> >
> >  .. code-block:: console
> > @@ -725,6 +727,38 @@ Set the length of each segment of the TX-ONLY
> > packets or length of packet for FL
> >
> >  Where x[,y]* represents a CSV list of values, without white space.
> >
> > +set txtimes
> > +~~~~~~~~~~
> 
> WARNING: Title underline too short.
> 
> > +
> > +Configure the timing burst pattern for Tx only mode. This command
> > +enables the packet send scheduling on dynamic timestamp mbuf field
> > +and configures timing pattern in Tx only mode. In this mode, if
> > +scheduling is enabled application provides timestamps in the packets
> > +being sent. It is possible to configure delay (in unspecified device
> > +clock units) between bursts and between the packets within the burst::
> > +
> > +   testpmd> set txtimes (inter),(intra)
> > +
> > +where:
> > +
> > +* ``inter``  is the delay between the bursts in the device clock units.
> > +  If ``intra`` is zero, this is the time between the beginnings of
> > +the
> > +  first packets in the neighbour bursts, if ``intra`` is not zero,
> > +  ``inter`` specifies the time between the beginning of the first
> > +packet
> > +  of the current burst and the beginning of the last packet of the
> > +  previous burst. If ``inter`` parameter is zero the send scheduling
> > +  on timestamps is disabled (default).
> > +
> > +* ``intra`` is the delay between the packets within the burst
> > +specified
> > +  in the device clock units. The number of packets in the burst is
> > +defined
> > +  by regular burst setting. If ``intra`` parameter is zero no
> > +timestamps
> > +  provided in the packets excepting  the first one in the burst.
> > +
> > +As the result the bursts of packet will be transmitted with specific
> > +delays between the packets within the burst and specific delay
> > +between the bursts. The rte_eth_get_clock() is supposed to be engaged
> > +to get the
> 
> 'rte_eth_read_clock()'?


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-10 12:39 ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
  2020-07-10 12:39   ` [dpdk-dev] [PATCH v7 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
@ 2020-07-10 15:46   ` Slava Ovsiienko
  2020-07-10 22:07     ` Ferruh Yigit
  1 sibling, 1 reply; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-10 15:46 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, arybchenko,
	Thomas Monjalon, ferruh.yigit



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
> Sent: Friday, July 10, 2020 15:40
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> arybchenko@solarflare.com; Thomas Monjalon <thomas@monjalon.net>;
> ferruh.yigit@intel.com
> Subject: [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx
> scheduling
> 
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive) the
> packets at the very precisely specified moment of time provides the
> opportunity to support the connections with Time Division Multiplexing using
> the contemporary general purpose NIC without involving an auxiliary
> hardware. For example, the supporting of O-RAN Fronthaul interface is one
> of the promising features for potentially usage of the precise time
> management for the egress packets.
> 
> The main objective of this patchset is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from mlx5
> PMD side [1].
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not explicitly
> defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return the current
> device timestamp. The dynamic timestamp flag tells whether the field
> contains actual timestamp value. For the packets being sent this value can be
> used by PMD to schedule packet sending.
> 
> The device clock is opaque entity, the units and frequency are vendor specific
> and might depend on hardware capabilities and configurations. If might (or
> not) be synchronized with real time via PTP, might (or not) be synchronous
> with CPU clock (for example if NIC and CPU share the same clock source
> there might be no any drift between the NIC and CPU clocks), etc.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
> deprecation and obsoleting, these dynamic flag and field might be used to
> manage the timestamps on receiving datapath as well. Having the dedicated
> flags for Rx/Tx timestamps allows applications not to perform explicit flags
> reset on forwarding and not to promote received timestamps to the
> transmitting datapath by default.
> The static PKT_RX_TIMESTAMP is considered as candidate to become the
> dynamic flag and this move should be discussed.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it
> tries to synchronize the time of packet appearing on the wire with the
> specified packet timestamp. If the specified one is in the past it should be
> ignored, if one is in the distant future it should be capped with some
> reasonable value (in range of seconds). These specific cases ("too late" and
> "distant future") can be optionally reported via device xstats to assist
> applications to detect the time-related problems.
> 
> There is no any packet reordering according timestamps is supposed, neither
> within packet burst, nor between packets, it is an entirely application
> responsibility to generate packets and its timestamps in desired order. The
> timestamps can be put only in the first packet in the burst providing the
> entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp with
> new offload flag:
> 
> This is palliative and might be replaced with new eth_dev API about
> reporting/managing the supported dynamic flags and its related features.
> This API would break ABI compatibility and can't be introduced at the
> moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific delays
> between the packets within the burst and specific delay between the bursts.
> The rte_eth_read_clock is supposed to be engaged to get the current device
> clock value and provide the reference for the timestamps.
> 
> [1]
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatche
> s.dpdk.org%2Fpatch%2F73714%2F&amp;data=02%7C01%7Cviacheslavo%40
> mellanox.com%7C810609c61c3b466e8f5a08d824ce57f8%7Ca652971c7d2e4
> d9ba6a4d149256f461b%7C0%7C0%7C637299815958194092&amp;sdata=H
> D7efBGOLuYxHd5KLJYJj7RSbiLRVBNm5jdq%2FJv%2FXfk%3D&amp;reserved=
> 0
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

 promote Acked-bt from previous patch version to maintain patchwork 
 status accordingly

Acked-by: Olivier Matz <olivier.matz@6wind.com>

> 
> ---
>   v1->v4:
>      - dedicated dynamic Tx timestamp flag instead of shared with Rx
>   v4->v5:
>      - elaborated commit message
>      - more words about device clocks added,
>      - note about dedicated Rx/Tx timestamp flags added
>   v5->v6:
>      - release notes are updated
>   v6->v7:
>      - commit message is updated
>      - testpmd checks the supported offloads before registering
>        dynamic timestamp flag/field
> ---
>  doc/guides/rel_notes/release_20_08.rst |  7 +++++++
>  lib/librte_ethdev/rte_ethdev.c         |  1 +
>  lib/librte_ethdev/rte_ethdev.h         |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h         | 31
> +++++++++++++++++++++++++++++++
>  4 files changed, 43 insertions(+)
> 

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-10 15:46   ` [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling Slava Ovsiienko
@ 2020-07-10 22:07     ` Ferruh Yigit
  0 siblings, 0 replies; 86+ messages in thread
From: Ferruh Yigit @ 2020-07-10 22:07 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, arybchenko, Thomas Monjalon

On 7/10/2020 4:46 PM, Slava Ovsiienko wrote:
> 
> 
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
>> Sent: Friday, July 10, 2020 15:40
>> To: dev@dpdk.org
>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
>> <rasland@mellanox.com>; olivier.matz@6wind.com;
>> arybchenko@solarflare.com; Thomas Monjalon <thomas@monjalon.net>;
>> ferruh.yigit@intel.com
>> Subject: [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx
>> scheduling
>>
>> There is the requirement on some networks for precise traffic timing
>> management. The ability to send (and, generally speaking, receive) the
>> packets at the very precisely specified moment of time provides the
>> opportunity to support the connections with Time Division Multiplexing using
>> the contemporary general purpose NIC without involving an auxiliary
>> hardware. For example, the supporting of O-RAN Fronthaul interface is one
>> of the promising features for potentially usage of the precise time
>> management for the egress packets.
>>
>> The main objective of this patchset is to specify the way how applications
>> can provide the moment of time at what the packet transmission must be
>> started and to describe in preliminary the supporting this feature from mlx5
>> PMD side [1].
>>
>> The new dynamic timestamp field is proposed, it provides some timing
>> information, the units and time references (initial phase) are not explicitly
>> defined but are maintained always the same for a given port.
>> Some devices allow to query rte_eth_read_clock() that will return the current
>> device timestamp. The dynamic timestamp flag tells whether the field
>> contains actual timestamp value. For the packets being sent this value can be
>> used by PMD to schedule packet sending.
>>
>> The device clock is opaque entity, the units and frequency are vendor specific
>> and might depend on hardware capabilities and configurations. If might (or
>> not) be synchronized with real time via PTP, might (or not) be synchronous
>> with CPU clock (for example if NIC and CPU share the same clock source
>> there might be no any drift between the NIC and CPU clocks), etc.
>>
>> After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
>> deprecation and obsoleting, these dynamic flag and field might be used to
>> manage the timestamps on receiving datapath as well. Having the dedicated
>> flags for Rx/Tx timestamps allows applications not to perform explicit flags
>> reset on forwarding and not to promote received timestamps to the
>> transmitting datapath by default.
>> The static PKT_RX_TIMESTAMP is considered as candidate to become the
>> dynamic flag and this move should be discussed.
>>
>> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it
>> tries to synchronize the time of packet appearing on the wire with the
>> specified packet timestamp. If the specified one is in the past it should be
>> ignored, if one is in the distant future it should be capped with some
>> reasonable value (in range of seconds). These specific cases ("too late" and
>> "distant future") can be optionally reported via device xstats to assist
>> applications to detect the time-related problems.
>>
>> There is no any packet reordering according timestamps is supposed, neither
>> within packet burst, nor between packets, it is an entirely application
>> responsibility to generate packets and its timestamps in desired order. The
>> timestamps can be put only in the first packet in the burst providing the
>> entire burst scheduling.
>>
>> PMD reports the ability to synchronize packet sending on timestamp with
>> new offload flag:
>>
>> This is palliative and might be replaced with new eth_dev API about
>> reporting/managing the supported dynamic flags and its related features.
>> This API would break ABI compatibility and can't be introduced at the
>> moment, so is postponed to 20.11.
>>
>> For testing purposes it is proposed to update testpmd "txonly"
>> forwarding mode routine. With this update testpmd application generates
>> the packets and sets the dynamic timestamps according to specified time
>> pattern if it sees the "rte_dynfield_timestamp" is registered.
>>
>> The new testpmd command is proposed to configure sending pattern:
>>
>> set tx_times <burst_gap>,<intra_gap>
>>
>> <intra_gap> - the delay between the packets within the burst
>>               specified in the device clock units. The number
>>               of packets in the burst is defined by txburst parameter
>>
>> <burst_gap> - the delay between the bursts in the device clock units
>>
>> As the result the bursts of packet will be transmitted with specific delays
>> between the packets within the burst and specific delay between the bursts.
>> The rte_eth_read_clock is supposed to be engaged to get the current device
>> clock value and provide the reference for the timestamps.
>>
>> [1]
>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatche
>> s.dpdk.org%2Fpatch%2F73714%2F&amp;data=02%7C01%7Cviacheslavo%40
>> mellanox.com%7C810609c61c3b466e8f5a08d824ce57f8%7Ca652971c7d2e4
>> d9ba6a4d149256f461b%7C0%7C0%7C637299815958194092&amp;sdata=H
>> D7efBGOLuYxHd5KLJYJj7RSbiLRVBNm5jdq%2FJv%2FXfk%3D&amp;reserved=
>> 0
>>
>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
>  promote Acked-bt from previous patch version to maintain patchwork 
>  status accordingly
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 

For series,
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 00/17] net/mlx5: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (7 preceding siblings ...)
  2020-07-10 12:39 ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
@ 2020-07-15  6:21 ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 01/17] common/mlx5: update common part to support packet pacing Viacheslav Ovsiienko
                     ` (16 more replies)
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-07-17 14:28 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Viacheslav Ovsiienko
  10 siblings, 17 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The accurate packet Tx scheduling on timestamps was introduced [1].
This patchset provides the feature implementation on mlx5 PMD side.

[1] http://patches.dpdk.org/patch/73742/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>

Viacheslav Ovsiienko (17):
  common/mlx5: update common part to support packet pacing
  net/mlx5: introduce send scheduling devargs
  net/mlx5: fix UAR lock sharing for multiport devices
  net/mlx5: introduce shared UAR resource
  net/mlx5: create clock queue for packet pacing
  net/mlx5: create rearm queue for packet pacing
  net/mlx5: create Tx queues with DevX
  net/mlx5: allocate packet pacing context
  net/mlx5: introduce clock queue service routine
  net/mlx5: prepare Tx queue structures to support timestamp
  net/mlx5: convert timestamp to completion index
  net/mlx5: prepare Tx datapath to support sheduling
  net/mlx5: add scheduling support to send routine template
  net/mlx5: add read device clock support
  net/mlx5: provide the send scheduling error statistics
  common/mlx5: add register access DevX routine
  net/mlx5: convert Rx timestamps in realtime format

 doc/guides/nics/mlx5.rst                        |   37 +
 drivers/common/mlx5/Makefile                    |   25 +
 drivers/common/mlx5/linux/meson.build           |   10 +
 drivers/common/mlx5/linux/mlx5_glue.c           |   31 +-
 drivers/common/mlx5/linux/mlx5_glue.h           |    5 +
 drivers/common/mlx5/mlx5_devx_cmds.c            |   79 +-
 drivers/common/mlx5/mlx5_devx_cmds.h            |   13 +
 drivers/common/mlx5/mlx5_prm.h                  |  121 +-
 drivers/common/mlx5/rte_common_mlx5_version.map |    1 +
 drivers/net/mlx5/Makefile                       |    1 +
 drivers/net/mlx5/linux/mlx5_os.c                |   97 +-
 drivers/net/mlx5/meson.build                    |    1 +
 drivers/net/mlx5/mlx5.c                         |   61 +-
 drivers/net/mlx5/mlx5.h                         |   89 +-
 drivers/net/mlx5/mlx5_defs.h                    |   11 +
 drivers/net/mlx5/mlx5_rxq.c                     |   26 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |  373 ++++++-
 drivers/net/mlx5/mlx5_rxtx.h                    |   97 ++
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |   34 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |   54 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |   34 +-
 drivers/net/mlx5/mlx5_stats.c                   |    7 +-
 drivers/net/mlx5/mlx5_trigger.c                 |   26 +-
 drivers/net/mlx5/mlx5_txpp.c                    | 1333 +++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txq.c                     |  342 +++++-
 25 files changed, 2799 insertions(+), 109 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_txpp.c

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 01/17] common/mlx5: update common part to support packet pacing
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 02/17] net/mlx5: introduce send scheduling devargs Viacheslav Ovsiienko
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch prepares the common part of the mlx5 PMDs to
support packet send scheduling on mbuf timestamps:

  - the DevX routine to query the packet pacing HCA capabilities
  - packet pacing Send Queue attributes support
  - the hardware related definitions

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/Makefile          | 25 +++++++++++++
 drivers/common/mlx5/linux/meson.build | 10 +++++
 drivers/common/mlx5/linux/mlx5_glue.c | 31 +++++++++++++++-
 drivers/common/mlx5/linux/mlx5_glue.h |  5 +++
 drivers/common/mlx5/mlx5_devx_cmds.c  | 19 +++++++++-
 drivers/common/mlx5/mlx5_devx_cmds.h  | 10 +++++
 drivers/common/mlx5/mlx5_prm.h        | 69 ++++++++++++++++++++++++++++++++---
 7 files changed, 161 insertions(+), 8 deletions(-)

diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index f6c762b..f9dc376 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -172,6 +172,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_devx_qp_query \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DEVX_UAR_OFFSET \
+		infiniband/mlx5dv.h \
+		field "struct mlx5dv_devx_uar.mmap_off" \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_PP_ALLOC \
+		infiniband/mlx5dv.h \
+		func mlx5dv_pp_alloc \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
 		infiniband/mlx5dv.h \
 		func mlx5dv_dr_action_create_dest_devx_tir \
@@ -207,6 +217,21 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_dr_domain_set_reclaim_device_memory \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_OPCODE_ENHANCED_MPSW \
+		infiniband/mlx5dv.h \
+		enum MLX5_OPCODE_ENHANCED_MPSW \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_OPCODE_SEND_EN \
+		infiniband/mlx5dv.h \
+		enum MLX5_OPCODE_SEND_EN \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_OPCODE_WAIT \
+		infiniband/mlx5dv.h \
+		enum MLX5_OPCODE_WAIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_ETHTOOL_LINK_MODE_25G \
 		/usr/include/linux/ethtool.h \
 		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 2294213..48e8ad6 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -64,6 +64,8 @@ has_member_args = [
 	'struct ibv_counter_set_init_attr', 'counter_set_id' ],
 	[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
 	'struct ibv_counters_init_attr', 'comp_mask' ],
+	[ 'HAVE_MLX5DV_DEVX_UAR_OFFSET', 'infiniband/mlx5dv.h',
+	'struct mlx5dv_devx_uar', 'mmap_off' ],
 ]
 # input array for meson symbol search:
 # [ "MACRO to define if found", "header for the search",
@@ -101,6 +103,8 @@ has_sym_args = [
 	'mlx5dv_devx_obj_query_async' ],
 	[ 'HAVE_IBV_DEVX_QP', 'infiniband/mlx5dv.h',
 	'mlx5dv_devx_qp_query' ],
+	[ 'HAVE_MLX5DV_PP_ALLOC', 'infiniband/mlx5dv.h',
+	'mlx5dv_pp_alloc' ],
 	[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
 	'mlx5dv_dr_action_create_dest_devx_tir' ],
 	[ 'HAVE_IBV_DEVX_EVENT', 'infiniband/mlx5dv.h',
@@ -116,6 +120,12 @@ has_sym_args = [
 	[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
 	'mlx5dv_dr_action_create_push_vlan' ],
 	[ 'HAVE_IBV_VAR', 'infiniband/mlx5dv.h', 'mlx5dv_alloc_var' ],
+	[ 'HAVE_MLX5_OPCODE_ENHANCED_MPSW', 'infiniband/mlx5dv.h',
+	'MLX5_OPCODE_ENHANCED_MPSW' ],
+	[ 'HAVE_MLX5_OPCODE_SEND_EN', 'infiniband/mlx5dv.h',
+	'MLX5_OPCODE_SEND_EN' ],
+	[ 'HAVE_MLX5_OPCODE_WAIT', 'infiniband/mlx5dv.h',
+	'MLX5_OPCODE_WAIT' ],
 	[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
 	'SUPPORTED_40000baseKR4_Full' ],
 	[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
diff --git a/drivers/common/mlx5/linux/mlx5_glue.c b/drivers/common/mlx5/linux/mlx5_glue.c
index 395519d..4d3875f 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.c
+++ b/drivers/common/mlx5/linux/mlx5_glue.c
@@ -1195,7 +1195,6 @@
 #endif
 }
 
-
 static void
 mlx5_glue_dr_reclaim_domain_memory(void *domain, uint32_t enable)
 {
@@ -1207,6 +1206,34 @@
 #endif
 }
 
+static struct mlx5dv_pp *
+mlx5_glue_dv_alloc_pp(struct ibv_context *context,
+		      size_t pp_context_sz,
+		      const void *pp_context,
+		      uint32_t flags)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	return mlx5dv_pp_alloc(context, pp_context_sz, pp_context, flags);
+#else
+	RTE_SET_USED(context);
+	RTE_SET_USED(pp_context_sz);
+	RTE_SET_USED(pp_context);
+	RTE_SET_USED(flags);
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_dv_free_pp(struct mlx5dv_pp *pp)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	mlx5dv_pp_free(pp);
+#else
+	RTE_SET_USED(pp);
+#endif
+}
+
 __rte_cache_aligned
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue) {
 	.version = MLX5_GLUE_VERSION,
@@ -1319,4 +1346,6 @@
 	.devx_free_uar = mlx5_glue_devx_free_uar,
 	.dv_alloc_var = mlx5_glue_dv_alloc_var,
 	.dv_free_var = mlx5_glue_dv_free_var,
+	.dv_alloc_pp = mlx5_glue_dv_alloc_pp,
+	.dv_free_pp = mlx5_glue_dv_free_pp,
 };
diff --git a/drivers/common/mlx5/linux/mlx5_glue.h b/drivers/common/mlx5/linux/mlx5_glue.h
index 069d854..c4f9b00 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.h
+++ b/drivers/common/mlx5/linux/mlx5_glue.h
@@ -304,6 +304,11 @@ struct mlx5_glue {
 			 struct mlx5dv_devx_async_event_hdr *event_data,
 			 size_t event_resp_len);
 	void (*dr_reclaim_domain_memory)(void *domain, uint32_t enable);
+	struct mlx5dv_pp *(*dv_alloc_pp)(struct ibv_context *context,
+					 size_t pp_context_sz,
+					 const void *pp_context,
+					 uint32_t flags);
+	void (*dv_free_pp)(struct mlx5dv_pp *pp);
 };
 
 extern const struct mlx5_glue *mlx5_glue;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2179a83..093636c 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -467,6 +467,14 @@ struct mlx5_devx_obj *
 	attr->vdpa.queue_counters_valid = !!(MLX5_GET64(cmd_hca_cap, hcattr,
 							general_obj_types) &
 				  MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_Q_COUNTERS);
+	attr->wqe_index_ignore = MLX5_GET(cmd_hca_cap, hcattr,
+					  wqe_index_ignore_cap);
+	attr->cross_channel = MLX5_GET(cmd_hca_cap, hcattr, cd);
+	attr->non_wire_sq = MLX5_GET(cmd_hca_cap, hcattr, non_wire_sq);
+	attr->log_max_static_sq_wq = MLX5_GET(cmd_hca_cap, hcattr,
+					      log_max_static_sq_wq);
+	attr->dev_freq_khz = MLX5_GET(cmd_hca_cap, hcattr,
+				      device_frequency_khz);
 	if (attr->qos.sup) {
 		MLX5_SET(query_hca_cap_in, in, op_mod,
 			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
@@ -487,9 +495,13 @@ struct mlx5_devx_obj *
 		attr->qos.log_max_flow_meter =
 				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
 		attr->qos.flow_meter_reg_c_ids =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
+				MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
 		attr->qos.flow_meter_reg_share =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
+				MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
+		attr->qos.packet_pacing =
+				MLX5_GET(qos_cap, hcattr, packet_pacing);
+		attr->qos.wqe_rate_pp =
+				MLX5_GET(qos_cap, hcattr, wqe_rate_pp);
 	}
 	if (attr->vdpa.valid)
 		mlx5_devx_cmd_query_hca_vdpa_attr(ctx, &attr->vdpa);
@@ -971,6 +983,8 @@ struct mlx5_devx_obj *
 	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
 	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
 	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, non_wire, sq_attr->non_wire);
+	MLX5_SET(sqc, sq_ctx, static_sq_wq, sq_attr->static_sq_wq);
 	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
 	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
 	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
@@ -1185,6 +1199,7 @@ struct mlx5_devx_obj *
 	} else {
 		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_addr);
 	}
+	MLX5_SET(cqc, cqctx, cqe_sz, attr->cqe_size);
 	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
 	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
 	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 25704ef..c79b349 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -29,6 +29,8 @@ struct mlx5_devx_mkey_attr {
 struct mlx5_hca_qos_attr {
 	uint32_t sup:1;	/* Whether QOS is supported. */
 	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint32_t packet_pacing:1; /* Packet pacing is supported. */
+	uint32_t wqe_rate_pp:1; /* Packet pacing WQE rate mode. */
 	uint32_t flow_meter_reg_share:1;
 	/* Whether reg_c share is supported. */
 	uint8_t log_max_flow_meter;
@@ -90,6 +92,11 @@ struct mlx5_hca_attr {
 	uint32_t vhca_id:16;
 	uint32_t relaxed_ordering_write:1;
 	uint32_t relaxed_ordering_read:1;
+	uint32_t wqe_index_ignore:1;
+	uint32_t cross_channel:1;
+	uint32_t non_wire_sq:1; /* SQ with non-wire ops is supported. */
+	uint32_t log_max_static_sq_wq:5; /* Static WQE size SQ. */
+	uint32_t dev_freq_khz; /* Timestamp counter frequency, kHz. */
 	struct mlx5_hca_qos_attr qos;
 	struct mlx5_hca_vdpa_attr vdpa;
 };
@@ -207,6 +214,8 @@ struct mlx5_devx_create_sq_attr {
 	uint32_t reg_umr:1;
 	uint32_t allow_swp:1;
 	uint32_t hairpin:1;
+	uint32_t non_wire:1;
+	uint32_t static_sq_wq:1;
 	uint32_t user_index:24;
 	uint32_t cqn:24;
 	uint32_t packet_pacing_rate_limit_index:16;
@@ -230,6 +239,7 @@ struct mlx5_devx_cq_attr {
 	uint32_t db_umem_valid:1;
 	uint32_t use_first_only:1;
 	uint32_t overrun_ignore:1;
+	uint32_t cqe_size:3;
 	uint32_t log_cq_size:5;
 	uint32_t log_page_size:5;
 	uint32_t uar_page_id;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index c63795f..cf47103 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -41,6 +41,10 @@
 /* Invalidate a CQE. */
 #define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
 
+/* Hardware index widths. */
+#define MLX5_CQ_INDEX_WIDTH 24
+#define MLX5_WQ_INDEX_WIDTH 16
+
 /* WQE Segment sizes in bytes. */
 #define MLX5_WSEG_SIZE 16u
 #define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
@@ -126,7 +130,17 @@
 				  MLX5_ESEG_MIN_INLINE_SIZE)
 
 /* Missed in mlv5dv.h, should define here. */
+#ifndef HAVE_MLX5_OPCODE_ENHANCED_MPSW
 #define MLX5_OPCODE_ENHANCED_MPSW 0x29u
+#endif
+
+#ifndef HAVE_MLX5_OPCODE_SEND_EN
+#define MLX5_OPCODE_SEND_EN 0x17u
+#endif
+
+#ifndef HAVE_MLX5_OPCODE_WAIT
+#define MLX5_OPCODE_WAIT 0x0fu
+#endif
 
 /* CQE value to inform that VLAN is stripped. */
 #define MLX5_CQE_VLAN_STRIPPED (1u << 0)
@@ -255,6 +269,9 @@
 /* The alignment needed for WQ buffer. */
 #define MLX5_WQE_BUF_ALIGNMENT sysconf(_SC_PAGESIZE)
 
+/* The alignment needed for CQ buffer. */
+#define MLX5_CQE_BUF_ALIGNMENT sysconf(_SC_PAGESIZE)
+
 /* Completion mode. */
 enum mlx5_completion_mode {
 	MLX5_COMP_ONLY_ERR = 0x0,
@@ -314,6 +331,13 @@ struct mlx5_wqe_eseg {
 	};
 } __rte_packed;
 
+struct mlx5_wqe_qseg {
+	uint32_t reserved0;
+	uint32_t reserved1;
+	uint32_t max_index;
+	uint32_t qpn_cqn;
+} __rte_packed;
+
 /* The title WQEBB, header of WQE. */
 struct mlx5_wqe {
 	union {
@@ -373,6 +397,14 @@ struct mlx5_cqe {
 	uint8_t op_own;
 };
 
+struct mlx5_cqe_ts {
+	uint64_t timestamp;
+	uint32_t sop_drop_qpn;
+	uint16_t wqe_counter;
+	uint8_t rsvd5;
+	uint8_t op_own;
+};
+
 /* Adding direct verbs to data-path. */
 
 /* CQ sequence number mask. */
@@ -992,7 +1024,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8 reserved_at_40[0x40];
 	u8 log_max_srq_sz[0x8];
 	u8 log_max_qp_sz[0x8];
-	u8 reserved_at_90[0xb];
+	u8 reserved_at_90[0x9];
+	u8 wqe_index_ignore_cap[0x1];
+	u8 dynamic_qp_allocation[0x1];
 	u8 log_max_qp[0x5];
 	u8 reserved_at_a0[0xb];
 	u8 log_max_srq[0x5];
@@ -1018,9 +1052,12 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8 umr_extended_translation_offset[0x1];
 	u8 null_mkey[0x1];
 	u8 log_max_klm_list_size[0x6];
-	u8 reserved_at_120[0xa];
+	u8 non_wire_sq[0x1];
+	u8 reserved_at_121[0x9];
 	u8 log_max_ra_req_dc[0x6];
-	u8 reserved_at_130[0xa];
+	u8 reserved_at_130[0x3];
+	u8 log_max_static_sq_wq[0x5];
+	u8 reserved_at_138[0x2];
 	u8 log_max_ra_res_dc[0x6];
 	u8 reserved_at_140[0xa];
 	u8 log_max_ra_req_qp[0x6];
@@ -1271,7 +1308,8 @@ struct mlx5_ifc_qos_cap_bits {
 	u8 reserved_at_8[0x8];
 	u8 log_max_flow_meter[0x8];
 	u8 flow_meter_reg_id[0x8];
-	u8 reserved_at_25[0x8];
+	u8 wqe_rate_pp[0x1];
+	u8 reserved_at_25[0x7];
 	u8 flow_meter_reg_share[0x1];
 	u8 reserved_at_2e[0x17];
 	u8 packet_pacing_max_rate[0x20];
@@ -1835,7 +1873,9 @@ struct mlx5_ifc_sqc_bits {
 	u8 reg_umr[0x1];
 	u8 allow_swp[0x1];
 	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
+	u8 non_wire[0x1];
+	u8 static_sq_wq[0x1];
+	u8 reserved_at_11[0xf];
 	u8 reserved_at_20[0x8];
 	u8 user_index[0x18];
 	u8 reserved_at_40[0x8];
@@ -1935,6 +1975,11 @@ struct mlx5_ifc_flow_meter_parameters_bits {
 	u8         reserved_at_8[0x60];		// 14h-1Ch
 };
 
+enum {
+	MLX5_CQE_SIZE_64B = 0x0,
+	MLX5_CQE_SIZE_128B = 0x1,
+};
+
 struct mlx5_ifc_cqc_bits {
 	u8 status[0x4];
 	u8 as_notify[0x1];
@@ -2486,6 +2531,20 @@ struct mlx5_ifc_query_qp_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+enum {
+	MLX5_DATA_RATE = 0x0,
+	MLX5_WQE_RATE = 0x1,
+};
+
+struct mlx5_ifc_set_pp_rate_limit_context_bits {
+	u8 rate_limit[0x20];
+	u8 burst_upper_bound[0x20];
+	u8 reserved_at_40[0xC];
+	u8 rate_mode[0x4];
+	u8 typical_packet_size[0x10];
+	u8 reserved_at_60[0x120];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 02/17] net/mlx5: introduce send scheduling devargs
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 01/17] common/mlx5: update common part to support packet pacing Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 03/17] net/mlx5: fix UAR lock sharing for multiport devices Viacheslav Ovsiienko
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch introduces the new devargs:

tx_pp - enables accurate packet send scheduling on mbuf timestamps
  in the PMD. On the device start if "rte_dynflag_timestamp"
  dynamic flag is registered and this devarg non-zero value is
  specified, the driver initializes all necessary internal
  infrastructure to provide packet scheduling. The parameter
  value specifies scheduling granularity in nanoseconds.

tx_skew - the parameter adjusts the send packet scheduling on
  timestamps and represents the average delay between beginning
  of the transmitting descriptor processing by the hardware and
  appearance of actual packet data on the wire. The value should
  be provided in nanoseconds and is valid only if tx_pp parameter
  is specified. The default value is zero.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 doc/guides/nics/mlx5.rst         | 37 +++++++++++++++++++++++
 drivers/net/mlx5/linux/mlx5_os.c | 63 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.c          | 39 +++++++++++++++++++++++--
 drivers/net/mlx5/mlx5.h          |  2 ++
 4 files changed, 138 insertions(+), 3 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 4b6d8fb..9a57768 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -241,6 +241,24 @@ Limitations
   reduce the requested Tx size or adjust data inline settings with
   ``txq_inline_max`` and ``txq_inline_mpw`` devargs keys.
 
+- To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
+  parameter should be specified, RTE_MBUF_DYNFIELD_TIMESTAMP_NAME and
+  RTE_MBUF_DYNFLAG_TIMESTAMP_NAME should be registered by application.
+  When PMD sees the RTE_MBUF_DYNFLAG_TIMESTAMP_NAME set on the packet
+  being sent it tries to synchronize the time of packet appearing on
+  the wire with the specified packet timestamp. It the specified one
+  is in the past it should be ignored, if one is in the distant future
+  it should be capped with some reasonable value (in range of seconds).
+  These specific cases ("too late" and "distant future") can be optionally
+  reported via device xstats to assist applications to detect the
+  time-related problems.
+
+  There is no any packet reordering according timestamps is supposed,
+  neither within packet burst, nor between packets, it is an entirely
+  application responsibility to generate packets and its timestamps
+  in desired order. The timestamps can be put only in the first packet
+  in the burst providing the entire burst scheduling.
+
 - E-Switch decapsulation Flow:
 
   - can be applied to PF port only.
@@ -700,6 +718,25 @@ Driver options
   variable "MLX5_SHUT_UP_BF" value is used. If there is no "MLX5_SHUT_UP_BF",
   the default ``tx_db_nc`` value is zero for ARM64 hosts and one for others.
 
+- ``tx_pp`` parameter [int]
+
+  If a nonzero value is specified the driver creates all necessary internal
+  objects to provide accurate packet send scheduling on mbuf timestamps.
+  The positive value specifies the scheduling granularity in nanoseconds,
+  the packet send will be accurate up to specified digits. The allowed range is
+  from 500 to 1 million of nanoseconds. The negative value specifies the module
+  of granularity and engages the special test mode the check the schedule rate.
+  By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
+  feature is disabled.
+
+- ``tx_skew`` parameter [int]
+
+  The parameter adjusts the send packet scheduling on timestamps and represents
+  the average delay between beginning of the transmitting descriptor processing
+  by the hardware and appearance of actual packet data on the wire. The value
+  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
+  specified. The default value is zero.
+
 - ``tx_vec_en`` parameter [int]
 
   A nonzero value enables Tx vector on ConnectX-5, ConnectX-6, ConnectX-6 Dx
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 2dc57b2..14af468 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -879,6 +879,69 @@
 		}
 #endif
 	}
+	if (config.tx_pp) {
+		DRV_LOG(DEBUG, "Timestamp counter frequency %u kHz",
+			config.hca_attr.dev_freq_khz);
+		DRV_LOG(DEBUG, "Packet pacing is %ssupported",
+			config.hca_attr.qos.packet_pacing ? "" : "not ");
+		DRV_LOG(DEBUG, "Cross channel ops are %ssupported",
+			config.hca_attr.cross_channel ? "" : "not ");
+		DRV_LOG(DEBUG, "WQE index ignore is %ssupported",
+			config.hca_attr.wqe_index_ignore ? "" : "not ");
+		DRV_LOG(DEBUG, "Non-wire SQ feature is %ssupported",
+			config.hca_attr.non_wire_sq ? "" : "not ");
+		DRV_LOG(DEBUG, "Static WQE SQ feature is %ssupported (%d)",
+			config.hca_attr.log_max_static_sq_wq ? "" : "not ",
+			config.hca_attr.log_max_static_sq_wq);
+		DRV_LOG(DEBUG, "WQE rate PP mode is %ssupported",
+			config.hca_attr.qos.wqe_rate_pp ? "" : "not ");
+		if (!config.devx) {
+			DRV_LOG(ERR, "DevX is required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.qos.packet_pacing) {
+			DRV_LOG(ERR, "Packet pacing is not supported");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.cross_channel) {
+			DRV_LOG(ERR, "Cross channel operations are"
+				     " required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.wqe_index_ignore) {
+			DRV_LOG(ERR, "WQE index ignore feature is"
+				     " required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.non_wire_sq) {
+			DRV_LOG(ERR, "Non-wire SQ feature is"
+				     " required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.log_max_static_sq_wq) {
+			DRV_LOG(ERR, "Static WQE SQ feature is"
+				     " required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.qos.wqe_rate_pp) {
+			DRV_LOG(ERR, "WQE rate mode is required"
+				     " for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+#ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
+		DRV_LOG(ERR, "DevX does not provide UAR offset,"
+			     " can't create queues for packet pacing");
+		err = ENODEV;
+		goto error;
+#endif
+	}
 	if (config.mprq.enabled && mprq) {
 		if (config.mprq.stride_num_n &&
 		    (config.mprq.stride_num_n > mprq_max_stride_num_n ||
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 0c654ed..72e0870 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -120,6 +120,19 @@
 #define MLX5_TXQ_MAX_INLINE_LEN "txq_max_inline_len"
 
 /*
+ * Device parameter to enable Tx scheduling on timestamps
+ * and specify the packet pacing granularity in nanoseconds.
+ */
+#define MLX5_TX_PP "tx_pp"
+
+/*
+ * Device parameter to specify skew in nanoseconds on Tx datapath,
+ * it represents the time between SQ start WQE processing and
+ * appearing actual packet data on the wire.
+ */
+#define MLX5_TX_SKEW "tx_skew"
+
+/*
  * Device parameter to enable hardware Tx vector.
  * Deprecated, ignored (no vectorized Tx routines anymore).
  */
@@ -1271,18 +1284,26 @@ struct mlx5_dev_ctx_shared *
 mlx5_args_check(const char *key, const char *val, void *opaque)
 {
 	struct mlx5_dev_config *config = opaque;
-	unsigned long tmp;
+	unsigned long mod;
+	signed long tmp;
 
 	/* No-op, port representors are processed in mlx5_dev_spawn(). */
 	if (!strcmp(MLX5_REPRESENTOR, key))
 		return 0;
 	errno = 0;
-	tmp = strtoul(val, NULL, 0);
+	tmp = strtol(val, NULL, 0);
 	if (errno) {
 		rte_errno = errno;
 		DRV_LOG(WARNING, "%s: \"%s\" is not a valid integer", key, val);
 		return -rte_errno;
 	}
+	if (tmp < 0 && strcmp(MLX5_TX_PP, key) && strcmp(MLX5_TX_SKEW, key)) {
+		/* Negative values are acceptable for some keys only. */
+		rte_errno = EINVAL;
+		DRV_LOG(WARNING, "%s: invalid negative value \"%s\"", key, val);
+		return -rte_errno;
+	}
+	mod = tmp >= 0 ? tmp : -tmp;
 	if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0) {
 		config->cqe_comp = !!tmp;
 	} else if (strcmp(MLX5_RXQ_CQE_PAD_EN, key) == 0) {
@@ -1333,6 +1354,15 @@ struct mlx5_dev_ctx_shared *
 		config->txq_inline_mpw = tmp;
 	} else if (strcmp(MLX5_TX_VEC_EN, key) == 0) {
 		DRV_LOG(WARNING, "%s: deprecated parameter, ignored", key);
+	} else if (strcmp(MLX5_TX_PP, key) == 0) {
+		if (!mod) {
+			DRV_LOG(ERR, "Zero Tx packet pacing parameter");
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
+		config->tx_pp = tmp;
+	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
+		config->tx_skew = tmp;
 	} else if (strcmp(MLX5_RX_VEC_EN, key) == 0) {
 		config->rx_vec_en = !!tmp;
 	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
@@ -1415,6 +1445,8 @@ struct mlx5_dev_ctx_shared *
 		MLX5_TXQ_MPW_HDR_DSEG_EN,
 		MLX5_TXQ_MAX_INLINE_LEN,
 		MLX5_TX_DB_NC,
+		MLX5_TX_PP,
+		MLX5_TX_SKEW,
 		MLX5_TX_VEC_EN,
 		MLX5_RX_VEC_EN,
 		MLX5_L3_VXLAN_EN,
@@ -1693,7 +1725,8 @@ struct mlx5_dev_ctx_shared *
 {
 	static const char *const dynf_names[] = {
 		RTE_PMD_MLX5_FINE_GRANULARITY_INLINE,
-		RTE_MBUF_DYNFLAG_METADATA_NAME
+		RTE_MBUF_DYNFLAG_METADATA_NAME,
+		RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
 	};
 	unsigned int i;
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 46e66eb..84cd3e1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -240,6 +240,8 @@ struct mlx5_dev_config {
 	int txq_inline_min; /* Minimal amount of data bytes to inline. */
 	int txq_inline_max; /* Max packet size for inlining with SEND. */
 	int txq_inline_mpw; /* Max packet size for inlining with eMPW. */
+	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
+	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
 	struct mlx5_hca_attr hca_attr; /* HCA attributes. */
 	struct mlx5_lro_config lro; /* LRO configuration. */
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 03/17] net/mlx5: fix UAR lock sharing for multiport devices
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 01/17] common/mlx5: update common part to support packet pacing Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 02/17] net/mlx5: introduce send scheduling devargs Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 04/17] net/mlx5: introduce shared UAR resource Viacheslav Ovsiienko
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit, stable

The master and representors might be created over the multiport
Infiniband devices and the UAR resource allocated for sibling
ports might belong to the same underlying Infiniband device.
Hardware requires the write access to the UAR must be performed
as atomic 64-bit write, on 32-bit systems this is two sequential
writes, protected by lock. Due to possibility to share the same
UAR between sibling devices the locks must be moved to shared
context.

Fixes: f048f3d479a6 ("net/mlx5: switch to the shared IB device context")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  6 ------
 drivers/net/mlx5/mlx5.c          |  6 ++++++
 drivers/net/mlx5/mlx5.h          | 10 +++++-----
 drivers/net/mlx5/mlx5_rxq.c      |  2 +-
 drivers/net/mlx5/mlx5_txq.c      |  2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 14af468..63e9350 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -630,12 +630,6 @@
 	priv->mtu = RTE_ETHER_MTU;
 	priv->mp_id.port_id = port_id;
 	strlcpy(priv->mp_id.name, MLX5_MP_NAME, RTE_MP_MAX_NAME_LEN);
-#ifndef RTE_ARCH_64
-	/* Initialize UAR access locks for 32bit implementations. */
-	rte_spinlock_init(&priv->uar_lock_cq);
-	for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
-		rte_spinlock_init(&priv->uar_lock[i]);
-#endif
 	/* Some internal functions rely on Netlink sockets, open them now. */
 	priv->nl_socket_rdma = mlx5_nl_init(NETLINK_RDMA);
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 72e0870..0786945 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -717,6 +717,12 @@ struct mlx5_dev_ctx_shared *
 		err = ENOMEM;
 		goto error;
 	}
+#ifndef RTE_ARCH_64
+	/* Initialize UAR access locks for 32bit implementations. */
+	rte_spinlock_init(&sh->uar_lock_cq);
+	for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
+		rte_spinlock_init(&sh->uar_lock[i]);
+#endif
 	/*
 	 * Once the device is added to the list of memory event
 	 * callback, its global MR cache table cannot be expanded
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 84cd3e1..d01d7f3 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -559,6 +559,11 @@ struct mlx5_dev_ctx_shared {
 	void *fdb_domain; /* FDB Direct Rules name space handle. */
 	void *rx_domain; /* RX Direct Rules name space handle. */
 	void *tx_domain; /* TX Direct Rules name space handle. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
+	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
+	/* UAR same-page access control required in 32bit implementations. */
+#endif
 	struct mlx5_hlist *flow_tbls;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
@@ -673,11 +678,6 @@ struct mlx5_priv {
 	uint8_t mtr_color_reg; /* Meter color match REG_C. */
 	struct mlx5_mtr_profiles flow_meter_profiles; /* MTR profile list. */
 	struct mlx5_flow_meters flow_meters; /* MTR list. */
-#ifndef RTE_ARCH_64
-	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
-	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
-	/* UAR same-page access control required in 32bit implementations. */
-#endif
 	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 	uint8_t fdb_def_rule; /* Whether fdb jump to table 1 is configured. */
 	struct mlx5_mp_id mp_id; /* ID of a multi-process process */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b436f06..2681322 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1997,7 +1997,7 @@ struct mlx5_rxq_ctrl *
 	tmpl->rxq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
 #ifndef RTE_ARCH_64
-	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq;
+	tmpl->rxq.uar_lock_cq = &priv->sh->uar_lock_cq;
 #endif
 	tmpl->rxq.idx = idx;
 	rte_atomic32_inc(&tmpl->refcnt);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 35b3ade..e1fa24e 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -355,7 +355,7 @@
 	/* Assign an UAR lock according to UAR page number */
 	lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
 		   MLX5_UAR_PAGE_NUM_MASK;
-	txq_ctrl->txq.uar_lock = &priv->uar_lock[lock_idx];
+	txq_ctrl->txq.uar_lock = &priv->sh->uar_lock[lock_idx];
 #endif
 }
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 04/17] net/mlx5: introduce shared UAR resource
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 03/17] net/mlx5: fix UAR lock sharing for multiport devices Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 05/17] net/mlx5: create clock queue for packet pacing Viacheslav Ovsiienko
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This is preparation step before moving the Tx queue creation
to the DevX approach. Some features require the shared UAR
for Tx queues and scheduling completion queues, the patch
manages the shared UAR.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 14 ++++++++++++++
 drivers/net/mlx5/mlx5.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 0786945..71e59ac 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -709,6 +709,12 @@ struct mlx5_dev_ctx_shared *
 			err = ENOMEM;
 			goto error;
 		}
+		sh->tx_uar = mlx5_glue->devx_alloc_uar(sh->ctx, 0);
+		if (!sh->tx_uar) {
+			DRV_LOG(ERR, "Failed to allocate DevX UAR.");
+			err = ENOMEM;
+			goto error;
+		}
 	}
 	sh->flow_id_pool = mlx5_flow_id_pool_alloc
 					((1 << HAIRPIN_FLOW_ID_BITS) - 1);
@@ -767,6 +773,10 @@ struct mlx5_dev_ctx_shared *
 		mlx5_l3t_destroy(sh->cnt_id_tbl);
 		sh->cnt_id_tbl = NULL;
 	}
+	if (sh->tx_uar) {
+		mlx5_glue->devx_free_uar(sh->tx_uar);
+		sh->tx_uar = NULL;
+	}
 	if (sh->tis)
 		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
 	if (sh->td)
@@ -832,6 +842,10 @@ struct mlx5_dev_ctx_shared *
 		mlx5_l3t_destroy(sh->cnt_id_tbl);
 		sh->cnt_id_tbl = NULL;
 	}
+	if (sh->tx_uar) {
+		mlx5_glue->devx_free_uar(sh->tx_uar);
+		sh->tx_uar = NULL;
+	}
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->tis)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d01d7f3..c760aff 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -586,6 +586,7 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
+	struct mlx5dv_devx_uar *tx_uar; /* Tx/packer pacing shared UAR. */
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 05/17] net/mlx5: create clock queue for packet pacing
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 04/17] net/mlx5: introduce shared UAR resource Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 06/17] net/mlx5: create rearm " Viacheslav Ovsiienko
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch creates the special completion queue providing
reference completions to schedule packet send from
other transmitting queues.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/Makefile        |   1 +
 drivers/net/mlx5/linux/mlx5_os.c |   3 +
 drivers/net/mlx5/meson.build     |   1 +
 drivers/net/mlx5/mlx5.c          |   2 +
 drivers/net/mlx5/mlx5.h          |  47 ++++
 drivers/net/mlx5/mlx5_defs.h     |   7 +
 drivers/net/mlx5/mlx5_trigger.c  |  16 +-
 drivers/net/mlx5/mlx5_txpp.c     | 449 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 521 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_txpp.c

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index a458402..9eaac6b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -11,6 +11,7 @@ LIB = librte_pmd_mlx5.a
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txpp.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
 ifneq ($(filter y,$(CONFIG_RTE_ARCH_X86_64) \
 			$(CONFIG_RTE_ARCH_PPC_64) \
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 63e9350..ea36309 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1869,6 +1869,9 @@
 {
 	int dbmap_env;
 	int err = 0;
+
+	sh->numa_node = spawn->pci_dev->device.numa_node;
+	pthread_mutex_init(&sh->txpp.mutex, NULL);
 	/*
 	 * Configure environment variable "MLX5_BF_SHUT_UP"
 	 * before the device creation. The rdma_core library
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index e95ce02..c06b153 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -26,6 +26,7 @@ sources = files(
 	'mlx5_stats.c',
 	'mlx5_trigger.c',
 	'mlx5_txq.c',
+	'mlx5_txpp.c',
 	'mlx5_vlan.c',
 	'mlx5_utils.c',
 )
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 71e59ac..10196ac 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -767,6 +767,7 @@ struct mlx5_dev_ctx_shared *
 	pthread_mutex_unlock(&mlx5_dev_ctx_list_mutex);
 	return sh;
 error:
+	pthread_mutex_destroy(&sh->txpp.mutex);
 	pthread_mutex_unlock(&mlx5_dev_ctx_list_mutex);
 	MLX5_ASSERT(sh);
 	if (sh->cnt_id_tbl) {
@@ -856,6 +857,7 @@ struct mlx5_dev_ctx_shared *
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	if (sh->flow_id_pool)
 		mlx5_flow_id_pool_release(sh->flow_id_pool);
+	pthread_mutex_destroy(&sh->txpp.mutex);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_dev_ctx_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c760aff..0b73b2a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -531,6 +531,44 @@ struct mlx5_flow_id_pool {
 	uint32_t max_id; /**< Maximum id can be allocated from the pool. */
 };
 
+/* Tx pacing queue structure - for Clock and Rearm queues. */
+struct mlx5_txpp_wq {
+	/* Completion Queue related data.*/
+	struct mlx5_devx_obj *cq;
+	struct mlx5dv_devx_umem *cq_umem;
+	union {
+		volatile void *cq_buf;
+		volatile struct mlx5_cqe *cqes;
+	};
+	volatile uint32_t *cq_dbrec;
+	uint32_t cq_ci:24;
+	uint32_t arm_sn:2;
+	/* Send Queue related data.*/
+	struct mlx5_devx_obj *sq;
+	struct mlx5dv_devx_umem *sq_umem;
+	union {
+		volatile void *sq_buf;
+		volatile struct mlx5_wqe *wqes;
+	};
+	uint16_t sq_size; /* Number of WQEs in the queue. */
+	uint16_t sq_ci; /* Next WQE to execute. */
+	volatile uint32_t *sq_dbrec;
+};
+
+/* Tx packet pacing structure. */
+struct mlx5_dev_txpp {
+	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
+	uint32_t refcnt; /* Pacing reference counter. */
+	uint32_t freq; /* Timestamp frequency, Hz. */
+	uint32_t tick; /* Completion tick duration in nanoseconds. */
+	uint32_t test; /* Packet pacing test mode. */
+	int32_t skew; /* Scheduling skew. */
+	uint32_t eqn; /* Event Queue number. */
+	struct rte_intr_handle intr_handle; /* Periodic interrupt. */
+	struct mlx5dv_devx_event_channel *echan; /* Event Channel. */
+	struct mlx5_txpp_wq clock_queue; /* Clock Queue. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -547,9 +585,12 @@ struct mlx5_dev_ctx_shared {
 	char ibdev_name[DEV_SYSFS_NAME_MAX]; /* SYSFS dev name. */
 	char ibdev_path[DEV_SYSFS_PATH_MAX]; /* SYSFS dev path for secondary */
 	struct mlx5_dev_attr device_attr; /* Device properties. */
+	int numa_node; /* Numa node of backing physical device. */
 	LIST_ENTRY(mlx5_dev_ctx_shared) mem_event_cb;
 	/**< Called by memory event callback. */
 	struct mlx5_mr_share_cache share_cache;
+	/* Packet pacing related structure. */
+	struct mlx5_dev_txpp txpp;
 	/* Shared DV/DR flow data section. */
 	pthread_mutex_t dv_mutex; /* DV context mutex. */
 	uint32_t dv_meta_mask; /* flow META metadata supported mask. */
@@ -622,6 +663,7 @@ struct mlx5_priv {
 	unsigned int representor:1; /* Device is a port representor. */
 	unsigned int master:1; /* Device is a E-Switch master. */
 	unsigned int dr_shared:1; /* DV/DR data is shared. */
+	unsigned int txpp_en:1; /* Tx packet pacing enabled. */
 	unsigned int counter_fallback:1; /* Use counter fallback management. */
 	unsigned int mtr_en:1; /* Whether support meter. */
 	unsigned int mtr_reg_share:1; /* Whether support meter REG_C share. */
@@ -944,4 +986,9 @@ int mlx5_os_read_dev_stat(struct mlx5_priv *priv,
 void mlx5_os_stats_init(struct rte_eth_dev *dev);
 void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
 			   mlx5_dereg_mr_t *dereg_mr_cb);
+/* mlx5_txpp.c */
+
+int mlx5_txpp_start(struct rte_eth_dev *dev);
+void mlx5_txpp_stop(struct rte_eth_dev *dev);
+
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 260f584..07a2b59 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -171,6 +171,13 @@
 #define MLX5_TXDB_NCACHED 1
 #define MLX5_TXDB_HEURISTIC 2
 
+/* Tx accurate scheduling on timestamps parameters. */
+#define MLX5_TXPP_CLKQ_SIZE 1
+/* The minimal size test packet to put into one WQE, padded by HW. */
+#define MLX5_TXPP_TEST_PKT_SIZE (sizeof(struct rte_ether_hdr) +	\
+				 sizeof(struct rte_ipv4_hdr))
+
+
 /* Size of the simple hash table for metadata register table. */
 #define MLX5_FLOW_MREG_HTABLE_SZ 4096
 #define MLX5_FLOW_MREG_HNAME "MARK_COPY_TABLE"
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index ef74609..ca25ad9 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -288,25 +288,29 @@
 			return -rte_errno;
 		}
 	}
+	ret = mlx5_txpp_start(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u Tx packet pacing init failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		goto error;
+	}
 	ret = mlx5_txq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
 			dev->data->port_id, strerror(rte_errno));
-		return -rte_errno;
+		goto error;
 	}
 	ret = mlx5_rxq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Rx queue allocation failed: %s",
 			dev->data->port_id, strerror(rte_errno));
-		mlx5_txq_stop(dev);
-		return -rte_errno;
+		goto error;
 	}
 	ret = mlx5_hairpin_bind(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
 			dev->data->port_id, strerror(rte_errno));
-		mlx5_txq_stop(dev);
-		return -rte_errno;
+		goto error;
 	}
 	/* Set started flag here for the following steps like control flow. */
 	dev->data->dev_started = 1;
@@ -362,6 +366,7 @@
 	mlx5_traffic_disable(dev);
 	mlx5_txq_stop(dev);
 	mlx5_rxq_stop(dev);
+	mlx5_txpp_stop(dev); /* Stop last. */
 	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
@@ -398,6 +403,7 @@
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
 	mlx5_txq_stop(dev);
 	mlx5_rxq_stop(dev);
+	mlx5_txpp_stop(dev);
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
new file mode 100644
index 0000000..382bd20
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -0,0 +1,449 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_interrupts.h>
+#include <rte_alarm.h>
+#include <rte_malloc.h>
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+
+/* Destroy Event Queue Notification Channel. */
+static void
+mlx5_txpp_destroy_eqn(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->txpp.echan) {
+		mlx5_glue->devx_destroy_event_channel(sh->txpp.echan);
+		sh->txpp.echan = NULL;
+	}
+	sh->txpp.eqn = 0;
+}
+
+/* Create Event Queue Notification Channel. */
+static int
+mlx5_txpp_create_eqn(struct mlx5_dev_ctx_shared *sh)
+{
+	uint32_t lcore;
+
+	MLX5_ASSERT(!sh->txpp.echan);
+	lcore = (uint32_t)rte_lcore_to_cpu_id(-1);
+	if (mlx5_glue->devx_query_eqn(sh->ctx, lcore, &sh->txpp.eqn)) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to query EQ number %d.", rte_errno);
+		sh->txpp.eqn = 0;
+		return -rte_errno;
+	}
+	sh->txpp.echan = mlx5_glue->devx_create_event_channel(sh->ctx,
+			MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
+	if (!sh->txpp.echan) {
+		sh->txpp.eqn = 0;
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create event channel %d.",
+			rte_errno);
+		return -rte_errno;
+	}
+	return 0;
+}
+
+static void
+mlx5_txpp_destroy_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+
+	if (wq->sq)
+		claim_zero(mlx5_devx_cmd_destroy(wq->sq));
+	if (wq->sq_umem)
+		claim_zero(mlx5_glue->devx_umem_dereg(wq->sq_umem));
+	if (wq->sq_buf)
+		rte_free((void *)(uintptr_t)wq->sq_buf);
+	if (wq->cq)
+		claim_zero(mlx5_devx_cmd_destroy(wq->cq));
+	if (wq->cq_umem)
+		claim_zero(mlx5_glue->devx_umem_dereg(wq->cq_umem));
+	if (wq->cq_buf)
+		rte_free((void *)(uintptr_t)wq->cq_buf);
+	memset(wq, 0, sizeof(*wq));
+}
+
+static void
+mlx5_txpp_fill_wqe_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+	struct mlx5_wqe *wqe = (struct mlx5_wqe *)(uintptr_t)wq->wqes;
+	struct mlx5_wqe_cseg *cs = &wqe->cseg;
+	uint32_t wqe_size, opcode, i;
+	uint8_t *dst;
+
+	/* For test purposes fill the WQ with SEND inline packet. */
+	if (sh->txpp.test) {
+		wqe_size = RTE_ALIGN(MLX5_TXPP_TEST_PKT_SIZE +
+				     MLX5_WQE_CSEG_SIZE +
+				     2 * MLX5_WQE_ESEG_SIZE -
+				     MLX5_ESEG_MIN_INLINE_SIZE,
+				     MLX5_WSEG_SIZE);
+		opcode = MLX5_OPCODE_SEND;
+	} else {
+		wqe_size = MLX5_WSEG_SIZE;
+		opcode = MLX5_OPCODE_NOP;
+	}
+	cs->opcode = rte_cpu_to_be_32(opcode | 0); /* Index is ignored. */
+	cs->sq_ds = rte_cpu_to_be_32((wq->sq->id << 8) |
+				     (wqe_size / MLX5_WSEG_SIZE));
+	cs->flags = RTE_BE32(MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET);
+	cs->misc = RTE_BE32(0);
+	wqe_size = RTE_ALIGN(wqe_size, MLX5_WQE_SIZE);
+	if (sh->txpp.test) {
+		struct mlx5_wqe_eseg *es = &wqe->eseg;
+		struct rte_ether_hdr *eth_hdr;
+		struct rte_ipv4_hdr *ip_hdr;
+		struct rte_udp_hdr *udp_hdr;
+
+		/* Build the inline test packet pattern. */
+		MLX5_ASSERT(wqe_size <= MLX5_WQE_SIZE_MAX);
+		MLX5_ASSERT(MLX5_TXPP_TEST_PKT_SIZE >=
+				(sizeof(struct rte_ether_hdr) +
+				 sizeof(struct rte_ipv4_hdr)));
+		es->flags = 0;
+		es->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
+		es->swp_offs = 0;
+		es->metadata = 0;
+		es->swp_flags = 0;
+		es->mss = 0;
+		es->inline_hdr_sz = RTE_BE16(MLX5_TXPP_TEST_PKT_SIZE);
+		/* Build test packet L2 header (Ethernet). */
+		dst = (uint8_t *)&es->inline_data;
+		eth_hdr = (struct rte_ether_hdr *)dst;
+		rte_eth_random_addr(&eth_hdr->d_addr.addr_bytes[0]);
+		rte_eth_random_addr(&eth_hdr->s_addr.addr_bytes[0]);
+		eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4);
+		/* Build test packet L3 header (IP v4). */
+		dst += sizeof(struct rte_ether_hdr);
+		ip_hdr = (struct rte_ipv4_hdr *)dst;
+		ip_hdr->version_ihl = RTE_IPV4_VHL_DEF;
+		ip_hdr->type_of_service = 0;
+		ip_hdr->fragment_offset = 0;
+		ip_hdr->time_to_live = 64;
+		ip_hdr->next_proto_id = IPPROTO_UDP;
+		ip_hdr->packet_id = 0;
+		ip_hdr->total_length = RTE_BE16(MLX5_TXPP_TEST_PKT_SIZE -
+						sizeof(struct rte_ether_hdr));
+		/* use RFC5735 / RFC2544 reserved network test addresses */
+		ip_hdr->src_addr = RTE_BE32((198U << 24) | (18 << 16) |
+					    (0 << 8) | 1);
+		ip_hdr->dst_addr = RTE_BE32((198U << 24) | (18 << 16) |
+					    (0 << 8) | 2);
+		if (MLX5_TXPP_TEST_PKT_SIZE <
+					(sizeof(struct rte_ether_hdr) +
+					 sizeof(struct rte_ipv4_hdr) +
+					 sizeof(struct rte_udp_hdr)))
+			goto wcopy;
+		/* Build test packet L4 header (UDP). */
+		dst += sizeof(struct rte_ipv4_hdr);
+		udp_hdr = (struct rte_udp_hdr *)dst;
+		udp_hdr->src_port = RTE_BE16(9); /* RFC863 Discard. */
+		udp_hdr->dst_port = RTE_BE16(9);
+		udp_hdr->dgram_len = RTE_BE16(MLX5_TXPP_TEST_PKT_SIZE -
+					      sizeof(struct rte_ether_hdr) -
+					      sizeof(struct rte_ipv4_hdr));
+		udp_hdr->dgram_cksum = 0;
+		/* Fill the test packet data. */
+		dst += sizeof(struct rte_udp_hdr);
+		for (i = sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(struct rte_udp_hdr);
+				i < MLX5_TXPP_TEST_PKT_SIZE; i++)
+			*dst++ = (uint8_t)(i & 0xFF);
+	}
+wcopy:
+	/* Duplicate the pattern to the next WQEs. */
+	dst = (uint8_t *)(uintptr_t)wq->sq_buf;
+	for (i = 1; i < MLX5_TXPP_CLKQ_SIZE; i++) {
+		dst += wqe_size;
+		rte_memcpy(dst, (void *)(uintptr_t)wq->sq_buf, wqe_size);
+	}
+}
+
+/* Creates the Clock Queue for packet pacing, returns zero on success. */
+static int
+mlx5_txpp_create_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_devx_create_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
+	struct mlx5_devx_cq_attr cq_attr = { 0 };
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+	size_t page_size = sysconf(_SC_PAGESIZE);
+	uint32_t umem_size, umem_dbrec;
+	int ret;
+
+	/* Allocate memory buffer for CQEs and doorbell record. */
+	umem_size = sizeof(struct mlx5_cqe) * MLX5_TXPP_CLKQ_SIZE;
+	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+	umem_size += MLX5_DBR_SIZE;
+	wq->cq_buf = rte_zmalloc_socket(__func__, umem_size,
+					page_size, sh->numa_node);
+	if (!wq->cq_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for Clock Queue.");
+		return -ENOMEM;
+	}
+	/* Register allocated buffer in user space with DevX. */
+	wq->cq_umem = mlx5_glue->devx_umem_reg(sh->ctx,
+					       (void *)(uintptr_t)wq->cq_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!wq->cq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to register umem for Clock Queue.");
+		goto error;
+	}
+	/* Create completion queue object for Clock Queue. */
+	cq_attr.cqe_size = (sizeof(struct mlx5_cqe) == 128) ?
+			    MLX5_CQE_SIZE_128B : MLX5_CQE_SIZE_64B;
+	cq_attr.use_first_only = 1;
+	cq_attr.overrun_ignore = 1;
+	cq_attr.uar_page_id = sh->tx_uar->page_id;
+	cq_attr.eqn = sh->txpp.eqn;
+	cq_attr.q_umem_valid = 1;
+	cq_attr.q_umem_offset = 0;
+	cq_attr.q_umem_id = wq->cq_umem->umem_id;
+	cq_attr.db_umem_valid = 1;
+	cq_attr.db_umem_offset = umem_dbrec;
+	cq_attr.db_umem_id = wq->cq_umem->umem_id;
+	cq_attr.log_cq_size = rte_log2_u32(MLX5_TXPP_CLKQ_SIZE);
+	cq_attr.log_page_size = rte_log2_u32(page_size);
+	wq->cq = mlx5_devx_cmd_create_cq(sh->ctx, &cq_attr);
+	if (!wq->cq) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create CQ for Clock Queue.");
+		goto error;
+	}
+	wq->cq_dbrec = RTE_PTR_ADD(wq->cq_buf, umem_dbrec);
+	wq->cq_ci = 0;
+	/* Allocate memory buffer for Send Queue WQEs. */
+	if (sh->txpp.test) {
+		wq->sq_size = RTE_ALIGN(MLX5_TXPP_TEST_PKT_SIZE +
+					MLX5_WQE_CSEG_SIZE +
+					2 * MLX5_WQE_ESEG_SIZE -
+					MLX5_ESEG_MIN_INLINE_SIZE,
+					MLX5_WQE_SIZE) / MLX5_WQE_SIZE;
+		wq->sq_size *= MLX5_TXPP_CLKQ_SIZE;
+	} else {
+		wq->sq_size = MLX5_TXPP_CLKQ_SIZE;
+	}
+	/* There should not be WQE leftovers in the cyclic queue. */
+	MLX5_ASSERT(wq->sq_size == (1 << log2above(wq->sq_size)));
+	umem_size =  MLX5_WQE_SIZE * wq->sq_size;
+	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+	umem_size += MLX5_DBR_SIZE;
+	wq->sq_buf = rte_zmalloc_socket(__func__, umem_size,
+					page_size, sh->numa_node);
+	if (!wq->sq_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for Clock Queue.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Register allocated buffer in user space with DevX. */
+	wq->sq_umem = mlx5_glue->devx_umem_reg(sh->ctx,
+					       (void *)(uintptr_t)wq->sq_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!wq->sq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to register umem for Clock Queue.");
+		goto error;
+	}
+	/* Create send queue object for Clock Queue. */
+	if (sh->txpp.test) {
+		sq_attr.tis_lst_sz = 1;
+		sq_attr.tis_num = sh->tis->id;
+		sq_attr.non_wire = 0;
+		sq_attr.static_sq_wq = 1;
+	} else {
+		sq_attr.non_wire = 1;
+		sq_attr.static_sq_wq = 1;
+	}
+	sq_attr.state = MLX5_SQC_STATE_RST;
+	sq_attr.cqn = wq->cq->id;
+	sq_attr.wq_attr.cd_slave = 1;
+	sq_attr.wq_attr.uar_page = sh->tx_uar->page_id;
+	sq_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
+	sq_attr.wq_attr.pd = sh->pdn;
+	sq_attr.wq_attr.log_wq_stride = rte_log2_u32(MLX5_WQE_SIZE);
+	sq_attr.wq_attr.log_wq_sz = rte_log2_u32(wq->sq_size);
+	sq_attr.wq_attr.dbr_umem_valid = 1;
+	sq_attr.wq_attr.dbr_addr = umem_dbrec;
+	sq_attr.wq_attr.dbr_umem_id = wq->sq_umem->umem_id;
+	sq_attr.wq_attr.wq_umem_valid = 1;
+	sq_attr.wq_attr.wq_umem_id = wq->sq_umem->umem_id;
+	/* umem_offset must be zero for static_sq_wq queue. */
+	sq_attr.wq_attr.wq_umem_offset = 0;
+	wq->sq = mlx5_devx_cmd_create_sq(sh->ctx, &sq_attr);
+	if (!wq->sq) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create SQ for Clock Queue.");
+		goto error;
+	}
+	wq->sq_dbrec = RTE_PTR_ADD(wq->sq_buf, umem_dbrec +
+				   MLX5_SND_DBR * sizeof(uint32_t));
+	/* Build the WQEs in the Send Queue before goto Ready state. */
+	mlx5_txpp_fill_wqe_clock_queue(sh);
+	/* Change queue state to ready. */
+	msq_attr.sq_state = MLX5_SQC_STATE_RST;
+	msq_attr.state = MLX5_SQC_STATE_RDY;
+	wq->sq_ci = 0;
+	ret = mlx5_devx_cmd_modify_sq(wq->sq, &msq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set SQ ready state Clock Queue.");
+		goto error;
+	}
+	return 0;
+error:
+	ret = -rte_errno;
+	mlx5_txpp_destroy_clock_queue(sh);
+	rte_errno = -ret;
+	return ret;
+}
+
+/*
+ * The routine initializes the packet pacing infrastructure:
+ * - allocates PP context
+ * - Clock CQ/SQ
+ * - Rearm CQ/SQ
+ * - attaches rearm interrupt handler
+ *
+ * Returns 0 on success, negative otherwise
+ */
+static int
+mlx5_txpp_create(struct mlx5_dev_ctx_shared *sh, struct mlx5_priv *priv)
+{
+	int tx_pp = priv->config.tx_pp;
+	int ret;
+
+	/* Store the requested pacing parameters. */
+	sh->txpp.tick = tx_pp >= 0 ? tx_pp : -tx_pp;
+	sh->txpp.test = !!(tx_pp < 0);
+	sh->txpp.skew = priv->config.tx_skew;
+	sh->txpp.freq = priv->config.hca_attr.dev_freq_khz;
+	ret = mlx5_txpp_create_eqn(sh);
+	if (ret)
+		goto exit;
+	ret = mlx5_txpp_create_clock_queue(sh);
+	if (ret)
+		goto exit;
+exit:
+	if (ret) {
+		mlx5_txpp_destroy_clock_queue(sh);
+		mlx5_txpp_destroy_eqn(sh);
+		sh->txpp.tick = 0;
+		sh->txpp.test = 0;
+		sh->txpp.skew = 0;
+	}
+	return ret;
+}
+
+/*
+ * The routine destroys the packet pacing infrastructure:
+ * - detaches rearm interrupt handler
+ * - Rearm CQ/SQ
+ * - Clock CQ/SQ
+ * - PP context
+ */
+static void
+mlx5_txpp_destroy(struct mlx5_dev_ctx_shared *sh)
+{
+	mlx5_txpp_destroy_clock_queue(sh);
+	mlx5_txpp_destroy_eqn(sh);
+	sh->txpp.tick = 0;
+	sh->txpp.test = 0;
+	sh->txpp.skew = 0;
+}
+
+/**
+ * Creates and starts packet pacing infrastructure on specified device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_txpp_start(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	int err = 0;
+	int ret;
+
+	if (!priv->config.tx_pp) {
+		/* Packet pacing is not requested for the device. */
+		MLX5_ASSERT(priv->txpp_en == 0);
+		return 0;
+	}
+	if (priv->txpp_en) {
+		/* Packet pacing is already enabled for the device. */
+		MLX5_ASSERT(sh->txpp.refcnt);
+		return 0;
+	}
+	if (priv->config.tx_pp > 0) {
+		ret = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+		if (ret < 0)
+			return 0;
+	}
+	ret = pthread_mutex_lock(&sh->txpp.mutex);
+	MLX5_ASSERT(!ret);
+	RTE_SET_USED(ret);
+	if (sh->txpp.refcnt) {
+		priv->txpp_en = 1;
+		++sh->txpp.refcnt;
+	} else {
+		err = mlx5_txpp_create(sh, priv);
+		if (!err) {
+			MLX5_ASSERT(sh->txpp.tick);
+			priv->txpp_en = 1;
+			sh->txpp.refcnt = 1;
+		} else {
+			rte_errno = -err;
+		}
+	}
+	ret = pthread_mutex_unlock(&sh->txpp.mutex);
+	MLX5_ASSERT(!ret);
+	RTE_SET_USED(ret);
+	return err;
+}
+
+/**
+ * Stops and destroys packet pacing infrastructure on specified device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+void
+mlx5_txpp_stop(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	int ret;
+
+	if (!priv->txpp_en) {
+		/* Packet pacing is already disabled for the device. */
+		return;
+	}
+	priv->txpp_en = 0;
+	ret = pthread_mutex_lock(&sh->txpp.mutex);
+	MLX5_ASSERT(!ret);
+	RTE_SET_USED(ret);
+	MLX5_ASSERT(sh->txpp.refcnt);
+	if (!sh->txpp.refcnt || --sh->txpp.refcnt)
+		return;
+	/* No references any more, do actual destroy. */
+	mlx5_txpp_destroy(sh);
+	ret = pthread_mutex_unlock(&sh->txpp.mutex);
+	MLX5_ASSERT(!ret);
+	RTE_SET_USED(ret);
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 06/17] net/mlx5: create rearm queue for packet pacing
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 05/17] net/mlx5: create clock queue for packet pacing Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 07/17] net/mlx5: create Tx queues with DevX Viacheslav Ovsiienko
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The dedicated Rearm Queue is needed to fire the work requests to
the Clock Queue in realtime. The Clock Queue should never stop,
otherwise the clock synchronization mignt be broken and packet
send scheduling would fail. The Rearm Queue uses cross channel
SEND_EN/WAIT operations to provides the requests to the
CLock Queue in robust way.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |   1 +
 drivers/net/mlx5/mlx5_defs.h |   5 +-
 drivers/net/mlx5/mlx5_txpp.c | 203 ++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 205 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0b73b2a..61a93f9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -567,6 +567,7 @@ struct mlx5_dev_txpp {
 	struct rte_intr_handle intr_handle; /* Periodic interrupt. */
 	struct mlx5dv_devx_event_channel *echan; /* Event Channel. */
 	struct mlx5_txpp_wq clock_queue; /* Clock Queue. */
+	struct mlx5_txpp_wq rearm_queue; /* Clock Queue. */
 };
 
 /*
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 07a2b59..a8626a4 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -173,11 +173,14 @@
 
 /* Tx accurate scheduling on timestamps parameters. */
 #define MLX5_TXPP_CLKQ_SIZE 1
+#define MLX5_TXPP_REARM	((1UL << MLX5_WQ_INDEX_WIDTH) / 4)
+#define MLX5_TXPP_REARM_SQ_SIZE (((1UL << MLX5_CQ_INDEX_WIDTH) / \
+				  MLX5_TXPP_REARM) * 2)
+#define MLX5_TXPP_REARM_CQ_SIZE (MLX5_TXPP_REARM_SQ_SIZE / 2)
 /* The minimal size test packet to put into one WQE, padded by HW. */
 #define MLX5_TXPP_TEST_PKT_SIZE (sizeof(struct rte_ether_hdr) +	\
 				 sizeof(struct rte_ipv4_hdr))
 
-
 /* Size of the simple hash table for metadata register table. */
 #define MLX5_FLOW_MREG_HTABLE_SZ 4096
 #define MLX5_FLOW_MREG_HNAME "MARK_COPY_TABLE"
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 382bd20..f600fc5 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -9,6 +9,7 @@
 
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
+#include "mlx5_common_os.h"
 
 /* Destroy Event Queue Notification Channel. */
 static void
@@ -48,10 +49,8 @@
 }
 
 static void
-mlx5_txpp_destroy_clock_queue(struct mlx5_dev_ctx_shared *sh)
+mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
-	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
-
 	if (wq->sq)
 		claim_zero(mlx5_devx_cmd_destroy(wq->sq));
 	if (wq->sq_umem)
@@ -68,6 +67,199 @@
 }
 
 static void
+mlx5_txpp_destroy_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+
+	mlx5_txpp_destroy_send_queue(wq);
+}
+
+static void
+mlx5_txpp_destroy_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+
+	mlx5_txpp_destroy_send_queue(wq);
+}
+
+static void
+mlx5_txpp_fill_cqe_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	struct mlx5_cqe *cqe = (struct mlx5_cqe *)(uintptr_t)wq->cqes;
+	uint32_t i;
+
+	for (i = 0; i < MLX5_TXPP_REARM_CQ_SIZE; i++) {
+		cqe->op_own = (MLX5_CQE_INVALID << 4) | MLX5_CQE_OWNER_MASK;
+		++cqe;
+	}
+}
+
+static void
+mlx5_txpp_fill_wqe_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	struct mlx5_wqe *wqe = (struct mlx5_wqe *)(uintptr_t)wq->wqes;
+	uint32_t i;
+
+	for (i = 0; i < wq->sq_size; i += 2) {
+		struct mlx5_wqe_cseg *cs;
+		struct mlx5_wqe_qseg *qs;
+		uint32_t index;
+
+		/* Build SEND_EN request with slave WQE index. */
+		cs = &wqe[i + 0].cseg;
+		cs->opcode = RTE_BE32(MLX5_OPCODE_SEND_EN | 0);
+		cs->sq_ds = rte_cpu_to_be_32((wq->sq->id << 8) | 2);
+		cs->flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+				     MLX5_COMP_MODE_OFFSET);
+		cs->misc = RTE_BE32(0);
+		qs = RTE_PTR_ADD(cs, sizeof(struct mlx5_wqe_cseg));
+		index = (i * MLX5_TXPP_REARM / 2 + MLX5_TXPP_REARM) &
+			((1 << MLX5_WQ_INDEX_WIDTH) - 1);
+		qs->max_index = rte_cpu_to_be_32(index);
+		qs->qpn_cqn = rte_cpu_to_be_32(sh->txpp.clock_queue.sq->id);
+		/* Build WAIT request with slave CQE index. */
+		cs = &wqe[i + 1].cseg;
+		cs->opcode = RTE_BE32(MLX5_OPCODE_WAIT | 0);
+		cs->sq_ds = rte_cpu_to_be_32((wq->sq->id << 8) | 2);
+		cs->flags = RTE_BE32(MLX5_COMP_ONLY_ERR <<
+				     MLX5_COMP_MODE_OFFSET);
+		cs->misc = RTE_BE32(0);
+		qs = RTE_PTR_ADD(cs, sizeof(struct mlx5_wqe_cseg));
+		index = (i * MLX5_TXPP_REARM / 2 + MLX5_TXPP_REARM / 2) &
+			((1 << MLX5_CQ_INDEX_WIDTH) - 1);
+		qs->max_index = rte_cpu_to_be_32(index);
+		qs->qpn_cqn = rte_cpu_to_be_32(sh->txpp.clock_queue.cq->id);
+	}
+}
+
+/* Creates the Rearm Queue to fire the requests to Clock Queue in realtime. */
+static int
+mlx5_txpp_create_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_devx_create_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
+	struct mlx5_devx_cq_attr cq_attr = { 0 };
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	size_t page_size = sysconf(_SC_PAGESIZE);
+	uint32_t umem_size, umem_dbrec;
+	int ret;
+
+	/* Allocate memory buffer for CQEs and doorbell record. */
+	umem_size = sizeof(struct mlx5_cqe) * MLX5_TXPP_REARM_CQ_SIZE;
+	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+	umem_size += MLX5_DBR_SIZE;
+	wq->cq_buf = rte_zmalloc_socket(__func__, umem_size,
+					page_size, sh->numa_node);
+	if (!wq->cq_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for Rearm Queue.");
+		return -ENOMEM;
+	}
+	/* Register allocated buffer in user space with DevX. */
+	wq->cq_umem = mlx5_glue->devx_umem_reg(sh->ctx,
+					       (void *)(uintptr_t)wq->cq_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!wq->cq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to register umem for Rearm Queue.");
+		goto error;
+	}
+	/* Create completion queue object for Rearm Queue. */
+	cq_attr.cqe_size = (sizeof(struct mlx5_cqe) == 128) ?
+			    MLX5_CQE_SIZE_128B : MLX5_CQE_SIZE_64B;
+	cq_attr.uar_page_id = sh->tx_uar->page_id;
+	cq_attr.eqn = sh->txpp.eqn;
+	cq_attr.q_umem_valid = 1;
+	cq_attr.q_umem_offset = 0;
+	cq_attr.q_umem_id = mlx5_os_get_umem_id(wq->cq_umem);
+	cq_attr.db_umem_valid = 1;
+	cq_attr.db_umem_offset = umem_dbrec;
+	cq_attr.db_umem_id = mlx5_os_get_umem_id(wq->cq_umem);
+	cq_attr.log_cq_size = rte_log2_u32(MLX5_TXPP_REARM_CQ_SIZE);
+	cq_attr.log_page_size = rte_log2_u32(page_size);
+	wq->cq = mlx5_devx_cmd_create_cq(sh->ctx, &cq_attr);
+	if (!wq->cq) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create CQ for Rearm Queue.");
+		goto error;
+	}
+	wq->cq_dbrec = RTE_PTR_ADD(wq->cq_buf, umem_dbrec);
+	wq->cq_ci = 0;
+	wq->arm_sn = 0;
+	/* Mark all CQEs initially as invalid. */
+	mlx5_txpp_fill_cqe_rearm_queue(sh);
+	/*
+	 * Allocate memory buffer for Send Queue WQEs.
+	 * There should be no WQE leftovers in the cyclic queue.
+	 */
+	wq->sq_size = MLX5_TXPP_REARM_SQ_SIZE;
+	MLX5_ASSERT(wq->sq_size == (1 << log2above(wq->sq_size)));
+	umem_size =  MLX5_WQE_SIZE * wq->sq_size;
+	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+	umem_size += MLX5_DBR_SIZE;
+	wq->sq_buf = rte_zmalloc_socket(__func__, umem_size,
+					page_size, sh->numa_node);
+	if (!wq->sq_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for Rearm Queue.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Register allocated buffer in user space with DevX. */
+	wq->sq_umem = mlx5_glue->devx_umem_reg(sh->ctx,
+					       (void *)(uintptr_t)wq->sq_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!wq->sq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to register umem for Rearm Queue.");
+		goto error;
+	}
+	/* Create send queue object for Rearm Queue. */
+	sq_attr.state = MLX5_SQC_STATE_RST;
+	sq_attr.tis_lst_sz = 1;
+	sq_attr.tis_num = sh->tis->id;
+	sq_attr.cqn = wq->cq->id;
+	sq_attr.cd_master = 1;
+	sq_attr.wq_attr.uar_page = sh->tx_uar->page_id;
+	sq_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
+	sq_attr.wq_attr.pd = sh->pdn;
+	sq_attr.wq_attr.log_wq_stride = rte_log2_u32(MLX5_WQE_SIZE);
+	sq_attr.wq_attr.log_wq_sz = rte_log2_u32(wq->sq_size);
+	sq_attr.wq_attr.dbr_umem_valid = 1;
+	sq_attr.wq_attr.dbr_addr = umem_dbrec;
+	sq_attr.wq_attr.dbr_umem_id = mlx5_os_get_umem_id(wq->sq_umem);
+	sq_attr.wq_attr.wq_umem_valid = 1;
+	sq_attr.wq_attr.wq_umem_id = mlx5_os_get_umem_id(wq->sq_umem);
+	sq_attr.wq_attr.wq_umem_offset = 0;
+	wq->sq = mlx5_devx_cmd_create_sq(sh->ctx, &sq_attr);
+	if (!wq->sq) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create SQ for Rearm Queue.");
+		goto error;
+	}
+	wq->sq_dbrec = RTE_PTR_ADD(wq->sq_buf, umem_dbrec +
+				   MLX5_SND_DBR * sizeof(uint32_t));
+	/* Build the WQEs in the Send Queue before goto Ready state. */
+	mlx5_txpp_fill_wqe_rearm_queue(sh);
+	/* Change queue state to ready. */
+	msq_attr.sq_state = MLX5_SQC_STATE_RST;
+	msq_attr.state = MLX5_SQC_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_sq(wq->sq, &msq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set SQ ready state Rearm Queue.");
+		goto error;
+	}
+	return 0;
+error:
+	ret = -rte_errno;
+	mlx5_txpp_destroy_rearm_queue(sh);
+	rte_errno = -ret;
+	return ret;
+}
+
+static void
 mlx5_txpp_fill_wqe_clock_queue(struct mlx5_dev_ctx_shared *sh)
 {
 	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
@@ -331,8 +523,12 @@
 	ret = mlx5_txpp_create_clock_queue(sh);
 	if (ret)
 		goto exit;
+	ret = mlx5_txpp_create_rearm_queue(sh);
+	if (ret)
+		goto exit;
 exit:
 	if (ret) {
+		mlx5_txpp_destroy_rearm_queue(sh);
 		mlx5_txpp_destroy_clock_queue(sh);
 		mlx5_txpp_destroy_eqn(sh);
 		sh->txpp.tick = 0;
@@ -352,6 +548,7 @@
 static void
 mlx5_txpp_destroy(struct mlx5_dev_ctx_shared *sh)
 {
+	mlx5_txpp_destroy_rearm_queue(sh);
 	mlx5_txpp_destroy_clock_queue(sh);
 	mlx5_txpp_destroy_eqn(sh);
 	sh->txpp.tick = 0;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 07/17] net/mlx5: create Tx queues with DevX
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (5 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 06/17] net/mlx5: create rearm " Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 08/17] net/mlx5: allocate packet pacing context Viacheslav Ovsiienko
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

To provide the packet send schedule on mbuf timestamp the Tx
queue must be attached to the same UAR as Clock Queue is.
UAR is special hardware related resource mapped to the host
memory and provides doorbell registers, the assigning UAR
to the queue being created is provided via DevX API only.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.c    | 108 +++++++++-----
 drivers/net/mlx5/mlx5_rxtx.h    |  14 ++
 drivers/net/mlx5/mlx5_trigger.c |   4 +-
 drivers/net/mlx5/mlx5_txq.c     | 306 +++++++++++++++++++++++++++++++++++++++-
 4 files changed, 392 insertions(+), 40 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 894f441..227289e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -944,43 +944,79 @@ enum mlx5_txcmp_code {
 		struct mlx5_txq_data *txq = (*priv->txqs)[sm->queue_id];
 		struct mlx5_txq_ctrl *txq_ctrl =
 			container_of(txq, struct mlx5_txq_ctrl, txq);
-		struct ibv_qp_attr mod = {
-			.qp_state = IBV_QPS_RESET,
-			.port_num = (uint8_t)priv->dev_port,
-		};
-		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
-		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-		if (ret) {
-			DRV_LOG(ERR, "Cannot change the Tx QP state to RESET "
-				"%s", strerror(errno));
-			rte_errno = errno;
-			return ret;
-		}
-		mod.qp_state = IBV_QPS_INIT;
-		ret = mlx5_glue->modify_qp(qp, &mod,
-					   (IBV_QP_STATE | IBV_QP_PORT));
-		if (ret) {
-			DRV_LOG(ERR, "Cannot change Tx QP state to INIT %s",
-				strerror(errno));
-			rte_errno = errno;
-			return ret;
-		}
-		mod.qp_state = IBV_QPS_RTR;
-		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-		if (ret) {
-			DRV_LOG(ERR, "Cannot change Tx QP state to RTR %s",
-				strerror(errno));
-			rte_errno = errno;
-			return ret;
-		}
-		mod.qp_state = IBV_QPS_RTS;
-		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-		if (ret) {
-			DRV_LOG(ERR, "Cannot change Tx QP state to RTS %s",
-				strerror(errno));
-			rte_errno = errno;
-			return ret;
+		if (txq_ctrl->obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_SQ) {
+			struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
+
+			/* Change queue state to reset. */
+			msq_attr.sq_state = MLX5_SQC_STATE_ERR;
+			msq_attr.state = MLX5_SQC_STATE_RST;
+			ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq_devx,
+						      &msq_attr);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to RESET %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+			/* Change queue state to ready. */
+			msq_attr.sq_state = MLX5_SQC_STATE_RST;
+			msq_attr.state = MLX5_SQC_STATE_RDY;
+			ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq_devx,
+						      &msq_attr);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to READY %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+		} else {
+			struct ibv_qp_attr mod = {
+				.qp_state = IBV_QPS_RESET,
+				.port_num = (uint8_t)priv->dev_port,
+			};
+			struct ibv_qp *qp = txq_ctrl->obj->qp;
+
+			MLX5_ASSERT
+				(txq_ctrl->obj->type == MLX5_TXQ_OBJ_TYPE_IBV);
+
+			ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to RESET %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+			mod.qp_state = IBV_QPS_INIT;
+			ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to INIT %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+			mod.qp_state = IBV_QPS_RTR;
+			ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to RTR %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+			mod.qp_state = IBV_QPS_RTS;
+			ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to RTS %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
 		}
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 26621ff..1b797da 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -323,6 +323,7 @@ struct mlx5_txq_data {
 
 enum mlx5_txq_obj_type {
 	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_SQ,	/* mlx5_txq_obj with mlx5_devx_sq. */
 	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
 	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
 };
@@ -349,6 +350,19 @@ struct mlx5_txq_obj {
 			/* DevX object for Sx queue. */
 			struct mlx5_devx_obj *tis; /* The TIS object. */
 		};
+		struct {
+			struct rte_eth_dev *dev;
+			struct mlx5_devx_obj *cq_devx;
+			struct mlx5dv_devx_umem *cq_umem;
+			void *cq_buf;
+			int64_t cq_dbrec_offset;
+			struct mlx5_devx_dbr_page *cq_dbrec_page;
+			struct mlx5_devx_obj *sq_devx;
+			struct mlx5dv_devx_umem *sq_umem;
+			void *sq_buf;
+			int64_t sq_dbrec_offset;
+			struct mlx5_devx_dbr_page *sq_dbrec_page;
+		};
 	};
 };
 
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index ca25ad9..e324319 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -59,7 +59,9 @@
 		} else {
 			txq_alloc_elts(txq_ctrl);
 			txq_ctrl->obj = mlx5_txq_obj_new
-				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+				(dev, i, priv->txpp_en ?
+				MLX5_TXQ_OBJ_TYPE_DEVX_SQ :
+				MLX5_TXQ_OBJ_TYPE_IBV);
 		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index e1fa24e..f9ed504 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -31,6 +31,7 @@
 #include <mlx5_devx_cmds.h>
 #include <mlx5_common.h>
 #include <mlx5_common_mr.h>
+#include <mlx5_common_os.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
@@ -573,6 +574,297 @@
 }
 
 /**
+ * Destroy the Tx queue DevX object.
+ *
+ * @param txq_obj
+ *   Txq object to destroy
+ */
+static void
+txq_release_sq_resources(struct mlx5_txq_obj *txq_obj)
+{
+	MLX5_ASSERT(txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_SQ);
+
+	if (txq_obj->sq_devx)
+		claim_zero(mlx5_devx_cmd_destroy(txq_obj->sq_devx));
+	if (txq_obj->sq_dbrec_page)
+		claim_zero(mlx5_release_dbr
+				(&txq_obj->txq_ctrl->priv->dbrpgs,
+				mlx5_os_get_umem_id
+					(txq_obj->sq_dbrec_page->umem),
+				txq_obj->sq_dbrec_offset));
+	if (txq_obj->sq_umem)
+		claim_zero(mlx5_glue->devx_umem_dereg(txq_obj->sq_umem));
+	if (txq_obj->sq_buf)
+		rte_free(txq_obj->sq_buf);
+	if (txq_obj->cq_devx)
+		claim_zero(mlx5_devx_cmd_destroy(txq_obj->cq_devx));
+	if (txq_obj->cq_dbrec_page)
+		claim_zero(mlx5_release_dbr
+				(&txq_obj->txq_ctrl->priv->dbrpgs,
+				mlx5_os_get_umem_id
+					(txq_obj->cq_dbrec_page->umem),
+				txq_obj->cq_dbrec_offset));
+	if (txq_obj->cq_umem)
+		claim_zero(mlx5_glue->devx_umem_dereg(txq_obj->cq_umem));
+	if (txq_obj->cq_buf)
+		rte_free(txq_obj->cq_buf);
+}
+
+/**
+ * Create the Tx queue DevX object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_devx_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+#ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
+	DRV_LOG(ERR, "port %u Tx queue %u cannot create with DevX, no UAR",
+		     dev->data->port_id, idx);
+	rte_errno = ENOMEM;
+	return NULL;
+#else
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
+	struct mlx5_devx_cq_attr cq_attr = { 0 };
+	struct mlx5_txq_obj *txq_obj = NULL;
+	size_t page_size = sysconf(_SC_PAGESIZE);
+	struct mlx5_cqe *cqe;
+	uint32_t i, nqe;
+	int ret = 0;
+
+	MLX5_ASSERT(txq_data);
+	MLX5_ASSERT(!txq_ctrl->obj);
+	txq_obj = rte_calloc_socket(__func__, 1,
+				    sizeof(struct mlx5_txq_obj), 0,
+				    txq_ctrl->socket);
+	if (!txq_obj) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	txq_obj->type = MLX5_TXQ_OBJ_TYPE_DEVX_SQ;
+	txq_obj->txq_ctrl = txq_ctrl;
+	txq_obj->dev = dev;
+	/* Create the Completion Queue. */
+	nqe = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
+	       1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	nqe = 1UL << log2above(nqe);
+	if (nqe > UINT16_MAX) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u requests to many CQEs %u",
+			dev->data->port_id, txq_data->idx, nqe);
+		rte_errno = EINVAL;
+		goto error;
+	}
+	/* Allocate memory buffer for CQEs. */
+	txq_obj->cq_buf = rte_zmalloc_socket(__func__,
+					     nqe * sizeof(struct mlx5_cqe),
+					     MLX5_CQE_BUF_ALIGNMENT,
+					     sh->numa_node);
+	if (!txq_obj->cq_buf) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory (CQ)",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	txq_data->cqe_n = log2above(nqe);
+	txq_data->cqe_s = 1 << txq_data->cqe_n;
+	txq_data->cqe_m = txq_data->cqe_s - 1;
+	txq_data->cqes = (volatile struct mlx5_cqe *)txq_obj->cq_buf;
+	txq_data->cq_ci = 0;
+	txq_data->cq_pi = 0;
+	/* Register allocated buffer in user space with DevX. */
+	txq_obj->cq_umem = mlx5_glue->devx_umem_reg
+					(sh->ctx,
+					 (void *)txq_obj->cq_buf,
+					 nqe * sizeof(struct mlx5_cqe),
+					 IBV_ACCESS_LOCAL_WRITE);
+	if (!txq_obj->cq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot register memory (CQ)",
+			dev->data->port_id, txq_data->idx);
+		goto error;
+	}
+	/* Allocate doorbell record for completion queue. */
+	txq_obj->cq_dbrec_offset = mlx5_get_dbr(sh->ctx,
+						&priv->dbrpgs,
+						&txq_obj->cq_dbrec_page);
+	if (txq_obj->cq_dbrec_offset < 0)
+		goto error;
+	txq_data->cq_db = (volatile uint32_t *)(txq_obj->cq_dbrec_page->dbrs +
+						txq_obj->cq_dbrec_offset);
+	*txq_data->cq_db = 0;
+	/* Create completion queue object with DevX. */
+	cq_attr.cqe_size = (sizeof(struct mlx5_cqe) == 128) ?
+			    MLX5_CQE_SIZE_128B : MLX5_CQE_SIZE_64B;
+	cq_attr.uar_page_id = sh->tx_uar->page_id;
+	cq_attr.eqn = sh->txpp.eqn;
+	cq_attr.q_umem_valid = 1;
+	cq_attr.q_umem_offset = (uintptr_t)txq_obj->cq_buf % page_size;
+	cq_attr.q_umem_id = txq_obj->cq_umem->umem_id;
+	cq_attr.db_umem_valid = 1;
+	cq_attr.db_umem_offset = txq_obj->cq_dbrec_offset;
+	cq_attr.db_umem_id = mlx5_os_get_umem_id(txq_obj->cq_dbrec_page->umem);
+	cq_attr.log_cq_size = rte_log2_u32(nqe);
+	cq_attr.log_page_size = rte_log2_u32(page_size);
+	txq_obj->cq_devx = mlx5_devx_cmd_create_cq(sh->ctx, &cq_attr);
+	if (!txq_obj->cq_devx) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure",
+			dev->data->port_id, idx);
+		goto error;
+	}
+	/* Initial fill CQ buffer with invalid CQE opcode. */
+	cqe = (struct mlx5_cqe *)txq_obj->cq_buf;
+	for (i = 0; i < txq_data->cqe_s; i++) {
+		cqe->op_own = (MLX5_CQE_INVALID << 4) | MLX5_CQE_OWNER_MASK;
+		++cqe;
+	}
+	/* Create the Work Queue. */
+	nqe = RTE_MIN(1UL << txq_data->elts_n,
+		      (uint32_t)sh->device_attr.max_qp_wr);
+	txq_obj->sq_buf = rte_zmalloc_socket(__func__,
+					     nqe * sizeof(struct mlx5_wqe),
+					     page_size,
+					     sh->numa_node);
+	if (!txq_obj->sq_buf) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory (SQ)",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	txq_data->wqe_n = log2above(nqe);
+	txq_data->wqe_s = 1 << txq_data->wqe_n;
+	txq_data->wqe_m = txq_data->wqe_s - 1;
+	txq_data->wqes = (struct mlx5_wqe *)txq_obj->sq_buf;
+	txq_data->wqes_end = txq_data->wqes + txq_data->wqe_s;
+	txq_data->wqe_ci = 0;
+	txq_data->wqe_pi = 0;
+	txq_data->wqe_comp = 0;
+	txq_data->wqe_thres = txq_data->wqe_s / MLX5_TX_COMP_THRESH_INLINE_DIV;
+	/* Register allocated buffer in user space with DevX. */
+	txq_obj->sq_umem = mlx5_glue->devx_umem_reg
+					(sh->ctx,
+					 (void *)txq_obj->sq_buf,
+					 nqe * sizeof(struct mlx5_wqe),
+					 IBV_ACCESS_LOCAL_WRITE);
+	if (!txq_obj->sq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot register memory (SQ)",
+			dev->data->port_id, txq_data->idx);
+		goto error;
+	}
+	/* Allocate doorbell record for completion queue. */
+	txq_obj->cq_dbrec_offset = mlx5_get_dbr(sh->ctx,
+						&priv->dbrpgs,
+						&txq_obj->sq_dbrec_page);
+	if (txq_obj->sq_dbrec_offset < 0)
+		goto error;
+	txq_data->qp_db = (volatile uint32_t *)
+					(txq_obj->sq_dbrec_page->dbrs +
+					 txq_obj->sq_dbrec_offset +
+					 MLX5_SND_DBR * sizeof(uint32_t));
+	*txq_data->qp_db = 0;
+	/* Create Send Queue object with DevX. */
+	sq_attr.tis_lst_sz = 1;
+	sq_attr.tis_num = sh->tis->id;
+	sq_attr.state = MLX5_SQC_STATE_RST;
+	sq_attr.cqn = txq_obj->cq_devx->id;
+	sq_attr.flush_in_error_en = 1;
+	sq_attr.allow_multi_pkt_send_wqe = !!priv->config.mps;
+	sq_attr.allow_swp = !!priv->config.swp;
+	sq_attr.min_wqe_inline_mode = priv->config.hca_attr.vport_inline_mode;
+	sq_attr.wq_attr.uar_page = sh->tx_uar->page_id;
+	sq_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
+	sq_attr.wq_attr.pd = sh->pdn;
+	sq_attr.wq_attr.log_wq_stride = rte_log2_u32(MLX5_WQE_SIZE);
+	sq_attr.wq_attr.log_wq_sz = txq_data->wqe_n;
+	sq_attr.wq_attr.dbr_umem_valid = 1;
+	sq_attr.wq_attr.dbr_addr = txq_obj->cq_dbrec_offset;
+	sq_attr.wq_attr.dbr_umem_id =
+			mlx5_os_get_umem_id(txq_obj->cq_dbrec_page->umem);
+	sq_attr.wq_attr.wq_umem_valid = 1;
+	sq_attr.wq_attr.wq_umem_id = txq_obj->sq_umem->umem_id;
+	sq_attr.wq_attr.wq_umem_offset = (uintptr_t)txq_obj->sq_buf % page_size;
+	txq_obj->sq_devx = mlx5_devx_cmd_create_sq(sh->ctx, &sq_attr);
+	if (!txq_obj->sq_devx) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "port %u Tx queue %u SQ creation failure",
+			dev->data->port_id, idx);
+		goto error;
+	}
+	txq_data->qp_num_8s = txq_obj->sq_devx->id << 8;
+	/* Change Send Queue state to Ready-to-Send. */
+	msq_attr.sq_state = MLX5_SQC_STATE_RST;
+	msq_attr.state = MLX5_SQC_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_sq(txq_obj->sq_devx, &msq_attr);
+	if (ret) {
+		rte_errno = errno;
+		DRV_LOG(ERR,
+			"port %u Tx queue %u SP state to SQC_STATE_RDY failed",
+			dev->data->port_id, idx);
+		goto error;
+	}
+	txq_data->fcqs = rte_calloc_socket(__func__,
+					   txq_data->cqe_s,
+					   sizeof(*txq_data->fcqs),
+					   RTE_CACHE_LINE_SIZE,
+					   txq_ctrl->socket);
+	if (!txq_data->fcqs) {
+		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory (FCQ)",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	/*
+	 * If using DevX need to query and store TIS transport domain value.
+	 * This is done once per port.
+	 * Will use this value on Rx, when creating matching TIR.
+	 */
+	if (priv->config.devx && !priv->sh->tdn)
+		priv->sh->tdn = priv->sh->td->id;
+#endif
+	MLX5_ASSERT(sh->tx_uar);
+	MLX5_ASSERT(sh->tx_uar->reg_addr);
+	txq_ctrl->bf_reg = sh->tx_uar->reg_addr;
+	txq_ctrl->uar_mmap_offset = sh->tx_uar->mmap_off;
+	rte_atomic32_set(&txq_obj->refcnt, 1);
+	txq_uar_init(txq_ctrl);
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	return txq_obj;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	txq_release_sq_resources(txq_obj);
+	if (txq_data->fcqs) {
+		rte_free(txq_data->fcqs);
+		txq_data->fcqs = NULL;
+	}
+	rte_free(txq_obj);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+#endif
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
@@ -609,6 +901,8 @@ struct mlx5_txq_obj *
 
 	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
 		return mlx5_txq_obj_hairpin_new(dev, idx);
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_SQ)
+		return mlx5_txq_obj_devx_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -817,8 +1111,10 @@ struct mlx5_txq_obj *
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_data->fcqs)
+	if (txq_data && txq_data->fcqs) {
 		rte_free(txq_data->fcqs);
+		txq_data->fcqs = NULL;
+	}
 	if (txq_obj)
 		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
@@ -870,11 +1166,15 @@ struct mlx5_txq_obj *
 		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
 			if (txq_obj->tis)
 				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_SQ) {
+			txq_release_sq_resources(txq_obj);
 		} else {
 			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
 			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
-				if (txq_obj->txq_ctrl->txq.fcqs)
-					rte_free(txq_obj->txq_ctrl->txq.fcqs);
+		}
+		if (txq_obj->txq_ctrl->txq.fcqs) {
+			rte_free(txq_obj->txq_ctrl->txq.fcqs);
+			txq_obj->txq_ctrl->txq.fcqs = NULL;
 		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 08/17] net/mlx5: allocate packet pacing context
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (6 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 07/17] net/mlx5: create Tx queues with DevX Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 09/17] net/mlx5: introduce clock queue service routine Viacheslav Ovsiienko
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch allocates the Packet Pacing context from the kernel,
configures one according to requested pace send scheduling
granularuty and assigns to Clock Queue.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_txpp.c | 71 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 61a93f9..e8a7b10 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -568,6 +568,8 @@ struct mlx5_dev_txpp {
 	struct mlx5dv_devx_event_channel *echan; /* Event Channel. */
 	struct mlx5_txpp_wq clock_queue; /* Clock Queue. */
 	struct mlx5_txpp_wq rearm_queue; /* Clock Queue. */
+	struct mlx5dv_pp *pp; /* Packet pacing context. */
+	uint16_t pp_id; /* Packet pacing context index. */
 };
 
 /*
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index f600fc5..a0ee872 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -6,6 +6,7 @@
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
 #include <rte_malloc.h>
+#include <rte_cycles.h>
 
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
@@ -49,6 +50,69 @@
 }
 
 static void
+mlx5_txpp_free_pp_index(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->txpp.pp) {
+		mlx5_glue->dv_free_pp(sh->txpp.pp);
+		sh->txpp.pp = NULL;
+		sh->txpp.pp_id = 0;
+	}
+}
+
+/* Allocate Packet Pacing index from kernel via mlx5dv call. */
+static int
+mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
+	uint64_t rate;
+
+	MLX5_ASSERT(!sh->txpp.pp);
+	memset(&pp, 0, sizeof(pp));
+	rate = NS_PER_S / sh->txpp.tick;
+	if (rate * sh->txpp.tick != NS_PER_S)
+		DRV_LOG(WARNING, "Packet pacing frequency is not precise.");
+	if (sh->txpp.test) {
+		uint32_t len;
+
+		len = RTE_MAX(MLX5_TXPP_TEST_PKT_SIZE,
+			      (size_t)RTE_ETHER_MIN_LEN);
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, len);
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, len);
+		/* Convert packets per second into kilobits. */
+		rate = (rate * len) / (1000ul / CHAR_BIT);
+		DRV_LOG(INFO, "Packet pacing rate set to %" PRIu64, rate);
+	}
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, rate);
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode,
+		 sh->txpp.test ? MLX5_DATA_RATE : MLX5_WQE_RATE);
+	sh->txpp.pp = mlx5_glue->dv_alloc_pp
+				(sh->ctx, sizeof(pp), &pp,
+				 MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX);
+	if (sh->txpp.pp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate packet pacing index.");
+		rte_errno = errno;
+		return -errno;
+	}
+	if (!sh->txpp.pp->index) {
+		DRV_LOG(ERR, "Zero packet pacing index allocated.");
+		mlx5_txpp_free_pp_index(sh);
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	sh->txpp.pp_id = sh->txpp.pp->index;
+	return 0;
+#else
+	RTE_SET_USED(sh);
+	DRV_LOG(ERR, "Allocating pacing index is not supported.");
+	rte_errno = ENOTSUP;
+	return -ENOTSUP;
+#endif
+}
+
+static void
 mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
 	if (wq->sq)
@@ -457,6 +521,7 @@
 	}
 	sq_attr.state = MLX5_SQC_STATE_RST;
 	sq_attr.cqn = wq->cq->id;
+	sq_attr.packet_pacing_rate_limit_index = sh->txpp.pp_id;
 	sq_attr.wq_attr.cd_slave = 1;
 	sq_attr.wq_attr.uar_page = sh->tx_uar->page_id;
 	sq_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
@@ -503,6 +568,7 @@
  * - Clock CQ/SQ
  * - Rearm CQ/SQ
  * - attaches rearm interrupt handler
+ * - starts Clock Queue
  *
  * Returns 0 on success, negative otherwise
  */
@@ -520,6 +586,9 @@
 	ret = mlx5_txpp_create_eqn(sh);
 	if (ret)
 		goto exit;
+	ret = mlx5_txpp_alloc_pp_index(sh);
+	if (ret)
+		goto exit;
 	ret = mlx5_txpp_create_clock_queue(sh);
 	if (ret)
 		goto exit;
@@ -530,6 +599,7 @@
 	if (ret) {
 		mlx5_txpp_destroy_rearm_queue(sh);
 		mlx5_txpp_destroy_clock_queue(sh);
+		mlx5_txpp_free_pp_index(sh);
 		mlx5_txpp_destroy_eqn(sh);
 		sh->txpp.tick = 0;
 		sh->txpp.test = 0;
@@ -550,6 +620,7 @@
 {
 	mlx5_txpp_destroy_rearm_queue(sh);
 	mlx5_txpp_destroy_clock_queue(sh);
+	mlx5_txpp_free_pp_index(sh);
 	mlx5_txpp_destroy_eqn(sh);
 	sh->txpp.tick = 0;
 	sh->txpp.test = 0;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 09/17] net/mlx5: introduce clock queue service routine
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (7 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 08/17] net/mlx5: allocate packet pacing context Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 10/17] net/mlx5: prepare Tx queue structures to support timestamp Viacheslav Ovsiienko
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

Service routine is invoked periodically on Rearm Queue
completion interrupts, typically once per some milliseconds
(1-16) to track clock jitter and wander in robust fashion.
It performs the following:

- fetches the completed CQEs for Rearm Queue
- restarts Rearm Queue on errors
- pushes new requests to Rearm Queue to make it
  continuously running and pushing cross-channel requests
  to Clock Queue
- reads and caches the Clock Queue CQE to be used in datapath
- gathers statistics to estimate clock jitter and wander
- gathers Clock Queue errors statistics

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  15 ++
 drivers/net/mlx5/mlx5_defs.h |   1 +
 drivers/net/mlx5/mlx5_rxtx.h |  20 +++
 drivers/net/mlx5/mlx5_txpp.c | 336 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 372 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e8a7b10..e599bbb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -555,6 +555,12 @@ struct mlx5_txpp_wq {
 	volatile uint32_t *sq_dbrec;
 };
 
+/* Tx packet pacing internal timestamp. */
+struct mlx5_txpp_ts {
+	rte_atomic64_t ci_ts;
+	rte_atomic64_t ts;
+};
+
 /* Tx packet pacing structure. */
 struct mlx5_dev_txpp {
 	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
@@ -570,6 +576,15 @@ struct mlx5_dev_txpp {
 	struct mlx5_txpp_wq rearm_queue; /* Clock Queue. */
 	struct mlx5dv_pp *pp; /* Packet pacing context. */
 	uint16_t pp_id; /* Packet pacing context index. */
+	uint16_t ts_n; /* Number of captured timestamps. */
+	uint16_t ts_p; /* Pointer to statisticks timestamp. */
+	struct mlx5_txpp_ts *tsa; /* Timestamps sliding window stats. */
+	struct mlx5_txpp_ts ts; /* Cached completion id/timestamp. */
+	uint32_t sync_lost:1; /* ci/timestamp synchronization lost. */
+	/* Statistics counters. */
+	rte_atomic32_t err_miss_int; /* Missed service interrupt. */
+	rte_atomic32_t err_rearm_queue; /* Rearm Queue errors. */
+	rte_atomic32_t err_clock_queue; /* Clock Queue errors. */
 };
 
 /*
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index a8626a4..7ed3e88 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -172,6 +172,7 @@
 #define MLX5_TXDB_HEURISTIC 2
 
 /* Tx accurate scheduling on timestamps parameters. */
+#define MLX5_TXPP_WAIT_INIT_TS 1000ul /* How long to wait timestamp. */
 #define MLX5_TXPP_CLKQ_SIZE 1
 #define MLX5_TXPP_REARM	((1UL << MLX5_WQ_INDEX_WIDTH) / 4)
 #define MLX5_TXPP_REARM_SQ_SIZE (((1UL << MLX5_CQ_INDEX_WIDTH) / \
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 1b797da..8a8d2b5 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -30,6 +30,7 @@
 #include <rte_io.h>
 #include <rte_bus_pci.h>
 #include <rte_malloc.h>
+#include <rte_cycles.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_prm.h>
@@ -695,4 +696,23 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
 	mlx5_tx_dbrec_cond_wmb(txq, wqe, 1);
 }
 
+/**
+ * Convert timestamp from HW format to linear counter
+ * from Packet Pacing Clock Queue CQE timestamp format.
+ *
+ * @param sh
+ *   Pointer to the device shared context. Might be needed
+ *   to convert according current device configuration.
+ * @param ts
+ *   Timestamp from CQE to convert.
+ * @return
+ *   UTC in nanoseconds
+ */
+static __rte_always_inline uint64_t
+mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
+{
+	RTE_SET_USED(sh);
+	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index a0ee872..17d64ff 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -1,6 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2020 Mellanox Technologies, Ltd
  */
+#include <fcntl.h>
+#include <stdint.h>
+
 #include <rte_ether.h>
 #include <rte_ethdev_driver.h>
 #include <rte_interrupts.h>
@@ -144,6 +147,33 @@
 	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
 
 	mlx5_txpp_destroy_send_queue(wq);
+	if (sh->txpp.tsa) {
+		rte_free(sh->txpp.tsa);
+		sh->txpp.tsa = NULL;
+	}
+}
+
+static void
+mlx5_txpp_doorbell_rearm_queue(struct mlx5_dev_ctx_shared *sh, uint16_t ci)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	union {
+		uint32_t w32[2];
+		uint64_t w64;
+	} cs;
+
+	wq->sq_ci = ci + 1;
+	cs.w32[0] = rte_cpu_to_be_32(rte_be_to_cpu_32
+		   (wq->wqes[ci & (wq->sq_size - 1)].ctrl[0]) | (ci - 1) << 8);
+	cs.w32[1] = wq->wqes[ci & (wq->sq_size - 1)].ctrl[1];
+	/* Update SQ doorbell record with new SQ ci. */
+	rte_compiler_barrier();
+	*wq->sq_dbrec = rte_cpu_to_be_32(wq->sq_ci);
+	/* Make sure the doorbell record is updated. */
+	rte_wmb();
+	/* Write to doorbel register to start processing. */
+	__mlx5_uar_write64_relaxed(cs.w64, sh->tx_uar->reg_addr, NULL);
+	rte_wmb();
 }
 
 static void
@@ -433,6 +463,16 @@
 	uint32_t umem_size, umem_dbrec;
 	int ret;
 
+	sh->txpp.tsa = rte_zmalloc_socket(__func__,
+					   MLX5_TXPP_REARM_SQ_SIZE *
+					   sizeof(struct mlx5_txpp_ts),
+					   0, sh->numa_node);
+	if (!sh->txpp.tsa) {
+		DRV_LOG(ERR, "Failed to allocate memory for CQ stats.");
+		return -ENOMEM;
+	}
+	sh->txpp.ts_p = 0;
+	sh->txpp.ts_n = 0;
 	/* Allocate memory buffer for CQEs and doorbell record. */
 	umem_size = sizeof(struct mlx5_cqe) * MLX5_TXPP_CLKQ_SIZE;
 	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
@@ -562,6 +602,297 @@
 	return ret;
 }
 
+/* Enable notification from the Rearm Queue CQ. */
+static inline void
+mlx5_txpp_cq_arm(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *aq = &sh->txpp.rearm_queue;
+	uint32_t arm_sn = aq->arm_sn << MLX5_CQ_SQN_OFFSET;
+	uint32_t db_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | aq->cq_ci;
+	uint64_t db_be = rte_cpu_to_be_64(((uint64_t)db_hi << 32) | aq->cq->id);
+	uint32_t *addr = RTE_PTR_ADD(sh->tx_uar->base_addr, MLX5_CQ_DOORBELL);
+
+	rte_compiler_barrier();
+	aq->cq_dbrec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(db_hi);
+	rte_wmb();
+#ifdef RTE_ARCH_64
+	*(uint64_t *)addr = db_be;
+#else
+	*(uint32_t *)addr = db_be;
+	rte_io_wmb();
+	*((uint32_t *)addr + 1) = db_be >> 32;
+#endif
+	aq->arm_sn++;
+}
+
+static inline void
+mlx5_atomic_read_cqe(rte_int128_t *from, rte_int128_t *ts)
+{
+	/*
+	 * The only CQE of Clock Queue is being continuously
+	 * update by hardware with soecified rate. We have to
+	 * read timestump and WQE completion index atomically.
+	 */
+#ifdef RTE_ARCH_PPC_64
+	rte_atomic64_t *cqe = (rte_atomic64_t *)from;
+
+	/* Power architecture does not support 16B compare-and-swap. */
+	for (;;) {
+		int64_t tm, op;
+		int64_t *ps;
+
+		rte_compiler_barrier();
+		tm = rte_atomic64_read(cqe + 0);
+		op = rte_atomic64_read(cqe + 1);
+		rte_compiler_barrier();
+		if (tm != rte_atomic64_read(cqe + 0))
+			continue;
+		if (op != rte_atomic64_read(cqe + 1))
+			continue;
+		ps = (int64_t *)ts;
+		ps[0] = tm;
+		ps[1] = op;
+		return;
+	}
+#else
+	rte_int128_t src;
+
+	memset(&src, 0, sizeof(src));
+	*ts = src;
+	/* if (*from == *ts) *from = *src else *ts = *from; */
+	rte_atomic128_cmp_exchange(from, ts, &src, 0,
+				   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+#endif
+}
+
+/* Stores timestamp in the cache structure to share data with datapath. */
+static inline void
+mlx5_txpp_cache_timestamp(struct mlx5_dev_ctx_shared *sh,
+			   uint64_t ts, uint64_t ci)
+{
+	ci = ci << (64 - MLX5_CQ_INDEX_WIDTH);
+	ci |= (ts << MLX5_CQ_INDEX_WIDTH) >> MLX5_CQ_INDEX_WIDTH;
+	rte_compiler_barrier();
+	rte_atomic64_set(&sh->txpp.ts.ts, ts);
+	rte_atomic64_set(&sh->txpp.ts.ci_ts, ci);
+	rte_wmb();
+}
+
+/* Reads timestamp from Clock Queue CQE and stores in the cache. */
+static inline void
+mlx5_txpp_update_timestamp(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+	struct mlx5_cqe *cqe = (struct mlx5_cqe *)(uintptr_t)wq->cqes;
+	union {
+		rte_int128_t u128;
+		struct mlx5_cqe_ts cts;
+	} to;
+	uint64_t ts;
+	uint16_t ci;
+
+	static_assert(sizeof(struct mlx5_cqe_ts) == sizeof(rte_int128_t),
+		      "Wrong timestamp CQE part size");
+	mlx5_atomic_read_cqe((rte_int128_t *)&cqe->timestamp, &to.u128);
+	if (to.cts.op_own >> 4) {
+		DRV_LOG(DEBUG, "Clock Queue error sync lost.");
+		rte_atomic32_inc(&sh->txpp.err_clock_queue);
+		sh->txpp.sync_lost = 1;
+		return;
+	}
+	ci = rte_be_to_cpu_16(to.cts.wqe_counter);
+	ts = rte_be_to_cpu_64(to.cts.timestamp);
+	ts = mlx5_txpp_convert_rx_ts(sh, ts);
+	wq->cq_ci += (ci - wq->sq_ci) & UINT16_MAX;
+	wq->sq_ci = ci;
+	mlx5_txpp_cache_timestamp(sh, ts, wq->cq_ci);
+}
+
+/* Gather statistics for timestamp from Clock Queue CQE. */
+static inline void
+mlx5_txpp_gather_timestamp(struct mlx5_dev_ctx_shared *sh)
+{
+	/* Check whether we have a valid timestamp. */
+	if (!sh->txpp.clock_queue.sq_ci && !sh->txpp.ts_n)
+		return;
+	MLX5_ASSERT(sh->txpp.ts_p < MLX5_TXPP_REARM_SQ_SIZE);
+	sh->txpp.tsa[sh->txpp.ts_p] = sh->txpp.ts;
+	if (++sh->txpp.ts_p >= MLX5_TXPP_REARM_SQ_SIZE)
+		sh->txpp.ts_p = 0;
+	if (sh->txpp.ts_n < MLX5_TXPP_REARM_SQ_SIZE)
+		++sh->txpp.ts_n;
+}
+
+/* Waits for the first completion on Clock Queue to init timestamp. */
+static inline void
+mlx5_txpp_init_timestamp(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+	uint32_t wait;
+
+	sh->txpp.ts_p = 0;
+	sh->txpp.ts_n = 0;
+	for (wait = 0; wait < MLX5_TXPP_WAIT_INIT_TS; wait++) {
+		struct timespec onems;
+
+		mlx5_txpp_update_timestamp(sh);
+		if (wq->sq_ci)
+			return;
+		/* Wait one millisecond and try again. */
+		onems.tv_sec = 0;
+		onems.tv_nsec = NS_PER_S / MS_PER_S;
+		nanosleep(&onems, 0);
+	}
+	DRV_LOG(ERR, "Unable to initialize timestamp.");
+	sh->txpp.sync_lost = 1;
+}
+
+/* Handles Rearm Queue completions in periodic service. */
+static __rte_always_inline void
+mlx5_txpp_handle_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	uint32_t cq_ci = wq->cq_ci;
+	bool error = false;
+	int ret;
+
+	do {
+		volatile struct mlx5_cqe *cqe;
+
+		cqe = &wq->cqes[cq_ci & (MLX5_TXPP_REARM_CQ_SIZE - 1)];
+		ret = check_cqe(cqe, MLX5_TXPP_REARM_CQ_SIZE, cq_ci);
+		switch (ret) {
+		case MLX5_CQE_STATUS_ERR:
+			error = true;
+			++cq_ci;
+			break;
+		case MLX5_CQE_STATUS_SW_OWN:
+			wq->sq_ci += 2;
+			++cq_ci;
+			break;
+		case MLX5_CQE_STATUS_HW_OWN:
+			break;
+		default:
+			MLX5_ASSERT(false);
+			break;
+		}
+	} while (ret != MLX5_CQE_STATUS_HW_OWN);
+	if (likely(cq_ci != wq->cq_ci)) {
+		/* Check whether we have missed interrupts. */
+		if (cq_ci - wq->cq_ci != 1) {
+			DRV_LOG(DEBUG, "Rearm Queue missed interrupt.");
+			rte_atomic32_inc(&sh->txpp.err_miss_int);
+			/* Check sync lost on wqe index. */
+			if (cq_ci - wq->cq_ci >=
+				(((1UL << MLX5_WQ_INDEX_WIDTH) /
+				  MLX5_TXPP_REARM) - 1))
+				error = 1;
+		}
+		/* Update doorbell record to notify hardware. */
+		rte_compiler_barrier();
+		*wq->cq_dbrec = rte_cpu_to_be_32(cq_ci);
+		rte_wmb();
+		wq->cq_ci = cq_ci;
+		/* Fire new requests to Rearm Queue. */
+		if (error) {
+			DRV_LOG(DEBUG, "Rearm Queue error sync lost.");
+			rte_atomic32_inc(&sh->txpp.err_rearm_queue);
+			sh->txpp.sync_lost = 1;
+		}
+	}
+}
+
+/* Handles Clock Queue completions in periodic service. */
+static __rte_always_inline void
+mlx5_txpp_handle_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	mlx5_txpp_update_timestamp(sh);
+	mlx5_txpp_gather_timestamp(sh);
+}
+
+/* Invoked periodically on Rearm Queue completions. */
+static void
+mlx5_txpp_interrupt_handler(void *cb_arg)
+{
+#ifndef HAVE_IBV_DEVX_EVENT
+	RTE_SET_USED(cb_arg);
+	return;
+#else
+	struct mlx5_dev_ctx_shared *sh = cb_arg;
+	union {
+		struct mlx5dv_devx_async_event_hdr event_resp;
+		uint8_t buf[sizeof(struct mlx5dv_devx_async_event_hdr) + 128];
+	} out;
+
+	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	/* Process events in the loop. Only rearm completions are expected. */
+	while (mlx5_glue->devx_get_event
+				(sh->txpp.echan,
+				 &out.event_resp,
+				 sizeof(out.buf)) >=
+				 (ssize_t)sizeof(out.event_resp.cookie)) {
+		mlx5_txpp_handle_rearm_queue(sh);
+		mlx5_txpp_handle_clock_queue(sh);
+		mlx5_txpp_cq_arm(sh);
+		mlx5_txpp_doorbell_rearm_queue
+					(sh, sh->txpp.rearm_queue.sq_ci - 1);
+	}
+#endif /* HAVE_IBV_DEVX_ASYNC */
+}
+
+static void
+mlx5_txpp_stop_service(struct mlx5_dev_ctx_shared *sh)
+{
+	if (!sh->txpp.intr_handle.fd)
+		return;
+	mlx5_intr_callback_unregister(&sh->txpp.intr_handle,
+				      mlx5_txpp_interrupt_handler, sh);
+	sh->txpp.intr_handle.fd = 0;
+}
+
+/* Attach interrupt handler and fires first request to Rearm Queue. */
+static int
+mlx5_txpp_start_service(struct mlx5_dev_ctx_shared *sh)
+{
+	uint16_t event_nums[1] = {0};
+	int flags;
+	int ret;
+
+	/* Attach interrupt handler to process Rearm Queue completions. */
+	flags = fcntl(sh->txpp.echan->fd, F_GETFL);
+	ret = fcntl(sh->txpp.echan->fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to change event channel FD.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	memset(&sh->txpp.intr_handle, 0, sizeof(sh->txpp.intr_handle));
+	sh->txpp.intr_handle.fd = sh->txpp.echan->fd;
+	sh->txpp.intr_handle.type = RTE_INTR_HANDLE_EXT;
+	if (rte_intr_callback_register(&sh->txpp.intr_handle,
+				       mlx5_txpp_interrupt_handler, sh)) {
+		sh->txpp.intr_handle.fd = 0;
+		DRV_LOG(ERR, "Failed to register CQE interrupt %d.", rte_errno);
+		return -rte_errno;
+	}
+	/* Subscribe CQ event to the event channel controlled by the driver. */
+	ret = mlx5_glue->devx_subscribe_devx_event(sh->txpp.echan,
+						   sh->txpp.rearm_queue.cq->obj,
+						   sizeof(event_nums),
+						   event_nums, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to subscribe CQE event.");
+		rte_errno = errno;
+		return -errno;
+	}
+	/* Enable interrupts in the CQ. */
+	mlx5_txpp_cq_arm(sh);
+	/* Fire the first request on Rearm Queue. */
+	mlx5_txpp_doorbell_rearm_queue(sh, sh->txpp.rearm_queue.sq_size - 1);
+	mlx5_txpp_init_timestamp(sh);
+	return 0;
+}
+
 /*
  * The routine initializes the packet pacing infrastructure:
  * - allocates PP context
@@ -595,8 +926,12 @@
 	ret = mlx5_txpp_create_rearm_queue(sh);
 	if (ret)
 		goto exit;
+	ret = mlx5_txpp_start_service(sh);
+	if (ret)
+		goto exit;
 exit:
 	if (ret) {
+		mlx5_txpp_stop_service(sh);
 		mlx5_txpp_destroy_rearm_queue(sh);
 		mlx5_txpp_destroy_clock_queue(sh);
 		mlx5_txpp_free_pp_index(sh);
@@ -618,6 +953,7 @@
 static void
 mlx5_txpp_destroy(struct mlx5_dev_ctx_shared *sh)
 {
+	mlx5_txpp_stop_service(sh);
 	mlx5_txpp_destroy_rearm_queue(sh);
 	mlx5_txpp_destroy_clock_queue(sh);
 	mlx5_txpp_free_pp_index(sh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 10/17] net/mlx5: prepare Tx queue structures to support timestamp
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (8 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 09/17] net/mlx5: introduce clock queue service routine Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 11/17] net/mlx5: convert timestamp to completion index Viacheslav Ovsiienko
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The fields to support send scheduling on dynamic timestamp
field are introduced and initialized on device start.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.h    |  4 ++++
 drivers/net/mlx5/mlx5_trigger.c |  2 ++
 drivers/net/mlx5/mlx5_txq.c     | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 8a8d2b5..974a847 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -313,6 +313,9 @@ struct mlx5_txq_data {
 	volatile uint32_t *cq_db; /* Completion queue doorbell. */
 	uint16_t port_id; /* Port ID of device. */
 	uint16_t idx; /* Queue index. */
+	uint64_t ts_mask; /* Timestamp flag dynamic mask. */
+	int32_t ts_offset; /* Timestamp field dynamic offset. */
+	struct mlx5_dev_ctx_shared *sh; /* Shared context. */
 	struct mlx5_txq_stats stats; /* TX queue counters. */
 #ifndef RTE_ARCH_64
 	rte_spinlock_t *uar_lock;
@@ -468,6 +471,7 @@ struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
 void txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
 uint64_t mlx5_get_tx_port_offloads(struct rte_eth_dev *dev);
+void mlx5_txq_dynf_timestamp_set(struct rte_eth_dev *dev);
 
 /* mlx5_rxtx.c */
 
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e324319..29aef54 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -331,6 +331,8 @@
 	}
 	/* Set a mask and offset of dynamic metadata flows into Rx queues*/
 	mlx5_flow_rxq_dynf_metadata_set(dev);
+	/* Set a mask and offset of scheduling on timestamp into Tx queues*/
+	mlx5_txq_dynf_timestamp_set(dev);
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index f9ed504..7f6a40a 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1785,3 +1785,35 @@ struct mlx5_txq_ctrl *
 	}
 	return ret;
 }
+
+/**
+ * Set the Tx queue dynamic timestamp (mask and offset)
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ */
+void
+mlx5_txq_dynf_timestamp_set(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_data *data;
+	int off, nbit;
+	unsigned int i;
+	uint64_t mask = 0;
+
+	nbit = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+	off = rte_mbuf_dynfield_lookup
+				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (nbit > 0 && off >= 0 && sh->txpp.refcnt)
+		mask = 1ULL << nbit;
+	for (i = 0; i != priv->txqs_n; ++i) {
+		data = (*priv->txqs)[i];
+		if (!data)
+			continue;
+		data->sh = sh;
+		data->ts_mask = mask;
+		data->ts_offset = off;
+	}
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 11/17] net/mlx5: convert timestamp to completion index
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (9 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 10/17] net/mlx5: prepare Tx queue structures to support timestamp Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 12/17] net/mlx5: prepare Tx datapath to support sheduling Viacheslav Ovsiienko
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The application provides timestamps in Tx mbuf as clocks,
the hardware performs scheduling on Clock Queue completion index
match. This patch introduces the timestamp-to-completion-index
inline routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_rxtx.h | 55 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txpp.c |  5 ++++
 3 files changed, 62 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e599bbb..a61394c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -585,6 +585,8 @@ struct mlx5_dev_txpp {
 	rte_atomic32_t err_miss_int; /* Missed service interrupt. */
 	rte_atomic32_t err_rearm_queue; /* Rearm Queue errors. */
 	rte_atomic32_t err_clock_queue; /* Clock Queue errors. */
+	rte_atomic32_t err_ts_past; /* Timestamp in the past. */
+	rte_atomic32_t err_ts_future; /* Timestamp in the distant future. */
 };
 
 /*
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 974a847..d082cd7 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -719,4 +719,59 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
 	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
 }
 
+/**
+ * Convert timestamp from mbuf format to linear counter
+ * of Clock Queue completions (24 bits)
+ *
+ * @param sh
+ *   Pointer to the device shared context to fetch Tx
+ *   packet pacing timestamp and parameters.
+ * @param ts
+ *   Timestamp from mbuf to convert.
+ * @return
+ *   positive or zero value - completion ID to wait
+ *   negative value - conversion error
+ */
+static __rte_always_inline int32_t
+mlx5_txpp_convert_tx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t mts)
+{
+	uint64_t ts, ci;
+	uint32_t tick;
+
+	do {
+		/*
+		 * Read atomically two uint64_t fields and compare lsb bits.
+		 * It there is no match - the timestamp was updated in
+		 * the service thread, data should be re-read.
+		 */
+		rte_compiler_barrier();
+		ci = rte_atomic64_read(&sh->txpp.ts.ci_ts);
+		ts = rte_atomic64_read(&sh->txpp.ts.ts);
+		rte_compiler_barrier();
+		if (!((ts ^ ci) << (64 - MLX5_CQ_INDEX_WIDTH)))
+			break;
+	} while (true);
+	/* Perform the skew correction, positive value to send earlier. */
+	mts -= sh->txpp.skew;
+	mts -= ts;
+	if (unlikely(mts >= UINT64_MAX / 2)) {
+		/* We have negative integer, mts is in the past. */
+		rte_atomic32_inc(&sh->txpp.err_ts_past);
+		return -1;
+	}
+	tick = sh->txpp.tick;
+	MLX5_ASSERT(tick);
+	/* Convert delta to completions, round up. */
+	mts = (mts + tick - 1) / tick;
+	if (unlikely(mts >= (1 << MLX5_CQ_INDEX_WIDTH) / 2 - 1)) {
+		/* We have mts is too distant future. */
+		rte_atomic32_inc(&sh->txpp.err_ts_future);
+		return -1;
+	}
+	mts <<= 64 - MLX5_CQ_INDEX_WIDTH;
+	ci += mts;
+	ci >>= 64 - MLX5_CQ_INDEX_WIDTH;
+	return ci;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 17d64ff..799b20a 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -858,6 +858,11 @@
 	int flags;
 	int ret;
 
+	rte_atomic32_set(&sh->txpp.err_miss_int, 0);
+	rte_atomic32_set(&sh->txpp.err_rearm_queue, 0);
+	rte_atomic32_set(&sh->txpp.err_clock_queue, 0);
+	rte_atomic32_set(&sh->txpp.err_ts_past, 0);
+	rte_atomic32_set(&sh->txpp.err_ts_future, 0);
 	/* Attach interrupt handler to process Rearm Queue completions. */
 	flags = fcntl(sh->txpp.echan->fd, F_GETFL);
 	ret = fcntl(sh->txpp.echan->fd, F_SETFL, flags | O_NONBLOCK);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 12/17] net/mlx5: prepare Tx datapath to support sheduling
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (10 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 11/17] net/mlx5: convert timestamp to completion index Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 13/17] net/mlx5: add scheduling support to send routine template Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The new static control flag is introduced to control
routine generating from template, enabling the scheduling
on timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.c | 97 +++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_txq.c  |  2 +
 2 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 227289e..76fe12b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -66,6 +66,7 @@ enum mlx5_txcmp_code {
 #define MLX5_TXOFF_CONFIG_METADATA (1u << 6) /* Flow metadata. */
 #define MLX5_TXOFF_CONFIG_EMPW (1u << 8) /* Enhanced MPW supported.*/
 #define MLX5_TXOFF_CONFIG_MPW (1u << 9) /* Legacy MPW supported.*/
+#define MLX5_TXOFF_CONFIG_TXPP (1u << 10) /* Scheduling on timestamp.*/
 
 /* The most common offloads groups. */
 #define MLX5_TXOFF_CONFIG_NONE 0
@@ -5268,6 +5269,45 @@ enum mlx5_txcmp_code {
 		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_VLAN |
 		MLX5_TXOFF_CONFIG_METADATA)
 
+/* Generate routines with timestamp scheduling. */
+MLX5_TXOFF_DECL(full_ts_nompw,
+		MLX5_TXOFF_CONFIG_FULL | MLX5_TXOFF_CONFIG_TXPP)
+
+MLX5_TXOFF_DECL(full_ts_nompwi,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_SWP | MLX5_TXOFF_CONFIG_CSUM |
+		MLX5_TXOFF_CONFIG_VLAN | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP)
+
+MLX5_TXOFF_DECL(full_ts,
+		MLX5_TXOFF_CONFIG_FULL | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(full_ts_noi,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_SWP | MLX5_TXOFF_CONFIG_CSUM |
+		MLX5_TXOFF_CONFIG_VLAN | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(none_ts,
+		MLX5_TXOFF_CONFIG_NONE | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(mdi_ts,
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(mti_ts,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(mtiv_ts,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_VLAN |
+		MLX5_TXOFF_CONFIG_METADATA | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
 /*
  * Generate routines with Legacy Multi-Packet Write support.
  * This mode is supported by ConnectX-4 Lx only and imposes
@@ -5372,6 +5412,44 @@ enum mlx5_txcmp_code {
 		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_VLAN |
 		MLX5_TXOFF_CONFIG_METADATA | MLX5_TXOFF_CONFIG_EMPW)
 
+MLX5_TXOFF_INFO(full_ts_nompw,
+		MLX5_TXOFF_CONFIG_FULL | MLX5_TXOFF_CONFIG_TXPP)
+
+MLX5_TXOFF_INFO(full_ts_nompwi,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_SWP | MLX5_TXOFF_CONFIG_CSUM |
+		MLX5_TXOFF_CONFIG_VLAN | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP)
+
+MLX5_TXOFF_INFO(full_ts,
+		MLX5_TXOFF_CONFIG_FULL | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(full_ts_noi,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_SWP | MLX5_TXOFF_CONFIG_CSUM |
+		MLX5_TXOFF_CONFIG_VLAN | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(none_ts,
+		MLX5_TXOFF_CONFIG_NONE | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(mdi_ts,
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(mti_ts,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(mtiv_ts,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_VLAN |
+		MLX5_TXOFF_CONFIG_METADATA | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
 MLX5_TXOFF_INFO(full,
 		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
 		MLX5_TXOFF_CONFIG_SWP |	MLX5_TXOFF_CONFIG_CSUM |
@@ -5518,6 +5596,14 @@ enum mlx5_txcmp_code {
 		/* We should support VLAN insertion. */
 		olx |= MLX5_TXOFF_CONFIG_VLAN;
 	}
+	if (tx_offloads & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP &&
+	    rte_mbuf_dynflag_lookup
+			(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL) > 0 &&
+	    rte_mbuf_dynfield_lookup
+			(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL) > 0) {
+		/* Offload configured, dynamic entities registered. */
+		olx |= MLX5_TXOFF_CONFIG_TXPP;
+	}
 	if (priv->txqs_n && (*priv->txqs)[0]) {
 		struct mlx5_txq_data *txd = (*priv->txqs)[0];
 
@@ -5587,6 +5673,9 @@ enum mlx5_txcmp_code {
 		if ((olx ^ tmp) & MLX5_TXOFF_CONFIG_INLINE)
 			/* Do not enable inlining if not configured. */
 			continue;
+		if ((olx ^ tmp) & MLX5_TXOFF_CONFIG_TXPP)
+			/* Do not enable scheduling if not configured. */
+			continue;
 		/*
 		 * Some routine meets the requirements.
 		 * Check whether it has minimal amount
@@ -5631,6 +5720,8 @@ enum mlx5_txcmp_code {
 		DRV_LOG(DEBUG, "\tVLANI (VLAN insertion)");
 	if (txoff_func[m].olx & MLX5_TXOFF_CONFIG_METADATA)
 		DRV_LOG(DEBUG, "\tMETAD (tx Flow metadata)");
+	if (txoff_func[m].olx & MLX5_TXOFF_CONFIG_TXPP)
+		DRV_LOG(DEBUG, "\tMETAD (tx Scheduling)");
 	if (txoff_func[m].olx & MLX5_TXOFF_CONFIG_EMPW) {
 		if (txoff_func[m].olx & MLX5_TXOFF_CONFIG_MPW)
 			DRV_LOG(DEBUG, "\tMPW   (Legacy MPW)");
@@ -5705,7 +5796,7 @@ enum mlx5_txcmp_code {
 		if (pkt_burst == txoff_func[i].func) {
 			olx = txoff_func[i].olx;
 			snprintf(mode->info, sizeof(mode->info),
-				 "%s%s%s%s%s%s%s%s",
+				 "%s%s%s%s%s%s%s%s%s",
 				 (olx & MLX5_TXOFF_CONFIG_EMPW) ?
 				 ((olx & MLX5_TXOFF_CONFIG_MPW) ?
 				 "Legacy MPW" : "Enhanced MPW") : "No MPW",
@@ -5722,7 +5813,9 @@ enum mlx5_txcmp_code {
 				 (olx & MLX5_TXOFF_CONFIG_VLAN) ?
 				 " + VLAN" : "",
 				 (olx & MLX5_TXOFF_CONFIG_METADATA) ?
-				 " + METADATA" : "");
+				 " + METADATA" : "",
+				 (olx & MLX5_TXOFF_CONFIG_TXPP) ?
+				 " + TXPP" : "");
 			return 0;
 		}
 	}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 7f6a40a..4ab6ac1 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -119,6 +119,8 @@
 			     DEV_TX_OFFLOAD_TCP_CKSUM);
 	if (config->tso)
 		offloads |= DEV_TX_OFFLOAD_TCP_TSO;
+	if (config->tx_pp)
+		offloads |= DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP;
 	if (config->swp) {
 		if (config->hw_csum)
 			offloads |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 13/17] net/mlx5: add scheduling support to send routine template
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (11 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 12/17] net/mlx5: prepare Tx datapath to support sheduling Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 14/17] net/mlx5: add read device clock support Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch adds send scheduling on timestamps into tx_burst
routine template. The feature is controlled by static configuration
flag, the actual routines supporting the new feature are generated
over this updated template.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.c | 162 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 161 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 76fe12b..e511142 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -2404,6 +2404,37 @@ enum mlx5_txcmp_code {
 }
 
 /**
+ * Build the Synchronize Queue Segment with specified completion index.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param wqe
+ *   Pointer to WQE to fill with built Control Segment.
+ * @param wci
+ *   Completion index in Clock Queue to wait.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ */
+static __rte_always_inline void
+mlx5_tx_wseg_init(struct mlx5_txq_data *restrict txq,
+		  struct mlx5_txq_local *restrict loc __rte_unused,
+		  struct mlx5_wqe *restrict wqe,
+		  unsigned int wci,
+		  unsigned int olx __rte_unused)
+{
+	struct mlx5_wqe_qseg *qs;
+
+	qs = RTE_PTR_ADD(wqe, MLX5_WSEG_SIZE);
+	qs->max_index = rte_cpu_to_be_32(wci);
+	qs->qpn_cqn = rte_cpu_to_be_32(txq->sh->txpp.clock_queue.cq->id);
+	qs->reserved0 = RTE_BE32(0);
+	qs->reserved1 = RTE_BE32(0);
+}
+
+/**
  * Build the Ethernet Segment without inlined data.
  * Supports Software Parser, Checksums and VLAN
  * insertion Tx offload features.
@@ -3241,6 +3272,59 @@ enum mlx5_txcmp_code {
 }
 
 /**
+ * The routine checks timestamp flag in the current packet,
+ * and push WAIT WQE into the queue if scheduling is required.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ *
+ * @return
+ *   MLX5_TXCMP_CODE_EXIT - sending is done or impossible.
+ *   MLX5_TXCMP_CODE_SINGLE - continue processing with the packet.
+ *   MLX5_TXCMP_CODE_MULTI - the WAIT inserted, continue processing.
+ * Local context variables partially updated.
+ */
+static __rte_always_inline enum mlx5_txcmp_code
+mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
+		      struct mlx5_txq_local *restrict loc,
+		      unsigned int olx)
+{
+	if (MLX5_TXOFF_CONFIG(TXPP) &&
+	    loc->mbuf->ol_flags & txq->ts_mask) {
+		struct mlx5_wqe *wqe;
+		uint64_t ts;
+		int32_t wci;
+
+		/*
+		 * Estimate the required space quickly and roughly.
+		 * We would like to ensure the packet can be pushed
+		 * to the queue and we won't get the orphan WAIT WQE.
+		 */
+		if (loc->wqe_free <= MLX5_WQE_SIZE_MAX / MLX5_WQE_SIZE ||
+		    loc->elts_free < NB_SEGS(loc->mbuf))
+			return MLX5_TXCMP_CODE_EXIT;
+		/* Convert the timestamp into completion to wait. */
+		ts = *RTE_MBUF_DYNFIELD(loc->mbuf, txq->ts_offset, uint64_t *);
+		wci = mlx5_txpp_convert_tx_ts(txq->sh, ts);
+		if (unlikely(wci < 0))
+			return MLX5_TXCMP_CODE_SINGLE;
+		/* Build the WAIT WQE with specified completion. */
+		wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
+		mlx5_tx_cseg_init(txq, loc, wqe, 2, MLX5_OPCODE_WAIT, olx);
+		mlx5_tx_wseg_init(txq, loc, wqe, wci, olx);
+		++txq->wqe_ci;
+		--loc->wqe_free;
+		return MLX5_TXCMP_CODE_MULTI;
+	}
+	return MLX5_TXCMP_CODE_SINGLE;
+}
+
+/**
  * Tx one packet function for multi-segment TSO. Supports all
  * types of Tx offloads, uses MLX5_OPCODE_TSO to build WQEs,
  * sends one packet per WQE.
@@ -3269,6 +3353,16 @@ enum mlx5_txcmp_code {
 	struct mlx5_wqe *__rte_restrict wqe;
 	unsigned int ds, dlen, inlen, ntcp, vlan = 0;
 
+	if (MLX5_TXOFF_CONFIG(TXPP)) {
+		enum mlx5_txcmp_code wret;
+
+		/* Generate WAIT for scheduling if requested. */
+		wret = mlx5_tx_schedule_send(txq, loc, olx);
+		if (wret == MLX5_TXCMP_CODE_EXIT)
+			return MLX5_TXCMP_CODE_EXIT;
+		if (wret == MLX5_TXCMP_CODE_ERROR)
+			return MLX5_TXCMP_CODE_ERROR;
+	}
 	/*
 	 * Calculate data length to be inlined to estimate
 	 * the required space in WQE ring buffer.
@@ -3360,6 +3454,16 @@ enum mlx5_txcmp_code {
 	unsigned int ds, nseg;
 
 	MLX5_ASSERT(NB_SEGS(loc->mbuf) > 1);
+	if (MLX5_TXOFF_CONFIG(TXPP)) {
+		enum mlx5_txcmp_code wret;
+
+		/* Generate WAIT for scheduling if requested. */
+		wret = mlx5_tx_schedule_send(txq, loc, olx);
+		if (wret == MLX5_TXCMP_CODE_EXIT)
+			return MLX5_TXCMP_CODE_EXIT;
+		if (wret == MLX5_TXCMP_CODE_ERROR)
+			return MLX5_TXCMP_CODE_ERROR;
+	}
 	/*
 	 * No inline at all, it means the CPU cycles saving
 	 * is prioritized at configuration, we should not
@@ -3468,6 +3572,16 @@ enum mlx5_txcmp_code {
 
 	MLX5_ASSERT(MLX5_TXOFF_CONFIG(INLINE));
 	MLX5_ASSERT(NB_SEGS(loc->mbuf) > 1);
+	if (MLX5_TXOFF_CONFIG(TXPP)) {
+		enum mlx5_txcmp_code wret;
+
+		/* Generate WAIT for scheduling if requested. */
+		wret = mlx5_tx_schedule_send(txq, loc, olx);
+		if (wret == MLX5_TXCMP_CODE_EXIT)
+			return MLX5_TXCMP_CODE_EXIT;
+		if (wret == MLX5_TXCMP_CODE_ERROR)
+			return MLX5_TXCMP_CODE_ERROR;
+	}
 	/*
 	 * First calculate data length to be inlined
 	 * to estimate the required space for WQE.
@@ -3730,6 +3844,16 @@ enum mlx5_txcmp_code {
 		uint8_t *dptr;
 
 		MLX5_ASSERT(NB_SEGS(loc->mbuf) == 1);
+		if (MLX5_TXOFF_CONFIG(TXPP)) {
+			enum mlx5_txcmp_code wret;
+
+			/* Generate WAIT for scheduling if requested. */
+			wret = mlx5_tx_schedule_send(txq, loc, olx);
+			if (wret == MLX5_TXCMP_CODE_EXIT)
+				return MLX5_TXCMP_CODE_EXIT;
+			if (wret == MLX5_TXCMP_CODE_ERROR)
+				return MLX5_TXCMP_CODE_ERROR;
+		}
 		dlen = rte_pktmbuf_data_len(loc->mbuf);
 		if (MLX5_TXOFF_CONFIG(VLAN) &&
 		    loc->mbuf->ol_flags & PKT_TX_VLAN_PKT) {
@@ -3892,7 +4016,7 @@ enum mlx5_txcmp_code {
  *  false - no match, eMPW should be restarted.
  */
 static __rte_always_inline bool
-mlx5_tx_match_empw(struct mlx5_txq_data *__rte_restrict txq __rte_unused,
+mlx5_tx_match_empw(struct mlx5_txq_data *__rte_restrict txq,
 		   struct mlx5_wqe_eseg *__rte_restrict es,
 		   struct mlx5_txq_local *__rte_restrict loc,
 		   uint32_t dlen,
@@ -3921,6 +4045,10 @@ enum mlx5_txcmp_code {
 	/* There must be no VLAN packets in eMPW loop. */
 	if (MLX5_TXOFF_CONFIG(VLAN))
 		MLX5_ASSERT(!(loc->mbuf->ol_flags & PKT_TX_VLAN_PKT));
+	/* Check if the scheduling is requested. */
+	if (MLX5_TXOFF_CONFIG(TXPP) &&
+	    loc->mbuf->ol_flags & txq->ts_mask)
+		return false;
 	return true;
 }
 
@@ -4106,6 +4234,16 @@ enum mlx5_txcmp_code {
 
 next_empw:
 		MLX5_ASSERT(NB_SEGS(loc->mbuf) == 1);
+		if (MLX5_TXOFF_CONFIG(TXPP)) {
+			enum mlx5_txcmp_code wret;
+
+			/* Generate WAIT for scheduling if requested. */
+			wret = mlx5_tx_schedule_send(txq, loc, olx);
+			if (wret == MLX5_TXCMP_CODE_EXIT)
+				return MLX5_TXCMP_CODE_EXIT;
+			if (wret == MLX5_TXCMP_CODE_ERROR)
+				return MLX5_TXCMP_CODE_ERROR;
+		}
 		part = RTE_MIN(pkts_n, MLX5_TXOFF_CONFIG(MPW) ?
 				       MLX5_MPW_MAX_PACKETS :
 				       MLX5_EMPW_MAX_PACKETS);
@@ -4201,6 +4339,7 @@ enum mlx5_txcmp_code {
 			 * - metadata value
 			 * - software parser settings
 			 * - packets length (legacy MPW only)
+			 * - scheduling is not required
 			 */
 			if (!mlx5_tx_match_empw(txq, eseg, loc, dlen, olx)) {
 				MLX5_ASSERT(loop);
@@ -4271,6 +4410,16 @@ enum mlx5_txcmp_code {
 		unsigned int slen = 0;
 
 		MLX5_ASSERT(NB_SEGS(loc->mbuf) == 1);
+		if (MLX5_TXOFF_CONFIG(TXPP)) {
+			enum mlx5_txcmp_code wret;
+
+			/* Generate WAIT for scheduling if requested. */
+			wret = mlx5_tx_schedule_send(txq, loc, olx);
+			if (wret == MLX5_TXCMP_CODE_EXIT)
+				return MLX5_TXCMP_CODE_EXIT;
+			if (wret == MLX5_TXCMP_CODE_ERROR)
+				return MLX5_TXCMP_CODE_ERROR;
+		}
 		/*
 		 * Limits the amount of packets in one WQE
 		 * to improve CQE latency generation.
@@ -4496,6 +4645,7 @@ enum mlx5_txcmp_code {
 			 * - metadata value
 			 * - software parser settings
 			 * - packets length (legacy MPW only)
+			 * - scheduling is not required
 			 */
 			if (!mlx5_tx_match_empw(txq, &wqem->eseg,
 						loc, dlen, olx))
@@ -4545,6 +4695,16 @@ enum mlx5_txcmp_code {
 		enum mlx5_txcmp_code ret;
 
 		MLX5_ASSERT(NB_SEGS(loc->mbuf) == 1);
+		if (MLX5_TXOFF_CONFIG(TXPP)) {
+			enum mlx5_txcmp_code wret;
+
+			/* Generate WAIT for scheduling if requested. */
+			wret = mlx5_tx_schedule_send(txq, loc, olx);
+			if (wret == MLX5_TXCMP_CODE_EXIT)
+				return MLX5_TXCMP_CODE_EXIT;
+			if (wret == MLX5_TXCMP_CODE_ERROR)
+				return MLX5_TXCMP_CODE_ERROR;
+		}
 		if (MLX5_TXOFF_CONFIG(INLINE)) {
 			unsigned int inlen, vlan = 0;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 14/17] net/mlx5: add read device clock support
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (12 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 13/17] net/mlx5: add scheduling support to send routine template Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 15/17] net/mlx5: provide the send scheduling error statistics Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

If send schedule feature is engaged there is the Clock Queue
created, that reports reliable the current device clock counter
value. The device clock counter can be read directly from the
Clock Queue CQE.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  4 ++-
 drivers/net/mlx5/mlx5.h          |  1 +
 drivers/net/mlx5/mlx5_txpp.c     | 55 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index ea36309..e7241d8 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2348,7 +2348,7 @@
 	.xstats_get_names = mlx5_xstats_get_names,
 	.fw_version_get = mlx5_fw_version_get,
 	.dev_infos_get = mlx5_dev_infos_get,
-	.read_clock = mlx5_read_clock,
+	.read_clock = mlx5_txpp_read_clock,
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
@@ -2397,6 +2397,7 @@
 	.xstats_get_names = mlx5_xstats_get_names,
 	.fw_version_get = mlx5_fw_version_get,
 	.dev_infos_get = mlx5_dev_infos_get,
+	.read_clock = mlx5_txpp_read_clock,
 	.rx_descriptor_status = mlx5_rx_descriptor_status,
 	.tx_descriptor_status = mlx5_tx_descriptor_status,
 	.rxq_info_get = mlx5_rxq_info_get,
@@ -2427,6 +2428,7 @@
 	.xstats_get_names = mlx5_xstats_get_names,
 	.fw_version_get = mlx5_fw_version_get,
 	.dev_infos_get = mlx5_dev_infos_get,
+	.read_clock = mlx5_txpp_read_clock,
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index a61394c..f1246b8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1010,5 +1010,6 @@ void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
 
 int mlx5_txpp_start(struct rte_eth_dev *dev);
 void mlx5_txpp_stop(struct rte_eth_dev *dev);
+int mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 799b20a..d0916de 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -1056,3 +1056,58 @@
 	MLX5_ASSERT(!ret);
 	RTE_SET_USED(ret);
 }
+
+/*
+ * Read the current clock counter of an Ethernet device
+ *
+ * This returns the current raw clock value of an Ethernet device. It is
+ * a raw amount of ticks, with no given time reference.
+ * The value returned here is from the same clock than the one
+ * filling timestamp field of Rx/Tx packets when using hardware timestamp
+ * offload. Therefore it can be used to compute a precise conversion of
+ * the device clock to the real time.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param clock
+ *   Pointer to the uint64_t that holds the raw clock value.
+ *
+ * @return
+ *   - 0: Success.
+ *   - -ENOTSUP: The function is not supported in this mode. Requires
+ *     packet pacing module configured and started (tx_pp devarg)
+ */
+int
+mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	int ret;
+
+	if (sh->txpp.refcnt) {
+		struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+		struct mlx5_cqe *cqe = (struct mlx5_cqe *)(uintptr_t)wq->cqes;
+		union {
+			rte_int128_t u128;
+			struct mlx5_cqe_ts cts;
+		} to;
+		uint64_t ts;
+
+		mlx5_atomic_read_cqe((rte_int128_t *)&cqe->timestamp, &to.u128);
+		if (to.cts.op_own >> 4) {
+			DRV_LOG(DEBUG, "Clock Queue error sync lost.");
+			rte_atomic32_inc(&sh->txpp.err_clock_queue);
+			sh->txpp.sync_lost = 1;
+			return -EIO;
+		}
+		ts = rte_be_to_cpu_64(to.cts.timestamp);
+		ts = mlx5_txpp_convert_rx_ts(sh, ts);
+		*timestamp = ts;
+		return 0;
+	}
+	/* Not supported in isolated mode - kernel does not see the CQEs. */
+	if (priv->isolated || rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return -ENOTSUP;
+	ret = mlx5_read_clock(dev, timestamp);
+	return ret;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 15/17] net/mlx5: provide the send scheduling error statistics
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (13 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 14/17] net/mlx5: add read device clock support Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 16/17] common/mlx5: add register access DevX routine Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 17/17] net/mlx5: convert Rx timestamps in realtime format Viacheslav Ovsiienko
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The mlx5 PMD exposes the following new introduced
extended statistics counter to report the errors
of packet send scheduling on timestamps:

  - txpp_err_miss_int - rearm queue interrupt was not handled
    was not handled in time and service routine might miss
    the completions

  - txpp_err_rearm_queue - reports errors in rearm queue
  - txpp_err_clock_queue - reports errors in clock queue

  - txpp_err_ts_past - timestamps in the packet being sent
    were found in the past, timestamps were ignored

  - txpp_err_ts_future - timestamps in the packet being sent
    were found in the too distant future (beyond HW/clock queue
    capabilities to schedule, typically it is about 16M of
    tx_pp devarg periods)

  - txpp_jitter - estimated jitter in device clocks between
    8K completions of Clock Queue.

  - txpp_wander - estimated wander in device clocks between
    16M completions of Clock Queue.

  - txpp_sync_lost - error flag, the Clock Queue completions
    synchronization is lost, accurate packet scheduling can
    not be handled, timestamps are being ignored, the restart
    of all ports using scheduling must be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h       |   7 ++
 drivers/net/mlx5/mlx5_stats.c |   7 +-
 drivers/net/mlx5/mlx5_txpp.c  | 220 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index f1246b8..97a14fb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1011,5 +1011,12 @@ void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
 int mlx5_txpp_start(struct rte_eth_dev *dev);
 void mlx5_txpp_stop(struct rte_eth_dev *dev);
 int mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp);
+int mlx5_txpp_xstats_get(struct rte_eth_dev *dev,
+			 struct rte_eth_xstat *stats,
+			 unsigned int n, unsigned int n_used);
+int mlx5_txpp_xstats_reset(struct rte_eth_dev *dev);
+int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev,
+			       struct rte_eth_xstat_name *xstats_names,
+			       unsigned int n, unsigned int n_used);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index a9b33ee..e30542e 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -75,6 +75,7 @@
 			}
 		}
 	}
+	mlx5_stats_n = mlx5_txpp_xstats_get(dev, stats, n, mlx5_stats_n);
 	return mlx5_stats_n;
 }
 
@@ -237,7 +238,7 @@
 		xstats_ctrl->base[i] = counters[i];
 		xstats_ctrl->hw_stats[i] = 0;
 	}
-
+	mlx5_txpp_xstats_reset(dev);
 	return 0;
 }
 
@@ -255,7 +256,7 @@
  *   Number of xstats names.
  */
 int
-mlx5_xstats_get_names(struct rte_eth_dev *dev __rte_unused,
+mlx5_xstats_get_names(struct rte_eth_dev *dev,
 		      struct rte_eth_xstat_name *xstats_names, unsigned int n)
 {
 	unsigned int i;
@@ -271,5 +272,7 @@
 			xstats_names[i].name[RTE_ETH_XSTATS_NAME_SIZE - 1] = 0;
 		}
 	}
+	mlx5_xstats_n = mlx5_txpp_xstats_get_names(dev, xstats_names,
+						   n, mlx5_xstats_n);
 	return mlx5_xstats_n;
 }
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index d0916de..5000e47 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -15,6 +15,17 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_common_os.h"
 
+static const char * const mlx5_txpp_stat_names[] = {
+	"txpp_err_miss_int", /* Missed service interrupt. */
+	"txpp_err_rearm_queue",	/* Rearm Queue errors. */
+	"txpp_err_clock_queue", /* Clock Queue errors. */
+	"txpp_err_ts_past", /* Timestamp in the past. */
+	"txpp_err_ts_future", /* Timestamp in the distant future. */
+	"txpp_jitter", /* Timestamp jitter (one Clock Queue completion). */
+	"txpp_wander", /* Timestamp jitter (half of Clock Queue completions). */
+	"txpp_sync_lost", /* Scheduling synchronization lost. */
+};
+
 /* Destroy Event Queue Notification Channel. */
 static void
 mlx5_txpp_destroy_eqn(struct mlx5_dev_ctx_shared *sh)
@@ -1111,3 +1122,212 @@
 	ret = mlx5_read_clock(dev, timestamp);
 	return ret;
 }
+
+/**
+ * DPDK callback to clear device extended statistics.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success and stats is reset, negative errno value otherwise and
+ *   rte_errno is set.
+ */
+int mlx5_txpp_xstats_reset(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+
+	rte_atomic32_set(&sh->txpp.err_miss_int, 0);
+	rte_atomic32_set(&sh->txpp.err_rearm_queue, 0);
+	rte_atomic32_set(&sh->txpp.err_clock_queue, 0);
+	rte_atomic32_set(&sh->txpp.err_ts_past, 0);
+	rte_atomic32_set(&sh->txpp.err_ts_future, 0);
+	return 0;
+}
+
+/**
+ * Routine to retrieve names of extended device statistics
+ * for packet send scheduling. It appends the specific stats names
+ * after the parts filled by preceding modules (eth stats, etc.)
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] xstats_names
+ *   Buffer to insert names into.
+ * @param n
+ *   Number of names.
+ * @param n_used
+ *   Number of names filled by preceding statistics modules.
+ *
+ * @return
+ *   Number of xstats names.
+ */
+int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev __rte_unused,
+			       struct rte_eth_xstat_name *xstats_names,
+			       unsigned int n, unsigned int n_used)
+{
+	unsigned int n_txpp = RTE_DIM(mlx5_txpp_stat_names);
+	unsigned int i;
+
+	if (n >= n_used + n_txpp && xstats_names) {
+		for (i = 0; i < n_txpp; ++i) {
+			strncpy(xstats_names[i + n_used].name,
+				mlx5_txpp_stat_names[i],
+				RTE_ETH_XSTATS_NAME_SIZE);
+			xstats_names[i + n_used].name
+					[RTE_ETH_XSTATS_NAME_SIZE - 1] = 0;
+		}
+	}
+	return n_used + n_txpp;
+}
+
+static inline void
+mlx5_txpp_read_tsa(struct mlx5_dev_txpp *txpp,
+		   struct mlx5_txpp_ts *tsa, uint16_t idx)
+{
+	do {
+		int64_t ts, ci;
+
+		ts = rte_atomic64_read(&txpp->tsa[idx].ts);
+		ci = rte_atomic64_read(&txpp->tsa[idx].ci_ts);
+		rte_compiler_barrier();
+		if ((ci ^ ts) << MLX5_CQ_INDEX_WIDTH != 0)
+			continue;
+		if (rte_atomic64_read(&txpp->tsa[idx].ts) != ts)
+			continue;
+		if (rte_atomic64_read(&txpp->tsa[idx].ci_ts) != ci)
+			continue;
+		rte_atomic64_set(&tsa->ts, ts);
+		rte_atomic64_set(&tsa->ci_ts, ci);
+		return;
+	} while (true);
+}
+
+/*
+ * Jitter reflects the clock change between
+ * neighbours Clock Queue completions.
+ */
+static uint64_t
+mlx5_txpp_xstats_jitter(struct mlx5_dev_txpp *txpp)
+{
+	struct mlx5_txpp_ts tsa0, tsa1;
+	int64_t dts, dci;
+	uint16_t ts_p;
+
+	if (txpp->ts_n < 2) {
+		/* No gathered enough reports yet. */
+		return 0;
+	}
+	do {
+		int ts_0, ts_1;
+
+		ts_p = txpp->ts_p;
+		rte_compiler_barrier();
+		ts_0 = ts_p - 2;
+		if (ts_0 < 0)
+			ts_0 += MLX5_TXPP_REARM_SQ_SIZE;
+		ts_1 = ts_p - 1;
+		if (ts_1 < 0)
+			ts_1 += MLX5_TXPP_REARM_SQ_SIZE;
+		mlx5_txpp_read_tsa(txpp, &tsa0, ts_0);
+		mlx5_txpp_read_tsa(txpp, &tsa1, ts_1);
+		rte_compiler_barrier();
+	} while (ts_p != txpp->ts_p);
+	/* We have two neighbor reports, calculate the jitter. */
+	dts = rte_atomic64_read(&tsa1.ts) - rte_atomic64_read(&tsa0.ts);
+	dci = (rte_atomic64_read(&tsa1.ci_ts) >> (64 - MLX5_CQ_INDEX_WIDTH)) -
+	      (rte_atomic64_read(&tsa0.ci_ts) >> (64 - MLX5_CQ_INDEX_WIDTH));
+	if (dci < 0)
+		dci += 1 << MLX5_CQ_INDEX_WIDTH;
+	dci *= txpp->tick;
+	return (dts > dci) ? dts - dci : dci - dts;
+}
+
+/*
+ * Wander reflects the long-term clock change
+ * over the entire length of all Clock Queue completions.
+ */
+static uint64_t
+mlx5_txpp_xstats_wander(struct mlx5_dev_txpp *txpp)
+{
+	struct mlx5_txpp_ts tsa0, tsa1;
+	int64_t dts, dci;
+	uint16_t ts_p;
+
+	if (txpp->ts_n < MLX5_TXPP_REARM_SQ_SIZE) {
+		/* No gathered enough reports yet. */
+		return 0;
+	}
+	do {
+		int ts_0, ts_1;
+
+		ts_p = txpp->ts_p;
+		rte_compiler_barrier();
+		ts_0 = ts_p - MLX5_TXPP_REARM_SQ_SIZE / 2 - 1;
+		if (ts_0 < 0)
+			ts_0 += MLX5_TXPP_REARM_SQ_SIZE;
+		ts_1 = ts_p - 1;
+		if (ts_1 < 0)
+			ts_1 += MLX5_TXPP_REARM_SQ_SIZE;
+		mlx5_txpp_read_tsa(txpp, &tsa0, ts_0);
+		mlx5_txpp_read_tsa(txpp, &tsa1, ts_1);
+		rte_compiler_barrier();
+	} while (ts_p != txpp->ts_p);
+	/* We have two neighbor reports, calculate the jitter. */
+	dts = rte_atomic64_read(&tsa1.ts) - rte_atomic64_read(&tsa0.ts);
+	dci = (rte_atomic64_read(&tsa1.ci_ts) >> (64 - MLX5_CQ_INDEX_WIDTH)) -
+	      (rte_atomic64_read(&tsa0.ci_ts) >> (64 - MLX5_CQ_INDEX_WIDTH));
+	dci += 1 << MLX5_CQ_INDEX_WIDTH;
+	dci *= txpp->tick;
+	return (dts > dci) ? dts - dci : dci - dts;
+}
+
+/**
+ * Routine to retrieve extended device statistics
+ * for packet send scheduling. It appends the specific statistics
+ * after the parts filled by preceding modules (eth stats, etc.)
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] stats
+ *   Pointer to rte extended stats table.
+ * @param n
+ *   The size of the stats table.
+ * @param n_used
+ *   Number of stats filled by preceding statistics modules.
+ *
+ * @return
+ *   Number of extended stats on success and stats is filled,
+ *   negative on error and rte_errno is set.
+ */
+int
+mlx5_txpp_xstats_get(struct rte_eth_dev *dev,
+		     struct rte_eth_xstat *stats,
+		     unsigned int n, unsigned int n_used)
+{
+	unsigned int n_txpp = RTE_DIM(mlx5_txpp_stat_names);
+
+	if (n >= n_used + n_txpp && stats) {
+		struct mlx5_priv *priv = dev->data->dev_private;
+		struct mlx5_dev_ctx_shared *sh = priv->sh;
+		unsigned int i;
+
+		for (i = 0; i < n_txpp; ++i)
+			stats[n_used + i].id = n_used + i;
+		stats[n_used + 0].value =
+				rte_atomic32_read(&sh->txpp.err_miss_int);
+		stats[n_used + 1].value =
+				rte_atomic32_read(&sh->txpp.err_rearm_queue);
+		stats[n_used + 2].value =
+				rte_atomic32_read(&sh->txpp.err_clock_queue);
+		stats[n_used + 3].value =
+				rte_atomic32_read(&sh->txpp.err_ts_past);
+		stats[n_used + 4].value =
+				rte_atomic32_read(&sh->txpp.err_ts_future);
+		stats[n_used + 5].value = mlx5_txpp_xstats_jitter(&sh->txpp);
+		stats[n_used + 6].value = mlx5_txpp_xstats_wander(&sh->txpp);
+		stats[n_used + 7].value = sh->txpp.sync_lost;
+	}
+	return n_used + n_txpp;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 16/17] common/mlx5: add register access DevX routine
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (14 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 15/17] net/mlx5: provide the send scheduling error statistics Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 17/17] net/mlx5: convert Rx timestamps in realtime format Viacheslav Ovsiienko
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The DevX routine to read/write NIC registers via DevX API is added.
This is the preparation step to check timestamp modes and units
and gather the extended statistics.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 60 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  3 ++
 drivers/common/mlx5/mlx5_prm.h                  | 52 +++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 116 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 093636c..13cd76a 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -12,6 +12,66 @@
 
 
 /**
+ * Perform read access to the registers. Reads data from register
+ * and writes ones to the specified buffer.
+ *
+ * @param[in] ctx
+ *   Context returned from mlx5 open_device() glue function.
+ * @param[in] reg_id
+ *   Register identifier according to the PRM.
+ * @param[in] arg
+ *   Register access auxiliary parameter according to the PRM.
+ * @param[out] data
+ *   Pointer to the buffer to store read data.
+ * @param[in] dw_cnt
+ *   Buffer size in double words.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_register_read(void *ctx, uint16_t reg_id, uint32_t arg,
+			    uint32_t *data, uint32_t dw_cnt)
+{
+	uint32_t in[MLX5_ST_SZ_DW(access_register_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(access_register_out) +
+		     MLX5_ACCESS_REGISTER_DATA_DWORD_MAX] = {0};
+	int status, rc;
+
+	MLX5_ASSERT(data && dw_cnt);
+	MLX5_ASSERT(dw_cnt <= MLX5_ACCESS_REGISTER_DATA_DWORD_MAX);
+	if (dw_cnt  > MLX5_ACCESS_REGISTER_DATA_DWORD_MAX) {
+		DRV_LOG(ERR, "Not enough  buffer for register read data");
+		return -1;
+	}
+	MLX5_SET(access_register_in, in, opcode, MLX5_CMD_OP_ACCESS_REGISTER);
+	MLX5_SET(access_register_in, in, op_mod,
+					MLX5_ACCESS_REGISTER_IN_OP_MOD_READ);
+	MLX5_SET(access_register_in, in, register_id, reg_id);
+	MLX5_SET(access_register_in, in, argument, arg);
+	rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in), out,
+					 MLX5_ST_SZ_DW(access_register_out) *
+					 sizeof(uint32_t) + dw_cnt);
+	if (rc)
+		goto error;
+	status = MLX5_GET(access_register_out, out, status);
+	if (status) {
+		int syndrome = MLX5_GET(access_register_out, out, syndrome);
+
+		DRV_LOG(DEBUG, "Failed to access NIC register 0x%X, "
+			       "status %x, syndrome = %x",
+			       reg_id, status, syndrome);
+		return -1;
+	}
+	memcpy(data, &out[MLX5_ST_SZ_DW(access_register_out)],
+	       dw_cnt * sizeof(uint32_t));
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
  * Allocate flow counters via devx interface.
  *
  * @param[in] ctx
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index c79b349..34482e1 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -383,6 +383,9 @@ int mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp,
 int mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
 			     struct mlx5_devx_rqt_attr *rqt_attr);
 
+__rte_internal
+int mlx5_devx_cmd_register_read(void *ctx, uint16_t reg_id,
+				uint32_t arg, uint32_t *data, uint32_t dw_cnt);
 /**
  * Create virtio queue counters object DevX API.
  *
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index cf47103..b37be30 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -776,6 +776,7 @@ enum {
 	MLX5_CMD_OP_SUSPEND_QP = 0x50F,
 	MLX5_CMD_OP_RESUME_QP = 0x510,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ACCESS_REGISTER = 0x805,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
 	MLX5_CMD_OP_CREATE_SQ = 0X904,
@@ -2545,6 +2546,57 @@ struct mlx5_ifc_set_pp_rate_limit_context_bits {
 	u8 reserved_at_60[0x120];
 };
 
+#define MLX5_ACCESS_REGISTER_DATA_DWORD_MAX 8u
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_access_register_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	u8 register_data[0][0x20];
+};
+
+struct mlx5_ifc_access_register_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x10];
+	u8 register_id[0x10];
+	u8 argument[0x20];
+	u8 register_data[0][0x20];
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+enum {
+	MLX5_ACCESS_REGISTER_IN_OP_MOD_WRITE  = 0x0,
+	MLX5_ACCESS_REGISTER_IN_OP_MOD_READ   = 0x1,
+};
+
+enum {
+	MLX5_REGISTER_ID_MTUTC  = 0x9055,
+};
+
+struct mlx5_ifc_register_mtutc_bits {
+	u8 time_stamp_mode[0x2];
+	u8 time_stamp_state[0x2];
+	u8 reserved_at_4[0x18];
+	u8 operation[0x4];
+	u8 freq_adjustment[0x20];
+	u8 reserved_at_40[0x40];
+	u8 utc_sec[0x20];
+	u8 utc_nsec[0x20];
+	u8 time_adjustment[0x20];
+};
+
+#define MLX5_MTUTC_TIMESTAMP_MODE_INTERNAL_TIMER 0
+#define MLX5_MTUTC_TIMESTAMP_MODE_REAL_TIME 1
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index ae57ebd..68007ef 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -34,6 +34,7 @@ INTERNAL {
 	mlx5_devx_cmd_query_hca_attr;
 	mlx5_devx_cmd_query_virtio_q_counters;
 	mlx5_devx_cmd_query_virtq;
+	mlx5_devx_cmd_register_read;
 	mlx5_devx_get_out_command_status;
 
 	mlx5_get_ifname_sysfs;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v2 17/17] net/mlx5: convert Rx timestamps in realtime format
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
                     ` (15 preceding siblings ...)
  2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 16/17] common/mlx5: add register access DevX routine Viacheslav Ovsiienko
@ 2020-07-15  6:21   ` Viacheslav Ovsiienko
  16 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-15  6:21 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The ConnectX-6DX supports the timestamps in various formats,
the new realtime format is introduced - the upper 32-bit word
of timestamp contains the UTC seconds and the lower 32-bit word
contains the nanoseconds. This patch detects what format is
configured in the NIC and performs the conversion accordingly.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/linux/mlx5_os.c         | 21 +++++++++++++
 drivers/net/mlx5/mlx5.h                  |  1 +
 drivers/net/mlx5/mlx5_rxq.c              | 24 ++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.c             |  6 +++-
 drivers/net/mlx5/mlx5_rxtx.h             |  4 +++
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 34 +++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h    | 54 ++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h     | 34 +++++++++++++++-----
 drivers/net/mlx5/mlx5_trigger.c          |  6 ++--
 9 files changed, 149 insertions(+), 35 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e7241d8..f228bab 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -936,6 +936,27 @@
 		goto error;
 #endif
 	}
+	if (config.devx) {
+		uint32_t reg[MLX5_ST_SZ_DW(register_mtutc)];
+
+		err = mlx5_devx_cmd_register_read
+			(sh->ctx, MLX5_REGISTER_ID_MTUTC, 0,
+			reg, MLX5_ST_SZ_DW(register_mtutc));
+		if (!err) {
+			uint32_t ts_mode;
+
+			/* MTUTC register is read successfully. */
+			ts_mode = MLX5_GET(register_mtutc, reg,
+					   time_stamp_mode);
+			if (ts_mode == MLX5_MTUTC_TIMESTAMP_MODE_REAL_TIME)
+				config.rt_timestamp = 1;
+		} else {
+			/* Kernel does not support register reading. */
+			if (config.hca_attr.dev_freq_khz ==
+						 (NS_PER_S / MS_PER_S))
+				config.rt_timestamp = 1;
+		}
+	}
 	if (config.mprq.enabled && mprq) {
 		if (config.mprq.stride_num_n &&
 		    (config.mprq.stride_num_n > mprq_max_stride_num_n ||
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 97a14fb..34e2bc1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -216,6 +216,7 @@ struct mlx5_dev_config {
 	unsigned int devx:1; /* Whether devx interface is available or not. */
 	unsigned int dest_tir:1; /* Whether advanced DR API is available. */
 	unsigned int reclaim_mode:2; /* Memory reclaim mode. */
+	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2681322..7dd06e8 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2972,3 +2972,27 @@ struct mlx5_hrxq *
 		priv->drop_queue.hrxq = NULL;
 	}
 }
+
+
+/**
+ * Set the Rx queue timestamp conversion parameters
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ */
+void
+mlx5_rxq_timestamp_set(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_rxq_data *data;
+	unsigned int i;
+
+	for (i = 0; i != priv->rxqs_n; ++i) {
+		if (!(*priv->rxqs)[i])
+			continue;
+		data = (*priv->rxqs)[i];
+		data->sh = sh;
+		data->rt_timestamp = priv->config.rt_timestamp;
+	}
+}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index e511142..65239f9 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1389,7 +1389,11 @@ enum mlx5_txcmp_code {
 		pkt->vlan_tci = rte_be_to_cpu_16(cqe->vlan_info);
 	}
 	if (rxq->hw_timestamp) {
-		pkt->timestamp = rte_be_to_cpu_64(cqe->timestamp);
+		uint64_t ts = rte_be_to_cpu_64(cqe->timestamp);
+
+		if (rxq->rt_timestamp)
+			ts = mlx5_txpp_convert_rx_ts(rxq->sh, ts);
+		pkt->timestamp = ts;
 		pkt->ol_flags |= PKT_RX_TIMESTAMP;
 	}
 }
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d082cd7..5116a15 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -109,6 +109,7 @@ enum mlx5_rxq_err_state {
 struct mlx5_rxq_data {
 	unsigned int csum:1; /* Enable checksum offloading. */
 	unsigned int hw_timestamp:1; /* Enable HW timestamp. */
+	unsigned int rt_timestamp:1; /* Realtime timestamp format. */
 	unsigned int vlan_strip:1; /* Enable VLAN stripping. */
 	unsigned int crc_present:1; /* CRC must be subtracted. */
 	unsigned int sges_n:3; /* Log 2 of SGEs (max buffers per packet). */
@@ -148,6 +149,7 @@ struct mlx5_rxq_data {
 	struct rte_mempool *mp;
 	struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ. */
 	struct mlx5_mprq_buf *mprq_repl; /* Stashed mbuf for replenish. */
+	struct mlx5_dev_ctx_shared *sh; /* Shared context. */
 	uint16_t idx; /* Queue index. */
 	struct mlx5_rxq_stats stats;
 	rte_xmm_t mbuf_initializer; /* Default rearm/flags for vectorized Rx. */
@@ -442,6 +444,8 @@ uint32_t mlx5_hrxq_get(struct rte_eth_dev *dev,
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
 uint64_t mlx5_get_rx_queue_offloads(struct rte_eth_dev *dev);
+void mlx5_rxq_timestamp_set(struct rte_eth_dev *dev);
+
 
 /* mlx5_txq.c */
 
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index b55138a..f5414ee 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@@ -1024,14 +1024,32 @@
 		/* D.5 fill in mbuf - rearm_data and packet_type. */
 		rxq_cq_to_ptype_oflags_v(rxq, cqes, opcode, &pkts[pos]);
 		if (rxq->hw_timestamp) {
-			pkts[pos]->timestamp =
-				rte_be_to_cpu_64(cq[pos].timestamp);
-			pkts[pos + 1]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p1].timestamp);
-			pkts[pos + 2]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p2].timestamp);
-			pkts[pos + 3]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p3].timestamp);
+			if (rxq->rt_timestamp) {
+				struct mlx5_dev_ctx_shared *sh = rxq->sh;
+				uint64_t ts;
+
+				ts = rte_be_to_cpu_64(cq[pos].timestamp);
+				pkts[pos]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p1].timestamp);
+				pkts[pos + 1]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p2].timestamp);
+				pkts[pos + 2]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p3].timestamp);
+				pkts[pos + 3]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+			} else {
+				pkts[pos]->timestamp = rte_be_to_cpu_64
+						(cq[pos].timestamp);
+				pkts[pos + 1]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p1].timestamp);
+				pkts[pos + 2]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p2].timestamp);
+				pkts[pos + 3]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p3].timestamp);
+			}
 		}
 		if (rxq->dynf_meta) {
 			uint64_t flag = rxq->flow_meta_mask;
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 3007c03..555c342 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -694,22 +694,44 @@
 		rxq_cq_to_ptype_oflags_v(rxq, ptype_info, flow_tag,
 					 opcode, &elts[pos]);
 		if (rxq->hw_timestamp) {
-			elts[pos]->timestamp =
-				rte_be_to_cpu_64(
-					container_of(p0, struct mlx5_cqe,
-						     pkt_info)->timestamp);
-			elts[pos + 1]->timestamp =
-				rte_be_to_cpu_64(
-					container_of(p1, struct mlx5_cqe,
-						     pkt_info)->timestamp);
-			elts[pos + 2]->timestamp =
-				rte_be_to_cpu_64(
-					container_of(p2, struct mlx5_cqe,
-						     pkt_info)->timestamp);
-			elts[pos + 3]->timestamp =
-				rte_be_to_cpu_64(
-					container_of(p3, struct mlx5_cqe,
-						     pkt_info)->timestamp);
+			if (rxq->rt_timestamp) {
+				struct mlx5_dev_ctx_shared *sh = rxq->sh;
+				uint64_t ts;
+
+				ts = rte_be_to_cpu_64
+					(container_of(p0, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64
+					(container_of(p1, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 1]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64
+					(container_of(p2, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 2]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64
+					(container_of(p3, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 3]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+			} else {
+				elts[pos]->timestamp = rte_be_to_cpu_64
+					(container_of(p0, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 1]->timestamp = rte_be_to_cpu_64
+					(container_of(p1, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 2]->timestamp = rte_be_to_cpu_64
+					(container_of(p2, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 3]->timestamp = rte_be_to_cpu_64
+					(container_of(p3, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+			}
 		}
 		if (!!rxq->flow_meta_mask) {
 			/* This code is subject for futher optimization. */
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index da5960a..34e3397 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -655,14 +655,32 @@
 		/* D.5 fill in mbuf - rearm_data and packet_type. */
 		rxq_cq_to_ptype_oflags_v(rxq, cqes, opcode, &pkts[pos]);
 		if (rxq->hw_timestamp) {
-			pkts[pos]->timestamp =
-				rte_be_to_cpu_64(cq[pos].timestamp);
-			pkts[pos + 1]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p1].timestamp);
-			pkts[pos + 2]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p2].timestamp);
-			pkts[pos + 3]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p3].timestamp);
+			if (rxq->rt_timestamp) {
+				struct mlx5_dev_ctx_shared *sh = rxq->sh;
+				uint64_t ts;
+
+				ts = rte_be_to_cpu_64(cq[pos].timestamp);
+				pkts[pos]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p1].timestamp);
+				pkts[pos + 1]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p2].timestamp);
+				pkts[pos + 2]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p3].timestamp);
+				pkts[pos + 3]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+			} else {
+				pkts[pos]->timestamp = rte_be_to_cpu_64
+						(cq[pos].timestamp);
+				pkts[pos + 1]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p1].timestamp);
+				pkts[pos + 2]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p2].timestamp);
+				pkts[pos + 3]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p3].timestamp);
+			}
 		}
 		if (rxq->dynf_meta) {
 			/* This code is subject for futher optimization. */
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 29aef54..6e5a730 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -329,9 +329,11 @@
 			dev->data->port_id);
 		goto error;
 	}
-	/* Set a mask and offset of dynamic metadata flows into Rx queues*/
+	/* Set a mask and offset of dynamic metadata flows into Rx queues. */
 	mlx5_flow_rxq_dynf_metadata_set(dev);
-	/* Set a mask and offset of scheduling on timestamp into Tx queues*/
+	/* Set flags and context to convert Rx timestamps. */
+	mlx5_rxq_timestamp_set(dev);
+	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (8 preceding siblings ...)
  2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
@ 2020-07-16  8:23 ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 01/17] common/mlx5: update common part to support packet pacing Viacheslav Ovsiienko
                     ` (17 more replies)
  2020-07-17 14:28 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Viacheslav Ovsiienko
  10 siblings, 18 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The accurate packet Tx scheduling on timestamps was introduced [1].
This patchset provides the feature implementation on mlx5 PMD side.

[1] http://patches.dpdk.org/patch/73742/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>

---
v3: - 32-bit systems rte_atomic128_cmp_exchange() fix
    - strncpy coverity issue (bug)fix
    - codestyle compilation issues
    - negative tx_pp is always enabled

Viacheslav Ovsiienko (17):
  common/mlx5: update common part to support packet pacing
  net/mlx5: introduce send scheduling devargs
  net/mlx5: fix UAR lock sharing for multiport devices
  net/mlx5: introduce shared UAR resource
  net/mlx5: create clock queue for packet pacing
  net/mlx5: create rearm queue for packet pacing
  net/mlx5: create Tx queues with DevX
  net/mlx5: allocate packet pacing context
  net/mlx5: introduce clock queue service routine
  net/mlx5: prepare Tx queue structures to support timestamp
  net/mlx5: convert timestamp to completion index
  net/mlx5: prepare Tx datapath to support sheduling
  net/mlx5: add scheduling support to send routine template
  net/mlx5: add read device clock support
  net/mlx5: provide the send scheduling error statistics
  common/mlx5: add register access DevX routine
  net/mlx5: convert Rx timestamps in realtime format

 doc/guides/nics/mlx5.rst                        |   37 +
 drivers/common/mlx5/Makefile                    |   25 +
 drivers/common/mlx5/linux/meson.build           |   10 +
 drivers/common/mlx5/linux/mlx5_glue.c           |   31 +-
 drivers/common/mlx5/linux/mlx5_glue.h           |    5 +
 drivers/common/mlx5/mlx5_devx_cmds.c            |   79 +-
 drivers/common/mlx5/mlx5_devx_cmds.h            |   13 +
 drivers/common/mlx5/mlx5_prm.h                  |  121 +-
 drivers/common/mlx5/rte_common_mlx5_version.map |    1 +
 drivers/net/mlx5/Makefile                       |    1 +
 drivers/net/mlx5/linux/mlx5_os.c                |   97 +-
 drivers/net/mlx5/meson.build                    |    1 +
 drivers/net/mlx5/mlx5.c                         |   61 +-
 drivers/net/mlx5/mlx5.h                         |   90 +-
 drivers/net/mlx5/mlx5_defs.h                    |   11 +
 drivers/net/mlx5/mlx5_rxq.c                     |   26 +-
 drivers/net/mlx5/mlx5_rxtx.c                    |  373 ++++++-
 drivers/net/mlx5/mlx5_rxtx.h                    |   97 ++
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |   34 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |   54 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |   34 +-
 drivers/net/mlx5/mlx5_stats.c                   |    7 +-
 drivers/net/mlx5/mlx5_trigger.c                 |   26 +-
 drivers/net/mlx5/mlx5_txpp.c                    | 1335 +++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txq.c                     |  342 +++++-
 25 files changed, 2802 insertions(+), 109 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_txpp.c

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 01/17] common/mlx5: update common part to support packet pacing
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 02/17] net/mlx5: introduce send scheduling devargs Viacheslav Ovsiienko
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch prepares the common part of the mlx5 PMDs to
support packet send scheduling on mbuf timestamps:

  - the DevX routine to query the packet pacing HCA capabilities
  - packet pacing Send Queue attributes support
  - the hardware related definitions

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/Makefile          | 25 +++++++++++++
 drivers/common/mlx5/linux/meson.build | 10 +++++
 drivers/common/mlx5/linux/mlx5_glue.c | 31 +++++++++++++++-
 drivers/common/mlx5/linux/mlx5_glue.h |  5 +++
 drivers/common/mlx5/mlx5_devx_cmds.c  | 19 +++++++++-
 drivers/common/mlx5/mlx5_devx_cmds.h  | 10 +++++
 drivers/common/mlx5/mlx5_prm.h        | 69 ++++++++++++++++++++++++++++++++---
 7 files changed, 161 insertions(+), 8 deletions(-)

diff --git a/drivers/common/mlx5/Makefile b/drivers/common/mlx5/Makefile
index f6c762b..f9dc376 100644
--- a/drivers/common/mlx5/Makefile
+++ b/drivers/common/mlx5/Makefile
@@ -172,6 +172,16 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_devx_qp_query \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_DEVX_UAR_OFFSET \
+		infiniband/mlx5dv.h \
+		field "struct mlx5dv_devx_uar.mmap_off" \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5DV_PP_ALLOC \
+		infiniband/mlx5dv.h \
+		func mlx5dv_pp_alloc \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR \
 		infiniband/mlx5dv.h \
 		func mlx5dv_dr_action_create_dest_devx_tir \
@@ -207,6 +217,21 @@ mlx5_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
 		func mlx5dv_dr_domain_set_reclaim_device_memory \
 		$(AUTOCONF_OUTPUT)
 	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_OPCODE_ENHANCED_MPSW \
+		infiniband/mlx5dv.h \
+		enum MLX5_OPCODE_ENHANCED_MPSW \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_OPCODE_SEND_EN \
+		infiniband/mlx5dv.h \
+		enum MLX5_OPCODE_SEND_EN \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
+		HAVE_MLX5_OPCODE_WAIT \
+		infiniband/mlx5dv.h \
+		enum MLX5_OPCODE_WAIT \
+		$(AUTOCONF_OUTPUT)
+	$Q sh -- '$<' '$@' \
 		HAVE_ETHTOOL_LINK_MODE_25G \
 		/usr/include/linux/ethtool.h \
 		enum ETHTOOL_LINK_MODE_25000baseCR_Full_BIT \
diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 2294213..48e8ad6 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -64,6 +64,8 @@ has_member_args = [
 	'struct ibv_counter_set_init_attr', 'counter_set_id' ],
 	[ 'HAVE_IBV_DEVICE_COUNTERS_SET_V45', 'infiniband/verbs.h',
 	'struct ibv_counters_init_attr', 'comp_mask' ],
+	[ 'HAVE_MLX5DV_DEVX_UAR_OFFSET', 'infiniband/mlx5dv.h',
+	'struct mlx5dv_devx_uar', 'mmap_off' ],
 ]
 # input array for meson symbol search:
 # [ "MACRO to define if found", "header for the search",
@@ -101,6 +103,8 @@ has_sym_args = [
 	'mlx5dv_devx_obj_query_async' ],
 	[ 'HAVE_IBV_DEVX_QP', 'infiniband/mlx5dv.h',
 	'mlx5dv_devx_qp_query' ],
+	[ 'HAVE_MLX5DV_PP_ALLOC', 'infiniband/mlx5dv.h',
+	'mlx5dv_pp_alloc' ],
 	[ 'HAVE_MLX5DV_DR_ACTION_DEST_DEVX_TIR', 'infiniband/mlx5dv.h',
 	'mlx5dv_dr_action_create_dest_devx_tir' ],
 	[ 'HAVE_IBV_DEVX_EVENT', 'infiniband/mlx5dv.h',
@@ -116,6 +120,12 @@ has_sym_args = [
 	[ 'HAVE_MLX5DV_DR_VLAN', 'infiniband/mlx5dv.h',
 	'mlx5dv_dr_action_create_push_vlan' ],
 	[ 'HAVE_IBV_VAR', 'infiniband/mlx5dv.h', 'mlx5dv_alloc_var' ],
+	[ 'HAVE_MLX5_OPCODE_ENHANCED_MPSW', 'infiniband/mlx5dv.h',
+	'MLX5_OPCODE_ENHANCED_MPSW' ],
+	[ 'HAVE_MLX5_OPCODE_SEND_EN', 'infiniband/mlx5dv.h',
+	'MLX5_OPCODE_SEND_EN' ],
+	[ 'HAVE_MLX5_OPCODE_WAIT', 'infiniband/mlx5dv.h',
+	'MLX5_OPCODE_WAIT' ],
 	[ 'HAVE_SUPPORTED_40000baseKR4_Full', 'linux/ethtool.h',
 	'SUPPORTED_40000baseKR4_Full' ],
 	[ 'HAVE_SUPPORTED_40000baseCR4_Full', 'linux/ethtool.h',
diff --git a/drivers/common/mlx5/linux/mlx5_glue.c b/drivers/common/mlx5/linux/mlx5_glue.c
index 395519d..4d3875f 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.c
+++ b/drivers/common/mlx5/linux/mlx5_glue.c
@@ -1195,7 +1195,6 @@
 #endif
 }
 
-
 static void
 mlx5_glue_dr_reclaim_domain_memory(void *domain, uint32_t enable)
 {
@@ -1207,6 +1206,34 @@
 #endif
 }
 
+static struct mlx5dv_pp *
+mlx5_glue_dv_alloc_pp(struct ibv_context *context,
+		      size_t pp_context_sz,
+		      const void *pp_context,
+		      uint32_t flags)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	return mlx5dv_pp_alloc(context, pp_context_sz, pp_context, flags);
+#else
+	RTE_SET_USED(context);
+	RTE_SET_USED(pp_context_sz);
+	RTE_SET_USED(pp_context);
+	RTE_SET_USED(flags);
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
+static void
+mlx5_glue_dv_free_pp(struct mlx5dv_pp *pp)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	mlx5dv_pp_free(pp);
+#else
+	RTE_SET_USED(pp);
+#endif
+}
+
 __rte_cache_aligned
 const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue) {
 	.version = MLX5_GLUE_VERSION,
@@ -1319,4 +1346,6 @@
 	.devx_free_uar = mlx5_glue_devx_free_uar,
 	.dv_alloc_var = mlx5_glue_dv_alloc_var,
 	.dv_free_var = mlx5_glue_dv_free_var,
+	.dv_alloc_pp = mlx5_glue_dv_alloc_pp,
+	.dv_free_pp = mlx5_glue_dv_free_pp,
 };
diff --git a/drivers/common/mlx5/linux/mlx5_glue.h b/drivers/common/mlx5/linux/mlx5_glue.h
index 069d854..c4f9b00 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.h
+++ b/drivers/common/mlx5/linux/mlx5_glue.h
@@ -304,6 +304,11 @@ struct mlx5_glue {
 			 struct mlx5dv_devx_async_event_hdr *event_data,
 			 size_t event_resp_len);
 	void (*dr_reclaim_domain_memory)(void *domain, uint32_t enable);
+	struct mlx5dv_pp *(*dv_alloc_pp)(struct ibv_context *context,
+					 size_t pp_context_sz,
+					 const void *pp_context,
+					 uint32_t flags);
+	void (*dv_free_pp)(struct mlx5dv_pp *pp);
 };
 
 extern const struct mlx5_glue *mlx5_glue;
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 2179a83..093636c 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -467,6 +467,14 @@ struct mlx5_devx_obj *
 	attr->vdpa.queue_counters_valid = !!(MLX5_GET64(cmd_hca_cap, hcattr,
 							general_obj_types) &
 				  MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_Q_COUNTERS);
+	attr->wqe_index_ignore = MLX5_GET(cmd_hca_cap, hcattr,
+					  wqe_index_ignore_cap);
+	attr->cross_channel = MLX5_GET(cmd_hca_cap, hcattr, cd);
+	attr->non_wire_sq = MLX5_GET(cmd_hca_cap, hcattr, non_wire_sq);
+	attr->log_max_static_sq_wq = MLX5_GET(cmd_hca_cap, hcattr,
+					      log_max_static_sq_wq);
+	attr->dev_freq_khz = MLX5_GET(cmd_hca_cap, hcattr,
+				      device_frequency_khz);
 	if (attr->qos.sup) {
 		MLX5_SET(query_hca_cap_in, in, op_mod,
 			 MLX5_GET_HCA_CAP_OP_MOD_QOS_CAP |
@@ -487,9 +495,13 @@ struct mlx5_devx_obj *
 		attr->qos.log_max_flow_meter =
 				MLX5_GET(qos_cap, hcattr, log_max_flow_meter);
 		attr->qos.flow_meter_reg_c_ids =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
+				MLX5_GET(qos_cap, hcattr, flow_meter_reg_id);
 		attr->qos.flow_meter_reg_share =
-			MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
+				MLX5_GET(qos_cap, hcattr, flow_meter_reg_share);
+		attr->qos.packet_pacing =
+				MLX5_GET(qos_cap, hcattr, packet_pacing);
+		attr->qos.wqe_rate_pp =
+				MLX5_GET(qos_cap, hcattr, wqe_rate_pp);
 	}
 	if (attr->vdpa.valid)
 		mlx5_devx_cmd_query_hca_vdpa_attr(ctx, &attr->vdpa);
@@ -971,6 +983,8 @@ struct mlx5_devx_obj *
 	MLX5_SET(sqc, sq_ctx, reg_umr, sq_attr->reg_umr);
 	MLX5_SET(sqc, sq_ctx, allow_swp, sq_attr->allow_swp);
 	MLX5_SET(sqc, sq_ctx, hairpin, sq_attr->hairpin);
+	MLX5_SET(sqc, sq_ctx, non_wire, sq_attr->non_wire);
+	MLX5_SET(sqc, sq_ctx, static_sq_wq, sq_attr->static_sq_wq);
 	MLX5_SET(sqc, sq_ctx, user_index, sq_attr->user_index);
 	MLX5_SET(sqc, sq_ctx, cqn, sq_attr->cqn);
 	MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
@@ -1185,6 +1199,7 @@ struct mlx5_devx_obj *
 	} else {
 		MLX5_SET64(cqc, cqctx, dbr_addr, attr->db_addr);
 	}
+	MLX5_SET(cqc, cqctx, cqe_sz, attr->cqe_size);
 	MLX5_SET(cqc, cqctx, cc, attr->use_first_only);
 	MLX5_SET(cqc, cqctx, oi, attr->overrun_ignore);
 	MLX5_SET(cqc, cqctx, log_cq_size, attr->log_cq_size);
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 25704ef..c79b349 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -29,6 +29,8 @@ struct mlx5_devx_mkey_attr {
 struct mlx5_hca_qos_attr {
 	uint32_t sup:1;	/* Whether QOS is supported. */
 	uint32_t srtcm_sup:1; /* Whether srTCM mode is supported. */
+	uint32_t packet_pacing:1; /* Packet pacing is supported. */
+	uint32_t wqe_rate_pp:1; /* Packet pacing WQE rate mode. */
 	uint32_t flow_meter_reg_share:1;
 	/* Whether reg_c share is supported. */
 	uint8_t log_max_flow_meter;
@@ -90,6 +92,11 @@ struct mlx5_hca_attr {
 	uint32_t vhca_id:16;
 	uint32_t relaxed_ordering_write:1;
 	uint32_t relaxed_ordering_read:1;
+	uint32_t wqe_index_ignore:1;
+	uint32_t cross_channel:1;
+	uint32_t non_wire_sq:1; /* SQ with non-wire ops is supported. */
+	uint32_t log_max_static_sq_wq:5; /* Static WQE size SQ. */
+	uint32_t dev_freq_khz; /* Timestamp counter frequency, kHz. */
 	struct mlx5_hca_qos_attr qos;
 	struct mlx5_hca_vdpa_attr vdpa;
 };
@@ -207,6 +214,8 @@ struct mlx5_devx_create_sq_attr {
 	uint32_t reg_umr:1;
 	uint32_t allow_swp:1;
 	uint32_t hairpin:1;
+	uint32_t non_wire:1;
+	uint32_t static_sq_wq:1;
 	uint32_t user_index:24;
 	uint32_t cqn:24;
 	uint32_t packet_pacing_rate_limit_index:16;
@@ -230,6 +239,7 @@ struct mlx5_devx_cq_attr {
 	uint32_t db_umem_valid:1;
 	uint32_t use_first_only:1;
 	uint32_t overrun_ignore:1;
+	uint32_t cqe_size:3;
 	uint32_t log_cq_size:5;
 	uint32_t log_page_size:5;
 	uint32_t uar_page_id;
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index c63795f..cf47103 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -41,6 +41,10 @@
 /* Invalidate a CQE. */
 #define MLX5_CQE_INVALIDATE (MLX5_CQE_INVALID << 4)
 
+/* Hardware index widths. */
+#define MLX5_CQ_INDEX_WIDTH 24
+#define MLX5_WQ_INDEX_WIDTH 16
+
 /* WQE Segment sizes in bytes. */
 #define MLX5_WSEG_SIZE 16u
 #define MLX5_WQE_CSEG_SIZE sizeof(struct mlx5_wqe_cseg)
@@ -126,7 +130,17 @@
 				  MLX5_ESEG_MIN_INLINE_SIZE)
 
 /* Missed in mlv5dv.h, should define here. */
+#ifndef HAVE_MLX5_OPCODE_ENHANCED_MPSW
 #define MLX5_OPCODE_ENHANCED_MPSW 0x29u
+#endif
+
+#ifndef HAVE_MLX5_OPCODE_SEND_EN
+#define MLX5_OPCODE_SEND_EN 0x17u
+#endif
+
+#ifndef HAVE_MLX5_OPCODE_WAIT
+#define MLX5_OPCODE_WAIT 0x0fu
+#endif
 
 /* CQE value to inform that VLAN is stripped. */
 #define MLX5_CQE_VLAN_STRIPPED (1u << 0)
@@ -255,6 +269,9 @@
 /* The alignment needed for WQ buffer. */
 #define MLX5_WQE_BUF_ALIGNMENT sysconf(_SC_PAGESIZE)
 
+/* The alignment needed for CQ buffer. */
+#define MLX5_CQE_BUF_ALIGNMENT sysconf(_SC_PAGESIZE)
+
 /* Completion mode. */
 enum mlx5_completion_mode {
 	MLX5_COMP_ONLY_ERR = 0x0,
@@ -314,6 +331,13 @@ struct mlx5_wqe_eseg {
 	};
 } __rte_packed;
 
+struct mlx5_wqe_qseg {
+	uint32_t reserved0;
+	uint32_t reserved1;
+	uint32_t max_index;
+	uint32_t qpn_cqn;
+} __rte_packed;
+
 /* The title WQEBB, header of WQE. */
 struct mlx5_wqe {
 	union {
@@ -373,6 +397,14 @@ struct mlx5_cqe {
 	uint8_t op_own;
 };
 
+struct mlx5_cqe_ts {
+	uint64_t timestamp;
+	uint32_t sop_drop_qpn;
+	uint16_t wqe_counter;
+	uint8_t rsvd5;
+	uint8_t op_own;
+};
+
 /* Adding direct verbs to data-path. */
 
 /* CQ sequence number mask. */
@@ -992,7 +1024,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8 reserved_at_40[0x40];
 	u8 log_max_srq_sz[0x8];
 	u8 log_max_qp_sz[0x8];
-	u8 reserved_at_90[0xb];
+	u8 reserved_at_90[0x9];
+	u8 wqe_index_ignore_cap[0x1];
+	u8 dynamic_qp_allocation[0x1];
 	u8 log_max_qp[0x5];
 	u8 reserved_at_a0[0xb];
 	u8 log_max_srq[0x5];
@@ -1018,9 +1052,12 @@ struct mlx5_ifc_cmd_hca_cap_bits {
 	u8 umr_extended_translation_offset[0x1];
 	u8 null_mkey[0x1];
 	u8 log_max_klm_list_size[0x6];
-	u8 reserved_at_120[0xa];
+	u8 non_wire_sq[0x1];
+	u8 reserved_at_121[0x9];
 	u8 log_max_ra_req_dc[0x6];
-	u8 reserved_at_130[0xa];
+	u8 reserved_at_130[0x3];
+	u8 log_max_static_sq_wq[0x5];
+	u8 reserved_at_138[0x2];
 	u8 log_max_ra_res_dc[0x6];
 	u8 reserved_at_140[0xa];
 	u8 log_max_ra_req_qp[0x6];
@@ -1271,7 +1308,8 @@ struct mlx5_ifc_qos_cap_bits {
 	u8 reserved_at_8[0x8];
 	u8 log_max_flow_meter[0x8];
 	u8 flow_meter_reg_id[0x8];
-	u8 reserved_at_25[0x8];
+	u8 wqe_rate_pp[0x1];
+	u8 reserved_at_25[0x7];
 	u8 flow_meter_reg_share[0x1];
 	u8 reserved_at_2e[0x17];
 	u8 packet_pacing_max_rate[0x20];
@@ -1835,7 +1873,9 @@ struct mlx5_ifc_sqc_bits {
 	u8 reg_umr[0x1];
 	u8 allow_swp[0x1];
 	u8 hairpin[0x1];
-	u8 reserved_at_f[0x11];
+	u8 non_wire[0x1];
+	u8 static_sq_wq[0x1];
+	u8 reserved_at_11[0xf];
 	u8 reserved_at_20[0x8];
 	u8 user_index[0x18];
 	u8 reserved_at_40[0x8];
@@ -1935,6 +1975,11 @@ struct mlx5_ifc_flow_meter_parameters_bits {
 	u8         reserved_at_8[0x60];		// 14h-1Ch
 };
 
+enum {
+	MLX5_CQE_SIZE_64B = 0x0,
+	MLX5_CQE_SIZE_128B = 0x1,
+};
+
 struct mlx5_ifc_cqc_bits {
 	u8 status[0x4];
 	u8 as_notify[0x1];
@@ -2486,6 +2531,20 @@ struct mlx5_ifc_query_qp_in_bits {
 	u8 reserved_at_60[0x20];
 };
 
+enum {
+	MLX5_DATA_RATE = 0x0,
+	MLX5_WQE_RATE = 0x1,
+};
+
+struct mlx5_ifc_set_pp_rate_limit_context_bits {
+	u8 rate_limit[0x20];
+	u8 burst_upper_bound[0x20];
+	u8 reserved_at_40[0xC];
+	u8 rate_mode[0x4];
+	u8 typical_packet_size[0x10];
+	u8 reserved_at_60[0x120];
+};
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 02/17] net/mlx5: introduce send scheduling devargs
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 01/17] common/mlx5: update common part to support packet pacing Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 03/17] net/mlx5: fix UAR lock sharing for multiport devices Viacheslav Ovsiienko
                     ` (15 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch introduces the new devargs:

tx_pp - enables accurate packet send scheduling on mbuf timestamps
  in the PMD. On the device start if "rte_dynflag_timestamp"
  dynamic flag is registered and this devarg non-zero value is
  specified, the driver initializes all necessary internal
  infrastructure to provide packet scheduling. The parameter
  value specifies scheduling granularity in nanoseconds.

tx_skew - the parameter adjusts the send packet scheduling on
  timestamps and represents the average delay between beginning
  of the transmitting descriptor processing by the hardware and
  appearance of actual packet data on the wire. The value should
  be provided in nanoseconds and is valid only if tx_pp parameter
  is specified. The default value is zero.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 doc/guides/nics/mlx5.rst         | 37 +++++++++++++++++++++++
 drivers/net/mlx5/linux/mlx5_os.c | 63 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.c          | 39 +++++++++++++++++++++++--
 drivers/net/mlx5/mlx5.h          |  2 ++
 4 files changed, 138 insertions(+), 3 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 4b6d8fb..9a57768 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -241,6 +241,24 @@ Limitations
   reduce the requested Tx size or adjust data inline settings with
   ``txq_inline_max`` and ``txq_inline_mpw`` devargs keys.
 
+- To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
+  parameter should be specified, RTE_MBUF_DYNFIELD_TIMESTAMP_NAME and
+  RTE_MBUF_DYNFLAG_TIMESTAMP_NAME should be registered by application.
+  When PMD sees the RTE_MBUF_DYNFLAG_TIMESTAMP_NAME set on the packet
+  being sent it tries to synchronize the time of packet appearing on
+  the wire with the specified packet timestamp. It the specified one
+  is in the past it should be ignored, if one is in the distant future
+  it should be capped with some reasonable value (in range of seconds).
+  These specific cases ("too late" and "distant future") can be optionally
+  reported via device xstats to assist applications to detect the
+  time-related problems.
+
+  There is no any packet reordering according timestamps is supposed,
+  neither within packet burst, nor between packets, it is an entirely
+  application responsibility to generate packets and its timestamps
+  in desired order. The timestamps can be put only in the first packet
+  in the burst providing the entire burst scheduling.
+
 - E-Switch decapsulation Flow:
 
   - can be applied to PF port only.
@@ -700,6 +718,25 @@ Driver options
   variable "MLX5_SHUT_UP_BF" value is used. If there is no "MLX5_SHUT_UP_BF",
   the default ``tx_db_nc`` value is zero for ARM64 hosts and one for others.
 
+- ``tx_pp`` parameter [int]
+
+  If a nonzero value is specified the driver creates all necessary internal
+  objects to provide accurate packet send scheduling on mbuf timestamps.
+  The positive value specifies the scheduling granularity in nanoseconds,
+  the packet send will be accurate up to specified digits. The allowed range is
+  from 500 to 1 million of nanoseconds. The negative value specifies the module
+  of granularity and engages the special test mode the check the schedule rate.
+  By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
+  feature is disabled.
+
+- ``tx_skew`` parameter [int]
+
+  The parameter adjusts the send packet scheduling on timestamps and represents
+  the average delay between beginning of the transmitting descriptor processing
+  by the hardware and appearance of actual packet data on the wire. The value
+  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
+  specified. The default value is zero.
+
 - ``tx_vec_en`` parameter [int]
 
   A nonzero value enables Tx vector on ConnectX-5, ConnectX-6, ConnectX-6 Dx
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 2dc57b2..14af468 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -879,6 +879,69 @@
 		}
 #endif
 	}
+	if (config.tx_pp) {
+		DRV_LOG(DEBUG, "Timestamp counter frequency %u kHz",
+			config.hca_attr.dev_freq_khz);
+		DRV_LOG(DEBUG, "Packet pacing is %ssupported",
+			config.hca_attr.qos.packet_pacing ? "" : "not ");
+		DRV_LOG(DEBUG, "Cross channel ops are %ssupported",
+			config.hca_attr.cross_channel ? "" : "not ");
+		DRV_LOG(DEBUG, "WQE index ignore is %ssupported",
+			config.hca_attr.wqe_index_ignore ? "" : "not ");
+		DRV_LOG(DEBUG, "Non-wire SQ feature is %ssupported",
+			config.hca_attr.non_wire_sq ? "" : "not ");
+		DRV_LOG(DEBUG, "Static WQE SQ feature is %ssupported (%d)",
+			config.hca_attr.log_max_static_sq_wq ? "" : "not ",
+			config.hca_attr.log_max_static_sq_wq);
+		DRV_LOG(DEBUG, "WQE rate PP mode is %ssupported",
+			config.hca_attr.qos.wqe_rate_pp ? "" : "not ");
+		if (!config.devx) {
+			DRV_LOG(ERR, "DevX is required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.qos.packet_pacing) {
+			DRV_LOG(ERR, "Packet pacing is not supported");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.cross_channel) {
+			DRV_LOG(ERR, "Cross channel operations are"
+				     " required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.wqe_index_ignore) {
+			DRV_LOG(ERR, "WQE index ignore feature is"
+				     " required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.non_wire_sq) {
+			DRV_LOG(ERR, "Non-wire SQ feature is"
+				     " required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.log_max_static_sq_wq) {
+			DRV_LOG(ERR, "Static WQE SQ feature is"
+				     " required for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+		if (!config.hca_attr.qos.wqe_rate_pp) {
+			DRV_LOG(ERR, "WQE rate mode is required"
+				     " for packet pacing");
+			err = ENODEV;
+			goto error;
+		}
+#ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
+		DRV_LOG(ERR, "DevX does not provide UAR offset,"
+			     " can't create queues for packet pacing");
+		err = ENODEV;
+		goto error;
+#endif
+	}
 	if (config.mprq.enabled && mprq) {
 		if (config.mprq.stride_num_n &&
 		    (config.mprq.stride_num_n > mprq_max_stride_num_n ||
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 0c654ed..72e0870 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -120,6 +120,19 @@
 #define MLX5_TXQ_MAX_INLINE_LEN "txq_max_inline_len"
 
 /*
+ * Device parameter to enable Tx scheduling on timestamps
+ * and specify the packet pacing granularity in nanoseconds.
+ */
+#define MLX5_TX_PP "tx_pp"
+
+/*
+ * Device parameter to specify skew in nanoseconds on Tx datapath,
+ * it represents the time between SQ start WQE processing and
+ * appearing actual packet data on the wire.
+ */
+#define MLX5_TX_SKEW "tx_skew"
+
+/*
  * Device parameter to enable hardware Tx vector.
  * Deprecated, ignored (no vectorized Tx routines anymore).
  */
@@ -1271,18 +1284,26 @@ struct mlx5_dev_ctx_shared *
 mlx5_args_check(const char *key, const char *val, void *opaque)
 {
 	struct mlx5_dev_config *config = opaque;
-	unsigned long tmp;
+	unsigned long mod;
+	signed long tmp;
 
 	/* No-op, port representors are processed in mlx5_dev_spawn(). */
 	if (!strcmp(MLX5_REPRESENTOR, key))
 		return 0;
 	errno = 0;
-	tmp = strtoul(val, NULL, 0);
+	tmp = strtol(val, NULL, 0);
 	if (errno) {
 		rte_errno = errno;
 		DRV_LOG(WARNING, "%s: \"%s\" is not a valid integer", key, val);
 		return -rte_errno;
 	}
+	if (tmp < 0 && strcmp(MLX5_TX_PP, key) && strcmp(MLX5_TX_SKEW, key)) {
+		/* Negative values are acceptable for some keys only. */
+		rte_errno = EINVAL;
+		DRV_LOG(WARNING, "%s: invalid negative value \"%s\"", key, val);
+		return -rte_errno;
+	}
+	mod = tmp >= 0 ? tmp : -tmp;
 	if (strcmp(MLX5_RXQ_CQE_COMP_EN, key) == 0) {
 		config->cqe_comp = !!tmp;
 	} else if (strcmp(MLX5_RXQ_CQE_PAD_EN, key) == 0) {
@@ -1333,6 +1354,15 @@ struct mlx5_dev_ctx_shared *
 		config->txq_inline_mpw = tmp;
 	} else if (strcmp(MLX5_TX_VEC_EN, key) == 0) {
 		DRV_LOG(WARNING, "%s: deprecated parameter, ignored", key);
+	} else if (strcmp(MLX5_TX_PP, key) == 0) {
+		if (!mod) {
+			DRV_LOG(ERR, "Zero Tx packet pacing parameter");
+			rte_errno = EINVAL;
+			return -rte_errno;
+		}
+		config->tx_pp = tmp;
+	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
+		config->tx_skew = tmp;
 	} else if (strcmp(MLX5_RX_VEC_EN, key) == 0) {
 		config->rx_vec_en = !!tmp;
 	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
@@ -1415,6 +1445,8 @@ struct mlx5_dev_ctx_shared *
 		MLX5_TXQ_MPW_HDR_DSEG_EN,
 		MLX5_TXQ_MAX_INLINE_LEN,
 		MLX5_TX_DB_NC,
+		MLX5_TX_PP,
+		MLX5_TX_SKEW,
 		MLX5_TX_VEC_EN,
 		MLX5_RX_VEC_EN,
 		MLX5_L3_VXLAN_EN,
@@ -1693,7 +1725,8 @@ struct mlx5_dev_ctx_shared *
 {
 	static const char *const dynf_names[] = {
 		RTE_PMD_MLX5_FINE_GRANULARITY_INLINE,
-		RTE_MBUF_DYNFLAG_METADATA_NAME
+		RTE_MBUF_DYNFLAG_METADATA_NAME,
+		RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
 	};
 	unsigned int i;
 
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 46e66eb..84cd3e1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -240,6 +240,8 @@ struct mlx5_dev_config {
 	int txq_inline_min; /* Minimal amount of data bytes to inline. */
 	int txq_inline_max; /* Max packet size for inlining with SEND. */
 	int txq_inline_mpw; /* Max packet size for inlining with eMPW. */
+	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
+	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
 	struct mlx5_hca_attr hca_attr; /* HCA attributes. */
 	struct mlx5_lro_config lro; /* LRO configuration. */
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 03/17] net/mlx5: fix UAR lock sharing for multiport devices
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 01/17] common/mlx5: update common part to support packet pacing Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 02/17] net/mlx5: introduce send scheduling devargs Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 04/17] net/mlx5: introduce shared UAR resource Viacheslav Ovsiienko
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit, stable

The master and representors might be created over the multiport
Infiniband devices and the UAR resource allocated for sibling
ports might belong to the same underlying Infiniband device.
Hardware requires the write access to the UAR must be performed
as atomic 64-bit write, on 32-bit systems this is two sequential
writes, protected by lock. Due to possibility to share the same
UAR between sibling devices the locks must be moved to shared
context.

Fixes: f048f3d479a6 ("net/mlx5: switch to the shared IB device context")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  6 ------
 drivers/net/mlx5/mlx5.c          |  6 ++++++
 drivers/net/mlx5/mlx5.h          | 10 +++++-----
 drivers/net/mlx5/mlx5_rxq.c      |  2 +-
 drivers/net/mlx5/mlx5_txq.c      |  2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 14af468..63e9350 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -630,12 +630,6 @@
 	priv->mtu = RTE_ETHER_MTU;
 	priv->mp_id.port_id = port_id;
 	strlcpy(priv->mp_id.name, MLX5_MP_NAME, RTE_MP_MAX_NAME_LEN);
-#ifndef RTE_ARCH_64
-	/* Initialize UAR access locks for 32bit implementations. */
-	rte_spinlock_init(&priv->uar_lock_cq);
-	for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
-		rte_spinlock_init(&priv->uar_lock[i]);
-#endif
 	/* Some internal functions rely on Netlink sockets, open them now. */
 	priv->nl_socket_rdma = mlx5_nl_init(NETLINK_RDMA);
 	priv->nl_socket_route =	mlx5_nl_init(NETLINK_ROUTE);
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 72e0870..0786945 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -717,6 +717,12 @@ struct mlx5_dev_ctx_shared *
 		err = ENOMEM;
 		goto error;
 	}
+#ifndef RTE_ARCH_64
+	/* Initialize UAR access locks for 32bit implementations. */
+	rte_spinlock_init(&sh->uar_lock_cq);
+	for (i = 0; i < MLX5_UAR_PAGE_NUM_MAX; i++)
+		rte_spinlock_init(&sh->uar_lock[i]);
+#endif
 	/*
 	 * Once the device is added to the list of memory event
 	 * callback, its global MR cache table cannot be expanded
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 84cd3e1..d01d7f3 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -559,6 +559,11 @@ struct mlx5_dev_ctx_shared {
 	void *fdb_domain; /* FDB Direct Rules name space handle. */
 	void *rx_domain; /* RX Direct Rules name space handle. */
 	void *tx_domain; /* TX Direct Rules name space handle. */
+#ifndef RTE_ARCH_64
+	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
+	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
+	/* UAR same-page access control required in 32bit implementations. */
+#endif
 	struct mlx5_hlist *flow_tbls;
 	/* Direct Rules tables for FDB, NIC TX+RX */
 	void *esw_drop_action; /* Pointer to DR E-Switch drop action. */
@@ -673,11 +678,6 @@ struct mlx5_priv {
 	uint8_t mtr_color_reg; /* Meter color match REG_C. */
 	struct mlx5_mtr_profiles flow_meter_profiles; /* MTR profile list. */
 	struct mlx5_flow_meters flow_meters; /* MTR list. */
-#ifndef RTE_ARCH_64
-	rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */
-	rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX];
-	/* UAR same-page access control required in 32bit implementations. */
-#endif
 	uint8_t skip_default_rss_reta; /* Skip configuration of default reta. */
 	uint8_t fdb_def_rule; /* Whether fdb jump to table 1 is configured. */
 	struct mlx5_mp_id mp_id; /* ID of a multi-process process */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b436f06..2681322 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1997,7 +1997,7 @@ struct mlx5_rxq_ctrl *
 	tmpl->rxq.elts =
 		(struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1);
 #ifndef RTE_ARCH_64
-	tmpl->rxq.uar_lock_cq = &priv->uar_lock_cq;
+	tmpl->rxq.uar_lock_cq = &priv->sh->uar_lock_cq;
 #endif
 	tmpl->rxq.idx = idx;
 	rte_atomic32_inc(&tmpl->refcnt);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 35b3ade..e1fa24e 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -355,7 +355,7 @@
 	/* Assign an UAR lock according to UAR page number */
 	lock_idx = (txq_ctrl->uar_mmap_offset / page_size) &
 		   MLX5_UAR_PAGE_NUM_MASK;
-	txq_ctrl->txq.uar_lock = &priv->uar_lock[lock_idx];
+	txq_ctrl->txq.uar_lock = &priv->sh->uar_lock[lock_idx];
 #endif
 }
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 04/17] net/mlx5: introduce shared UAR resource
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (2 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 03/17] net/mlx5: fix UAR lock sharing for multiport devices Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 05/17] net/mlx5: create clock queue for packet pacing Viacheslav Ovsiienko
                     ` (13 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This is preparation step before moving the Tx queue creation
to the DevX approach. Some features require the shared UAR
for Tx queues and scheduling completion queues, the patch
manages the shared UAR.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 14 ++++++++++++++
 drivers/net/mlx5/mlx5.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 0786945..71e59ac 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -709,6 +709,12 @@ struct mlx5_dev_ctx_shared *
 			err = ENOMEM;
 			goto error;
 		}
+		sh->tx_uar = mlx5_glue->devx_alloc_uar(sh->ctx, 0);
+		if (!sh->tx_uar) {
+			DRV_LOG(ERR, "Failed to allocate DevX UAR.");
+			err = ENOMEM;
+			goto error;
+		}
 	}
 	sh->flow_id_pool = mlx5_flow_id_pool_alloc
 					((1 << HAIRPIN_FLOW_ID_BITS) - 1);
@@ -767,6 +773,10 @@ struct mlx5_dev_ctx_shared *
 		mlx5_l3t_destroy(sh->cnt_id_tbl);
 		sh->cnt_id_tbl = NULL;
 	}
+	if (sh->tx_uar) {
+		mlx5_glue->devx_free_uar(sh->tx_uar);
+		sh->tx_uar = NULL;
+	}
 	if (sh->tis)
 		claim_zero(mlx5_devx_cmd_destroy(sh->tis));
 	if (sh->td)
@@ -832,6 +842,10 @@ struct mlx5_dev_ctx_shared *
 		mlx5_l3t_destroy(sh->cnt_id_tbl);
 		sh->cnt_id_tbl = NULL;
 	}
+	if (sh->tx_uar) {
+		mlx5_glue->devx_free_uar(sh->tx_uar);
+		sh->tx_uar = NULL;
+	}
 	if (sh->pd)
 		claim_zero(mlx5_glue->dealloc_pd(sh->pd));
 	if (sh->tis)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index d01d7f3..c760aff 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -586,6 +586,7 @@ struct mlx5_dev_ctx_shared {
 	struct mlx5_devx_obj *tis; /* TIS object. */
 	struct mlx5_devx_obj *td; /* Transport domain. */
 	struct mlx5_flow_id_pool *flow_id_pool; /* Flow ID pool. */
+	struct mlx5dv_devx_uar *tx_uar; /* Tx/packer pacing shared UAR. */
 	struct mlx5_dev_shared_port port[]; /* per device port data array. */
 };
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 05/17] net/mlx5: create clock queue for packet pacing
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (3 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 04/17] net/mlx5: introduce shared UAR resource Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 06/17] net/mlx5: create rearm " Viacheslav Ovsiienko
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch creates the special completion queue providing
reference completions to schedule packet send from
other transmitting queues.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/Makefile        |   1 +
 drivers/net/mlx5/linux/mlx5_os.c |   3 +
 drivers/net/mlx5/meson.build     |   1 +
 drivers/net/mlx5/mlx5.c          |   2 +
 drivers/net/mlx5/mlx5.h          |  47 ++++
 drivers/net/mlx5/mlx5_defs.h     |   7 +
 drivers/net/mlx5/mlx5_trigger.c  |  16 +-
 drivers/net/mlx5/mlx5_txpp.c     | 449 +++++++++++++++++++++++++++++++++++++++
 8 files changed, 521 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/mlx5/mlx5_txpp.c

diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index a458402..9eaac6b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -11,6 +11,7 @@ LIB = librte_pmd_mlx5.a
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txpp.c
 SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
 ifneq ($(filter y,$(CONFIG_RTE_ARCH_X86_64) \
 			$(CONFIG_RTE_ARCH_PPC_64) \
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 63e9350..ea36309 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1869,6 +1869,9 @@
 {
 	int dbmap_env;
 	int err = 0;
+
+	sh->numa_node = spawn->pci_dev->device.numa_node;
+	pthread_mutex_init(&sh->txpp.mutex, NULL);
 	/*
 	 * Configure environment variable "MLX5_BF_SHUT_UP"
 	 * before the device creation. The rdma_core library
diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index e95ce02..c06b153 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -26,6 +26,7 @@ sources = files(
 	'mlx5_stats.c',
 	'mlx5_trigger.c',
 	'mlx5_txq.c',
+	'mlx5_txpp.c',
 	'mlx5_vlan.c',
 	'mlx5_utils.c',
 )
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 71e59ac..10196ac 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -767,6 +767,7 @@ struct mlx5_dev_ctx_shared *
 	pthread_mutex_unlock(&mlx5_dev_ctx_list_mutex);
 	return sh;
 error:
+	pthread_mutex_destroy(&sh->txpp.mutex);
 	pthread_mutex_unlock(&mlx5_dev_ctx_list_mutex);
 	MLX5_ASSERT(sh);
 	if (sh->cnt_id_tbl) {
@@ -856,6 +857,7 @@ struct mlx5_dev_ctx_shared *
 		claim_zero(mlx5_glue->close_device(sh->ctx));
 	if (sh->flow_id_pool)
 		mlx5_flow_id_pool_release(sh->flow_id_pool);
+	pthread_mutex_destroy(&sh->txpp.mutex);
 	rte_free(sh);
 exit:
 	pthread_mutex_unlock(&mlx5_dev_ctx_list_mutex);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c760aff..0b73b2a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -531,6 +531,44 @@ struct mlx5_flow_id_pool {
 	uint32_t max_id; /**< Maximum id can be allocated from the pool. */
 };
 
+/* Tx pacing queue structure - for Clock and Rearm queues. */
+struct mlx5_txpp_wq {
+	/* Completion Queue related data.*/
+	struct mlx5_devx_obj *cq;
+	struct mlx5dv_devx_umem *cq_umem;
+	union {
+		volatile void *cq_buf;
+		volatile struct mlx5_cqe *cqes;
+	};
+	volatile uint32_t *cq_dbrec;
+	uint32_t cq_ci:24;
+	uint32_t arm_sn:2;
+	/* Send Queue related data.*/
+	struct mlx5_devx_obj *sq;
+	struct mlx5dv_devx_umem *sq_umem;
+	union {
+		volatile void *sq_buf;
+		volatile struct mlx5_wqe *wqes;
+	};
+	uint16_t sq_size; /* Number of WQEs in the queue. */
+	uint16_t sq_ci; /* Next WQE to execute. */
+	volatile uint32_t *sq_dbrec;
+};
+
+/* Tx packet pacing structure. */
+struct mlx5_dev_txpp {
+	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
+	uint32_t refcnt; /* Pacing reference counter. */
+	uint32_t freq; /* Timestamp frequency, Hz. */
+	uint32_t tick; /* Completion tick duration in nanoseconds. */
+	uint32_t test; /* Packet pacing test mode. */
+	int32_t skew; /* Scheduling skew. */
+	uint32_t eqn; /* Event Queue number. */
+	struct rte_intr_handle intr_handle; /* Periodic interrupt. */
+	struct mlx5dv_devx_event_channel *echan; /* Event Channel. */
+	struct mlx5_txpp_wq clock_queue; /* Clock Queue. */
+};
+
 /*
  * Shared Infiniband device context for Master/Representors
  * which belong to same IB device with multiple IB ports.
@@ -547,9 +585,12 @@ struct mlx5_dev_ctx_shared {
 	char ibdev_name[DEV_SYSFS_NAME_MAX]; /* SYSFS dev name. */
 	char ibdev_path[DEV_SYSFS_PATH_MAX]; /* SYSFS dev path for secondary */
 	struct mlx5_dev_attr device_attr; /* Device properties. */
+	int numa_node; /* Numa node of backing physical device. */
 	LIST_ENTRY(mlx5_dev_ctx_shared) mem_event_cb;
 	/**< Called by memory event callback. */
 	struct mlx5_mr_share_cache share_cache;
+	/* Packet pacing related structure. */
+	struct mlx5_dev_txpp txpp;
 	/* Shared DV/DR flow data section. */
 	pthread_mutex_t dv_mutex; /* DV context mutex. */
 	uint32_t dv_meta_mask; /* flow META metadata supported mask. */
@@ -622,6 +663,7 @@ struct mlx5_priv {
 	unsigned int representor:1; /* Device is a port representor. */
 	unsigned int master:1; /* Device is a E-Switch master. */
 	unsigned int dr_shared:1; /* DV/DR data is shared. */
+	unsigned int txpp_en:1; /* Tx packet pacing enabled. */
 	unsigned int counter_fallback:1; /* Use counter fallback management. */
 	unsigned int mtr_en:1; /* Whether support meter. */
 	unsigned int mtr_reg_share:1; /* Whether support meter REG_C share. */
@@ -944,4 +986,9 @@ int mlx5_os_read_dev_stat(struct mlx5_priv *priv,
 void mlx5_os_stats_init(struct rte_eth_dev *dev);
 void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
 			   mlx5_dereg_mr_t *dereg_mr_cb);
+/* mlx5_txpp.c */
+
+int mlx5_txpp_start(struct rte_eth_dev *dev);
+void mlx5_txpp_stop(struct rte_eth_dev *dev);
+
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 260f584..07a2b59 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -171,6 +171,13 @@
 #define MLX5_TXDB_NCACHED 1
 #define MLX5_TXDB_HEURISTIC 2
 
+/* Tx accurate scheduling on timestamps parameters. */
+#define MLX5_TXPP_CLKQ_SIZE 1
+/* The minimal size test packet to put into one WQE, padded by HW. */
+#define MLX5_TXPP_TEST_PKT_SIZE (sizeof(struct rte_ether_hdr) +	\
+				 sizeof(struct rte_ipv4_hdr))
+
+
 /* Size of the simple hash table for metadata register table. */
 #define MLX5_FLOW_MREG_HTABLE_SZ 4096
 #define MLX5_FLOW_MREG_HNAME "MARK_COPY_TABLE"
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index ef74609..ca25ad9 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -288,25 +288,29 @@
 			return -rte_errno;
 		}
 	}
+	ret = mlx5_txpp_start(dev);
+	if (ret) {
+		DRV_LOG(ERR, "port %u Tx packet pacing init failed: %s",
+			dev->data->port_id, strerror(rte_errno));
+		goto error;
+	}
 	ret = mlx5_txq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Tx queue allocation failed: %s",
 			dev->data->port_id, strerror(rte_errno));
-		return -rte_errno;
+		goto error;
 	}
 	ret = mlx5_rxq_start(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u Rx queue allocation failed: %s",
 			dev->data->port_id, strerror(rte_errno));
-		mlx5_txq_stop(dev);
-		return -rte_errno;
+		goto error;
 	}
 	ret = mlx5_hairpin_bind(dev);
 	if (ret) {
 		DRV_LOG(ERR, "port %u hairpin binding failed: %s",
 			dev->data->port_id, strerror(rte_errno));
-		mlx5_txq_stop(dev);
-		return -rte_errno;
+		goto error;
 	}
 	/* Set started flag here for the following steps like control flow. */
 	dev->data->dev_started = 1;
@@ -362,6 +366,7 @@
 	mlx5_traffic_disable(dev);
 	mlx5_txq_stop(dev);
 	mlx5_rxq_stop(dev);
+	mlx5_txpp_stop(dev); /* Stop last. */
 	rte_errno = ret; /* Restore rte_errno. */
 	return -rte_errno;
 }
@@ -398,6 +403,7 @@
 	priv->sh->port[priv->dev_port - 1].devx_ih_port_id = RTE_MAX_ETHPORTS;
 	mlx5_txq_stop(dev);
 	mlx5_rxq_stop(dev);
+	mlx5_txpp_stop(dev);
 }
 
 /**
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
new file mode 100644
index 0000000..382bd20
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -0,0 +1,449 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2020 Mellanox Technologies, Ltd
+ */
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_interrupts.h>
+#include <rte_alarm.h>
+#include <rte_malloc.h>
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+
+/* Destroy Event Queue Notification Channel. */
+static void
+mlx5_txpp_destroy_eqn(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->txpp.echan) {
+		mlx5_glue->devx_destroy_event_channel(sh->txpp.echan);
+		sh->txpp.echan = NULL;
+	}
+	sh->txpp.eqn = 0;
+}
+
+/* Create Event Queue Notification Channel. */
+static int
+mlx5_txpp_create_eqn(struct mlx5_dev_ctx_shared *sh)
+{
+	uint32_t lcore;
+
+	MLX5_ASSERT(!sh->txpp.echan);
+	lcore = (uint32_t)rte_lcore_to_cpu_id(-1);
+	if (mlx5_glue->devx_query_eqn(sh->ctx, lcore, &sh->txpp.eqn)) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to query EQ number %d.", rte_errno);
+		sh->txpp.eqn = 0;
+		return -rte_errno;
+	}
+	sh->txpp.echan = mlx5_glue->devx_create_event_channel(sh->ctx,
+			MLX5DV_DEVX_CREATE_EVENT_CHANNEL_FLAGS_OMIT_EV_DATA);
+	if (!sh->txpp.echan) {
+		sh->txpp.eqn = 0;
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create event channel %d.",
+			rte_errno);
+		return -rte_errno;
+	}
+	return 0;
+}
+
+static void
+mlx5_txpp_destroy_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+
+	if (wq->sq)
+		claim_zero(mlx5_devx_cmd_destroy(wq->sq));
+	if (wq->sq_umem)
+		claim_zero(mlx5_glue->devx_umem_dereg(wq->sq_umem));
+	if (wq->sq_buf)
+		rte_free((void *)(uintptr_t)wq->sq_buf);
+	if (wq->cq)
+		claim_zero(mlx5_devx_cmd_destroy(wq->cq));
+	if (wq->cq_umem)
+		claim_zero(mlx5_glue->devx_umem_dereg(wq->cq_umem));
+	if (wq->cq_buf)
+		rte_free((void *)(uintptr_t)wq->cq_buf);
+	memset(wq, 0, sizeof(*wq));
+}
+
+static void
+mlx5_txpp_fill_wqe_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+	struct mlx5_wqe *wqe = (struct mlx5_wqe *)(uintptr_t)wq->wqes;
+	struct mlx5_wqe_cseg *cs = &wqe->cseg;
+	uint32_t wqe_size, opcode, i;
+	uint8_t *dst;
+
+	/* For test purposes fill the WQ with SEND inline packet. */
+	if (sh->txpp.test) {
+		wqe_size = RTE_ALIGN(MLX5_TXPP_TEST_PKT_SIZE +
+				     MLX5_WQE_CSEG_SIZE +
+				     2 * MLX5_WQE_ESEG_SIZE -
+				     MLX5_ESEG_MIN_INLINE_SIZE,
+				     MLX5_WSEG_SIZE);
+		opcode = MLX5_OPCODE_SEND;
+	} else {
+		wqe_size = MLX5_WSEG_SIZE;
+		opcode = MLX5_OPCODE_NOP;
+	}
+	cs->opcode = rte_cpu_to_be_32(opcode | 0); /* Index is ignored. */
+	cs->sq_ds = rte_cpu_to_be_32((wq->sq->id << 8) |
+				     (wqe_size / MLX5_WSEG_SIZE));
+	cs->flags = RTE_BE32(MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET);
+	cs->misc = RTE_BE32(0);
+	wqe_size = RTE_ALIGN(wqe_size, MLX5_WQE_SIZE);
+	if (sh->txpp.test) {
+		struct mlx5_wqe_eseg *es = &wqe->eseg;
+		struct rte_ether_hdr *eth_hdr;
+		struct rte_ipv4_hdr *ip_hdr;
+		struct rte_udp_hdr *udp_hdr;
+
+		/* Build the inline test packet pattern. */
+		MLX5_ASSERT(wqe_size <= MLX5_WQE_SIZE_MAX);
+		MLX5_ASSERT(MLX5_TXPP_TEST_PKT_SIZE >=
+				(sizeof(struct rte_ether_hdr) +
+				 sizeof(struct rte_ipv4_hdr)));
+		es->flags = 0;
+		es->cs_flags = MLX5_ETH_WQE_L3_CSUM | MLX5_ETH_WQE_L4_CSUM;
+		es->swp_offs = 0;
+		es->metadata = 0;
+		es->swp_flags = 0;
+		es->mss = 0;
+		es->inline_hdr_sz = RTE_BE16(MLX5_TXPP_TEST_PKT_SIZE);
+		/* Build test packet L2 header (Ethernet). */
+		dst = (uint8_t *)&es->inline_data;
+		eth_hdr = (struct rte_ether_hdr *)dst;
+		rte_eth_random_addr(&eth_hdr->d_addr.addr_bytes[0]);
+		rte_eth_random_addr(&eth_hdr->s_addr.addr_bytes[0]);
+		eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4);
+		/* Build test packet L3 header (IP v4). */
+		dst += sizeof(struct rte_ether_hdr);
+		ip_hdr = (struct rte_ipv4_hdr *)dst;
+		ip_hdr->version_ihl = RTE_IPV4_VHL_DEF;
+		ip_hdr->type_of_service = 0;
+		ip_hdr->fragment_offset = 0;
+		ip_hdr->time_to_live = 64;
+		ip_hdr->next_proto_id = IPPROTO_UDP;
+		ip_hdr->packet_id = 0;
+		ip_hdr->total_length = RTE_BE16(MLX5_TXPP_TEST_PKT_SIZE -
+						sizeof(struct rte_ether_hdr));
+		/* use RFC5735 / RFC2544 reserved network test addresses */
+		ip_hdr->src_addr = RTE_BE32((198U << 24) | (18 << 16) |
+					    (0 << 8) | 1);
+		ip_hdr->dst_addr = RTE_BE32((198U << 24) | (18 << 16) |
+					    (0 << 8) | 2);
+		if (MLX5_TXPP_TEST_PKT_SIZE <
+					(sizeof(struct rte_ether_hdr) +
+					 sizeof(struct rte_ipv4_hdr) +
+					 sizeof(struct rte_udp_hdr)))
+			goto wcopy;
+		/* Build test packet L4 header (UDP). */
+		dst += sizeof(struct rte_ipv4_hdr);
+		udp_hdr = (struct rte_udp_hdr *)dst;
+		udp_hdr->src_port = RTE_BE16(9); /* RFC863 Discard. */
+		udp_hdr->dst_port = RTE_BE16(9);
+		udp_hdr->dgram_len = RTE_BE16(MLX5_TXPP_TEST_PKT_SIZE -
+					      sizeof(struct rte_ether_hdr) -
+					      sizeof(struct rte_ipv4_hdr));
+		udp_hdr->dgram_cksum = 0;
+		/* Fill the test packet data. */
+		dst += sizeof(struct rte_udp_hdr);
+		for (i = sizeof(struct rte_ether_hdr) +
+			sizeof(struct rte_ipv4_hdr) +
+			sizeof(struct rte_udp_hdr);
+				i < MLX5_TXPP_TEST_PKT_SIZE; i++)
+			*dst++ = (uint8_t)(i & 0xFF);
+	}
+wcopy:
+	/* Duplicate the pattern to the next WQEs. */
+	dst = (uint8_t *)(uintptr_t)wq->sq_buf;
+	for (i = 1; i < MLX5_TXPP_CLKQ_SIZE; i++) {
+		dst += wqe_size;
+		rte_memcpy(dst, (void *)(uintptr_t)wq->sq_buf, wqe_size);
+	}
+}
+
+/* Creates the Clock Queue for packet pacing, returns zero on success. */
+static int
+mlx5_txpp_create_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_devx_create_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
+	struct mlx5_devx_cq_attr cq_attr = { 0 };
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+	size_t page_size = sysconf(_SC_PAGESIZE);
+	uint32_t umem_size, umem_dbrec;
+	int ret;
+
+	/* Allocate memory buffer for CQEs and doorbell record. */
+	umem_size = sizeof(struct mlx5_cqe) * MLX5_TXPP_CLKQ_SIZE;
+	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+	umem_size += MLX5_DBR_SIZE;
+	wq->cq_buf = rte_zmalloc_socket(__func__, umem_size,
+					page_size, sh->numa_node);
+	if (!wq->cq_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for Clock Queue.");
+		return -ENOMEM;
+	}
+	/* Register allocated buffer in user space with DevX. */
+	wq->cq_umem = mlx5_glue->devx_umem_reg(sh->ctx,
+					       (void *)(uintptr_t)wq->cq_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!wq->cq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to register umem for Clock Queue.");
+		goto error;
+	}
+	/* Create completion queue object for Clock Queue. */
+	cq_attr.cqe_size = (sizeof(struct mlx5_cqe) == 128) ?
+			    MLX5_CQE_SIZE_128B : MLX5_CQE_SIZE_64B;
+	cq_attr.use_first_only = 1;
+	cq_attr.overrun_ignore = 1;
+	cq_attr.uar_page_id = sh->tx_uar->page_id;
+	cq_attr.eqn = sh->txpp.eqn;
+	cq_attr.q_umem_valid = 1;
+	cq_attr.q_umem_offset = 0;
+	cq_attr.q_umem_id = wq->cq_umem->umem_id;
+	cq_attr.db_umem_valid = 1;
+	cq_attr.db_umem_offset = umem_dbrec;
+	cq_attr.db_umem_id = wq->cq_umem->umem_id;
+	cq_attr.log_cq_size = rte_log2_u32(MLX5_TXPP_CLKQ_SIZE);
+	cq_attr.log_page_size = rte_log2_u32(page_size);
+	wq->cq = mlx5_devx_cmd_create_cq(sh->ctx, &cq_attr);
+	if (!wq->cq) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create CQ for Clock Queue.");
+		goto error;
+	}
+	wq->cq_dbrec = RTE_PTR_ADD(wq->cq_buf, umem_dbrec);
+	wq->cq_ci = 0;
+	/* Allocate memory buffer for Send Queue WQEs. */
+	if (sh->txpp.test) {
+		wq->sq_size = RTE_ALIGN(MLX5_TXPP_TEST_PKT_SIZE +
+					MLX5_WQE_CSEG_SIZE +
+					2 * MLX5_WQE_ESEG_SIZE -
+					MLX5_ESEG_MIN_INLINE_SIZE,
+					MLX5_WQE_SIZE) / MLX5_WQE_SIZE;
+		wq->sq_size *= MLX5_TXPP_CLKQ_SIZE;
+	} else {
+		wq->sq_size = MLX5_TXPP_CLKQ_SIZE;
+	}
+	/* There should not be WQE leftovers in the cyclic queue. */
+	MLX5_ASSERT(wq->sq_size == (1 << log2above(wq->sq_size)));
+	umem_size =  MLX5_WQE_SIZE * wq->sq_size;
+	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+	umem_size += MLX5_DBR_SIZE;
+	wq->sq_buf = rte_zmalloc_socket(__func__, umem_size,
+					page_size, sh->numa_node);
+	if (!wq->sq_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for Clock Queue.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Register allocated buffer in user space with DevX. */
+	wq->sq_umem = mlx5_glue->devx_umem_reg(sh->ctx,
+					       (void *)(uintptr_t)wq->sq_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!wq->sq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to register umem for Clock Queue.");
+		goto error;
+	}
+	/* Create send queue object for Clock Queue. */
+	if (sh->txpp.test) {
+		sq_attr.tis_lst_sz = 1;
+		sq_attr.tis_num = sh->tis->id;
+		sq_attr.non_wire = 0;
+		sq_attr.static_sq_wq = 1;
+	} else {
+		sq_attr.non_wire = 1;
+		sq_attr.static_sq_wq = 1;
+	}
+	sq_attr.state = MLX5_SQC_STATE_RST;
+	sq_attr.cqn = wq->cq->id;
+	sq_attr.wq_attr.cd_slave = 1;
+	sq_attr.wq_attr.uar_page = sh->tx_uar->page_id;
+	sq_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
+	sq_attr.wq_attr.pd = sh->pdn;
+	sq_attr.wq_attr.log_wq_stride = rte_log2_u32(MLX5_WQE_SIZE);
+	sq_attr.wq_attr.log_wq_sz = rte_log2_u32(wq->sq_size);
+	sq_attr.wq_attr.dbr_umem_valid = 1;
+	sq_attr.wq_attr.dbr_addr = umem_dbrec;
+	sq_attr.wq_attr.dbr_umem_id = wq->sq_umem->umem_id;
+	sq_attr.wq_attr.wq_umem_valid = 1;
+	sq_attr.wq_attr.wq_umem_id = wq->sq_umem->umem_id;
+	/* umem_offset must be zero for static_sq_wq queue. */
+	sq_attr.wq_attr.wq_umem_offset = 0;
+	wq->sq = mlx5_devx_cmd_create_sq(sh->ctx, &sq_attr);
+	if (!wq->sq) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create SQ for Clock Queue.");
+		goto error;
+	}
+	wq->sq_dbrec = RTE_PTR_ADD(wq->sq_buf, umem_dbrec +
+				   MLX5_SND_DBR * sizeof(uint32_t));
+	/* Build the WQEs in the Send Queue before goto Ready state. */
+	mlx5_txpp_fill_wqe_clock_queue(sh);
+	/* Change queue state to ready. */
+	msq_attr.sq_state = MLX5_SQC_STATE_RST;
+	msq_attr.state = MLX5_SQC_STATE_RDY;
+	wq->sq_ci = 0;
+	ret = mlx5_devx_cmd_modify_sq(wq->sq, &msq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set SQ ready state Clock Queue.");
+		goto error;
+	}
+	return 0;
+error:
+	ret = -rte_errno;
+	mlx5_txpp_destroy_clock_queue(sh);
+	rte_errno = -ret;
+	return ret;
+}
+
+/*
+ * The routine initializes the packet pacing infrastructure:
+ * - allocates PP context
+ * - Clock CQ/SQ
+ * - Rearm CQ/SQ
+ * - attaches rearm interrupt handler
+ *
+ * Returns 0 on success, negative otherwise
+ */
+static int
+mlx5_txpp_create(struct mlx5_dev_ctx_shared *sh, struct mlx5_priv *priv)
+{
+	int tx_pp = priv->config.tx_pp;
+	int ret;
+
+	/* Store the requested pacing parameters. */
+	sh->txpp.tick = tx_pp >= 0 ? tx_pp : -tx_pp;
+	sh->txpp.test = !!(tx_pp < 0);
+	sh->txpp.skew = priv->config.tx_skew;
+	sh->txpp.freq = priv->config.hca_attr.dev_freq_khz;
+	ret = mlx5_txpp_create_eqn(sh);
+	if (ret)
+		goto exit;
+	ret = mlx5_txpp_create_clock_queue(sh);
+	if (ret)
+		goto exit;
+exit:
+	if (ret) {
+		mlx5_txpp_destroy_clock_queue(sh);
+		mlx5_txpp_destroy_eqn(sh);
+		sh->txpp.tick = 0;
+		sh->txpp.test = 0;
+		sh->txpp.skew = 0;
+	}
+	return ret;
+}
+
+/*
+ * The routine destroys the packet pacing infrastructure:
+ * - detaches rearm interrupt handler
+ * - Rearm CQ/SQ
+ * - Clock CQ/SQ
+ * - PP context
+ */
+static void
+mlx5_txpp_destroy(struct mlx5_dev_ctx_shared *sh)
+{
+	mlx5_txpp_destroy_clock_queue(sh);
+	mlx5_txpp_destroy_eqn(sh);
+	sh->txpp.tick = 0;
+	sh->txpp.test = 0;
+	sh->txpp.skew = 0;
+}
+
+/**
+ * Creates and starts packet pacing infrastructure on specified device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_txpp_start(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	int err = 0;
+	int ret;
+
+	if (!priv->config.tx_pp) {
+		/* Packet pacing is not requested for the device. */
+		MLX5_ASSERT(priv->txpp_en == 0);
+		return 0;
+	}
+	if (priv->txpp_en) {
+		/* Packet pacing is already enabled for the device. */
+		MLX5_ASSERT(sh->txpp.refcnt);
+		return 0;
+	}
+	if (priv->config.tx_pp > 0) {
+		ret = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+		if (ret < 0)
+			return 0;
+	}
+	ret = pthread_mutex_lock(&sh->txpp.mutex);
+	MLX5_ASSERT(!ret);
+	RTE_SET_USED(ret);
+	if (sh->txpp.refcnt) {
+		priv->txpp_en = 1;
+		++sh->txpp.refcnt;
+	} else {
+		err = mlx5_txpp_create(sh, priv);
+		if (!err) {
+			MLX5_ASSERT(sh->txpp.tick);
+			priv->txpp_en = 1;
+			sh->txpp.refcnt = 1;
+		} else {
+			rte_errno = -err;
+		}
+	}
+	ret = pthread_mutex_unlock(&sh->txpp.mutex);
+	MLX5_ASSERT(!ret);
+	RTE_SET_USED(ret);
+	return err;
+}
+
+/**
+ * Stops and destroys packet pacing infrastructure on specified device.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+void
+mlx5_txpp_stop(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	int ret;
+
+	if (!priv->txpp_en) {
+		/* Packet pacing is already disabled for the device. */
+		return;
+	}
+	priv->txpp_en = 0;
+	ret = pthread_mutex_lock(&sh->txpp.mutex);
+	MLX5_ASSERT(!ret);
+	RTE_SET_USED(ret);
+	MLX5_ASSERT(sh->txpp.refcnt);
+	if (!sh->txpp.refcnt || --sh->txpp.refcnt)
+		return;
+	/* No references any more, do actual destroy. */
+	mlx5_txpp_destroy(sh);
+	ret = pthread_mutex_unlock(&sh->txpp.mutex);
+	MLX5_ASSERT(!ret);
+	RTE_SET_USED(ret);
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 06/17] net/mlx5: create rearm queue for packet pacing
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (4 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 05/17] net/mlx5: create clock queue for packet pacing Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with DevX Viacheslav Ovsiienko
                     ` (11 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The dedicated Rearm Queue is needed to fire the work requests to
the Clock Queue in realtime. The Clock Queue should never stop,
otherwise the clock synchronization mignt be broken and packet
send scheduling would fail. The Rearm Queue uses cross channel
SEND_EN/WAIT operations to provides the requests to the
CLock Queue in robust way.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |   1 +
 drivers/net/mlx5/mlx5_defs.h |   5 +-
 drivers/net/mlx5/mlx5_txpp.c | 203 ++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 205 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0b73b2a..61a93f9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -567,6 +567,7 @@ struct mlx5_dev_txpp {
 	struct rte_intr_handle intr_handle; /* Periodic interrupt. */
 	struct mlx5dv_devx_event_channel *echan; /* Event Channel. */
 	struct mlx5_txpp_wq clock_queue; /* Clock Queue. */
+	struct mlx5_txpp_wq rearm_queue; /* Clock Queue. */
 };
 
 /*
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 07a2b59..a8626a4 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -173,11 +173,14 @@
 
 /* Tx accurate scheduling on timestamps parameters. */
 #define MLX5_TXPP_CLKQ_SIZE 1
+#define MLX5_TXPP_REARM	((1UL << MLX5_WQ_INDEX_WIDTH) / 4)
+#define MLX5_TXPP_REARM_SQ_SIZE (((1UL << MLX5_CQ_INDEX_WIDTH) / \
+				  MLX5_TXPP_REARM) * 2)
+#define MLX5_TXPP_REARM_CQ_SIZE (MLX5_TXPP_REARM_SQ_SIZE / 2)
 /* The minimal size test packet to put into one WQE, padded by HW. */
 #define MLX5_TXPP_TEST_PKT_SIZE (sizeof(struct rte_ether_hdr) +	\
 				 sizeof(struct rte_ipv4_hdr))
 
-
 /* Size of the simple hash table for metadata register table. */
 #define MLX5_FLOW_MREG_HTABLE_SZ 4096
 #define MLX5_FLOW_MREG_HNAME "MARK_COPY_TABLE"
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 382bd20..f600fc5 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -9,6 +9,7 @@
 
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
+#include "mlx5_common_os.h"
 
 /* Destroy Event Queue Notification Channel. */
 static void
@@ -48,10 +49,8 @@
 }
 
 static void
-mlx5_txpp_destroy_clock_queue(struct mlx5_dev_ctx_shared *sh)
+mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
-	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
-
 	if (wq->sq)
 		claim_zero(mlx5_devx_cmd_destroy(wq->sq));
 	if (wq->sq_umem)
@@ -68,6 +67,199 @@
 }
 
 static void
+mlx5_txpp_destroy_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+
+	mlx5_txpp_destroy_send_queue(wq);
+}
+
+static void
+mlx5_txpp_destroy_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+
+	mlx5_txpp_destroy_send_queue(wq);
+}
+
+static void
+mlx5_txpp_fill_cqe_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	struct mlx5_cqe *cqe = (struct mlx5_cqe *)(uintptr_t)wq->cqes;
+	uint32_t i;
+
+	for (i = 0; i < MLX5_TXPP_REARM_CQ_SIZE; i++) {
+		cqe->op_own = (MLX5_CQE_INVALID << 4) | MLX5_CQE_OWNER_MASK;
+		++cqe;
+	}
+}
+
+static void
+mlx5_txpp_fill_wqe_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	struct mlx5_wqe *wqe = (struct mlx5_wqe *)(uintptr_t)wq->wqes;
+	uint32_t i;
+
+	for (i = 0; i < wq->sq_size; i += 2) {
+		struct mlx5_wqe_cseg *cs;
+		struct mlx5_wqe_qseg *qs;
+		uint32_t index;
+
+		/* Build SEND_EN request with slave WQE index. */
+		cs = &wqe[i + 0].cseg;
+		cs->opcode = RTE_BE32(MLX5_OPCODE_SEND_EN | 0);
+		cs->sq_ds = rte_cpu_to_be_32((wq->sq->id << 8) | 2);
+		cs->flags = RTE_BE32(MLX5_COMP_ALWAYS <<
+				     MLX5_COMP_MODE_OFFSET);
+		cs->misc = RTE_BE32(0);
+		qs = RTE_PTR_ADD(cs, sizeof(struct mlx5_wqe_cseg));
+		index = (i * MLX5_TXPP_REARM / 2 + MLX5_TXPP_REARM) &
+			((1 << MLX5_WQ_INDEX_WIDTH) - 1);
+		qs->max_index = rte_cpu_to_be_32(index);
+		qs->qpn_cqn = rte_cpu_to_be_32(sh->txpp.clock_queue.sq->id);
+		/* Build WAIT request with slave CQE index. */
+		cs = &wqe[i + 1].cseg;
+		cs->opcode = RTE_BE32(MLX5_OPCODE_WAIT | 0);
+		cs->sq_ds = rte_cpu_to_be_32((wq->sq->id << 8) | 2);
+		cs->flags = RTE_BE32(MLX5_COMP_ONLY_ERR <<
+				     MLX5_COMP_MODE_OFFSET);
+		cs->misc = RTE_BE32(0);
+		qs = RTE_PTR_ADD(cs, sizeof(struct mlx5_wqe_cseg));
+		index = (i * MLX5_TXPP_REARM / 2 + MLX5_TXPP_REARM / 2) &
+			((1 << MLX5_CQ_INDEX_WIDTH) - 1);
+		qs->max_index = rte_cpu_to_be_32(index);
+		qs->qpn_cqn = rte_cpu_to_be_32(sh->txpp.clock_queue.cq->id);
+	}
+}
+
+/* Creates the Rearm Queue to fire the requests to Clock Queue in realtime. */
+static int
+mlx5_txpp_create_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_devx_create_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
+	struct mlx5_devx_cq_attr cq_attr = { 0 };
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	size_t page_size = sysconf(_SC_PAGESIZE);
+	uint32_t umem_size, umem_dbrec;
+	int ret;
+
+	/* Allocate memory buffer for CQEs and doorbell record. */
+	umem_size = sizeof(struct mlx5_cqe) * MLX5_TXPP_REARM_CQ_SIZE;
+	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+	umem_size += MLX5_DBR_SIZE;
+	wq->cq_buf = rte_zmalloc_socket(__func__, umem_size,
+					page_size, sh->numa_node);
+	if (!wq->cq_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for Rearm Queue.");
+		return -ENOMEM;
+	}
+	/* Register allocated buffer in user space with DevX. */
+	wq->cq_umem = mlx5_glue->devx_umem_reg(sh->ctx,
+					       (void *)(uintptr_t)wq->cq_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!wq->cq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to register umem for Rearm Queue.");
+		goto error;
+	}
+	/* Create completion queue object for Rearm Queue. */
+	cq_attr.cqe_size = (sizeof(struct mlx5_cqe) == 128) ?
+			    MLX5_CQE_SIZE_128B : MLX5_CQE_SIZE_64B;
+	cq_attr.uar_page_id = sh->tx_uar->page_id;
+	cq_attr.eqn = sh->txpp.eqn;
+	cq_attr.q_umem_valid = 1;
+	cq_attr.q_umem_offset = 0;
+	cq_attr.q_umem_id = mlx5_os_get_umem_id(wq->cq_umem);
+	cq_attr.db_umem_valid = 1;
+	cq_attr.db_umem_offset = umem_dbrec;
+	cq_attr.db_umem_id = mlx5_os_get_umem_id(wq->cq_umem);
+	cq_attr.log_cq_size = rte_log2_u32(MLX5_TXPP_REARM_CQ_SIZE);
+	cq_attr.log_page_size = rte_log2_u32(page_size);
+	wq->cq = mlx5_devx_cmd_create_cq(sh->ctx, &cq_attr);
+	if (!wq->cq) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create CQ for Rearm Queue.");
+		goto error;
+	}
+	wq->cq_dbrec = RTE_PTR_ADD(wq->cq_buf, umem_dbrec);
+	wq->cq_ci = 0;
+	wq->arm_sn = 0;
+	/* Mark all CQEs initially as invalid. */
+	mlx5_txpp_fill_cqe_rearm_queue(sh);
+	/*
+	 * Allocate memory buffer for Send Queue WQEs.
+	 * There should be no WQE leftovers in the cyclic queue.
+	 */
+	wq->sq_size = MLX5_TXPP_REARM_SQ_SIZE;
+	MLX5_ASSERT(wq->sq_size == (1 << log2above(wq->sq_size)));
+	umem_size =  MLX5_WQE_SIZE * wq->sq_size;
+	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
+	umem_size += MLX5_DBR_SIZE;
+	wq->sq_buf = rte_zmalloc_socket(__func__, umem_size,
+					page_size, sh->numa_node);
+	if (!wq->sq_buf) {
+		DRV_LOG(ERR, "Failed to allocate memory for Rearm Queue.");
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	/* Register allocated buffer in user space with DevX. */
+	wq->sq_umem = mlx5_glue->devx_umem_reg(sh->ctx,
+					       (void *)(uintptr_t)wq->sq_buf,
+					       umem_size,
+					       IBV_ACCESS_LOCAL_WRITE);
+	if (!wq->sq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to register umem for Rearm Queue.");
+		goto error;
+	}
+	/* Create send queue object for Rearm Queue. */
+	sq_attr.state = MLX5_SQC_STATE_RST;
+	sq_attr.tis_lst_sz = 1;
+	sq_attr.tis_num = sh->tis->id;
+	sq_attr.cqn = wq->cq->id;
+	sq_attr.cd_master = 1;
+	sq_attr.wq_attr.uar_page = sh->tx_uar->page_id;
+	sq_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
+	sq_attr.wq_attr.pd = sh->pdn;
+	sq_attr.wq_attr.log_wq_stride = rte_log2_u32(MLX5_WQE_SIZE);
+	sq_attr.wq_attr.log_wq_sz = rte_log2_u32(wq->sq_size);
+	sq_attr.wq_attr.dbr_umem_valid = 1;
+	sq_attr.wq_attr.dbr_addr = umem_dbrec;
+	sq_attr.wq_attr.dbr_umem_id = mlx5_os_get_umem_id(wq->sq_umem);
+	sq_attr.wq_attr.wq_umem_valid = 1;
+	sq_attr.wq_attr.wq_umem_id = mlx5_os_get_umem_id(wq->sq_umem);
+	sq_attr.wq_attr.wq_umem_offset = 0;
+	wq->sq = mlx5_devx_cmd_create_sq(sh->ctx, &sq_attr);
+	if (!wq->sq) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "Failed to create SQ for Rearm Queue.");
+		goto error;
+	}
+	wq->sq_dbrec = RTE_PTR_ADD(wq->sq_buf, umem_dbrec +
+				   MLX5_SND_DBR * sizeof(uint32_t));
+	/* Build the WQEs in the Send Queue before goto Ready state. */
+	mlx5_txpp_fill_wqe_rearm_queue(sh);
+	/* Change queue state to ready. */
+	msq_attr.sq_state = MLX5_SQC_STATE_RST;
+	msq_attr.state = MLX5_SQC_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_sq(wq->sq, &msq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to set SQ ready state Rearm Queue.");
+		goto error;
+	}
+	return 0;
+error:
+	ret = -rte_errno;
+	mlx5_txpp_destroy_rearm_queue(sh);
+	rte_errno = -ret;
+	return ret;
+}
+
+static void
 mlx5_txpp_fill_wqe_clock_queue(struct mlx5_dev_ctx_shared *sh)
 {
 	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
@@ -331,8 +523,12 @@
 	ret = mlx5_txpp_create_clock_queue(sh);
 	if (ret)
 		goto exit;
+	ret = mlx5_txpp_create_rearm_queue(sh);
+	if (ret)
+		goto exit;
 exit:
 	if (ret) {
+		mlx5_txpp_destroy_rearm_queue(sh);
 		mlx5_txpp_destroy_clock_queue(sh);
 		mlx5_txpp_destroy_eqn(sh);
 		sh->txpp.tick = 0;
@@ -352,6 +548,7 @@
 static void
 mlx5_txpp_destroy(struct mlx5_dev_ctx_shared *sh)
 {
+	mlx5_txpp_destroy_rearm_queue(sh);
 	mlx5_txpp_destroy_clock_queue(sh);
 	mlx5_txpp_destroy_eqn(sh);
 	sh->txpp.tick = 0;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with DevX
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (5 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 06/17] net/mlx5: create rearm " Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-20 14:18     ` Ferruh Yigit
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 08/17] net/mlx5: allocate packet pacing context Viacheslav Ovsiienko
                     ` (10 subsequent siblings)
  17 siblings, 1 reply; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

To provide the packet send schedule on mbuf timestamp the Tx
queue must be attached to the same UAR as Clock Queue is.
UAR is special hardware related resource mapped to the host
memory and provides doorbell registers, the assigning UAR
to the queue being created is provided via DevX API only.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.c    | 108 +++++++++-----
 drivers/net/mlx5/mlx5_rxtx.h    |  14 ++
 drivers/net/mlx5/mlx5_trigger.c |   4 +-
 drivers/net/mlx5/mlx5_txq.c     | 306 +++++++++++++++++++++++++++++++++++++++-
 4 files changed, 392 insertions(+), 40 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 894f441..227289e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -944,43 +944,79 @@ enum mlx5_txcmp_code {
 		struct mlx5_txq_data *txq = (*priv->txqs)[sm->queue_id];
 		struct mlx5_txq_ctrl *txq_ctrl =
 			container_of(txq, struct mlx5_txq_ctrl, txq);
-		struct ibv_qp_attr mod = {
-			.qp_state = IBV_QPS_RESET,
-			.port_num = (uint8_t)priv->dev_port,
-		};
-		struct ibv_qp *qp = txq_ctrl->obj->qp;
 
-		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-		if (ret) {
-			DRV_LOG(ERR, "Cannot change the Tx QP state to RESET "
-				"%s", strerror(errno));
-			rte_errno = errno;
-			return ret;
-		}
-		mod.qp_state = IBV_QPS_INIT;
-		ret = mlx5_glue->modify_qp(qp, &mod,
-					   (IBV_QP_STATE | IBV_QP_PORT));
-		if (ret) {
-			DRV_LOG(ERR, "Cannot change Tx QP state to INIT %s",
-				strerror(errno));
-			rte_errno = errno;
-			return ret;
-		}
-		mod.qp_state = IBV_QPS_RTR;
-		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-		if (ret) {
-			DRV_LOG(ERR, "Cannot change Tx QP state to RTR %s",
-				strerror(errno));
-			rte_errno = errno;
-			return ret;
-		}
-		mod.qp_state = IBV_QPS_RTS;
-		ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
-		if (ret) {
-			DRV_LOG(ERR, "Cannot change Tx QP state to RTS %s",
-				strerror(errno));
-			rte_errno = errno;
-			return ret;
+		if (txq_ctrl->obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_SQ) {
+			struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
+
+			/* Change queue state to reset. */
+			msq_attr.sq_state = MLX5_SQC_STATE_ERR;
+			msq_attr.state = MLX5_SQC_STATE_RST;
+			ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq_devx,
+						      &msq_attr);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to RESET %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+			/* Change queue state to ready. */
+			msq_attr.sq_state = MLX5_SQC_STATE_RST;
+			msq_attr.state = MLX5_SQC_STATE_RDY;
+			ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq_devx,
+						      &msq_attr);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to READY %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+		} else {
+			struct ibv_qp_attr mod = {
+				.qp_state = IBV_QPS_RESET,
+				.port_num = (uint8_t)priv->dev_port,
+			};
+			struct ibv_qp *qp = txq_ctrl->obj->qp;
+
+			MLX5_ASSERT
+				(txq_ctrl->obj->type == MLX5_TXQ_OBJ_TYPE_IBV);
+
+			ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to RESET %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+			mod.qp_state = IBV_QPS_INIT;
+			ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to INIT %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+			mod.qp_state = IBV_QPS_RTR;
+			ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to RTR %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
+			mod.qp_state = IBV_QPS_RTS;
+			ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+			if (ret) {
+				DRV_LOG(ERR, "Cannot change the "
+					"Tx QP state to RTS %s",
+					strerror(errno));
+				rte_errno = errno;
+				return ret;
+			}
 		}
 	}
 	return 0;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 26621ff..1b797da 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -323,6 +323,7 @@ struct mlx5_txq_data {
 
 enum mlx5_txq_obj_type {
 	MLX5_TXQ_OBJ_TYPE_IBV,		/* mlx5_txq_obj with ibv_wq. */
+	MLX5_TXQ_OBJ_TYPE_DEVX_SQ,	/* mlx5_txq_obj with mlx5_devx_sq. */
 	MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN,
 	/* mlx5_txq_obj with mlx5_devx_tq and hairpin support. */
 };
@@ -349,6 +350,19 @@ struct mlx5_txq_obj {
 			/* DevX object for Sx queue. */
 			struct mlx5_devx_obj *tis; /* The TIS object. */
 		};
+		struct {
+			struct rte_eth_dev *dev;
+			struct mlx5_devx_obj *cq_devx;
+			struct mlx5dv_devx_umem *cq_umem;
+			void *cq_buf;
+			int64_t cq_dbrec_offset;
+			struct mlx5_devx_dbr_page *cq_dbrec_page;
+			struct mlx5_devx_obj *sq_devx;
+			struct mlx5dv_devx_umem *sq_umem;
+			void *sq_buf;
+			int64_t sq_dbrec_offset;
+			struct mlx5_devx_dbr_page *sq_dbrec_page;
+		};
 	};
 };
 
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index ca25ad9..e324319 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -59,7 +59,9 @@
 		} else {
 			txq_alloc_elts(txq_ctrl);
 			txq_ctrl->obj = mlx5_txq_obj_new
-				(dev, i, MLX5_TXQ_OBJ_TYPE_IBV);
+				(dev, i, priv->txpp_en ?
+				MLX5_TXQ_OBJ_TYPE_DEVX_SQ :
+				MLX5_TXQ_OBJ_TYPE_IBV);
 		}
 		if (!txq_ctrl->obj) {
 			rte_errno = ENOMEM;
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index e1fa24e..f9ed504 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -31,6 +31,7 @@
 #include <mlx5_devx_cmds.h>
 #include <mlx5_common.h>
 #include <mlx5_common_mr.h>
+#include <mlx5_common_os.h>
 
 #include "mlx5_defs.h"
 #include "mlx5_utils.h"
@@ -573,6 +574,297 @@
 }
 
 /**
+ * Destroy the Tx queue DevX object.
+ *
+ * @param txq_obj
+ *   Txq object to destroy
+ */
+static void
+txq_release_sq_resources(struct mlx5_txq_obj *txq_obj)
+{
+	MLX5_ASSERT(txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_SQ);
+
+	if (txq_obj->sq_devx)
+		claim_zero(mlx5_devx_cmd_destroy(txq_obj->sq_devx));
+	if (txq_obj->sq_dbrec_page)
+		claim_zero(mlx5_release_dbr
+				(&txq_obj->txq_ctrl->priv->dbrpgs,
+				mlx5_os_get_umem_id
+					(txq_obj->sq_dbrec_page->umem),
+				txq_obj->sq_dbrec_offset));
+	if (txq_obj->sq_umem)
+		claim_zero(mlx5_glue->devx_umem_dereg(txq_obj->sq_umem));
+	if (txq_obj->sq_buf)
+		rte_free(txq_obj->sq_buf);
+	if (txq_obj->cq_devx)
+		claim_zero(mlx5_devx_cmd_destroy(txq_obj->cq_devx));
+	if (txq_obj->cq_dbrec_page)
+		claim_zero(mlx5_release_dbr
+				(&txq_obj->txq_ctrl->priv->dbrpgs,
+				mlx5_os_get_umem_id
+					(txq_obj->cq_dbrec_page->umem),
+				txq_obj->cq_dbrec_offset));
+	if (txq_obj->cq_umem)
+		claim_zero(mlx5_glue->devx_umem_dereg(txq_obj->cq_umem));
+	if (txq_obj->cq_buf)
+		rte_free(txq_obj->cq_buf);
+}
+
+/**
+ * Create the Tx queue DevX object.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param idx
+ *   Queue index in DPDK Tx queue array
+ *
+ * @return
+ *   The DevX object initialised, NULL otherwise and rte_errno is set.
+ */
+static struct mlx5_txq_obj *
+mlx5_txq_obj_devx_new(struct rte_eth_dev *dev, uint16_t idx)
+{
+#ifndef HAVE_MLX5DV_DEVX_UAR_OFFSET
+	DRV_LOG(ERR, "port %u Tx queue %u cannot create with DevX, no UAR",
+		     dev->data->port_id, idx);
+	rte_errno = ENOMEM;
+	return NULL;
+#else
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_data *txq_data = (*priv->txqs)[idx];
+	struct mlx5_txq_ctrl *txq_ctrl =
+		container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	struct mlx5_devx_create_sq_attr sq_attr = { 0 };
+	struct mlx5_devx_modify_sq_attr msq_attr = { 0 };
+	struct mlx5_devx_cq_attr cq_attr = { 0 };
+	struct mlx5_txq_obj *txq_obj = NULL;
+	size_t page_size = sysconf(_SC_PAGESIZE);
+	struct mlx5_cqe *cqe;
+	uint32_t i, nqe;
+	int ret = 0;
+
+	MLX5_ASSERT(txq_data);
+	MLX5_ASSERT(!txq_ctrl->obj);
+	txq_obj = rte_calloc_socket(__func__, 1,
+				    sizeof(struct mlx5_txq_obj), 0,
+				    txq_ctrl->socket);
+	if (!txq_obj) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory resources",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+	txq_obj->type = MLX5_TXQ_OBJ_TYPE_DEVX_SQ;
+	txq_obj->txq_ctrl = txq_ctrl;
+	txq_obj->dev = dev;
+	/* Create the Completion Queue. */
+	nqe = (1UL << txq_data->elts_n) / MLX5_TX_COMP_THRESH +
+	       1 + MLX5_TX_COMP_THRESH_INLINE_DIV;
+	nqe = 1UL << log2above(nqe);
+	if (nqe > UINT16_MAX) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u requests to many CQEs %u",
+			dev->data->port_id, txq_data->idx, nqe);
+		rte_errno = EINVAL;
+		goto error;
+	}
+	/* Allocate memory buffer for CQEs. */
+	txq_obj->cq_buf = rte_zmalloc_socket(__func__,
+					     nqe * sizeof(struct mlx5_cqe),
+					     MLX5_CQE_BUF_ALIGNMENT,
+					     sh->numa_node);
+	if (!txq_obj->cq_buf) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory (CQ)",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	txq_data->cqe_n = log2above(nqe);
+	txq_data->cqe_s = 1 << txq_data->cqe_n;
+	txq_data->cqe_m = txq_data->cqe_s - 1;
+	txq_data->cqes = (volatile struct mlx5_cqe *)txq_obj->cq_buf;
+	txq_data->cq_ci = 0;
+	txq_data->cq_pi = 0;
+	/* Register allocated buffer in user space with DevX. */
+	txq_obj->cq_umem = mlx5_glue->devx_umem_reg
+					(sh->ctx,
+					 (void *)txq_obj->cq_buf,
+					 nqe * sizeof(struct mlx5_cqe),
+					 IBV_ACCESS_LOCAL_WRITE);
+	if (!txq_obj->cq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot register memory (CQ)",
+			dev->data->port_id, txq_data->idx);
+		goto error;
+	}
+	/* Allocate doorbell record for completion queue. */
+	txq_obj->cq_dbrec_offset = mlx5_get_dbr(sh->ctx,
+						&priv->dbrpgs,
+						&txq_obj->cq_dbrec_page);
+	if (txq_obj->cq_dbrec_offset < 0)
+		goto error;
+	txq_data->cq_db = (volatile uint32_t *)(txq_obj->cq_dbrec_page->dbrs +
+						txq_obj->cq_dbrec_offset);
+	*txq_data->cq_db = 0;
+	/* Create completion queue object with DevX. */
+	cq_attr.cqe_size = (sizeof(struct mlx5_cqe) == 128) ?
+			    MLX5_CQE_SIZE_128B : MLX5_CQE_SIZE_64B;
+	cq_attr.uar_page_id = sh->tx_uar->page_id;
+	cq_attr.eqn = sh->txpp.eqn;
+	cq_attr.q_umem_valid = 1;
+	cq_attr.q_umem_offset = (uintptr_t)txq_obj->cq_buf % page_size;
+	cq_attr.q_umem_id = txq_obj->cq_umem->umem_id;
+	cq_attr.db_umem_valid = 1;
+	cq_attr.db_umem_offset = txq_obj->cq_dbrec_offset;
+	cq_attr.db_umem_id = mlx5_os_get_umem_id(txq_obj->cq_dbrec_page->umem);
+	cq_attr.log_cq_size = rte_log2_u32(nqe);
+	cq_attr.log_page_size = rte_log2_u32(page_size);
+	txq_obj->cq_devx = mlx5_devx_cmd_create_cq(sh->ctx, &cq_attr);
+	if (!txq_obj->cq_devx) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "port %u Tx queue %u CQ creation failure",
+			dev->data->port_id, idx);
+		goto error;
+	}
+	/* Initial fill CQ buffer with invalid CQE opcode. */
+	cqe = (struct mlx5_cqe *)txq_obj->cq_buf;
+	for (i = 0; i < txq_data->cqe_s; i++) {
+		cqe->op_own = (MLX5_CQE_INVALID << 4) | MLX5_CQE_OWNER_MASK;
+		++cqe;
+	}
+	/* Create the Work Queue. */
+	nqe = RTE_MIN(1UL << txq_data->elts_n,
+		      (uint32_t)sh->device_attr.max_qp_wr);
+	txq_obj->sq_buf = rte_zmalloc_socket(__func__,
+					     nqe * sizeof(struct mlx5_wqe),
+					     page_size,
+					     sh->numa_node);
+	if (!txq_obj->sq_buf) {
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot allocate memory (SQ)",
+			dev->data->port_id, txq_data->idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+	txq_data->wqe_n = log2above(nqe);
+	txq_data->wqe_s = 1 << txq_data->wqe_n;
+	txq_data->wqe_m = txq_data->wqe_s - 1;
+	txq_data->wqes = (struct mlx5_wqe *)txq_obj->sq_buf;
+	txq_data->wqes_end = txq_data->wqes + txq_data->wqe_s;
+	txq_data->wqe_ci = 0;
+	txq_data->wqe_pi = 0;
+	txq_data->wqe_comp = 0;
+	txq_data->wqe_thres = txq_data->wqe_s / MLX5_TX_COMP_THRESH_INLINE_DIV;
+	/* Register allocated buffer in user space with DevX. */
+	txq_obj->sq_umem = mlx5_glue->devx_umem_reg
+					(sh->ctx,
+					 (void *)txq_obj->sq_buf,
+					 nqe * sizeof(struct mlx5_wqe),
+					 IBV_ACCESS_LOCAL_WRITE);
+	if (!txq_obj->sq_umem) {
+		rte_errno = errno;
+		DRV_LOG(ERR,
+			"port %u Tx queue %u cannot register memory (SQ)",
+			dev->data->port_id, txq_data->idx);
+		goto error;
+	}
+	/* Allocate doorbell record for completion queue. */
+	txq_obj->cq_dbrec_offset = mlx5_get_dbr(sh->ctx,
+						&priv->dbrpgs,
+						&txq_obj->sq_dbrec_page);
+	if (txq_obj->sq_dbrec_offset < 0)
+		goto error;
+	txq_data->qp_db = (volatile uint32_t *)
+					(txq_obj->sq_dbrec_page->dbrs +
+					 txq_obj->sq_dbrec_offset +
+					 MLX5_SND_DBR * sizeof(uint32_t));
+	*txq_data->qp_db = 0;
+	/* Create Send Queue object with DevX. */
+	sq_attr.tis_lst_sz = 1;
+	sq_attr.tis_num = sh->tis->id;
+	sq_attr.state = MLX5_SQC_STATE_RST;
+	sq_attr.cqn = txq_obj->cq_devx->id;
+	sq_attr.flush_in_error_en = 1;
+	sq_attr.allow_multi_pkt_send_wqe = !!priv->config.mps;
+	sq_attr.allow_swp = !!priv->config.swp;
+	sq_attr.min_wqe_inline_mode = priv->config.hca_attr.vport_inline_mode;
+	sq_attr.wq_attr.uar_page = sh->tx_uar->page_id;
+	sq_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
+	sq_attr.wq_attr.pd = sh->pdn;
+	sq_attr.wq_attr.log_wq_stride = rte_log2_u32(MLX5_WQE_SIZE);
+	sq_attr.wq_attr.log_wq_sz = txq_data->wqe_n;
+	sq_attr.wq_attr.dbr_umem_valid = 1;
+	sq_attr.wq_attr.dbr_addr = txq_obj->cq_dbrec_offset;
+	sq_attr.wq_attr.dbr_umem_id =
+			mlx5_os_get_umem_id(txq_obj->cq_dbrec_page->umem);
+	sq_attr.wq_attr.wq_umem_valid = 1;
+	sq_attr.wq_attr.wq_umem_id = txq_obj->sq_umem->umem_id;
+	sq_attr.wq_attr.wq_umem_offset = (uintptr_t)txq_obj->sq_buf % page_size;
+	txq_obj->sq_devx = mlx5_devx_cmd_create_sq(sh->ctx, &sq_attr);
+	if (!txq_obj->sq_devx) {
+		rte_errno = errno;
+		DRV_LOG(ERR, "port %u Tx queue %u SQ creation failure",
+			dev->data->port_id, idx);
+		goto error;
+	}
+	txq_data->qp_num_8s = txq_obj->sq_devx->id << 8;
+	/* Change Send Queue state to Ready-to-Send. */
+	msq_attr.sq_state = MLX5_SQC_STATE_RST;
+	msq_attr.state = MLX5_SQC_STATE_RDY;
+	ret = mlx5_devx_cmd_modify_sq(txq_obj->sq_devx, &msq_attr);
+	if (ret) {
+		rte_errno = errno;
+		DRV_LOG(ERR,
+			"port %u Tx queue %u SP state to SQC_STATE_RDY failed",
+			dev->data->port_id, idx);
+		goto error;
+	}
+	txq_data->fcqs = rte_calloc_socket(__func__,
+					   txq_data->cqe_s,
+					   sizeof(*txq_data->fcqs),
+					   RTE_CACHE_LINE_SIZE,
+					   txq_ctrl->socket);
+	if (!txq_data->fcqs) {
+		DRV_LOG(ERR, "port %u Tx queue %u cannot allocate memory (FCQ)",
+			dev->data->port_id, idx);
+		rte_errno = ENOMEM;
+		goto error;
+	}
+#ifdef HAVE_IBV_FLOW_DV_SUPPORT
+	/*
+	 * If using DevX need to query and store TIS transport domain value.
+	 * This is done once per port.
+	 * Will use this value on Rx, when creating matching TIR.
+	 */
+	if (priv->config.devx && !priv->sh->tdn)
+		priv->sh->tdn = priv->sh->td->id;
+#endif
+	MLX5_ASSERT(sh->tx_uar);
+	MLX5_ASSERT(sh->tx_uar->reg_addr);
+	txq_ctrl->bf_reg = sh->tx_uar->reg_addr;
+	txq_ctrl->uar_mmap_offset = sh->tx_uar->mmap_off;
+	rte_atomic32_set(&txq_obj->refcnt, 1);
+	txq_uar_init(txq_ctrl);
+	LIST_INSERT_HEAD(&priv->txqsobj, txq_obj, next);
+	return txq_obj;
+error:
+	ret = rte_errno; /* Save rte_errno before cleanup. */
+	txq_release_sq_resources(txq_obj);
+	if (txq_data->fcqs) {
+		rte_free(txq_data->fcqs);
+		txq_data->fcqs = NULL;
+	}
+	rte_free(txq_obj);
+	rte_errno = ret; /* Restore rte_errno. */
+	return NULL;
+#endif
+}
+
+/**
  * Create the Tx queue Verbs object.
  *
  * @param dev
@@ -609,6 +901,8 @@ struct mlx5_txq_obj *
 
 	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN)
 		return mlx5_txq_obj_hairpin_new(dev, idx);
+	if (type == MLX5_TXQ_OBJ_TYPE_DEVX_SQ)
+		return mlx5_txq_obj_devx_new(dev, idx);
 #ifdef HAVE_IBV_FLOW_DV_SUPPORT
 	/* If using DevX, need additional mask to read tisn value. */
 	if (priv->config.devx && !priv->sh->tdn)
@@ -817,8 +1111,10 @@ struct mlx5_txq_obj *
 		claim_zero(mlx5_glue->destroy_cq(tmpl.cq));
 	if (tmpl.qp)
 		claim_zero(mlx5_glue->destroy_qp(tmpl.qp));
-	if (txq_data->fcqs)
+	if (txq_data && txq_data->fcqs) {
 		rte_free(txq_data->fcqs);
+		txq_data->fcqs = NULL;
+	}
 	if (txq_obj)
 		rte_free(txq_obj);
 	priv->verbs_alloc_ctx.type = MLX5_VERBS_ALLOC_TYPE_NONE;
@@ -870,11 +1166,15 @@ struct mlx5_txq_obj *
 		if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_HAIRPIN) {
 			if (txq_obj->tis)
 				claim_zero(mlx5_devx_cmd_destroy(txq_obj->tis));
+		} else if (txq_obj->type == MLX5_TXQ_OBJ_TYPE_DEVX_SQ) {
+			txq_release_sq_resources(txq_obj);
 		} else {
 			claim_zero(mlx5_glue->destroy_qp(txq_obj->qp));
 			claim_zero(mlx5_glue->destroy_cq(txq_obj->cq));
-				if (txq_obj->txq_ctrl->txq.fcqs)
-					rte_free(txq_obj->txq_ctrl->txq.fcqs);
+		}
+		if (txq_obj->txq_ctrl->txq.fcqs) {
+			rte_free(txq_obj->txq_ctrl->txq.fcqs);
+			txq_obj->txq_ctrl->txq.fcqs = NULL;
 		}
 		LIST_REMOVE(txq_obj, next);
 		rte_free(txq_obj);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 08/17] net/mlx5: allocate packet pacing context
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (6 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with DevX Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 09/17] net/mlx5: introduce clock queue service routine Viacheslav Ovsiienko
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch allocates the Packet Pacing context from the kernel,
configures one according to requested pace send scheduling
granularuty and assigns to Clock Queue.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_txpp.c | 71 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 61a93f9..e8a7b10 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -568,6 +568,8 @@ struct mlx5_dev_txpp {
 	struct mlx5dv_devx_event_channel *echan; /* Event Channel. */
 	struct mlx5_txpp_wq clock_queue; /* Clock Queue. */
 	struct mlx5_txpp_wq rearm_queue; /* Clock Queue. */
+	struct mlx5dv_pp *pp; /* Packet pacing context. */
+	uint16_t pp_id; /* Packet pacing context index. */
 };
 
 /*
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index f600fc5..a0ee872 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -6,6 +6,7 @@
 #include <rte_interrupts.h>
 #include <rte_alarm.h>
 #include <rte_malloc.h>
+#include <rte_cycles.h>
 
 #include "mlx5.h"
 #include "mlx5_rxtx.h"
@@ -49,6 +50,69 @@
 }
 
 static void
+mlx5_txpp_free_pp_index(struct mlx5_dev_ctx_shared *sh)
+{
+	if (sh->txpp.pp) {
+		mlx5_glue->dv_free_pp(sh->txpp.pp);
+		sh->txpp.pp = NULL;
+		sh->txpp.pp_id = 0;
+	}
+}
+
+/* Allocate Packet Pacing index from kernel via mlx5dv call. */
+static int
+mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
+	uint64_t rate;
+
+	MLX5_ASSERT(!sh->txpp.pp);
+	memset(&pp, 0, sizeof(pp));
+	rate = NS_PER_S / sh->txpp.tick;
+	if (rate * sh->txpp.tick != NS_PER_S)
+		DRV_LOG(WARNING, "Packet pacing frequency is not precise.");
+	if (sh->txpp.test) {
+		uint32_t len;
+
+		len = RTE_MAX(MLX5_TXPP_TEST_PKT_SIZE,
+			      (size_t)RTE_ETHER_MIN_LEN);
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, len);
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, len);
+		/* Convert packets per second into kilobits. */
+		rate = (rate * len) / (1000ul / CHAR_BIT);
+		DRV_LOG(INFO, "Packet pacing rate set to %" PRIu64, rate);
+	}
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, rate);
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode,
+		 sh->txpp.test ? MLX5_DATA_RATE : MLX5_WQE_RATE);
+	sh->txpp.pp = mlx5_glue->dv_alloc_pp
+				(sh->ctx, sizeof(pp), &pp,
+				 MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX);
+	if (sh->txpp.pp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate packet pacing index.");
+		rte_errno = errno;
+		return -errno;
+	}
+	if (!sh->txpp.pp->index) {
+		DRV_LOG(ERR, "Zero packet pacing index allocated.");
+		mlx5_txpp_free_pp_index(sh);
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	sh->txpp.pp_id = sh->txpp.pp->index;
+	return 0;
+#else
+	RTE_SET_USED(sh);
+	DRV_LOG(ERR, "Allocating pacing index is not supported.");
+	rte_errno = ENOTSUP;
+	return -ENOTSUP;
+#endif
+}
+
+static void
 mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
 	if (wq->sq)
@@ -457,6 +521,7 @@
 	}
 	sq_attr.state = MLX5_SQC_STATE_RST;
 	sq_attr.cqn = wq->cq->id;
+	sq_attr.packet_pacing_rate_limit_index = sh->txpp.pp_id;
 	sq_attr.wq_attr.cd_slave = 1;
 	sq_attr.wq_attr.uar_page = sh->tx_uar->page_id;
 	sq_attr.wq_attr.wq_type = MLX5_WQ_TYPE_CYCLIC;
@@ -503,6 +568,7 @@
  * - Clock CQ/SQ
  * - Rearm CQ/SQ
  * - attaches rearm interrupt handler
+ * - starts Clock Queue
  *
  * Returns 0 on success, negative otherwise
  */
@@ -520,6 +586,9 @@
 	ret = mlx5_txpp_create_eqn(sh);
 	if (ret)
 		goto exit;
+	ret = mlx5_txpp_alloc_pp_index(sh);
+	if (ret)
+		goto exit;
 	ret = mlx5_txpp_create_clock_queue(sh);
 	if (ret)
 		goto exit;
@@ -530,6 +599,7 @@
 	if (ret) {
 		mlx5_txpp_destroy_rearm_queue(sh);
 		mlx5_txpp_destroy_clock_queue(sh);
+		mlx5_txpp_free_pp_index(sh);
 		mlx5_txpp_destroy_eqn(sh);
 		sh->txpp.tick = 0;
 		sh->txpp.test = 0;
@@ -550,6 +620,7 @@
 {
 	mlx5_txpp_destroy_rearm_queue(sh);
 	mlx5_txpp_destroy_clock_queue(sh);
+	mlx5_txpp_free_pp_index(sh);
 	mlx5_txpp_destroy_eqn(sh);
 	sh->txpp.tick = 0;
 	sh->txpp.test = 0;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 09/17] net/mlx5: introduce clock queue service routine
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (7 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 08/17] net/mlx5: allocate packet pacing context Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 10/17] net/mlx5: prepare Tx queue structures to support timestamp Viacheslav Ovsiienko
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

Service routine is invoked periodically on Rearm Queue
completion interrupts, typically once per some milliseconds
(1-16) to track clock jitter and wander in robust fashion.
It performs the following:

- fetches the completed CQEs for Rearm Queue
- restarts Rearm Queue on errors
- pushes new requests to Rearm Queue to make it
  continuously running and pushing cross-channel requests
  to Clock Queue
- reads and caches the Clock Queue CQE to be used in datapath
- gathers statistics to estimate clock jitter and wander
- gathers Clock Queue errors statistics

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  16 ++
 drivers/net/mlx5/mlx5_defs.h |   1 +
 drivers/net/mlx5/mlx5_rxtx.h |  20 +++
 drivers/net/mlx5/mlx5_txpp.c | 338 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 375 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e8a7b10..bb2c096 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -555,6 +555,12 @@ struct mlx5_txpp_wq {
 	volatile uint32_t *sq_dbrec;
 };
 
+/* Tx packet pacing internal timestamp. */
+struct mlx5_txpp_ts {
+	rte_atomic64_t ci_ts;
+	rte_atomic64_t ts;
+};
+
 /* Tx packet pacing structure. */
 struct mlx5_dev_txpp {
 	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
@@ -570,6 +576,15 @@ struct mlx5_dev_txpp {
 	struct mlx5_txpp_wq rearm_queue; /* Clock Queue. */
 	struct mlx5dv_pp *pp; /* Packet pacing context. */
 	uint16_t pp_id; /* Packet pacing context index. */
+	uint16_t ts_n; /* Number of captured timestamps. */
+	uint16_t ts_p; /* Pointer to statisticks timestamp. */
+	struct mlx5_txpp_ts *tsa; /* Timestamps sliding window stats. */
+	struct mlx5_txpp_ts ts; /* Cached completion id/timestamp. */
+	uint32_t sync_lost:1; /* ci/timestamp synchronization lost. */
+	/* Statistics counters. */
+	rte_atomic32_t err_miss_int; /* Missed service interrupt. */
+	rte_atomic32_t err_rearm_queue; /* Rearm Queue errors. */
+	rte_atomic32_t err_clock_queue; /* Clock Queue errors. */
 };
 
 /*
@@ -993,5 +1008,6 @@ void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
 
 int mlx5_txpp_start(struct rte_eth_dev *dev);
 void mlx5_txpp_stop(struct rte_eth_dev *dev);
+void mlx5_txpp_interrupt_handler(void *cb_arg);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index a8626a4..7ed3e88 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -172,6 +172,7 @@
 #define MLX5_TXDB_HEURISTIC 2
 
 /* Tx accurate scheduling on timestamps parameters. */
+#define MLX5_TXPP_WAIT_INIT_TS 1000ul /* How long to wait timestamp. */
 #define MLX5_TXPP_CLKQ_SIZE 1
 #define MLX5_TXPP_REARM	((1UL << MLX5_WQ_INDEX_WIDTH) / 4)
 #define MLX5_TXPP_REARM_SQ_SIZE (((1UL << MLX5_CQ_INDEX_WIDTH) / \
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 1b797da..8a8d2b5 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -30,6 +30,7 @@
 #include <rte_io.h>
 #include <rte_bus_pci.h>
 #include <rte_malloc.h>
+#include <rte_cycles.h>
 
 #include <mlx5_glue.h>
 #include <mlx5_prm.h>
@@ -695,4 +696,23 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
 	mlx5_tx_dbrec_cond_wmb(txq, wqe, 1);
 }
 
+/**
+ * Convert timestamp from HW format to linear counter
+ * from Packet Pacing Clock Queue CQE timestamp format.
+ *
+ * @param sh
+ *   Pointer to the device shared context. Might be needed
+ *   to convert according current device configuration.
+ * @param ts
+ *   Timestamp from CQE to convert.
+ * @return
+ *   UTC in nanoseconds
+ */
+static __rte_always_inline uint64_t
+mlx5_txpp_convert_rx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t ts)
+{
+	RTE_SET_USED(sh);
+	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index a0ee872..7ed5d2c 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -1,6 +1,9 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright 2020 Mellanox Technologies, Ltd
  */
+#include <fcntl.h>
+#include <stdint.h>
+
 #include <rte_ether.h>
 #include <rte_ethdev_driver.h>
 #include <rte_interrupts.h>
@@ -144,6 +147,33 @@
 	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
 
 	mlx5_txpp_destroy_send_queue(wq);
+	if (sh->txpp.tsa) {
+		rte_free(sh->txpp.tsa);
+		sh->txpp.tsa = NULL;
+	}
+}
+
+static void
+mlx5_txpp_doorbell_rearm_queue(struct mlx5_dev_ctx_shared *sh, uint16_t ci)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	union {
+		uint32_t w32[2];
+		uint64_t w64;
+	} cs;
+
+	wq->sq_ci = ci + 1;
+	cs.w32[0] = rte_cpu_to_be_32(rte_be_to_cpu_32
+		   (wq->wqes[ci & (wq->sq_size - 1)].ctrl[0]) | (ci - 1) << 8);
+	cs.w32[1] = wq->wqes[ci & (wq->sq_size - 1)].ctrl[1];
+	/* Update SQ doorbell record with new SQ ci. */
+	rte_compiler_barrier();
+	*wq->sq_dbrec = rte_cpu_to_be_32(wq->sq_ci);
+	/* Make sure the doorbell record is updated. */
+	rte_wmb();
+	/* Write to doorbel register to start processing. */
+	__mlx5_uar_write64_relaxed(cs.w64, sh->tx_uar->reg_addr, NULL);
+	rte_wmb();
 }
 
 static void
@@ -433,6 +463,16 @@
 	uint32_t umem_size, umem_dbrec;
 	int ret;
 
+	sh->txpp.tsa = rte_zmalloc_socket(__func__,
+					   MLX5_TXPP_REARM_SQ_SIZE *
+					   sizeof(struct mlx5_txpp_ts),
+					   0, sh->numa_node);
+	if (!sh->txpp.tsa) {
+		DRV_LOG(ERR, "Failed to allocate memory for CQ stats.");
+		return -ENOMEM;
+	}
+	sh->txpp.ts_p = 0;
+	sh->txpp.ts_n = 0;
 	/* Allocate memory buffer for CQEs and doorbell record. */
 	umem_size = sizeof(struct mlx5_cqe) * MLX5_TXPP_CLKQ_SIZE;
 	umem_dbrec = RTE_ALIGN(umem_size, MLX5_DBR_SIZE);
@@ -562,6 +602,299 @@
 	return ret;
 }
 
+/* Enable notification from the Rearm Queue CQ. */
+static inline void
+mlx5_txpp_cq_arm(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *aq = &sh->txpp.rearm_queue;
+	uint32_t arm_sn = aq->arm_sn << MLX5_CQ_SQN_OFFSET;
+	uint32_t db_hi = arm_sn | MLX5_CQ_DBR_CMD_ALL | aq->cq_ci;
+	uint64_t db_be = rte_cpu_to_be_64(((uint64_t)db_hi << 32) | aq->cq->id);
+	uint32_t *addr = RTE_PTR_ADD(sh->tx_uar->base_addr, MLX5_CQ_DOORBELL);
+
+	rte_compiler_barrier();
+	aq->cq_dbrec[MLX5_CQ_ARM_DB] = rte_cpu_to_be_32(db_hi);
+	rte_wmb();
+#ifdef RTE_ARCH_64
+	*(uint64_t *)addr = db_be;
+#else
+	*(uint32_t *)addr = db_be;
+	rte_io_wmb();
+	*((uint32_t *)addr + 1) = db_be >> 32;
+#endif
+	aq->arm_sn++;
+}
+
+static inline void
+mlx5_atomic_read_cqe(rte_int128_t *from, rte_int128_t *ts)
+{
+	/*
+	 * The only CQE of Clock Queue is being continuously
+	 * update by hardware with soecified rate. We have to
+	 * read timestump and WQE completion index atomically.
+	 */
+#if defined(RTE_ARCH_PPC_64) || defined(RTE_ARCH_32)
+	rte_atomic64_t *cqe = (rte_atomic64_t *)from;
+
+	/* Power architecture does not support 16B compare-and-swap. */
+	for (;;) {
+		int64_t tm, op;
+		int64_t *ps;
+
+		rte_compiler_barrier();
+		tm = rte_atomic64_read(cqe + 0);
+		op = rte_atomic64_read(cqe + 1);
+		rte_compiler_barrier();
+		if (tm != rte_atomic64_read(cqe + 0))
+			continue;
+		if (op != rte_atomic64_read(cqe + 1))
+			continue;
+		ps = (int64_t *)ts;
+		ps[0] = tm;
+		ps[1] = op;
+		return;
+	}
+#else
+	rte_int128_t src;
+
+	memset(&src, 0, sizeof(src));
+	*ts = src;
+	/* if (*from == *ts) *from = *src else *ts = *from; */
+	rte_atomic128_cmp_exchange(from, ts, &src, 0,
+				   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+#endif
+}
+
+/* Stores timestamp in the cache structure to share data with datapath. */
+static inline void
+mlx5_txpp_cache_timestamp(struct mlx5_dev_ctx_shared *sh,
+			   uint64_t ts, uint64_t ci)
+{
+	ci = ci << (64 - MLX5_CQ_INDEX_WIDTH);
+	ci |= (ts << MLX5_CQ_INDEX_WIDTH) >> MLX5_CQ_INDEX_WIDTH;
+	rte_compiler_barrier();
+	rte_atomic64_set(&sh->txpp.ts.ts, ts);
+	rte_atomic64_set(&sh->txpp.ts.ci_ts, ci);
+	rte_wmb();
+}
+
+/* Reads timestamp from Clock Queue CQE and stores in the cache. */
+static inline void
+mlx5_txpp_update_timestamp(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+	struct mlx5_cqe *cqe = (struct mlx5_cqe *)(uintptr_t)wq->cqes;
+	union {
+		rte_int128_t u128;
+		struct mlx5_cqe_ts cts;
+	} to;
+	uint64_t ts;
+	uint16_t ci;
+
+	static_assert(sizeof(struct mlx5_cqe_ts) == sizeof(rte_int128_t),
+		      "Wrong timestamp CQE part size");
+	mlx5_atomic_read_cqe((rte_int128_t *)&cqe->timestamp, &to.u128);
+	if (to.cts.op_own >> 4) {
+		DRV_LOG(DEBUG, "Clock Queue error sync lost.");
+		rte_atomic32_inc(&sh->txpp.err_clock_queue);
+		sh->txpp.sync_lost = 1;
+		return;
+	}
+	ci = rte_be_to_cpu_16(to.cts.wqe_counter);
+	ts = rte_be_to_cpu_64(to.cts.timestamp);
+	ts = mlx5_txpp_convert_rx_ts(sh, ts);
+	wq->cq_ci += (ci - wq->sq_ci) & UINT16_MAX;
+	wq->sq_ci = ci;
+	mlx5_txpp_cache_timestamp(sh, ts, wq->cq_ci);
+}
+
+/* Gather statistics for timestamp from Clock Queue CQE. */
+static inline void
+mlx5_txpp_gather_timestamp(struct mlx5_dev_ctx_shared *sh)
+{
+	/* Check whether we have a valid timestamp. */
+	if (!sh->txpp.clock_queue.sq_ci && !sh->txpp.ts_n)
+		return;
+	MLX5_ASSERT(sh->txpp.ts_p < MLX5_TXPP_REARM_SQ_SIZE);
+	sh->txpp.tsa[sh->txpp.ts_p] = sh->txpp.ts;
+	if (++sh->txpp.ts_p >= MLX5_TXPP_REARM_SQ_SIZE)
+		sh->txpp.ts_p = 0;
+	if (sh->txpp.ts_n < MLX5_TXPP_REARM_SQ_SIZE)
+		++sh->txpp.ts_n;
+}
+
+/* Waits for the first completion on Clock Queue to init timestamp. */
+static inline void
+mlx5_txpp_init_timestamp(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+	uint32_t wait;
+
+	sh->txpp.ts_p = 0;
+	sh->txpp.ts_n = 0;
+	for (wait = 0; wait < MLX5_TXPP_WAIT_INIT_TS; wait++) {
+		struct timespec onems;
+
+		mlx5_txpp_update_timestamp(sh);
+		if (wq->sq_ci)
+			return;
+		/* Wait one millisecond and try again. */
+		onems.tv_sec = 0;
+		onems.tv_nsec = NS_PER_S / MS_PER_S;
+		nanosleep(&onems, 0);
+	}
+	DRV_LOG(ERR, "Unable to initialize timestamp.");
+	sh->txpp.sync_lost = 1;
+}
+
+#ifdef HAVE_IBV_DEVX_EVENT
+/* Handles Rearm Queue completions in periodic service. */
+static __rte_always_inline void
+mlx5_txpp_handle_rearm_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	struct mlx5_txpp_wq *wq = &sh->txpp.rearm_queue;
+	uint32_t cq_ci = wq->cq_ci;
+	bool error = false;
+	int ret;
+
+	do {
+		volatile struct mlx5_cqe *cqe;
+
+		cqe = &wq->cqes[cq_ci & (MLX5_TXPP_REARM_CQ_SIZE - 1)];
+		ret = check_cqe(cqe, MLX5_TXPP_REARM_CQ_SIZE, cq_ci);
+		switch (ret) {
+		case MLX5_CQE_STATUS_ERR:
+			error = true;
+			++cq_ci;
+			break;
+		case MLX5_CQE_STATUS_SW_OWN:
+			wq->sq_ci += 2;
+			++cq_ci;
+			break;
+		case MLX5_CQE_STATUS_HW_OWN:
+			break;
+		default:
+			MLX5_ASSERT(false);
+			break;
+		}
+	} while (ret != MLX5_CQE_STATUS_HW_OWN);
+	if (likely(cq_ci != wq->cq_ci)) {
+		/* Check whether we have missed interrupts. */
+		if (cq_ci - wq->cq_ci != 1) {
+			DRV_LOG(DEBUG, "Rearm Queue missed interrupt.");
+			rte_atomic32_inc(&sh->txpp.err_miss_int);
+			/* Check sync lost on wqe index. */
+			if (cq_ci - wq->cq_ci >=
+				(((1UL << MLX5_WQ_INDEX_WIDTH) /
+				  MLX5_TXPP_REARM) - 1))
+				error = 1;
+		}
+		/* Update doorbell record to notify hardware. */
+		rte_compiler_barrier();
+		*wq->cq_dbrec = rte_cpu_to_be_32(cq_ci);
+		rte_wmb();
+		wq->cq_ci = cq_ci;
+		/* Fire new requests to Rearm Queue. */
+		if (error) {
+			DRV_LOG(DEBUG, "Rearm Queue error sync lost.");
+			rte_atomic32_inc(&sh->txpp.err_rearm_queue);
+			sh->txpp.sync_lost = 1;
+		}
+	}
+}
+
+/* Handles Clock Queue completions in periodic service. */
+static __rte_always_inline void
+mlx5_txpp_handle_clock_queue(struct mlx5_dev_ctx_shared *sh)
+{
+	mlx5_txpp_update_timestamp(sh);
+	mlx5_txpp_gather_timestamp(sh);
+}
+#endif
+
+/* Invoked periodically on Rearm Queue completions. */
+void
+mlx5_txpp_interrupt_handler(void *cb_arg)
+{
+#ifndef HAVE_IBV_DEVX_EVENT
+	RTE_SET_USED(cb_arg);
+	return;
+#else
+	struct mlx5_dev_ctx_shared *sh = cb_arg;
+	union {
+		struct mlx5dv_devx_async_event_hdr event_resp;
+		uint8_t buf[sizeof(struct mlx5dv_devx_async_event_hdr) + 128];
+	} out;
+
+	MLX5_ASSERT(rte_eal_process_type() == RTE_PROC_PRIMARY);
+	/* Process events in the loop. Only rearm completions are expected. */
+	while (mlx5_glue->devx_get_event
+				(sh->txpp.echan,
+				 &out.event_resp,
+				 sizeof(out.buf)) >=
+				 (ssize_t)sizeof(out.event_resp.cookie)) {
+		mlx5_txpp_handle_rearm_queue(sh);
+		mlx5_txpp_handle_clock_queue(sh);
+		mlx5_txpp_cq_arm(sh);
+		mlx5_txpp_doorbell_rearm_queue
+					(sh, sh->txpp.rearm_queue.sq_ci - 1);
+	}
+#endif /* HAVE_IBV_DEVX_ASYNC */
+}
+
+static void
+mlx5_txpp_stop_service(struct mlx5_dev_ctx_shared *sh)
+{
+	if (!sh->txpp.intr_handle.fd)
+		return;
+	mlx5_intr_callback_unregister(&sh->txpp.intr_handle,
+				      mlx5_txpp_interrupt_handler, sh);
+	sh->txpp.intr_handle.fd = 0;
+}
+
+/* Attach interrupt handler and fires first request to Rearm Queue. */
+static int
+mlx5_txpp_start_service(struct mlx5_dev_ctx_shared *sh)
+{
+	uint16_t event_nums[1] = {0};
+	int flags;
+	int ret;
+
+	/* Attach interrupt handler to process Rearm Queue completions. */
+	flags = fcntl(sh->txpp.echan->fd, F_GETFL);
+	ret = fcntl(sh->txpp.echan->fd, F_SETFL, flags | O_NONBLOCK);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to change event channel FD.");
+		rte_errno = errno;
+		return -rte_errno;
+	}
+	memset(&sh->txpp.intr_handle, 0, sizeof(sh->txpp.intr_handle));
+	sh->txpp.intr_handle.fd = sh->txpp.echan->fd;
+	sh->txpp.intr_handle.type = RTE_INTR_HANDLE_EXT;
+	if (rte_intr_callback_register(&sh->txpp.intr_handle,
+				       mlx5_txpp_interrupt_handler, sh)) {
+		sh->txpp.intr_handle.fd = 0;
+		DRV_LOG(ERR, "Failed to register CQE interrupt %d.", rte_errno);
+		return -rte_errno;
+	}
+	/* Subscribe CQ event to the event channel controlled by the driver. */
+	ret = mlx5_glue->devx_subscribe_devx_event(sh->txpp.echan,
+						   sh->txpp.rearm_queue.cq->obj,
+						   sizeof(event_nums),
+						   event_nums, 0);
+	if (ret) {
+		DRV_LOG(ERR, "Failed to subscribe CQE event.");
+		rte_errno = errno;
+		return -errno;
+	}
+	/* Enable interrupts in the CQ. */
+	mlx5_txpp_cq_arm(sh);
+	/* Fire the first request on Rearm Queue. */
+	mlx5_txpp_doorbell_rearm_queue(sh, sh->txpp.rearm_queue.sq_size - 1);
+	mlx5_txpp_init_timestamp(sh);
+	return 0;
+}
+
 /*
  * The routine initializes the packet pacing infrastructure:
  * - allocates PP context
@@ -595,8 +928,12 @@
 	ret = mlx5_txpp_create_rearm_queue(sh);
 	if (ret)
 		goto exit;
+	ret = mlx5_txpp_start_service(sh);
+	if (ret)
+		goto exit;
 exit:
 	if (ret) {
+		mlx5_txpp_stop_service(sh);
 		mlx5_txpp_destroy_rearm_queue(sh);
 		mlx5_txpp_destroy_clock_queue(sh);
 		mlx5_txpp_free_pp_index(sh);
@@ -618,6 +955,7 @@
 static void
 mlx5_txpp_destroy(struct mlx5_dev_ctx_shared *sh)
 {
+	mlx5_txpp_stop_service(sh);
 	mlx5_txpp_destroy_rearm_queue(sh);
 	mlx5_txpp_destroy_clock_queue(sh);
 	mlx5_txpp_free_pp_index(sh);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 10/17] net/mlx5: prepare Tx queue structures to support timestamp
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (8 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 09/17] net/mlx5: introduce clock queue service routine Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 11/17] net/mlx5: convert timestamp to completion index Viacheslav Ovsiienko
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The fields to support send scheduling on dynamic timestamp
field are introduced and initialized on device start.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.h    |  4 ++++
 drivers/net/mlx5/mlx5_trigger.c |  2 ++
 drivers/net/mlx5/mlx5_txq.c     | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 8a8d2b5..974a847 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -313,6 +313,9 @@ struct mlx5_txq_data {
 	volatile uint32_t *cq_db; /* Completion queue doorbell. */
 	uint16_t port_id; /* Port ID of device. */
 	uint16_t idx; /* Queue index. */
+	uint64_t ts_mask; /* Timestamp flag dynamic mask. */
+	int32_t ts_offset; /* Timestamp field dynamic offset. */
+	struct mlx5_dev_ctx_shared *sh; /* Shared context. */
 	struct mlx5_txq_stats stats; /* TX queue counters. */
 #ifndef RTE_ARCH_64
 	rte_spinlock_t *uar_lock;
@@ -468,6 +471,7 @@ struct mlx5_txq_ctrl *mlx5_txq_hairpin_new
 void txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
 uint64_t mlx5_get_tx_port_offloads(struct rte_eth_dev *dev);
+void mlx5_txq_dynf_timestamp_set(struct rte_eth_dev *dev);
 
 /* mlx5_rxtx.c */
 
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index e324319..29aef54 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -331,6 +331,8 @@
 	}
 	/* Set a mask and offset of dynamic metadata flows into Rx queues*/
 	mlx5_flow_rxq_dynf_metadata_set(dev);
+	/* Set a mask and offset of scheduling on timestamp into Tx queues*/
+	mlx5_txq_dynf_timestamp_set(dev);
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
 	 * action and no flow created by application exists anymore.
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index f9ed504..7f6a40a 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1785,3 +1785,35 @@ struct mlx5_txq_ctrl *
 	}
 	return ret;
 }
+
+/**
+ * Set the Tx queue dynamic timestamp (mask and offset)
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ */
+void
+mlx5_txq_dynf_timestamp_set(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_data *data;
+	int off, nbit;
+	unsigned int i;
+	uint64_t mask = 0;
+
+	nbit = rte_mbuf_dynflag_lookup
+				(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL);
+	off = rte_mbuf_dynfield_lookup
+				(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+	if (nbit > 0 && off >= 0 && sh->txpp.refcnt)
+		mask = 1ULL << nbit;
+	for (i = 0; i != priv->txqs_n; ++i) {
+		data = (*priv->txqs)[i];
+		if (!data)
+			continue;
+		data->sh = sh;
+		data->ts_mask = mask;
+		data->ts_offset = off;
+	}
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 11/17] net/mlx5: convert timestamp to completion index
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (9 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 10/17] net/mlx5: prepare Tx queue structures to support timestamp Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 12/17] net/mlx5: prepare Tx datapath to support sheduling Viacheslav Ovsiienko
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The application provides timestamps in Tx mbuf as clocks,
the hardware performs scheduling on Clock Queue completion index
match. This patch introduces the timestamp-to-completion-index
inline routine.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_rxtx.h | 55 ++++++++++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txpp.c |  5 ++++
 3 files changed, 62 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index bb2c096..3304f0d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -585,6 +585,8 @@ struct mlx5_dev_txpp {
 	rte_atomic32_t err_miss_int; /* Missed service interrupt. */
 	rte_atomic32_t err_rearm_queue; /* Rearm Queue errors. */
 	rte_atomic32_t err_clock_queue; /* Clock Queue errors. */
+	rte_atomic32_t err_ts_past; /* Timestamp in the past. */
+	rte_atomic32_t err_ts_future; /* Timestamp in the distant future. */
 };
 
 /*
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 974a847..d082cd7 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -719,4 +719,59 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
 	return (ts & UINT32_MAX) + (ts >> 32) * NS_PER_S;
 }
 
+/**
+ * Convert timestamp from mbuf format to linear counter
+ * of Clock Queue completions (24 bits)
+ *
+ * @param sh
+ *   Pointer to the device shared context to fetch Tx
+ *   packet pacing timestamp and parameters.
+ * @param ts
+ *   Timestamp from mbuf to convert.
+ * @return
+ *   positive or zero value - completion ID to wait
+ *   negative value - conversion error
+ */
+static __rte_always_inline int32_t
+mlx5_txpp_convert_tx_ts(struct mlx5_dev_ctx_shared *sh, uint64_t mts)
+{
+	uint64_t ts, ci;
+	uint32_t tick;
+
+	do {
+		/*
+		 * Read atomically two uint64_t fields and compare lsb bits.
+		 * It there is no match - the timestamp was updated in
+		 * the service thread, data should be re-read.
+		 */
+		rte_compiler_barrier();
+		ci = rte_atomic64_read(&sh->txpp.ts.ci_ts);
+		ts = rte_atomic64_read(&sh->txpp.ts.ts);
+		rte_compiler_barrier();
+		if (!((ts ^ ci) << (64 - MLX5_CQ_INDEX_WIDTH)))
+			break;
+	} while (true);
+	/* Perform the skew correction, positive value to send earlier. */
+	mts -= sh->txpp.skew;
+	mts -= ts;
+	if (unlikely(mts >= UINT64_MAX / 2)) {
+		/* We have negative integer, mts is in the past. */
+		rte_atomic32_inc(&sh->txpp.err_ts_past);
+		return -1;
+	}
+	tick = sh->txpp.tick;
+	MLX5_ASSERT(tick);
+	/* Convert delta to completions, round up. */
+	mts = (mts + tick - 1) / tick;
+	if (unlikely(mts >= (1 << MLX5_CQ_INDEX_WIDTH) / 2 - 1)) {
+		/* We have mts is too distant future. */
+		rte_atomic32_inc(&sh->txpp.err_ts_future);
+		return -1;
+	}
+	mts <<= 64 - MLX5_CQ_INDEX_WIDTH;
+	ci += mts;
+	ci >>= 64 - MLX5_CQ_INDEX_WIDTH;
+	return ci;
+}
+
 #endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 7ed5d2c..b7ba20a 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -860,6 +860,11 @@
 	int flags;
 	int ret;
 
+	rte_atomic32_set(&sh->txpp.err_miss_int, 0);
+	rte_atomic32_set(&sh->txpp.err_rearm_queue, 0);
+	rte_atomic32_set(&sh->txpp.err_clock_queue, 0);
+	rte_atomic32_set(&sh->txpp.err_ts_past, 0);
+	rte_atomic32_set(&sh->txpp.err_ts_future, 0);
 	/* Attach interrupt handler to process Rearm Queue completions. */
 	flags = fcntl(sh->txpp.echan->fd, F_GETFL);
 	ret = fcntl(sh->txpp.echan->fd, F_SETFL, flags | O_NONBLOCK);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 12/17] net/mlx5: prepare Tx datapath to support sheduling
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (10 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 11/17] net/mlx5: convert timestamp to completion index Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 13/17] net/mlx5: add scheduling support to send routine template Viacheslav Ovsiienko
                     ` (5 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The new static control flag is introduced to control
routine generating from template, enabling the scheduling
on timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.c | 97 +++++++++++++++++++++++++++++++++++++++++++-
 drivers/net/mlx5/mlx5_txq.c  |  2 +
 2 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 227289e..76fe12b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -66,6 +66,7 @@ enum mlx5_txcmp_code {
 #define MLX5_TXOFF_CONFIG_METADATA (1u << 6) /* Flow metadata. */
 #define MLX5_TXOFF_CONFIG_EMPW (1u << 8) /* Enhanced MPW supported.*/
 #define MLX5_TXOFF_CONFIG_MPW (1u << 9) /* Legacy MPW supported.*/
+#define MLX5_TXOFF_CONFIG_TXPP (1u << 10) /* Scheduling on timestamp.*/
 
 /* The most common offloads groups. */
 #define MLX5_TXOFF_CONFIG_NONE 0
@@ -5268,6 +5269,45 @@ enum mlx5_txcmp_code {
 		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_VLAN |
 		MLX5_TXOFF_CONFIG_METADATA)
 
+/* Generate routines with timestamp scheduling. */
+MLX5_TXOFF_DECL(full_ts_nompw,
+		MLX5_TXOFF_CONFIG_FULL | MLX5_TXOFF_CONFIG_TXPP)
+
+MLX5_TXOFF_DECL(full_ts_nompwi,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_SWP | MLX5_TXOFF_CONFIG_CSUM |
+		MLX5_TXOFF_CONFIG_VLAN | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP)
+
+MLX5_TXOFF_DECL(full_ts,
+		MLX5_TXOFF_CONFIG_FULL | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(full_ts_noi,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_SWP | MLX5_TXOFF_CONFIG_CSUM |
+		MLX5_TXOFF_CONFIG_VLAN | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(none_ts,
+		MLX5_TXOFF_CONFIG_NONE | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(mdi_ts,
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(mti_ts,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_DECL(mtiv_ts,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_VLAN |
+		MLX5_TXOFF_CONFIG_METADATA | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
 /*
  * Generate routines with Legacy Multi-Packet Write support.
  * This mode is supported by ConnectX-4 Lx only and imposes
@@ -5372,6 +5412,44 @@ enum mlx5_txcmp_code {
 		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_VLAN |
 		MLX5_TXOFF_CONFIG_METADATA | MLX5_TXOFF_CONFIG_EMPW)
 
+MLX5_TXOFF_INFO(full_ts_nompw,
+		MLX5_TXOFF_CONFIG_FULL | MLX5_TXOFF_CONFIG_TXPP)
+
+MLX5_TXOFF_INFO(full_ts_nompwi,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_SWP | MLX5_TXOFF_CONFIG_CSUM |
+		MLX5_TXOFF_CONFIG_VLAN | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP)
+
+MLX5_TXOFF_INFO(full_ts,
+		MLX5_TXOFF_CONFIG_FULL | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(full_ts_noi,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_SWP | MLX5_TXOFF_CONFIG_CSUM |
+		MLX5_TXOFF_CONFIG_VLAN | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(none_ts,
+		MLX5_TXOFF_CONFIG_NONE | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(mdi_ts,
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(mti_ts,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_METADATA |
+		MLX5_TXOFF_CONFIG_TXPP | MLX5_TXOFF_CONFIG_EMPW)
+
+MLX5_TXOFF_INFO(mtiv_ts,
+		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
+		MLX5_TXOFF_CONFIG_INLINE | MLX5_TXOFF_CONFIG_VLAN |
+		MLX5_TXOFF_CONFIG_METADATA | MLX5_TXOFF_CONFIG_TXPP |
+		MLX5_TXOFF_CONFIG_EMPW)
+
 MLX5_TXOFF_INFO(full,
 		MLX5_TXOFF_CONFIG_MULTI | MLX5_TXOFF_CONFIG_TSO |
 		MLX5_TXOFF_CONFIG_SWP |	MLX5_TXOFF_CONFIG_CSUM |
@@ -5518,6 +5596,14 @@ enum mlx5_txcmp_code {
 		/* We should support VLAN insertion. */
 		olx |= MLX5_TXOFF_CONFIG_VLAN;
 	}
+	if (tx_offloads & DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP &&
+	    rte_mbuf_dynflag_lookup
+			(RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME, NULL) > 0 &&
+	    rte_mbuf_dynfield_lookup
+			(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL) > 0) {
+		/* Offload configured, dynamic entities registered. */
+		olx |= MLX5_TXOFF_CONFIG_TXPP;
+	}
 	if (priv->txqs_n && (*priv->txqs)[0]) {
 		struct mlx5_txq_data *txd = (*priv->txqs)[0];
 
@@ -5587,6 +5673,9 @@ enum mlx5_txcmp_code {
 		if ((olx ^ tmp) & MLX5_TXOFF_CONFIG_INLINE)
 			/* Do not enable inlining if not configured. */
 			continue;
+		if ((olx ^ tmp) & MLX5_TXOFF_CONFIG_TXPP)
+			/* Do not enable scheduling if not configured. */
+			continue;
 		/*
 		 * Some routine meets the requirements.
 		 * Check whether it has minimal amount
@@ -5631,6 +5720,8 @@ enum mlx5_txcmp_code {
 		DRV_LOG(DEBUG, "\tVLANI (VLAN insertion)");
 	if (txoff_func[m].olx & MLX5_TXOFF_CONFIG_METADATA)
 		DRV_LOG(DEBUG, "\tMETAD (tx Flow metadata)");
+	if (txoff_func[m].olx & MLX5_TXOFF_CONFIG_TXPP)
+		DRV_LOG(DEBUG, "\tMETAD (tx Scheduling)");
 	if (txoff_func[m].olx & MLX5_TXOFF_CONFIG_EMPW) {
 		if (txoff_func[m].olx & MLX5_TXOFF_CONFIG_MPW)
 			DRV_LOG(DEBUG, "\tMPW   (Legacy MPW)");
@@ -5705,7 +5796,7 @@ enum mlx5_txcmp_code {
 		if (pkt_burst == txoff_func[i].func) {
 			olx = txoff_func[i].olx;
 			snprintf(mode->info, sizeof(mode->info),
-				 "%s%s%s%s%s%s%s%s",
+				 "%s%s%s%s%s%s%s%s%s",
 				 (olx & MLX5_TXOFF_CONFIG_EMPW) ?
 				 ((olx & MLX5_TXOFF_CONFIG_MPW) ?
 				 "Legacy MPW" : "Enhanced MPW") : "No MPW",
@@ -5722,7 +5813,9 @@ enum mlx5_txcmp_code {
 				 (olx & MLX5_TXOFF_CONFIG_VLAN) ?
 				 " + VLAN" : "",
 				 (olx & MLX5_TXOFF_CONFIG_METADATA) ?
-				 " + METADATA" : "");
+				 " + METADATA" : "",
+				 (olx & MLX5_TXOFF_CONFIG_TXPP) ?
+				 " + TXPP" : "");
 			return 0;
 		}
 	}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 7f6a40a..4ab6ac1 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -119,6 +119,8 @@
 			     DEV_TX_OFFLOAD_TCP_CKSUM);
 	if (config->tso)
 		offloads |= DEV_TX_OFFLOAD_TCP_TSO;
+	if (config->tx_pp)
+		offloads |= DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP;
 	if (config->swp) {
 		if (config->hw_csum)
 			offloads |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 13/17] net/mlx5: add scheduling support to send routine template
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (11 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 12/17] net/mlx5: prepare Tx datapath to support sheduling Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 14/17] net/mlx5: add read device clock support Viacheslav Ovsiienko
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

This patch adds send scheduling on timestamps into tx_burst
routine template. The feature is controlled by static configuration
flag, the actual routines supporting the new feature are generated
over this updated template.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.c | 162 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 161 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 76fe12b..e511142 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -2404,6 +2404,37 @@ enum mlx5_txcmp_code {
 }
 
 /**
+ * Build the Synchronize Queue Segment with specified completion index.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param wqe
+ *   Pointer to WQE to fill with built Control Segment.
+ * @param wci
+ *   Completion index in Clock Queue to wait.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ */
+static __rte_always_inline void
+mlx5_tx_wseg_init(struct mlx5_txq_data *restrict txq,
+		  struct mlx5_txq_local *restrict loc __rte_unused,
+		  struct mlx5_wqe *restrict wqe,
+		  unsigned int wci,
+		  unsigned int olx __rte_unused)
+{
+	struct mlx5_wqe_qseg *qs;
+
+	qs = RTE_PTR_ADD(wqe, MLX5_WSEG_SIZE);
+	qs->max_index = rte_cpu_to_be_32(wci);
+	qs->qpn_cqn = rte_cpu_to_be_32(txq->sh->txpp.clock_queue.cq->id);
+	qs->reserved0 = RTE_BE32(0);
+	qs->reserved1 = RTE_BE32(0);
+}
+
+/**
  * Build the Ethernet Segment without inlined data.
  * Supports Software Parser, Checksums and VLAN
  * insertion Tx offload features.
@@ -3241,6 +3272,59 @@ enum mlx5_txcmp_code {
 }
 
 /**
+ * The routine checks timestamp flag in the current packet,
+ * and push WAIT WQE into the queue if scheduling is required.
+ *
+ * @param txq
+ *   Pointer to TX queue structure.
+ * @param loc
+ *   Pointer to burst routine local context.
+ * @param olx
+ *   Configured Tx offloads mask. It is fully defined at
+ *   compile time and may be used for optimization.
+ *
+ * @return
+ *   MLX5_TXCMP_CODE_EXIT - sending is done or impossible.
+ *   MLX5_TXCMP_CODE_SINGLE - continue processing with the packet.
+ *   MLX5_TXCMP_CODE_MULTI - the WAIT inserted, continue processing.
+ * Local context variables partially updated.
+ */
+static __rte_always_inline enum mlx5_txcmp_code
+mlx5_tx_schedule_send(struct mlx5_txq_data *restrict txq,
+		      struct mlx5_txq_local *restrict loc,
+		      unsigned int olx)
+{
+	if (MLX5_TXOFF_CONFIG(TXPP) &&
+	    loc->mbuf->ol_flags & txq->ts_mask) {
+		struct mlx5_wqe *wqe;
+		uint64_t ts;
+		int32_t wci;
+
+		/*
+		 * Estimate the required space quickly and roughly.
+		 * We would like to ensure the packet can be pushed
+		 * to the queue and we won't get the orphan WAIT WQE.
+		 */
+		if (loc->wqe_free <= MLX5_WQE_SIZE_MAX / MLX5_WQE_SIZE ||
+		    loc->elts_free < NB_SEGS(loc->mbuf))
+			return MLX5_TXCMP_CODE_EXIT;
+		/* Convert the timestamp into completion to wait. */
+		ts = *RTE_MBUF_DYNFIELD(loc->mbuf, txq->ts_offset, uint64_t *);
+		wci = mlx5_txpp_convert_tx_ts(txq->sh, ts);
+		if (unlikely(wci < 0))
+			return MLX5_TXCMP_CODE_SINGLE;
+		/* Build the WAIT WQE with specified completion. */
+		wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
+		mlx5_tx_cseg_init(txq, loc, wqe, 2, MLX5_OPCODE_WAIT, olx);
+		mlx5_tx_wseg_init(txq, loc, wqe, wci, olx);
+		++txq->wqe_ci;
+		--loc->wqe_free;
+		return MLX5_TXCMP_CODE_MULTI;
+	}
+	return MLX5_TXCMP_CODE_SINGLE;
+}
+
+/**
  * Tx one packet function for multi-segment TSO. Supports all
  * types of Tx offloads, uses MLX5_OPCODE_TSO to build WQEs,
  * sends one packet per WQE.
@@ -3269,6 +3353,16 @@ enum mlx5_txcmp_code {
 	struct mlx5_wqe *__rte_restrict wqe;
 	unsigned int ds, dlen, inlen, ntcp, vlan = 0;
 
+	if (MLX5_TXOFF_CONFIG(TXPP)) {
+		enum mlx5_txcmp_code wret;
+
+		/* Generate WAIT for scheduling if requested. */
+		wret = mlx5_tx_schedule_send(txq, loc, olx);
+		if (wret == MLX5_TXCMP_CODE_EXIT)
+			return MLX5_TXCMP_CODE_EXIT;
+		if (wret == MLX5_TXCMP_CODE_ERROR)
+			return MLX5_TXCMP_CODE_ERROR;
+	}
 	/*
 	 * Calculate data length to be inlined to estimate
 	 * the required space in WQE ring buffer.
@@ -3360,6 +3454,16 @@ enum mlx5_txcmp_code {
 	unsigned int ds, nseg;
 
 	MLX5_ASSERT(NB_SEGS(loc->mbuf) > 1);
+	if (MLX5_TXOFF_CONFIG(TXPP)) {
+		enum mlx5_txcmp_code wret;
+
+		/* Generate WAIT for scheduling if requested. */
+		wret = mlx5_tx_schedule_send(txq, loc, olx);
+		if (wret == MLX5_TXCMP_CODE_EXIT)
+			return MLX5_TXCMP_CODE_EXIT;
+		if (wret == MLX5_TXCMP_CODE_ERROR)
+			return MLX5_TXCMP_CODE_ERROR;
+	}
 	/*
 	 * No inline at all, it means the CPU cycles saving
 	 * is prioritized at configuration, we should not
@@ -3468,6 +3572,16 @@ enum mlx5_txcmp_code {
 
 	MLX5_ASSERT(MLX5_TXOFF_CONFIG(INLINE));
 	MLX5_ASSERT(NB_SEGS(loc->mbuf) > 1);
+	if (MLX5_TXOFF_CONFIG(TXPP)) {
+		enum mlx5_txcmp_code wret;
+
+		/* Generate WAIT for scheduling if requested. */
+		wret = mlx5_tx_schedule_send(txq, loc, olx);
+		if (wret == MLX5_TXCMP_CODE_EXIT)
+			return MLX5_TXCMP_CODE_EXIT;
+		if (wret == MLX5_TXCMP_CODE_ERROR)
+			return MLX5_TXCMP_CODE_ERROR;
+	}
 	/*
 	 * First calculate data length to be inlined
 	 * to estimate the required space for WQE.
@@ -3730,6 +3844,16 @@ enum mlx5_txcmp_code {
 		uint8_t *dptr;
 
 		MLX5_ASSERT(NB_SEGS(loc->mbuf) == 1);
+		if (MLX5_TXOFF_CONFIG(TXPP)) {
+			enum mlx5_txcmp_code wret;
+
+			/* Generate WAIT for scheduling if requested. */
+			wret = mlx5_tx_schedule_send(txq, loc, olx);
+			if (wret == MLX5_TXCMP_CODE_EXIT)
+				return MLX5_TXCMP_CODE_EXIT;
+			if (wret == MLX5_TXCMP_CODE_ERROR)
+				return MLX5_TXCMP_CODE_ERROR;
+		}
 		dlen = rte_pktmbuf_data_len(loc->mbuf);
 		if (MLX5_TXOFF_CONFIG(VLAN) &&
 		    loc->mbuf->ol_flags & PKT_TX_VLAN_PKT) {
@@ -3892,7 +4016,7 @@ enum mlx5_txcmp_code {
  *  false - no match, eMPW should be restarted.
  */
 static __rte_always_inline bool
-mlx5_tx_match_empw(struct mlx5_txq_data *__rte_restrict txq __rte_unused,
+mlx5_tx_match_empw(struct mlx5_txq_data *__rte_restrict txq,
 		   struct mlx5_wqe_eseg *__rte_restrict es,
 		   struct mlx5_txq_local *__rte_restrict loc,
 		   uint32_t dlen,
@@ -3921,6 +4045,10 @@ enum mlx5_txcmp_code {
 	/* There must be no VLAN packets in eMPW loop. */
 	if (MLX5_TXOFF_CONFIG(VLAN))
 		MLX5_ASSERT(!(loc->mbuf->ol_flags & PKT_TX_VLAN_PKT));
+	/* Check if the scheduling is requested. */
+	if (MLX5_TXOFF_CONFIG(TXPP) &&
+	    loc->mbuf->ol_flags & txq->ts_mask)
+		return false;
 	return true;
 }
 
@@ -4106,6 +4234,16 @@ enum mlx5_txcmp_code {
 
 next_empw:
 		MLX5_ASSERT(NB_SEGS(loc->mbuf) == 1);
+		if (MLX5_TXOFF_CONFIG(TXPP)) {
+			enum mlx5_txcmp_code wret;
+
+			/* Generate WAIT for scheduling if requested. */
+			wret = mlx5_tx_schedule_send(txq, loc, olx);
+			if (wret == MLX5_TXCMP_CODE_EXIT)
+				return MLX5_TXCMP_CODE_EXIT;
+			if (wret == MLX5_TXCMP_CODE_ERROR)
+				return MLX5_TXCMP_CODE_ERROR;
+		}
 		part = RTE_MIN(pkts_n, MLX5_TXOFF_CONFIG(MPW) ?
 				       MLX5_MPW_MAX_PACKETS :
 				       MLX5_EMPW_MAX_PACKETS);
@@ -4201,6 +4339,7 @@ enum mlx5_txcmp_code {
 			 * - metadata value
 			 * - software parser settings
 			 * - packets length (legacy MPW only)
+			 * - scheduling is not required
 			 */
 			if (!mlx5_tx_match_empw(txq, eseg, loc, dlen, olx)) {
 				MLX5_ASSERT(loop);
@@ -4271,6 +4410,16 @@ enum mlx5_txcmp_code {
 		unsigned int slen = 0;
 
 		MLX5_ASSERT(NB_SEGS(loc->mbuf) == 1);
+		if (MLX5_TXOFF_CONFIG(TXPP)) {
+			enum mlx5_txcmp_code wret;
+
+			/* Generate WAIT for scheduling if requested. */
+			wret = mlx5_tx_schedule_send(txq, loc, olx);
+			if (wret == MLX5_TXCMP_CODE_EXIT)
+				return MLX5_TXCMP_CODE_EXIT;
+			if (wret == MLX5_TXCMP_CODE_ERROR)
+				return MLX5_TXCMP_CODE_ERROR;
+		}
 		/*
 		 * Limits the amount of packets in one WQE
 		 * to improve CQE latency generation.
@@ -4496,6 +4645,7 @@ enum mlx5_txcmp_code {
 			 * - metadata value
 			 * - software parser settings
 			 * - packets length (legacy MPW only)
+			 * - scheduling is not required
 			 */
 			if (!mlx5_tx_match_empw(txq, &wqem->eseg,
 						loc, dlen, olx))
@@ -4545,6 +4695,16 @@ enum mlx5_txcmp_code {
 		enum mlx5_txcmp_code ret;
 
 		MLX5_ASSERT(NB_SEGS(loc->mbuf) == 1);
+		if (MLX5_TXOFF_CONFIG(TXPP)) {
+			enum mlx5_txcmp_code wret;
+
+			/* Generate WAIT for scheduling if requested. */
+			wret = mlx5_tx_schedule_send(txq, loc, olx);
+			if (wret == MLX5_TXCMP_CODE_EXIT)
+				return MLX5_TXCMP_CODE_EXIT;
+			if (wret == MLX5_TXCMP_CODE_ERROR)
+				return MLX5_TXCMP_CODE_ERROR;
+		}
 		if (MLX5_TXOFF_CONFIG(INLINE)) {
 			unsigned int inlen, vlan = 0;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 14/17] net/mlx5: add read device clock support
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (12 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 13/17] net/mlx5: add scheduling support to send routine template Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 15/17] net/mlx5: provide the send scheduling error statistics Viacheslav Ovsiienko
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

If send schedule feature is engaged there is the Clock Queue
created, that reports reliable the current device clock counter
value. The device clock counter can be read directly from the
Clock Queue CQE.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/linux/mlx5_os.c |  4 ++-
 drivers/net/mlx5/mlx5.h          |  1 +
 drivers/net/mlx5/mlx5_txpp.c     | 55 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index ea36309..e7241d8 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -2348,7 +2348,7 @@
 	.xstats_get_names = mlx5_xstats_get_names,
 	.fw_version_get = mlx5_fw_version_get,
 	.dev_infos_get = mlx5_dev_infos_get,
-	.read_clock = mlx5_read_clock,
+	.read_clock = mlx5_txpp_read_clock,
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
@@ -2397,6 +2397,7 @@
 	.xstats_get_names = mlx5_xstats_get_names,
 	.fw_version_get = mlx5_fw_version_get,
 	.dev_infos_get = mlx5_dev_infos_get,
+	.read_clock = mlx5_txpp_read_clock,
 	.rx_descriptor_status = mlx5_rx_descriptor_status,
 	.tx_descriptor_status = mlx5_tx_descriptor_status,
 	.rxq_info_get = mlx5_rxq_info_get,
@@ -2427,6 +2428,7 @@
 	.xstats_get_names = mlx5_xstats_get_names,
 	.fw_version_get = mlx5_fw_version_get,
 	.dev_infos_get = mlx5_dev_infos_get,
+	.read_clock = mlx5_txpp_read_clock,
 	.dev_supported_ptypes_get = mlx5_dev_supported_ptypes_get,
 	.vlan_filter_set = mlx5_vlan_filter_set,
 	.rx_queue_setup = mlx5_rx_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3304f0d..15e35e8 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1010,6 +1010,7 @@ void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
 
 int mlx5_txpp_start(struct rte_eth_dev *dev);
 void mlx5_txpp_stop(struct rte_eth_dev *dev);
+int mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp);
 void mlx5_txpp_interrupt_handler(void *cb_arg);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index b7ba20a..3a592d9 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -1058,3 +1058,58 @@
 	MLX5_ASSERT(!ret);
 	RTE_SET_USED(ret);
 }
+
+/*
+ * Read the current clock counter of an Ethernet device
+ *
+ * This returns the current raw clock value of an Ethernet device. It is
+ * a raw amount of ticks, with no given time reference.
+ * The value returned here is from the same clock than the one
+ * filling timestamp field of Rx/Tx packets when using hardware timestamp
+ * offload. Therefore it can be used to compute a precise conversion of
+ * the device clock to the real time.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param clock
+ *   Pointer to the uint64_t that holds the raw clock value.
+ *
+ * @return
+ *   - 0: Success.
+ *   - -ENOTSUP: The function is not supported in this mode. Requires
+ *     packet pacing module configured and started (tx_pp devarg)
+ */
+int
+mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	int ret;
+
+	if (sh->txpp.refcnt) {
+		struct mlx5_txpp_wq *wq = &sh->txpp.clock_queue;
+		struct mlx5_cqe *cqe = (struct mlx5_cqe *)(uintptr_t)wq->cqes;
+		union {
+			rte_int128_t u128;
+			struct mlx5_cqe_ts cts;
+		} to;
+		uint64_t ts;
+
+		mlx5_atomic_read_cqe((rte_int128_t *)&cqe->timestamp, &to.u128);
+		if (to.cts.op_own >> 4) {
+			DRV_LOG(DEBUG, "Clock Queue error sync lost.");
+			rte_atomic32_inc(&sh->txpp.err_clock_queue);
+			sh->txpp.sync_lost = 1;
+			return -EIO;
+		}
+		ts = rte_be_to_cpu_64(to.cts.timestamp);
+		ts = mlx5_txpp_convert_rx_ts(sh, ts);
+		*timestamp = ts;
+		return 0;
+	}
+	/* Not supported in isolated mode - kernel does not see the CQEs. */
+	if (priv->isolated || rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return -ENOTSUP;
+	ret = mlx5_read_clock(dev, timestamp);
+	return ret;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 15/17] net/mlx5: provide the send scheduling error statistics
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (13 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 14/17] net/mlx5: add read device clock support Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 16/17] common/mlx5: add register access DevX routine Viacheslav Ovsiienko
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The mlx5 PMD exposes the following new introduced
extended statistics counter to report the errors
of packet send scheduling on timestamps:

  - txpp_err_miss_int - rearm queue interrupt was not handled
    was not handled in time and service routine might miss
    the completions

  - txpp_err_rearm_queue - reports errors in rearm queue
  - txpp_err_clock_queue - reports errors in clock queue

  - txpp_err_ts_past - timestamps in the packet being sent
    were found in the past, timestamps were ignored

  - txpp_err_ts_future - timestamps in the packet being sent
    were found in the too distant future (beyond HW/clock queue
    capabilities to schedule, typically it is about 16M of
    tx_pp devarg periods)

  - txpp_jitter - estimated jitter in device clocks between
    8K completions of Clock Queue.

  - txpp_wander - estimated wander in device clocks between
    16M completions of Clock Queue.

  - txpp_sync_lost - error flag, the Clock Queue completions
    synchronization is lost, accurate packet scheduling can
    not be handled, timestamps are being ignored, the restart
    of all ports using scheduling must be performed.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/mlx5.h       |   7 ++
 drivers/net/mlx5/mlx5_stats.c |   7 +-
 drivers/net/mlx5/mlx5_txpp.c  | 220 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 15e35e8..6e4f055 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1011,6 +1011,13 @@ void mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
 int mlx5_txpp_start(struct rte_eth_dev *dev);
 void mlx5_txpp_stop(struct rte_eth_dev *dev);
 int mlx5_txpp_read_clock(struct rte_eth_dev *dev, uint64_t *timestamp);
+int mlx5_txpp_xstats_get(struct rte_eth_dev *dev,
+			 struct rte_eth_xstat *stats,
+			 unsigned int n, unsigned int n_used);
+int mlx5_txpp_xstats_reset(struct rte_eth_dev *dev);
+int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev,
+			       struct rte_eth_xstat_name *xstats_names,
+			       unsigned int n, unsigned int n_used);
 void mlx5_txpp_interrupt_handler(void *cb_arg);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index a9b33ee..e30542e 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -75,6 +75,7 @@
 			}
 		}
 	}
+	mlx5_stats_n = mlx5_txpp_xstats_get(dev, stats, n, mlx5_stats_n);
 	return mlx5_stats_n;
 }
 
@@ -237,7 +238,7 @@
 		xstats_ctrl->base[i] = counters[i];
 		xstats_ctrl->hw_stats[i] = 0;
 	}
-
+	mlx5_txpp_xstats_reset(dev);
 	return 0;
 }
 
@@ -255,7 +256,7 @@
  *   Number of xstats names.
  */
 int
-mlx5_xstats_get_names(struct rte_eth_dev *dev __rte_unused,
+mlx5_xstats_get_names(struct rte_eth_dev *dev,
 		      struct rte_eth_xstat_name *xstats_names, unsigned int n)
 {
 	unsigned int i;
@@ -271,5 +272,7 @@
 			xstats_names[i].name[RTE_ETH_XSTATS_NAME_SIZE - 1] = 0;
 		}
 	}
+	mlx5_xstats_n = mlx5_txpp_xstats_get_names(dev, xstats_names,
+						   n, mlx5_xstats_n);
 	return mlx5_xstats_n;
 }
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 3a592d9..15c9a8e 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -15,6 +15,17 @@
 #include "mlx5_rxtx.h"
 #include "mlx5_common_os.h"
 
+static const char * const mlx5_txpp_stat_names[] = {
+	"txpp_err_miss_int", /* Missed service interrupt. */
+	"txpp_err_rearm_queue",	/* Rearm Queue errors. */
+	"txpp_err_clock_queue", /* Clock Queue errors. */
+	"txpp_err_ts_past", /* Timestamp in the past. */
+	"txpp_err_ts_future", /* Timestamp in the distant future. */
+	"txpp_jitter", /* Timestamp jitter (one Clock Queue completion). */
+	"txpp_wander", /* Timestamp jitter (half of Clock Queue completions). */
+	"txpp_sync_lost", /* Scheduling synchronization lost. */
+};
+
 /* Destroy Event Queue Notification Channel. */
 static void
 mlx5_txpp_destroy_eqn(struct mlx5_dev_ctx_shared *sh)
@@ -1113,3 +1124,212 @@
 	ret = mlx5_read_clock(dev, timestamp);
 	return ret;
 }
+
+/**
+ * DPDK callback to clear device extended statistics.
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ *
+ * @return
+ *   0 on success and stats is reset, negative errno value otherwise and
+ *   rte_errno is set.
+ */
+int mlx5_txpp_xstats_reset(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+
+	rte_atomic32_set(&sh->txpp.err_miss_int, 0);
+	rte_atomic32_set(&sh->txpp.err_rearm_queue, 0);
+	rte_atomic32_set(&sh->txpp.err_clock_queue, 0);
+	rte_atomic32_set(&sh->txpp.err_ts_past, 0);
+	rte_atomic32_set(&sh->txpp.err_ts_future, 0);
+	return 0;
+}
+
+/**
+ * Routine to retrieve names of extended device statistics
+ * for packet send scheduling. It appends the specific stats names
+ * after the parts filled by preceding modules (eth stats, etc.)
+ *
+ * @param dev
+ *   Pointer to Ethernet device structure.
+ * @param[out] xstats_names
+ *   Buffer to insert names into.
+ * @param n
+ *   Number of names.
+ * @param n_used
+ *   Number of names filled by preceding statistics modules.
+ *
+ * @return
+ *   Number of xstats names.
+ */
+int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev __rte_unused,
+			       struct rte_eth_xstat_name *xstats_names,
+			       unsigned int n, unsigned int n_used)
+{
+	unsigned int n_txpp = RTE_DIM(mlx5_txpp_stat_names);
+	unsigned int i;
+
+	if (n >= n_used + n_txpp && xstats_names) {
+		for (i = 0; i < n_txpp; ++i) {
+			strncpy(xstats_names[i + n_used].name,
+				mlx5_txpp_stat_names[i],
+				RTE_ETH_XSTATS_NAME_SIZE);
+			xstats_names[i + n_used].name
+					[RTE_ETH_XSTATS_NAME_SIZE - 1] = 0;
+		}
+	}
+	return n_used + n_txpp;
+}
+
+static inline void
+mlx5_txpp_read_tsa(struct mlx5_dev_txpp *txpp,
+		   struct mlx5_txpp_ts *tsa, uint16_t idx)
+{
+	do {
+		int64_t ts, ci;
+
+		ts = rte_atomic64_read(&txpp->tsa[idx].ts);
+		ci = rte_atomic64_read(&txpp->tsa[idx].ci_ts);
+		rte_compiler_barrier();
+		if ((ci ^ ts) << MLX5_CQ_INDEX_WIDTH != 0)
+			continue;
+		if (rte_atomic64_read(&txpp->tsa[idx].ts) != ts)
+			continue;
+		if (rte_atomic64_read(&txpp->tsa[idx].ci_ts) != ci)
+			continue;
+		rte_atomic64_set(&tsa->ts, ts);
+		rte_atomic64_set(&tsa->ci_ts, ci);
+		return;
+	} while (true);
+}
+
+/*
+ * Jitter reflects the clock change between
+ * neighbours Clock Queue completions.
+ */
+static uint64_t
+mlx5_txpp_xstats_jitter(struct mlx5_dev_txpp *txpp)
+{
+	struct mlx5_txpp_ts tsa0, tsa1;
+	int64_t dts, dci;
+	uint16_t ts_p;
+
+	if (txpp->ts_n < 2) {
+		/* No gathered enough reports yet. */
+		return 0;
+	}
+	do {
+		int ts_0, ts_1;
+
+		ts_p = txpp->ts_p;
+		rte_compiler_barrier();
+		ts_0 = ts_p - 2;
+		if (ts_0 < 0)
+			ts_0 += MLX5_TXPP_REARM_SQ_SIZE;
+		ts_1 = ts_p - 1;
+		if (ts_1 < 0)
+			ts_1 += MLX5_TXPP_REARM_SQ_SIZE;
+		mlx5_txpp_read_tsa(txpp, &tsa0, ts_0);
+		mlx5_txpp_read_tsa(txpp, &tsa1, ts_1);
+		rte_compiler_barrier();
+	} while (ts_p != txpp->ts_p);
+	/* We have two neighbor reports, calculate the jitter. */
+	dts = rte_atomic64_read(&tsa1.ts) - rte_atomic64_read(&tsa0.ts);
+	dci = (rte_atomic64_read(&tsa1.ci_ts) >> (64 - MLX5_CQ_INDEX_WIDTH)) -
+	      (rte_atomic64_read(&tsa0.ci_ts) >> (64 - MLX5_CQ_INDEX_WIDTH));
+	if (dci < 0)
+		dci += 1 << MLX5_CQ_INDEX_WIDTH;
+	dci *= txpp->tick;
+	return (dts > dci) ? dts - dci : dci - dts;
+}
+
+/*
+ * Wander reflects the long-term clock change
+ * over the entire length of all Clock Queue completions.
+ */
+static uint64_t
+mlx5_txpp_xstats_wander(struct mlx5_dev_txpp *txpp)
+{
+	struct mlx5_txpp_ts tsa0, tsa1;
+	int64_t dts, dci;
+	uint16_t ts_p;
+
+	if (txpp->ts_n < MLX5_TXPP_REARM_SQ_SIZE) {
+		/* No gathered enough reports yet. */
+		return 0;
+	}
+	do {
+		int ts_0, ts_1;
+
+		ts_p = txpp->ts_p;
+		rte_compiler_barrier();
+		ts_0 = ts_p - MLX5_TXPP_REARM_SQ_SIZE / 2 - 1;
+		if (ts_0 < 0)
+			ts_0 += MLX5_TXPP_REARM_SQ_SIZE;
+		ts_1 = ts_p - 1;
+		if (ts_1 < 0)
+			ts_1 += MLX5_TXPP_REARM_SQ_SIZE;
+		mlx5_txpp_read_tsa(txpp, &tsa0, ts_0);
+		mlx5_txpp_read_tsa(txpp, &tsa1, ts_1);
+		rte_compiler_barrier();
+	} while (ts_p != txpp->ts_p);
+	/* We have two neighbor reports, calculate the jitter. */
+	dts = rte_atomic64_read(&tsa1.ts) - rte_atomic64_read(&tsa0.ts);
+	dci = (rte_atomic64_read(&tsa1.ci_ts) >> (64 - MLX5_CQ_INDEX_WIDTH)) -
+	      (rte_atomic64_read(&tsa0.ci_ts) >> (64 - MLX5_CQ_INDEX_WIDTH));
+	dci += 1 << MLX5_CQ_INDEX_WIDTH;
+	dci *= txpp->tick;
+	return (dts > dci) ? dts - dci : dci - dts;
+}
+
+/**
+ * Routine to retrieve extended device statistics
+ * for packet send scheduling. It appends the specific statistics
+ * after the parts filled by preceding modules (eth stats, etc.)
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param[out] stats
+ *   Pointer to rte extended stats table.
+ * @param n
+ *   The size of the stats table.
+ * @param n_used
+ *   Number of stats filled by preceding statistics modules.
+ *
+ * @return
+ *   Number of extended stats on success and stats is filled,
+ *   negative on error and rte_errno is set.
+ */
+int
+mlx5_txpp_xstats_get(struct rte_eth_dev *dev,
+		     struct rte_eth_xstat *stats,
+		     unsigned int n, unsigned int n_used)
+{
+	unsigned int n_txpp = RTE_DIM(mlx5_txpp_stat_names);
+
+	if (n >= n_used + n_txpp && stats) {
+		struct mlx5_priv *priv = dev->data->dev_private;
+		struct mlx5_dev_ctx_shared *sh = priv->sh;
+		unsigned int i;
+
+		for (i = 0; i < n_txpp; ++i)
+			stats[n_used + i].id = n_used + i;
+		stats[n_used + 0].value =
+				rte_atomic32_read(&sh->txpp.err_miss_int);
+		stats[n_used + 1].value =
+				rte_atomic32_read(&sh->txpp.err_rearm_queue);
+		stats[n_used + 2].value =
+				rte_atomic32_read(&sh->txpp.err_clock_queue);
+		stats[n_used + 3].value =
+				rte_atomic32_read(&sh->txpp.err_ts_past);
+		stats[n_used + 4].value =
+				rte_atomic32_read(&sh->txpp.err_ts_future);
+		stats[n_used + 5].value = mlx5_txpp_xstats_jitter(&sh->txpp);
+		stats[n_used + 6].value = mlx5_txpp_xstats_wander(&sh->txpp);
+		stats[n_used + 7].value = sh->txpp.sync_lost;
+	}
+	return n_used + n_txpp;
+}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 16/17] common/mlx5: add register access DevX routine
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (14 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 15/17] net/mlx5: provide the send scheduling error statistics Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 17/17] net/mlx5: convert Rx timestamps in realtime format Viacheslav Ovsiienko
  2020-07-16 20:20   ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Raslan Darawsheh
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The DevX routine to read/write NIC registers via DevX API is added.
This is the preparation step to check timestamp modes and units
and gather the extended statistics.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c            | 60 +++++++++++++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h            |  3 ++
 drivers/common/mlx5/mlx5_prm.h                  | 52 +++++++++++++++++++++
 drivers/common/mlx5/rte_common_mlx5_version.map |  1 +
 4 files changed, 116 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 093636c..13cd76a 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -12,6 +12,66 @@
 
 
 /**
+ * Perform read access to the registers. Reads data from register
+ * and writes ones to the specified buffer.
+ *
+ * @param[in] ctx
+ *   Context returned from mlx5 open_device() glue function.
+ * @param[in] reg_id
+ *   Register identifier according to the PRM.
+ * @param[in] arg
+ *   Register access auxiliary parameter according to the PRM.
+ * @param[out] data
+ *   Pointer to the buffer to store read data.
+ * @param[in] dw_cnt
+ *   Buffer size in double words.
+ *
+ * @return
+ *   0 on success, a negative value otherwise.
+ */
+int
+mlx5_devx_cmd_register_read(void *ctx, uint16_t reg_id, uint32_t arg,
+			    uint32_t *data, uint32_t dw_cnt)
+{
+	uint32_t in[MLX5_ST_SZ_DW(access_register_in)]   = {0};
+	uint32_t out[MLX5_ST_SZ_DW(access_register_out) +
+		     MLX5_ACCESS_REGISTER_DATA_DWORD_MAX] = {0};
+	int status, rc;
+
+	MLX5_ASSERT(data && dw_cnt);
+	MLX5_ASSERT(dw_cnt <= MLX5_ACCESS_REGISTER_DATA_DWORD_MAX);
+	if (dw_cnt  > MLX5_ACCESS_REGISTER_DATA_DWORD_MAX) {
+		DRV_LOG(ERR, "Not enough  buffer for register read data");
+		return -1;
+	}
+	MLX5_SET(access_register_in, in, opcode, MLX5_CMD_OP_ACCESS_REGISTER);
+	MLX5_SET(access_register_in, in, op_mod,
+					MLX5_ACCESS_REGISTER_IN_OP_MOD_READ);
+	MLX5_SET(access_register_in, in, register_id, reg_id);
+	MLX5_SET(access_register_in, in, argument, arg);
+	rc = mlx5_glue->devx_general_cmd(ctx, in, sizeof(in), out,
+					 MLX5_ST_SZ_DW(access_register_out) *
+					 sizeof(uint32_t) + dw_cnt);
+	if (rc)
+		goto error;
+	status = MLX5_GET(access_register_out, out, status);
+	if (status) {
+		int syndrome = MLX5_GET(access_register_out, out, syndrome);
+
+		DRV_LOG(DEBUG, "Failed to access NIC register 0x%X, "
+			       "status %x, syndrome = %x",
+			       reg_id, status, syndrome);
+		return -1;
+	}
+	memcpy(data, &out[MLX5_ST_SZ_DW(access_register_out)],
+	       dw_cnt * sizeof(uint32_t));
+	return 0;
+error:
+	rc = (rc > 0) ? -rc : rc;
+	return rc;
+}
+
+/**
  * Allocate flow counters via devx interface.
  *
  * @param[in] ctx
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index c79b349..34482e1 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -383,6 +383,9 @@ int mlx5_devx_cmd_modify_qp_state(struct mlx5_devx_obj *qp,
 int mlx5_devx_cmd_modify_rqt(struct mlx5_devx_obj *rqt,
 			     struct mlx5_devx_rqt_attr *rqt_attr);
 
+__rte_internal
+int mlx5_devx_cmd_register_read(void *ctx, uint16_t reg_id,
+				uint32_t arg, uint32_t *data, uint32_t dw_cnt);
 /**
  * Create virtio queue counters object DevX API.
  *
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index cf47103..b37be30 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -776,6 +776,7 @@ enum {
 	MLX5_CMD_OP_SUSPEND_QP = 0x50F,
 	MLX5_CMD_OP_RESUME_QP = 0x510,
 	MLX5_CMD_OP_QUERY_NIC_VPORT_CONTEXT = 0x754,
+	MLX5_CMD_OP_ACCESS_REGISTER = 0x805,
 	MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN = 0x816,
 	MLX5_CMD_OP_CREATE_TIR = 0x900,
 	MLX5_CMD_OP_CREATE_SQ = 0X904,
@@ -2545,6 +2546,57 @@ struct mlx5_ifc_set_pp_rate_limit_context_bits {
 	u8 reserved_at_60[0x120];
 };
 
+#define MLX5_ACCESS_REGISTER_DATA_DWORD_MAX 8u
+
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-Wpedantic"
+#endif
+struct mlx5_ifc_access_register_out_bits {
+	u8 status[0x8];
+	u8 reserved_at_8[0x18];
+	u8 syndrome[0x20];
+	u8 reserved_at_40[0x40];
+	u8 register_data[0][0x20];
+};
+
+struct mlx5_ifc_access_register_in_bits {
+	u8 opcode[0x10];
+	u8 reserved_at_10[0x10];
+	u8 reserved_at_20[0x10];
+	u8 op_mod[0x10];
+	u8 reserved_at_40[0x10];
+	u8 register_id[0x10];
+	u8 argument[0x20];
+	u8 register_data[0][0x20];
+};
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-Wpedantic"
+#endif
+
+enum {
+	MLX5_ACCESS_REGISTER_IN_OP_MOD_WRITE  = 0x0,
+	MLX5_ACCESS_REGISTER_IN_OP_MOD_READ   = 0x1,
+};
+
+enum {
+	MLX5_REGISTER_ID_MTUTC  = 0x9055,
+};
+
+struct mlx5_ifc_register_mtutc_bits {
+	u8 time_stamp_mode[0x2];
+	u8 time_stamp_state[0x2];
+	u8 reserved_at_4[0x18];
+	u8 operation[0x4];
+	u8 freq_adjustment[0x20];
+	u8 reserved_at_40[0x40];
+	u8 utc_sec[0x20];
+	u8 utc_nsec[0x20];
+	u8 time_adjustment[0x20];
+};
+
+#define MLX5_MTUTC_TIMESTAMP_MODE_INTERNAL_TIMER 0
+#define MLX5_MTUTC_TIMESTAMP_MODE_REAL_TIME 1
+
 /* CQE format mask. */
 #define MLX5E_CQE_FORMAT_MASK 0xc
 
diff --git a/drivers/common/mlx5/rte_common_mlx5_version.map b/drivers/common/mlx5/rte_common_mlx5_version.map
index ae57ebd..68007ef 100644
--- a/drivers/common/mlx5/rte_common_mlx5_version.map
+++ b/drivers/common/mlx5/rte_common_mlx5_version.map
@@ -34,6 +34,7 @@ INTERNAL {
 	mlx5_devx_cmd_query_hca_attr;
 	mlx5_devx_cmd_query_virtio_q_counters;
 	mlx5_devx_cmd_query_virtq;
+	mlx5_devx_cmd_register_read;
 	mlx5_devx_get_out_command_status;
 
 	mlx5_get_ifname_sysfs;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH v3 17/17] net/mlx5: convert Rx timestamps in realtime format
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (15 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 16/17] common/mlx5: add register access DevX routine Viacheslav Ovsiienko
@ 2020-07-16  8:23   ` Viacheslav Ovsiienko
  2020-07-16 20:20   ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Raslan Darawsheh
  17 siblings, 0 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-16  8:23 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, thomas, ferruh.yigit

The ConnectX-6DX supports the timestamps in various formats,
the new realtime format is introduced - the upper 32-bit word
of timestamp contains the UTC seconds and the lower 32-bit word
contains the nanoseconds. This patch detects what format is
configured in the NIC and performs the conversion accordingly.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/mlx5/linux/mlx5_os.c         | 21 +++++++++++++
 drivers/net/mlx5/mlx5.h                  |  1 +
 drivers/net/mlx5/mlx5_rxq.c              | 24 ++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.c             |  6 +++-
 drivers/net/mlx5/mlx5_rxtx.h             |  4 +++
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 34 +++++++++++++++-----
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h    | 54 ++++++++++++++++++++++----------
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h     | 34 +++++++++++++++-----
 drivers/net/mlx5/mlx5_trigger.c          |  6 ++--
 9 files changed, 149 insertions(+), 35 deletions(-)

diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index e7241d8..f228bab 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -936,6 +936,27 @@
 		goto error;
 #endif
 	}
+	if (config.devx) {
+		uint32_t reg[MLX5_ST_SZ_DW(register_mtutc)];
+
+		err = mlx5_devx_cmd_register_read
+			(sh->ctx, MLX5_REGISTER_ID_MTUTC, 0,
+			reg, MLX5_ST_SZ_DW(register_mtutc));
+		if (!err) {
+			uint32_t ts_mode;
+
+			/* MTUTC register is read successfully. */
+			ts_mode = MLX5_GET(register_mtutc, reg,
+					   time_stamp_mode);
+			if (ts_mode == MLX5_MTUTC_TIMESTAMP_MODE_REAL_TIME)
+				config.rt_timestamp = 1;
+		} else {
+			/* Kernel does not support register reading. */
+			if (config.hca_attr.dev_freq_khz ==
+						 (NS_PER_S / MS_PER_S))
+				config.rt_timestamp = 1;
+		}
+	}
 	if (config.mprq.enabled && mprq) {
 		if (config.mprq.stride_num_n &&
 		    (config.mprq.stride_num_n > mprq_max_stride_num_n ||
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6e4f055..1aa3a3e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -216,6 +216,7 @@ struct mlx5_dev_config {
 	unsigned int devx:1; /* Whether devx interface is available or not. */
 	unsigned int dest_tir:1; /* Whether advanced DR API is available. */
 	unsigned int reclaim_mode:2; /* Memory reclaim mode. */
+	unsigned int rt_timestamp:1; /* realtime timestamp format. */
 	struct {
 		unsigned int enabled:1; /* Whether MPRQ is enabled. */
 		unsigned int stride_num_n; /* Number of strides. */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 2681322..7dd06e8 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -2972,3 +2972,27 @@ struct mlx5_hrxq *
 		priv->drop_queue.hrxq = NULL;
 	}
 }
+
+
+/**
+ * Set the Rx queue timestamp conversion parameters
+ *
+ * @param[in] dev
+ *   Pointer to the Ethernet device structure.
+ */
+void
+mlx5_rxq_timestamp_set(struct rte_eth_dev *dev)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_rxq_data *data;
+	unsigned int i;
+
+	for (i = 0; i != priv->rxqs_n; ++i) {
+		if (!(*priv->rxqs)[i])
+			continue;
+		data = (*priv->rxqs)[i];
+		data->sh = sh;
+		data->rt_timestamp = priv->config.rt_timestamp;
+	}
+}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index e511142..65239f9 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1389,7 +1389,11 @@ enum mlx5_txcmp_code {
 		pkt->vlan_tci = rte_be_to_cpu_16(cqe->vlan_info);
 	}
 	if (rxq->hw_timestamp) {
-		pkt->timestamp = rte_be_to_cpu_64(cqe->timestamp);
+		uint64_t ts = rte_be_to_cpu_64(cqe->timestamp);
+
+		if (rxq->rt_timestamp)
+			ts = mlx5_txpp_convert_rx_ts(rxq->sh, ts);
+		pkt->timestamp = ts;
 		pkt->ol_flags |= PKT_RX_TIMESTAMP;
 	}
 }
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d082cd7..5116a15 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -109,6 +109,7 @@ enum mlx5_rxq_err_state {
 struct mlx5_rxq_data {
 	unsigned int csum:1; /* Enable checksum offloading. */
 	unsigned int hw_timestamp:1; /* Enable HW timestamp. */
+	unsigned int rt_timestamp:1; /* Realtime timestamp format. */
 	unsigned int vlan_strip:1; /* Enable VLAN stripping. */
 	unsigned int crc_present:1; /* CRC must be subtracted. */
 	unsigned int sges_n:3; /* Log 2 of SGEs (max buffers per packet). */
@@ -148,6 +149,7 @@ struct mlx5_rxq_data {
 	struct rte_mempool *mp;
 	struct rte_mempool *mprq_mp; /* Mempool for Multi-Packet RQ. */
 	struct mlx5_mprq_buf *mprq_repl; /* Stashed mbuf for replenish. */
+	struct mlx5_dev_ctx_shared *sh; /* Shared context. */
 	uint16_t idx; /* Queue index. */
 	struct mlx5_rxq_stats stats;
 	rte_xmm_t mbuf_initializer; /* Default rearm/flags for vectorized Rx. */
@@ -442,6 +444,8 @@ uint32_t mlx5_hrxq_get(struct rte_eth_dev *dev,
 void mlx5_hrxq_drop_release(struct rte_eth_dev *dev);
 uint64_t mlx5_get_rx_port_offloads(void);
 uint64_t mlx5_get_rx_queue_offloads(struct rte_eth_dev *dev);
+void mlx5_rxq_timestamp_set(struct rte_eth_dev *dev);
+
 
 /* mlx5_txq.c */
 
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
index b55138a..f5414ee 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h
@@ -1024,14 +1024,32 @@
 		/* D.5 fill in mbuf - rearm_data and packet_type. */
 		rxq_cq_to_ptype_oflags_v(rxq, cqes, opcode, &pkts[pos]);
 		if (rxq->hw_timestamp) {
-			pkts[pos]->timestamp =
-				rte_be_to_cpu_64(cq[pos].timestamp);
-			pkts[pos + 1]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p1].timestamp);
-			pkts[pos + 2]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p2].timestamp);
-			pkts[pos + 3]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p3].timestamp);
+			if (rxq->rt_timestamp) {
+				struct mlx5_dev_ctx_shared *sh = rxq->sh;
+				uint64_t ts;
+
+				ts = rte_be_to_cpu_64(cq[pos].timestamp);
+				pkts[pos]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p1].timestamp);
+				pkts[pos + 1]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p2].timestamp);
+				pkts[pos + 2]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p3].timestamp);
+				pkts[pos + 3]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+			} else {
+				pkts[pos]->timestamp = rte_be_to_cpu_64
+						(cq[pos].timestamp);
+				pkts[pos + 1]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p1].timestamp);
+				pkts[pos + 2]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p2].timestamp);
+				pkts[pos + 3]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p3].timestamp);
+			}
 		}
 		if (rxq->dynf_meta) {
 			uint64_t flag = rxq->flow_meta_mask;
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 3007c03..555c342 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -694,22 +694,44 @@
 		rxq_cq_to_ptype_oflags_v(rxq, ptype_info, flow_tag,
 					 opcode, &elts[pos]);
 		if (rxq->hw_timestamp) {
-			elts[pos]->timestamp =
-				rte_be_to_cpu_64(
-					container_of(p0, struct mlx5_cqe,
-						     pkt_info)->timestamp);
-			elts[pos + 1]->timestamp =
-				rte_be_to_cpu_64(
-					container_of(p1, struct mlx5_cqe,
-						     pkt_info)->timestamp);
-			elts[pos + 2]->timestamp =
-				rte_be_to_cpu_64(
-					container_of(p2, struct mlx5_cqe,
-						     pkt_info)->timestamp);
-			elts[pos + 3]->timestamp =
-				rte_be_to_cpu_64(
-					container_of(p3, struct mlx5_cqe,
-						     pkt_info)->timestamp);
+			if (rxq->rt_timestamp) {
+				struct mlx5_dev_ctx_shared *sh = rxq->sh;
+				uint64_t ts;
+
+				ts = rte_be_to_cpu_64
+					(container_of(p0, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64
+					(container_of(p1, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 1]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64
+					(container_of(p2, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 2]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64
+					(container_of(p3, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 3]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+			} else {
+				elts[pos]->timestamp = rte_be_to_cpu_64
+					(container_of(p0, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 1]->timestamp = rte_be_to_cpu_64
+					(container_of(p1, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 2]->timestamp = rte_be_to_cpu_64
+					(container_of(p2, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+				elts[pos + 3]->timestamp = rte_be_to_cpu_64
+					(container_of(p3, struct mlx5_cqe,
+						      pkt_info)->timestamp);
+			}
 		}
 		if (!!rxq->flow_meta_mask) {
 			/* This code is subject for futher optimization. */
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index da5960a..34e3397 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -655,14 +655,32 @@
 		/* D.5 fill in mbuf - rearm_data and packet_type. */
 		rxq_cq_to_ptype_oflags_v(rxq, cqes, opcode, &pkts[pos]);
 		if (rxq->hw_timestamp) {
-			pkts[pos]->timestamp =
-				rte_be_to_cpu_64(cq[pos].timestamp);
-			pkts[pos + 1]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p1].timestamp);
-			pkts[pos + 2]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p2].timestamp);
-			pkts[pos + 3]->timestamp =
-				rte_be_to_cpu_64(cq[pos + p3].timestamp);
+			if (rxq->rt_timestamp) {
+				struct mlx5_dev_ctx_shared *sh = rxq->sh;
+				uint64_t ts;
+
+				ts = rte_be_to_cpu_64(cq[pos].timestamp);
+				pkts[pos]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p1].timestamp);
+				pkts[pos + 1]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p2].timestamp);
+				pkts[pos + 2]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+				ts = rte_be_to_cpu_64(cq[pos + p3].timestamp);
+				pkts[pos + 3]->timestamp =
+					mlx5_txpp_convert_rx_ts(sh, ts);
+			} else {
+				pkts[pos]->timestamp = rte_be_to_cpu_64
+						(cq[pos].timestamp);
+				pkts[pos + 1]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p1].timestamp);
+				pkts[pos + 2]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p2].timestamp);
+				pkts[pos + 3]->timestamp = rte_be_to_cpu_64
+						(cq[pos + p3].timestamp);
+			}
 		}
 		if (rxq->dynf_meta) {
 			/* This code is subject for futher optimization. */
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 29aef54..6e5a730 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -329,9 +329,11 @@
 			dev->data->port_id);
 		goto error;
 	}
-	/* Set a mask and offset of dynamic metadata flows into Rx queues*/
+	/* Set a mask and offset of dynamic metadata flows into Rx queues. */
 	mlx5_flow_rxq_dynf_metadata_set(dev);
-	/* Set a mask and offset of scheduling on timestamp into Tx queues*/
+	/* Set flags and context to convert Rx timestamps. */
+	mlx5_rxq_timestamp_set(dev);
+	/* Set a mask and offset of scheduling on timestamp into Tx queues. */
 	mlx5_txq_dynf_timestamp_set(dev);
 	/*
 	 * In non-cached mode, it only needs to start the default mreg copy
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                     ` (16 preceding siblings ...)
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 17/17] net/mlx5: convert Rx timestamps in realtime format Viacheslav Ovsiienko
@ 2020-07-16 20:20   ` Raslan Darawsheh
  17 siblings, 0 replies; 86+ messages in thread
From: Raslan Darawsheh @ 2020-07-16 20:20 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, olivier.matz, Thomas Monjalon, ferruh.yigit

Hi,

> -----Original Message-----
> From: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Sent: Thursday, July 16, 2020 11:23 AM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com; Thomas Monjalon
> <thomas@monjalon.net>; ferruh.yigit@intel.com
> Subject: [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling
> 
> The accurate packet Tx scheduling on timestamps was introduced [1].
> This patchset provides the feature implementation on mlx5 PMD side.
> 
> [1] http://patches.dpdk.org/patch/73742/
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Acked-by: Matan Azrad <matan@mellanox.com>
> 
> ---
> v3: - 32-bit systems rte_atomic128_cmp_exchange() fix
>     - strncpy coverity issue (bug)fix
>     - codestyle compilation issues
>     - negative tx_pp is always enabled
> 
> Viacheslav Ovsiienko (17):
>   common/mlx5: update common part to support packet pacing
>   net/mlx5: introduce send scheduling devargs
>   net/mlx5: fix UAR lock sharing for multiport devices
>   net/mlx5: introduce shared UAR resource
>   net/mlx5: create clock queue for packet pacing
>   net/mlx5: create rearm queue for packet pacing
>   net/mlx5: create Tx queues with DevX
>   net/mlx5: allocate packet pacing context
>   net/mlx5: introduce clock queue service routine
>   net/mlx5: prepare Tx queue structures to support timestamp
>   net/mlx5: convert timestamp to completion index
>   net/mlx5: prepare Tx datapath to support sheduling
>   net/mlx5: add scheduling support to send routine template
>   net/mlx5: add read device clock support
>   net/mlx5: provide the send scheduling error statistics
>   common/mlx5: add register access DevX routine
>   net/mlx5: convert Rx timestamps in realtime format
> 
>  doc/guides/nics/mlx5.rst                        |   37 +
>  drivers/common/mlx5/Makefile                    |   25 +
>  drivers/common/mlx5/linux/meson.build           |   10 +
>  drivers/common/mlx5/linux/mlx5_glue.c           |   31 +-
>  drivers/common/mlx5/linux/mlx5_glue.h           |    5 +
>  drivers/common/mlx5/mlx5_devx_cmds.c            |   79 +-
>  drivers/common/mlx5/mlx5_devx_cmds.h            |   13 +
>  drivers/common/mlx5/mlx5_prm.h                  |  121 +-
>  drivers/common/mlx5/rte_common_mlx5_version.map |    1 +
>  drivers/net/mlx5/Makefile                       |    1 +
>  drivers/net/mlx5/linux/mlx5_os.c                |   97 +-
>  drivers/net/mlx5/meson.build                    |    1 +
>  drivers/net/mlx5/mlx5.c                         |   61 +-
>  drivers/net/mlx5/mlx5.h                         |   90 +-
>  drivers/net/mlx5/mlx5_defs.h                    |   11 +
>  drivers/net/mlx5/mlx5_rxq.c                     |   26 +-
>  drivers/net/mlx5/mlx5_rxtx.c                    |  373 ++++++-
>  drivers/net/mlx5/mlx5_rxtx.h                    |   97 ++
>  drivers/net/mlx5/mlx5_rxtx_vec_altivec.h        |   34 +-
>  drivers/net/mlx5/mlx5_rxtx_vec_neon.h           |   54 +-
>  drivers/net/mlx5/mlx5_rxtx_vec_sse.h            |   34 +-
>  drivers/net/mlx5/mlx5_stats.c                   |    7 +-
>  drivers/net/mlx5/mlx5_trigger.c                 |   26 +-
>  drivers/net/mlx5/mlx5_txpp.c                    | 1335
> +++++++++++++++++++++++
>  drivers/net/mlx5/mlx5_txq.c                     |  342 +++++-
>  25 files changed, 2802 insertions(+), 109 deletions(-)
>  create mode 100644 drivers/net/mlx5/mlx5_txpp.c
> 
> --
> 1.8.3.1


Series applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event
  2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (9 preceding siblings ...)
  2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-17 14:28 ` Viacheslav Ovsiienko
  2020-07-17 14:28   ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix compilation issue with atomic128 exchange Viacheslav Ovsiienko
                     ` (2 more replies)
  10 siblings, 3 replies; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-17 14:28 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, thomas

If there is no support for DevX event the mlx5_txpp_gather_timestamp()
is reported as "not used" causing the compilation warning/error.
This patch moves the routine to the conditional block.

Fixes: 244788055199 ("net/mlx5: introduce clock queue service routine")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_txpp.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 15c9a8e..42776b8 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -719,21 +719,6 @@
 	mlx5_txpp_cache_timestamp(sh, ts, wq->cq_ci);
 }
 
-/* Gather statistics for timestamp from Clock Queue CQE. */
-static inline void
-mlx5_txpp_gather_timestamp(struct mlx5_dev_ctx_shared *sh)
-{
-	/* Check whether we have a valid timestamp. */
-	if (!sh->txpp.clock_queue.sq_ci && !sh->txpp.ts_n)
-		return;
-	MLX5_ASSERT(sh->txpp.ts_p < MLX5_TXPP_REARM_SQ_SIZE);
-	sh->txpp.tsa[sh->txpp.ts_p] = sh->txpp.ts;
-	if (++sh->txpp.ts_p >= MLX5_TXPP_REARM_SQ_SIZE)
-		sh->txpp.ts_p = 0;
-	if (sh->txpp.ts_n < MLX5_TXPP_REARM_SQ_SIZE)
-		++sh->txpp.ts_n;
-}
-
 /* Waits for the first completion on Clock Queue to init timestamp. */
 static inline void
 mlx5_txpp_init_timestamp(struct mlx5_dev_ctx_shared *sh)
@@ -759,6 +744,21 @@
 }
 
 #ifdef HAVE_IBV_DEVX_EVENT
+/* Gather statistics for timestamp from Clock Queue CQE. */
+static inline void
+mlx5_txpp_gather_timestamp(struct mlx5_dev_ctx_shared *sh)
+{
+	/* Check whether we have a valid timestamp. */
+	if (!sh->txpp.clock_queue.sq_ci && !sh->txpp.ts_n)
+		return;
+	MLX5_ASSERT(sh->txpp.ts_p < MLX5_TXPP_REARM_SQ_SIZE);
+	sh->txpp.tsa[sh->txpp.ts_p] = sh->txpp.ts;
+	if (++sh->txpp.ts_p >= MLX5_TXPP_REARM_SQ_SIZE)
+		sh->txpp.ts_p = 0;
+	if (sh->txpp.ts_n < MLX5_TXPP_REARM_SQ_SIZE)
+		++sh->txpp.ts_n;
+}
+
 /* Handles Rearm Queue completions in periodic service. */
 static __rte_always_inline void
 mlx5_txpp_handle_rearm_queue(struct mlx5_dev_ctx_shared *sh)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 2/3] net/mlx5: fix compilation issue with atomic128 exchange
  2020-07-17 14:28 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Viacheslav Ovsiienko
@ 2020-07-17 14:28   ` Viacheslav Ovsiienko
  2020-07-17 15:08     ` Thomas Monjalon
  2020-07-17 14:28   ` [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode Viacheslav Ovsiienko
  2020-07-18 13:38   ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Raslan Darawsheh
  2 siblings, 1 reply; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-17 14:28 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, thomas

For naw the rte_atomic128_cmp_exchange() is available on x86-64
and ARM64 architectures. The patch fixes the compilation condition
for the code using this atomic transaction.

Fixes: 244788055199 ("net/mlx5: introduce clock queue service routine")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_txpp.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 42776b8..cdb0079 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -644,7 +644,15 @@
 	 * update by hardware with soecified rate. We have to
 	 * read timestump and WQE completion index atomically.
 	 */
-#if defined(RTE_ARCH_PPC_64) || defined(RTE_ARCH_32)
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
+	rte_int128_t src;
+
+	memset(&src, 0, sizeof(src));
+	*ts = src;
+	/* if (*from == *ts) *from = *src else *ts = *from; */
+	rte_atomic128_cmp_exchange(from, ts, &src, 0,
+				   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+#else
 	rte_atomic64_t *cqe = (rte_atomic64_t *)from;
 
 	/* Power architecture does not support 16B compare-and-swap. */
@@ -665,14 +673,6 @@
 		ps[1] = op;
 		return;
 	}
-#else
-	rte_int128_t src;
-
-	memset(&src, 0, sizeof(src));
-	*ts = src;
-	/* if (*from == *ts) *from = *src else *ts = *from; */
-	rte_atomic128_cmp_exchange(from, ts, &src, 0,
-				   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
 #endif
 }
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode
  2020-07-17 14:28 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Viacheslav Ovsiienko
  2020-07-17 14:28   ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix compilation issue with atomic128 exchange Viacheslav Ovsiienko
@ 2020-07-17 14:28   ` Viacheslav Ovsiienko
  2020-07-17 15:05     ` Thomas Monjalon
  2020-07-18 13:38   ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Raslan Darawsheh
  2 siblings, 1 reply; 86+ messages in thread
From: Viacheslav Ovsiienko @ 2020-07-17 14:28 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, thomas

The dedicated MLX5_CMD_OP_ACCESS_REGISTER_USER opcode
must be used to read hardware register content from
unprotected mode.

Fixes: 737f44a25d97 ("common/mlx5: add register access DevX routine")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 3 ++-
 drivers/common/mlx5/mlx5_prm.h       | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 0cfa4dc..9f2f706 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -44,7 +44,8 @@
 		DRV_LOG(ERR, "Not enough  buffer for register read data");
 		return -1;
 	}
-	MLX5_SET(access_register_in, in, opcode, MLX5_CMD_OP_ACCESS_REGISTER);
+	MLX5_SET(access_register_in, in, opcode,
+		 MLX5_CMD_OP_ACCESS_REGISTER_USER);
 	MLX5_SET(access_register_in, in, op_mod,
 					MLX5_ACCESS_REGISTER_IN_OP_MOD_READ);
 	MLX5_SET(access_register_in, in, register_id, reg_id);
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index ec3b600..aba0368 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -806,6 +806,7 @@ enum {
 	MLX5_CMD_OP_CREATE_GENERAL_OBJECT = 0xa00,
 	MLX5_CMD_OP_MODIFY_GENERAL_OBJECT = 0xa01,
 	MLX5_CMD_OP_QUERY_GENERAL_OBJECT = 0xa02,
+	MLX5_CMD_OP_ACCESS_REGISTER_USER = 0xB0C,
 };
 
 enum {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode
  2020-07-17 14:28   ` [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode Viacheslav Ovsiienko
@ 2020-07-17 15:05     ` Thomas Monjalon
  2020-07-17 15:11       ` Slava Ovsiienko
  0 siblings, 1 reply; 86+ messages in thread
From: Thomas Monjalon @ 2020-07-17 15:05 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland

17/07/2020 16:28, Viacheslav Ovsiienko:
> The dedicated MLX5_CMD_OP_ACCESS_REGISTER_USER opcode
> must be used to read hardware register content from
> unprotected mode.

Otherwise? What was broken?

> 
> Fixes: 737f44a25d97 ("common/mlx5: add register access DevX routine")





^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/mlx5: fix compilation issue with atomic128 exchange
  2020-07-17 14:28   ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix compilation issue with atomic128 exchange Viacheslav Ovsiienko
@ 2020-07-17 15:08     ` Thomas Monjalon
  2020-07-17 15:15       ` Slava Ovsiienko
  0 siblings, 1 reply; 86+ messages in thread
From: Thomas Monjalon @ 2020-07-17 15:08 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland

17/07/2020 16:28, Viacheslav Ovsiienko:
> For naw the rte_atomic128_cmp_exchange() is available on x86-64

Typo: now

> and ARM64 architectures. The patch fixes the compilation condition
> for the code using this atomic transaction.

What is fixed exactly?
How "not (ppc or 32)" is different of "x86_64 or arm64"?

> -#if defined(RTE_ARCH_PPC_64) || defined(RTE_ARCH_32)
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
> +	rte_int128_t src;
> +
> +	memset(&src, 0, sizeof(src));
> +	*ts = src;
> +	/* if (*from == *ts) *from = *src else *ts = *from; */
> +	rte_atomic128_cmp_exchange(from, ts, &src, 0,
> +				   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
> +#else
>  	rte_atomic64_t *cqe = (rte_atomic64_t *)from;
>  
>  	/* Power architecture does not support 16B compare-and-swap. */
> @@ -665,14 +673,6 @@
>  		ps[1] = op;
>  		return;
>  	}
> -#else
> -	rte_int128_t src;
> -
> -	memset(&src, 0, sizeof(src));
> -	*ts = src;
> -	/* if (*from == *ts) *from = *src else *ts = *from; */
> -	rte_atomic128_cmp_exchange(from, ts, &src, 0,
> -				   __ATOMIC_RELAXED, __ATOMIC_RELAXED);
>  #endif




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode
  2020-07-17 15:05     ` Thomas Monjalon
@ 2020-07-17 15:11       ` Slava Ovsiienko
  2020-07-17 15:19         ` Thomas Monjalon
  0 siblings, 1 reply; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-17 15:11 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Matan Azrad, Raslan Darawsheh

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, July 17, 2020 18:06
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Raslan
> Darawsheh <rasland@mellanox.com>
> Subject: Re: [PATCH 3/3] common/mlx5: fix DevX register access opcode
> 
> 17/07/2020 16:28, Viacheslav Ovsiienko:
> > The dedicated MLX5_CMD_OP_ACCESS_REGISTER_USER opcode must be
> used to
> > read hardware register content from unprotected mode.
> 
> Otherwise? What was broken?

Otherwise the MLX5_CMD_OP_ACCESS_REGISTER  was used, it returned EINVAL
and register value was not read. It was supposed to enable ACCESS_REGISTER
operation from user mode in kernel driver to read registers, but eventually
it was replaced with ACCESS_REGISTER_USER dedicated operation.

mlx5 PMD does not rely on this feature strongly, if register reading fails
it deduces the timestamp mode from reported timestamp counter
frequency.

With best regards, Slava



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/mlx5: fix compilation issue with atomic128 exchange
  2020-07-17 15:08     ` Thomas Monjalon
@ 2020-07-17 15:15       ` Slava Ovsiienko
  0 siblings, 0 replies; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-17 15:15 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Matan Azrad, Raslan Darawsheh

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, July 17, 2020 18:08
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Raslan
> Darawsheh <rasland@mellanox.com>
> Subject: Re: [PATCH 2/3] net/mlx5: fix compilation issue with atomic128
> exchange
> 
> 17/07/2020 16:28, Viacheslav Ovsiienko:
> > For naw the rte_atomic128_cmp_exchange() is available on x86-64
> 
> Typo: now
> 
> > and ARM64 architectures. The patch fixes the compilation condition for
> > the code using this atomic transaction.
> 
> What is fixed exactly?
> How "not (ppc or 32)" is different of "x86_64 or arm64"?

CI detected the cases when rte_atomic128_cmp_exchange() is not defined.
So, we changed the case:

 "do not use cas_128 for the PPC and 32-bit and engage one otherwise"
 to
 "use cas_128 for x86_64 and ARM64 only, where it is known this one is defined, otherwise - do not engage"

With best regards, Slava

> 
> > -#if defined(RTE_ARCH_PPC_64) || defined(RTE_ARCH_32)
> > +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
> > +	rte_int128_t src;
> > +
> > +	memset(&src, 0, sizeof(src));
> > +	*ts = src;
> > +	/* if (*from == *ts) *from = *src else *ts = *from; */
> > +	rte_atomic128_cmp_exchange(from, ts, &src, 0,
> > +				   __ATOMIC_RELAXED,
> __ATOMIC_RELAXED); #else
> >  	rte_atomic64_t *cqe = (rte_atomic64_t *)from;
> >
> >  	/* Power architecture does not support 16B compare-and-swap. */
> @@
> > -665,14 +673,6 @@
> >  		ps[1] = op;
> >  		return;
> >  	}
> > -#else
> > -	rte_int128_t src;
> > -
> > -	memset(&src, 0, sizeof(src));
> > -	*ts = src;
> > -	/* if (*from == *ts) *from = *src else *ts = *from; */
> > -	rte_atomic128_cmp_exchange(from, ts, &src, 0,
> > -				   __ATOMIC_RELAXED,
> __ATOMIC_RELAXED);
> >  #endif
> 
> 


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode
  2020-07-17 15:11       ` Slava Ovsiienko
@ 2020-07-17 15:19         ` Thomas Monjalon
  2020-07-17 15:23           ` Slava Ovsiienko
  0 siblings, 1 reply; 86+ messages in thread
From: Thomas Monjalon @ 2020-07-17 15:19 UTC (permalink / raw)
  To: Slava Ovsiienko; +Cc: dev, Matan Azrad, Raslan Darawsheh

17/07/2020 17:11, Slava Ovsiienko:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 17/07/2020 16:28, Viacheslav Ovsiienko:
> > > The dedicated MLX5_CMD_OP_ACCESS_REGISTER_USER opcode must be
> > used to
> > > read hardware register content from unprotected mode.
> > 
> > Otherwise? What was broken?
> 
> Otherwise the MLX5_CMD_OP_ACCESS_REGISTER  was used, it returned EINVAL
> and register value was not read. It was supposed to enable ACCESS_REGISTER
> operation from user mode in kernel driver to read registers, but eventually
> it was replaced with ACCESS_REGISTER_USER dedicated operation.
> 
> mlx5 PMD does not rely on this feature strongly, if register reading fails
> it deduces the timestamp mode from reported timestamp counter
> frequency.

OK I think some of these explanations deserve to be in the commit log.



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode
  2020-07-17 15:19         ` Thomas Monjalon
@ 2020-07-17 15:23           ` Slava Ovsiienko
  2020-07-17 15:59             ` Thomas Monjalon
  0 siblings, 1 reply; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-17 15:23 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Matan Azrad, Raslan Darawsheh



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, July 17, 2020 18:19
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Raslan
> Darawsheh <rasland@mellanox.com>
> Subject: Re: [PATCH 3/3] common/mlx5: fix DevX register access opcode
> 
> 17/07/2020 17:11, Slava Ovsiienko:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 17/07/2020 16:28, Viacheslav Ovsiienko:
> > > > The dedicated MLX5_CMD_OP_ACCESS_REGISTER_USER opcode must
> be
> > > used to
> > > > read hardware register content from unprotected mode.
> > >
> > > Otherwise? What was broken?
> >
> > Otherwise the MLX5_CMD_OP_ACCESS_REGISTER  was used, it returned
> > EINVAL and register value was not read. It was supposed to enable
> > ACCESS_REGISTER operation from user mode in kernel driver to read
> > registers, but eventually it was replaced with ACCESS_REGISTER_USER
> dedicated operation.
> >
> > mlx5 PMD does not rely on this feature strongly, if register reading
> > fails it deduces the timestamp mode from reported timestamp counter
> > frequency.
> 
> OK I think some of these explanations deserve to be in the commit log.

 M-m-m-m, this is quite tiny 2-lines fix, we are going to squash it before rc2 😊
OK, will add the comment about dedicated opcode.

With best regards, Slava


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode
  2020-07-17 15:23           ` Slava Ovsiienko
@ 2020-07-17 15:59             ` Thomas Monjalon
  0 siblings, 0 replies; 86+ messages in thread
From: Thomas Monjalon @ 2020-07-17 15:59 UTC (permalink / raw)
  To: Slava Ovsiienko; +Cc: dev, Matan Azrad, Raslan Darawsheh

17/07/2020 17:23, Slava Ovsiienko:
> 
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Friday, July 17, 2020 18:19
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Raslan
> > Darawsheh <rasland@mellanox.com>
> > Subject: Re: [PATCH 3/3] common/mlx5: fix DevX register access opcode
> > 
> > 17/07/2020 17:11, Slava Ovsiienko:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 17/07/2020 16:28, Viacheslav Ovsiienko:
> > > > > The dedicated MLX5_CMD_OP_ACCESS_REGISTER_USER opcode must
> > be
> > > > used to
> > > > > read hardware register content from unprotected mode.
> > > >
> > > > Otherwise? What was broken?
> > >
> > > Otherwise the MLX5_CMD_OP_ACCESS_REGISTER  was used, it returned
> > > EINVAL and register value was not read. It was supposed to enable
> > > ACCESS_REGISTER operation from user mode in kernel driver to read
> > > registers, but eventually it was replaced with ACCESS_REGISTER_USER
> > dedicated operation.
> > >
> > > mlx5 PMD does not rely on this feature strongly, if register reading
> > > fails it deduces the timestamp mode from reported timestamp counter
> > > frequency.
> > 
> > OK I think some of these explanations deserve to be in the commit log.
> 
>  M-m-m-m, this is quite tiny 2-lines fix, we are going to squash it before rc2 😊
> OK, will add the comment about dedicated opcode.

If the patch is squashed with the root cause, no need to explain indeed.




^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event
  2020-07-17 14:28 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Viacheslav Ovsiienko
  2020-07-17 14:28   ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix compilation issue with atomic128 exchange Viacheslav Ovsiienko
  2020-07-17 14:28   ` [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode Viacheslav Ovsiienko
@ 2020-07-18 13:38   ` Raslan Darawsheh
  2 siblings, 0 replies; 86+ messages in thread
From: Raslan Darawsheh @ 2020-07-18 13:38 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: Matan Azrad, Thomas Monjalon

Hi,

> -----Original Message-----
> From: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Sent: Friday, July 17, 2020 5:28 PM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event
> 
> If there is no support for DevX event the mlx5_txpp_gather_timestamp()
> is reported as "not used" causing the compilation warning/error.
> This patch moves the routine to the conditional block.
> 
> Fixes: 244788055199 ("net/mlx5: introduce clock queue service routine")
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_txpp.c | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
> index 15c9a8e..42776b8 100644
> --- a/drivers/net/mlx5/mlx5_txpp.c
> +++ b/drivers/net/mlx5/mlx5_txpp.c
> @@ -719,21 +719,6 @@
>  	mlx5_txpp_cache_timestamp(sh, ts, wq->cq_ci);
>  }
> 
> -/* Gather statistics for timestamp from Clock Queue CQE. */
> -static inline void
> -mlx5_txpp_gather_timestamp(struct mlx5_dev_ctx_shared *sh)
> -{
> -	/* Check whether we have a valid timestamp. */
> -	if (!sh->txpp.clock_queue.sq_ci && !sh->txpp.ts_n)
> -		return;
> -	MLX5_ASSERT(sh->txpp.ts_p < MLX5_TXPP_REARM_SQ_SIZE);
> -	sh->txpp.tsa[sh->txpp.ts_p] = sh->txpp.ts;
> -	if (++sh->txpp.ts_p >= MLX5_TXPP_REARM_SQ_SIZE)
> -		sh->txpp.ts_p = 0;
> -	if (sh->txpp.ts_n < MLX5_TXPP_REARM_SQ_SIZE)
> -		++sh->txpp.ts_n;
> -}
> -
>  /* Waits for the first completion on Clock Queue to init timestamp. */
>  static inline void
>  mlx5_txpp_init_timestamp(struct mlx5_dev_ctx_shared *sh)
> @@ -759,6 +744,21 @@
>  }
> 
>  #ifdef HAVE_IBV_DEVX_EVENT
> +/* Gather statistics for timestamp from Clock Queue CQE. */
> +static inline void
> +mlx5_txpp_gather_timestamp(struct mlx5_dev_ctx_shared *sh)
> +{
> +	/* Check whether we have a valid timestamp. */
> +	if (!sh->txpp.clock_queue.sq_ci && !sh->txpp.ts_n)
> +		return;
> +	MLX5_ASSERT(sh->txpp.ts_p < MLX5_TXPP_REARM_SQ_SIZE);
> +	sh->txpp.tsa[sh->txpp.ts_p] = sh->txpp.ts;
> +	if (++sh->txpp.ts_p >= MLX5_TXPP_REARM_SQ_SIZE)
> +		sh->txpp.ts_p = 0;
> +	if (sh->txpp.ts_n < MLX5_TXPP_REARM_SQ_SIZE)
> +		++sh->txpp.ts_n;
> +}
> +
>  /* Handles Rearm Queue completions in periodic service. */
>  static __rte_always_inline void
>  mlx5_txpp_handle_rearm_queue(struct mlx5_dev_ctx_shared *sh)
> --
> 1.8.3.1

Series applied and squashed into relevant commits in master-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with DevX
  2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with DevX Viacheslav Ovsiienko
@ 2020-07-20 14:18     ` Ferruh Yigit
  2020-07-20 15:25       ` Ferruh Yigit
  0 siblings, 1 reply; 86+ messages in thread
From: Ferruh Yigit @ 2020-07-20 14:18 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, thomas, David Marchand, Phil Yang,
	Ruifeng Wang

On 7/16/2020 9:23 AM, Viacheslav Ovsiienko wrote:
> To provide the packet send schedule on mbuf timestamp the Tx
> queue must be attached to the same UAR as Clock Queue is.
> UAR is special hardware related resource mapped to the host
> memory and provides doorbell registers, the assigning UAR
> to the queue being created is provided via DevX API only.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Acked-by: Matan Azrad <matan@mellanox.com>

<...>

> +	MLX5_ASSERT(sh->tx_uar);
> +	MLX5_ASSERT(sh->tx_uar->reg_addr);
> +	txq_ctrl->bf_reg = sh->tx_uar->reg_addr;
> +	txq_ctrl->uar_mmap_offset = sh->tx_uar->mmap_off;
> +	rte_atomic32_set(&txq_obj->refcnt, 1);

This is using a function we plan to deprecate in long term, and checkpatch has a
warning for it [1].

To prevent this being blocker, I will preceed with the patchset and can you
please send an increamental patch to fix it, I can squash it before -rc2.

Thanks,
ferruh


[1]
Warning in drivers/net/mlx5/mlx5_txq.c:
Using rte_atomicNN_xxx


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with DevX
  2020-07-20 14:18     ` Ferruh Yigit
@ 2020-07-20 15:25       ` Ferruh Yigit
  2020-07-21 11:35         ` Slava Ovsiienko
  0 siblings, 1 reply; 86+ messages in thread
From: Ferruh Yigit @ 2020-07-20 15:25 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, thomas, David Marchand, Phil Yang,
	Ruifeng Wang, Honnappa Nagarahalli

On 7/20/2020 3:18 PM, Ferruh Yigit wrote:
> On 7/16/2020 9:23 AM, Viacheslav Ovsiienko wrote:
>> To provide the packet send schedule on mbuf timestamp the Tx
>> queue must be attached to the same UAR as Clock Queue is.
>> UAR is special hardware related resource mapped to the host
>> memory and provides doorbell registers, the assigning UAR
>> to the queue being created is provided via DevX API only.
>>
>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
>> Acked-by: Matan Azrad <matan@mellanox.com>
> 
> <...>
> 
>> +	MLX5_ASSERT(sh->tx_uar);
>> +	MLX5_ASSERT(sh->tx_uar->reg_addr);
>> +	txq_ctrl->bf_reg = sh->tx_uar->reg_addr;
>> +	txq_ctrl->uar_mmap_offset = sh->tx_uar->mmap_off;
>> +	rte_atomic32_set(&txq_obj->refcnt, 1);
> 
> This is using a function we plan to deprecate in long term, and checkpatch has a
> warning for it [1].
> 
> To prevent this being blocker, I will preceed with the patchset and can you
> please send an increamental patch to fix it, I can squash it before -rc2.
> 
> Thanks,
> ferruh
> 
> 
> [1]
> Warning in drivers/net/mlx5/mlx5_txq.c:
> Using rte_atomicNN_xxx
> 

cc'ed Honnapa too, from techboard discussion [2] I understand we won't accept
new code with old API. But also to be fair the check was not there until last
week so this was easy to miss by developers.

@Slava, can you please do your best to replace them before the release? Perhaps
@Honnappa can support on the effort?

And if this can't be done withing the release not sure what to do, one option is
to get them with the commitment from Mellanox to fix this on 20.11?

[2]
https://mails.dpdk.org/archives/dev/2020-April/165143.html


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with DevX
  2020-07-20 15:25       ` Ferruh Yigit
@ 2020-07-21 11:35         ` Slava Ovsiienko
  0 siblings, 0 replies; 86+ messages in thread
From: Slava Ovsiienko @ 2020-07-21 11:35 UTC (permalink / raw)
  To: Ferruh Yigit, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, Thomas Monjalon,
	David Marchand, Phil Yang, Ruifeng Wang, Honnappa Nagarahalli

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Monday, July 20, 2020 18:26
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com; Thomas Monjalon
> <thomas@monjalon.net>; David Marchand <david.marchand@redhat.com>;
> Phil Yang <phil.yang@arm.com>; Ruifeng Wang <ruifeng.wang@arm.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with
> DevX
> 
> On 7/20/2020 3:18 PM, Ferruh Yigit wrote:
> > On 7/16/2020 9:23 AM, Viacheslav Ovsiienko wrote:
> >> To provide the packet send schedule on mbuf timestamp the Tx queue
> >> must be attached to the same UAR as Clock Queue is.
> >> UAR is special hardware related resource mapped to the host memory
> >> and provides doorbell registers, the assigning UAR to the queue being
> >> created is provided via DevX API only.
> >>
> >> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> >> Acked-by: Matan Azrad <matan@mellanox.com>
> >
> > <...>
> >
> >> +	MLX5_ASSERT(sh->tx_uar);
> >> +	MLX5_ASSERT(sh->tx_uar->reg_addr);
> >> +	txq_ctrl->bf_reg = sh->tx_uar->reg_addr;
> >> +	txq_ctrl->uar_mmap_offset = sh->tx_uar->mmap_off;
> >> +	rte_atomic32_set(&txq_obj->refcnt, 1);
> >
> > This is using a function we plan to deprecate in long term, and
> > checkpatch has a warning for it [1].
> >
> > To prevent this being blocker, I will preceed with the patchset and
> > can you please send an increamental patch to fix it, I can squash it before -
> rc2.
> >
> > Thanks,
> > ferruh
> >
> >
> > [1]
> > Warning in drivers/net/mlx5/mlx5_txq.c:
> > Using rte_atomicNN_xxx
> >
> 
> cc'ed Honnapa too, from techboard discussion [2] I understand we won't
> accept new code with old API. But also to be fair the check was not there
> until last week so this was easy to miss by developers.
> 
> @Slava, can you please do your best to replace them before the release?
> Perhaps @Honnappa can support on the effort?


> And if this can't be done withing the release not sure what to do, one option
> is to get them with the commitment from Mellanox to fix this on 20.11?

I try to do the best before the release personally, but not sure we can pass the
full testing/verification cycle. So, I suppose to take the commitment to fix it on 20.11
(there is no other way due to "atomic"  deprecation). If we have the update before
20.08rc3, we'll push it.

With best regards, Slava

> 
> [2]
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.
> dpdk.org%2Farchives%2Fdev%2F2020-
> April%2F165143.html&amp;data=02%7C01%7Cviacheslavo%40mellanox.co
> m%7Cea28f4d3c667404c4cdd08d82cc137fe%7Ca652971c7d2e4d9ba6a4d14
> 9256f461b%7C0%7C0%7C637308555673206316&amp;sdata=iNXzvMc%2FPh
> TnrLuW52F9z2t0bag9%2Ftw7AOTxwp8rkfI%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2020-07-21 11:35 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-10  6:38 [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
2020-06-10 13:33 ` Harman Kalra
2020-06-10 15:16   ` Slava Ovsiienko
2020-06-17 15:57     ` [dpdk-dev] [EXT] " Harman Kalra
2020-07-01 15:46       ` Slava Ovsiienko
2020-07-01 15:36 ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
2020-07-01 15:36   ` [dpdk-dev] [PATCH 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
2020-07-07 11:50   ` [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
2020-07-07 12:46     ` Slava Ovsiienko
2020-07-07 12:59 ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
2020-07-07 12:59   ` [dpdk-dev] [PATCH v2 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
2020-07-07 13:08 ` [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
2020-07-07 13:08   ` [dpdk-dev] [PATCH v3 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
2020-07-07 14:32   ` [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
2020-07-07 14:57 ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
2020-07-07 14:57   ` [dpdk-dev] [PATCH v4 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
2020-07-07 15:23   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling Olivier Matz
2020-07-08 14:16   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
2020-07-08 14:54     ` Slava Ovsiienko
2020-07-08 15:27       ` Morten Brørup
2020-07-08 15:51         ` Slava Ovsiienko
2020-07-08 15:47 ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
2020-07-08 15:47   ` [dpdk-dev] [PATCH v5 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
2020-07-08 16:05   ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Slava Ovsiienko
2020-07-09 12:26   ` Thomas Monjalon
2020-07-09 12:36 ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
2020-07-09 12:36   ` [dpdk-dev] [PATCH v6 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
2020-07-09 23:58     ` Ferruh Yigit
2020-07-10 12:41       ` Slava Ovsiienko
2020-07-09 23:47   ` [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling Ferruh Yigit
2020-07-10 12:32     ` Slava Ovsiienko
2020-07-10 12:39 ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
2020-07-10 12:39   ` [dpdk-dev] [PATCH v7 2/2] app/testpmd: add send scheduling test capability Viacheslav Ovsiienko
2020-07-10 15:46   ` [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling Slava Ovsiienko
2020-07-10 22:07     ` Ferruh Yigit
2020-07-15  6:21 ` [dpdk-dev] [PATCH v2 00/17] net/mlx5: " Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 01/17] common/mlx5: update common part to support packet pacing Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 02/17] net/mlx5: introduce send scheduling devargs Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 03/17] net/mlx5: fix UAR lock sharing for multiport devices Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 04/17] net/mlx5: introduce shared UAR resource Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 05/17] net/mlx5: create clock queue for packet pacing Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 06/17] net/mlx5: create rearm " Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 07/17] net/mlx5: create Tx queues with DevX Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 08/17] net/mlx5: allocate packet pacing context Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 09/17] net/mlx5: introduce clock queue service routine Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 10/17] net/mlx5: prepare Tx queue structures to support timestamp Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 11/17] net/mlx5: convert timestamp to completion index Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 12/17] net/mlx5: prepare Tx datapath to support sheduling Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 13/17] net/mlx5: add scheduling support to send routine template Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 14/17] net/mlx5: add read device clock support Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 15/17] net/mlx5: provide the send scheduling error statistics Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 16/17] common/mlx5: add register access DevX routine Viacheslav Ovsiienko
2020-07-15  6:21   ` [dpdk-dev] [PATCH v2 17/17] net/mlx5: convert Rx timestamps in realtime format Viacheslav Ovsiienko
2020-07-16  8:23 ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 01/17] common/mlx5: update common part to support packet pacing Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 02/17] net/mlx5: introduce send scheduling devargs Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 03/17] net/mlx5: fix UAR lock sharing for multiport devices Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 04/17] net/mlx5: introduce shared UAR resource Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 05/17] net/mlx5: create clock queue for packet pacing Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 06/17] net/mlx5: create rearm " Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 07/17] net/mlx5: create Tx queues with DevX Viacheslav Ovsiienko
2020-07-20 14:18     ` Ferruh Yigit
2020-07-20 15:25       ` Ferruh Yigit
2020-07-21 11:35         ` Slava Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 08/17] net/mlx5: allocate packet pacing context Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 09/17] net/mlx5: introduce clock queue service routine Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 10/17] net/mlx5: prepare Tx queue structures to support timestamp Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 11/17] net/mlx5: convert timestamp to completion index Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 12/17] net/mlx5: prepare Tx datapath to support sheduling Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 13/17] net/mlx5: add scheduling support to send routine template Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 14/17] net/mlx5: add read device clock support Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 15/17] net/mlx5: provide the send scheduling error statistics Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 16/17] common/mlx5: add register access DevX routine Viacheslav Ovsiienko
2020-07-16  8:23   ` [dpdk-dev] [PATCH v3 17/17] net/mlx5: convert Rx timestamps in realtime format Viacheslav Ovsiienko
2020-07-16 20:20   ` [dpdk-dev] [PATCH v3 00/17] net/mlx5: introduce accurate packet Tx scheduling Raslan Darawsheh
2020-07-17 14:28 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Viacheslav Ovsiienko
2020-07-17 14:28   ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix compilation issue with atomic128 exchange Viacheslav Ovsiienko
2020-07-17 15:08     ` Thomas Monjalon
2020-07-17 15:15       ` Slava Ovsiienko
2020-07-17 14:28   ` [dpdk-dev] [PATCH 3/3] common/mlx5: fix DevX register access opcode Viacheslav Ovsiienko
2020-07-17 15:05     ` Thomas Monjalon
2020-07-17 15:11       ` Slava Ovsiienko
2020-07-17 15:19         ` Thomas Monjalon
2020-07-17 15:23           ` Slava Ovsiienko
2020-07-17 15:59             ` Thomas Monjalon
2020-07-18 13:38   ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix compilation issue with missing DevX event Raslan Darawsheh

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git