DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [PATCH v2 3/3] vhost: add device op to offload the interrupt kick
  @ 2023-05-16 10:12  3%       ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-05-16 10:12 UTC (permalink / raw)
  To: Eelco Chaudron, maxime.coquelin, chenbo.xia; +Cc: dev

On Tue, May 16, 2023 at 10:53 AM Eelco Chaudron <echaudro@redhat.com> wrote:
> On 10 May 2023, at 13:44, David Marchand wrote:

[snip]

> >> @@ -846,6 +848,14 @@ vhost_user_socket_mem_free(struct vhost_user_socket *vsocket)
> >>                 vsocket->path = NULL;
> >>         }
> >>
> >> +       if (vsocket && vsocket->alloc_notify_ops) {
> >> +#pragma GCC diagnostic push
> >> +#pragma GCC diagnostic ignored "-Wcast-qual"
> >> +               free((struct rte_vhost_device_ops *)vsocket->notify_ops);
> >> +#pragma GCC diagnostic pop
> >> +               vsocket->notify_ops = NULL;
> >> +       }
> >
> > Rather than select the behavior based on a boolean (and here force the
> > compiler to close its eyes), I would instead add a non const pointer
> > to ops (let's say alloc_notify_ops) in vhost_user_socket.
> > The code can then unconditionnally call free(vsocket->alloc_notify_ops);
>
> Good idea, I will make the change in v3.

Feel free to use a better name for this field :-).

>
> >> +
> >>         if (vsocket) {
> >>                 free(vsocket);
> >>                 vsocket = NULL;

[snip]

> >> +       /*
> >> +        * Although the ops structure is a const structure, we do need to
> >> +        * override the guest_notify operation. This is because with the
> >> +        * previous APIs it was "reserved" and if any garbage value was passed,
> >> +        * it could crash the application.
> >> +        */
> >> +       if (ops && !ops->guest_notify) {
> >
> > Hum, as described in the comment above, I don't think we should look
> > at ops->guest_notify value at all.
> > Checking ops != NULL should be enough.
>
> Not sure I get you here. If the guest_notify passed by the user is NULL, it means the previously ‘reserved[1]’ field is NULL, so we do not need to use a new structure.
>
> I guess your comment would be true if we would introduce a new field in the data structure, not replacing a reserved one.

Hum, I don't understand my comment either o_O'.
Too many days off... or maybe my evil twin took over the keyboard.


>
> >> +               struct rte_vhost_device_ops *new_ops;
> >> +
> >> +               new_ops = malloc(sizeof(*new_ops));
> >
> > Strictly speaking, we lose the numa affinity of "ops" by calling malloc.
> > I am unclear of the impact though.
>
> Don’t think there is a portable API that we can use to determine the NUMA for the ops memory and then allocate this on the same numa?
>
> Any thoughts or ideas on how to solve this? I hope most people will memset() the ops structure and the reserved[1] part is zero, but it might be a problem in the future if more extensions get added.

Determinining current numa is doable, via 'ops'
get_mempolicy(MPOL_F_NODE | MPOL_F_ADDR), like what is done for vq in
numa_realloc().
The problem is how to allocate on this numa with the libc allocator
for which I have no idea...
We could go with the dpdk allocator (again, like numa_realloc()).


In practice, the passed ops will be probably from a const variable in
the program .data section (for which I think fields are set to 0
unless explicitly initialised), or a memset() will be called for a
dynamic allocation from good citizens.
So we can probably live with the current proposal.
Plus, this is only for one release, since in 23.11 with the ABI bump,
we will drop this compat code.

Maxime, Chenbo, what do you think?


[snip]

> >
> > But putting indentation aside, is this change equivalent?
> > -               if ((vhost_need_event(vhost_used_event(vq), new, old) &&
> > -                                       (vq->callfd >= 0)) ||
> > -                               unlikely(!signalled_used_valid)) {
> > +               if ((vhost_need_event(vhost_used_event(vq), new, old) ||
> > +                               unlikely(!signalled_used_valid)) &&
> > +                               vq->callfd >= 0) {
>
> They are not equal, but in the past eventfd_write() should also not have been called with callfd < 0, guess this was an existing bug ;)

I think this should be a separate fix.

>
> >> +                       vhost_vring_inject_irq(dev, vq);


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* [PATCH v1 5/7] ethdev: add GENEVE TLV option modification support
  @ 2023-05-16  6:37  3% ` Michael Baum
  0 siblings, 0 replies; 200+ results
From: Michael Baum @ 2023-05-16  6:37 UTC (permalink / raw)
  To: dev; +Cc: Ori Kam, Aman Singh, Yuying Zhang, Ferruh Yigit, Thomas Monjalon

Add modify field support for GENEVE option fields:
 - "RTE_FLOW_FIELD_GENEVE_OPT_TYPE"
 - "RTE_FLOW_FIELD_GENEVE_OPT_CLASS"
 - "RTE_FLOW_FIELD_GENEVE_OPT_DATA"

Each GENEVE TLV option is identified by both its "class" and "type", so
2 new fields were added to "rte_flow_action_modify_data" structure to
help specify which option to modify.

To get room for those 2 new fields, the "level" field move to use
"uint8_t" which is more than enough for encapsulation level.

Signed-off-by: Michael Baum <michaelba@nvidia.com>
---
 app/test-pmd/cmdline_flow.c            | 48 +++++++++++++++++++++++-
 doc/guides/prog_guide/rte_flow.rst     | 12 ++++++
 doc/guides/rel_notes/release_23_07.rst |  3 ++
 lib/ethdev/rte_flow.h                  | 51 +++++++++++++++++++++++++-
 4 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 58939ec321..8c1dea53c0 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -636,11 +636,15 @@ enum index {
 	ACTION_MODIFY_FIELD_DST_TYPE_VALUE,
 	ACTION_MODIFY_FIELD_DST_LEVEL,
 	ACTION_MODIFY_FIELD_DST_LEVEL_VALUE,
+	ACTION_MODIFY_FIELD_DST_TYPE_ID,
+	ACTION_MODIFY_FIELD_DST_CLASS_ID,
 	ACTION_MODIFY_FIELD_DST_OFFSET,
 	ACTION_MODIFY_FIELD_SRC_TYPE,
 	ACTION_MODIFY_FIELD_SRC_TYPE_VALUE,
 	ACTION_MODIFY_FIELD_SRC_LEVEL,
 	ACTION_MODIFY_FIELD_SRC_LEVEL_VALUE,
+	ACTION_MODIFY_FIELD_SRC_TYPE_ID,
+	ACTION_MODIFY_FIELD_SRC_CLASS_ID,
 	ACTION_MODIFY_FIELD_SRC_OFFSET,
 	ACTION_MODIFY_FIELD_SRC_VALUE,
 	ACTION_MODIFY_FIELD_SRC_POINTER,
@@ -854,7 +858,9 @@ static const char *const modify_field_ids[] = {
 	"ipv4_ecn", "ipv6_ecn", "gtp_psc_qfi", "meter_color",
 	"ipv6_proto",
 	"flex_item",
-	"hash_result", NULL
+	"hash_result",
+	"geneve_opt_type", "geneve_opt_class", "geneve_opt_data",
+	NULL
 };
 
 static const char *const meter_colors[] = {
@@ -2295,6 +2301,8 @@ static const enum index next_action_sample[] = {
 
 static const enum index action_modify_field_dst[] = {
 	ACTION_MODIFY_FIELD_DST_LEVEL,
+	ACTION_MODIFY_FIELD_DST_TYPE_ID,
+	ACTION_MODIFY_FIELD_DST_CLASS_ID,
 	ACTION_MODIFY_FIELD_DST_OFFSET,
 	ACTION_MODIFY_FIELD_SRC_TYPE,
 	ZERO,
@@ -2302,6 +2310,8 @@ static const enum index action_modify_field_dst[] = {
 
 static const enum index action_modify_field_src[] = {
 	ACTION_MODIFY_FIELD_SRC_LEVEL,
+	ACTION_MODIFY_FIELD_SRC_TYPE_ID,
+	ACTION_MODIFY_FIELD_SRC_CLASS_ID,
 	ACTION_MODIFY_FIELD_SRC_OFFSET,
 	ACTION_MODIFY_FIELD_SRC_VALUE,
 	ACTION_MODIFY_FIELD_SRC_POINTER,
@@ -6388,6 +6398,24 @@ static const struct token token_list[] = {
 		.call = parse_vc_modify_field_level,
 		.comp = comp_none,
 	},
+	[ACTION_MODIFY_FIELD_DST_TYPE_ID] = {
+		.name = "dst_type_id",
+		.help = "destination field type ID",
+		.next = NEXT(action_modify_field_dst,
+			     NEXT_ENTRY(COMMON_UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field,
+					dst.type)),
+		.call = parse_vc_conf,
+	},
+	[ACTION_MODIFY_FIELD_DST_CLASS_ID] = {
+		.name = "dst_class",
+		.help = "destination field class ID",
+		.next = NEXT(action_modify_field_dst,
+			     NEXT_ENTRY(COMMON_UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_action_modify_field,
+					     dst.class_id)),
+		.call = parse_vc_conf,
+	},
 	[ACTION_MODIFY_FIELD_DST_OFFSET] = {
 		.name = "dst_offset",
 		.help = "destination field bit offset",
@@ -6423,6 +6451,24 @@ static const struct token token_list[] = {
 		.call = parse_vc_modify_field_level,
 		.comp = comp_none,
 	},
+	[ACTION_MODIFY_FIELD_SRC_TYPE_ID] = {
+		.name = "src_type_id",
+		.help = "source field type ID",
+		.next = NEXT(action_modify_field_src,
+			     NEXT_ENTRY(COMMON_UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY(struct rte_flow_action_modify_field,
+					src.type)),
+		.call = parse_vc_conf,
+	},
+	[ACTION_MODIFY_FIELD_SRC_CLASS_ID] = {
+		.name = "src_class",
+		.help = "source field class ID",
+		.next = NEXT(action_modify_field_src,
+			     NEXT_ENTRY(COMMON_UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY_HTON(struct rte_flow_action_modify_field,
+					     src.class_id)),
+		.call = parse_vc_conf,
+	},
 	[ACTION_MODIFY_FIELD_SRC_OFFSET] = {
 		.name = "src_offset",
 		.help = "source field bit offset",
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 25b57bf86d..cd38f0de46 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -2937,6 +2937,14 @@ as well as any tag element in the tag array:
 For the tag array (in case of multiple tags are supported and present)
 ``level`` translates directly into the array index.
 
+``type`` is used to specify (along with ``class_id``) the Geneve option which
+is being modified.
+This field is relevant only for ``RTE_FLOW_FIELD_GENEVE_OPT_XXXX`` type.
+
+``class_id`` is used to specify (along with ``type``) the Geneve option which
+is being modified.
+This field is relevant only for ``RTE_FLOW_FIELD_GENEVE_OPT_XXXX`` type.
+
 ``flex_handle`` is used to specify the flex item pointer which is being
 modified. ``flex_handle`` and ``level`` are mutually exclusive.
 
@@ -2994,6 +3002,10 @@ value as sequence of bytes {xxx, xxx, 0x85, xxx, xxx, xxx}.
    +-----------------+----------------------------------------------------------+
    | ``level``       | encapsulation level of a packet field or tag array index |
    +-----------------+----------------------------------------------------------+
+   | ``type``        | geneve option type                                       |
+   +-----------------+----------------------------------------------------------+
+   | ``class_id``    | geneve option class ID                                   |
+   +-----------------+----------------------------------------------------------+
    | ``flex_handle`` | flex item handle of a packet field                       |
    +-----------------+----------------------------------------------------------+
    | ``offset``      | number of bits to skip at the beginning                  |
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index a9b1293689..ce1755096f 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -84,6 +84,9 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* The ``level`` field in experimental structure
+  ``struct rte_flow_action_modify_data`` was reduced to 8 bits.
+
 
 ABI Changes
 -----------
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 713ba8b65c..b82eb0c0a8 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3773,6 +3773,9 @@ enum rte_flow_field_id {
 	RTE_FLOW_FIELD_IPV6_PROTO,	/**< IPv6 next header. */
 	RTE_FLOW_FIELD_FLEX_ITEM,	/**< Flex item. */
 	RTE_FLOW_FIELD_HASH_RESULT,	/**< Hash result. */
+	RTE_FLOW_FIELD_GENEVE_OPT_TYPE,	/**< GENEVE option type */
+	RTE_FLOW_FIELD_GENEVE_OPT_CLASS,/**< GENEVE option class */
+	RTE_FLOW_FIELD_GENEVE_OPT_DATA	/**< GENEVE option data */
 };
 
 /**
@@ -3788,7 +3791,53 @@ struct rte_flow_action_modify_data {
 		struct {
 			/** Encapsulation level or tag index or flex item handle. */
 			union {
-				uint32_t level;
+				struct {
+					/**
+					 * Packet encapsulation level containing
+					 * the field modify to.
+					 *
+					 * - @p 0 requests the default behavior.
+					 *   Depending on the packet type, it
+					 *   can mean outermost, innermost or
+					 *   anything in between.
+					 *
+					 *   It basically stands for the
+					 *   innermost encapsulation level
+					 *   modification can be performed on
+					 *   according to PMD and device
+					 *   capabilities.
+					 *
+					 * - @p 1 requests modification to be
+					 *   performed on the outermost packet
+					 *   encapsulation level.
+					 *
+					 * - @p 2 and subsequent values request
+					 *   modification to be performed on
+					 *   the specified inner packet
+					 *   encapsulation level, from
+					 *   outermost to innermost (lower to
+					 *   higher values).
+					 *
+					 * Values other than @p 0 are not
+					 * necessarily supported.
+					 *
+					 * For RTE_FLOW_FIELD_TAG it represents
+					 * the tag element in the tag array.
+					 */
+					uint8_t level;
+					/**
+					 * Geneve option type. relevant only
+					 * for RTE_FLOW_FIELD_GENEVE_OPT_XXXX
+					 * modification type.
+					 */
+					uint8_t type;
+					/**
+					 * Geneve option class. relevant only
+					 * for RTE_FLOW_FIELD_GENEVE_OPT_XXXX
+					 * modification type.
+					 */
+					rte_be16_t class_id;
+				};
 				struct rte_flow_item_flex_handle *flex_handle;
 			};
 			/** Number of bits to skip from a field. */
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts
  @ 2023-05-15 20:52  3%         ` Mattias Rönnblom
  0 siblings, 0 replies; 200+ results
From: Mattias Rönnblom @ 2023-05-15 20:52 UTC (permalink / raw)
  To: Jerin Jacob, Mattias Rönnblom; +Cc: jerinj, dev, Morten Brørup

On 2023-05-15 14:38, Jerin Jacob wrote:
> On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>>
>> On 2023-05-12 13:59, Jerin Jacob wrote:
>>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>>
>>>> Use non-burst event enqueue and dequeue calls from burst enqueue and
>>>> dequeue only when the burst size is compile-time constant (and equal
>>>> to one).
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>
>>>> ---
>>>>
>>>> v3: Actually include the change v2 claimed to contain.
>>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if
>>>>       application is compiled with -pedantic. (Morten Brørup)
>>>> ---
>>>>    lib/eventdev/rte_eventdev.h | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>>>> index a90e23ac8b..a471caeb6d 100644
>>>> --- a/lib/eventdev/rte_eventdev.h
>>>> +++ b/lib/eventdev/rte_eventdev.h
>>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
>>>>            * Allow zero cost non burst mode routine invocation if application
>>>>            * requests nb_events as const one
>>>>            */
>>>> -       if (nb_events == 1)
>>>> +       if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>>>
>>> "Why" part is not clear from the commit message. Is this to avoid
>>> nb_events read if it is built-in const.
>>
>> The __builtin_constant_p() is introduced to avoid having the compiler
>> generate a conditional branch and two different code paths in case
>> nb_elem is a run-time variable.
>>
>> In particular, this matters if nb_elems is run-time variable and varies
>> between 1 and some larger value.
>>
>> I should have mention this in the commit message.
>>
>> A very slight performance improvement. It also makes the code better
>> match the comment, imo. Zero cost for const one enqueues, but no impact
>> non-compile-time-constant-length enqueues.
>>
>> Feel free to ignore.
> 
> 
> I did some performance comparison of the patch.
> A low-end ARM machines shows 0.7%  drop with single event case. No
> regression see with high-end ARM cores with single event case.
> 
> IMO, optimizing the check for burst mode(the new patch) may not show
> any real improvement as the cost is divided by number of event.
> Whereas optimizing the check for single event case(The current code)
> shows better performance with single event case and no regression
> with burst mode as cost is divided by number of events.

I ran some tests on an AMD Zen 3 with DSW.

In the below tests the enqueue burst size is not compile-time constant.

Enqueue burst size      Performance improvement
Run-time constant 1     ~5%
Run-time constant 2     ~0%
Run-time variable 1-2   ~9%
Run-time variable 1-16  ~0%

The run-time variable enqueue sizes randomly (uniformly) distributed in 
the specified range.

The first result may come as a surprise. The benchmark is using 
RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type 
in most apps). The single-event enqueue function only exists in a 
generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent). 
I suspect that is the reason for the performance improvement.

This effect is large-enough to make it somewhat beneficial (+~1%) to use 
run-time variable single-event enqueue compared to keeping the burst 
size compile-time constant.

The performance gain is counted toward both enqueue and dequeue costs 
(+benchmark app overhead), so an under-estimation if see this as an 
enqueue performance improvement.

> If you agree, then we can skip this patch.
>

I have no strong opinion if this should be included or not.

It was up to me, I would drop the single-enqueue special case handling 
altogether in the next ABI update.

> 
>>
>>> If so, check should be following. Right?
>>>
>>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1)
>>> || nb_events  == 1)
>>>
>>> At least, It was my original intention in the code.
>>>
>>>
>>>
>>>>                   return (fp_ops->enqueue)(port, ev);
>>>>           else
>>>>                   return fn(port, ev, nb_events);
>>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[],
>>>>            * Allow zero cost non burst mode routine invocation if application
>>>>            * requests nb_events as const one
>>>>            */
>>>> -       if (nb_events == 1)
>>>> +       if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1)
>>>>                   return (fp_ops->dequeue)(port, ev, timeout_ticks);
>>>>           else
>>>>                   return (fp_ops->dequeue_burst)(port, ev, nb_events,
>>>> --
>>>> 2.34.1
>>>>
>>

^ permalink raw reply	[relevance 3%]

* [PATCH v6 1/3] ring: fix unmatched type definition and usage
  2023-05-09  9:24  3%   ` [PATCH v6 0/3] add telemetry cmds for ring Jie Hai
@ 2023-05-09  9:24  3%     ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2023-05-09  9:24 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev; +Cc: dev, liudongdong3

Field 'flags' of struct rte_ring is defined as int type. However,
it is used as unsigned int. To ensure consistency, change the
type of flags to unsigned int. Since these two types has the
same byte size, this change is not an ABI change.

Fixes: af75078fece3 ("first public release")

Signed-off-by: Jie Hai <haijie1@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/ring/rte_ring_core.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 82b237091b71..1c809abeb531 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -120,7 +120,7 @@ struct rte_ring_hts_headtail {
 struct rte_ring {
 	char name[RTE_RING_NAMESIZE] __rte_cache_aligned;
 	/**< Name of the ring. */
-	int flags;               /**< Flags supplied at creation. */
+	uint32_t flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
 	uint32_t size;           /**< Size of ring. */
-- 
2.33.0


^ permalink raw reply	[relevance 3%]

* [PATCH v6 0/3] add telemetry cmds for ring
  2023-05-09  1:29  3% ` [PATCH v5 " Jie Hai
  2023-05-09  1:29  3%   ` [PATCH v5 1/3] ring: fix unmatched type definition and usage Jie Hai
@ 2023-05-09  9:24  3%   ` Jie Hai
  2023-05-09  9:24  3%     ` [PATCH v6 1/3] ring: fix unmatched type definition and usage Jie Hai
  1 sibling, 1 reply; 200+ results
From: Jie Hai @ 2023-05-09  9:24 UTC (permalink / raw)
  Cc: dev, liudongdong3

This patch set supports telemetry cmd to list rings and dump information
of a ring by its name.

v1->v2:
1. Add space after "switch".
2. Fix wrong strlen parameter.

v2->v3:
1. Remove prefix "rte_" for static function.
2. Add Acked-by Konstantin Ananyev for PATCH 1.
3. Introduce functions to return strings instead copy strings.
4. Check pointer to memzone of ring.
5. Remove redundant variable.
6. Hold lock when access ring data.

v3->v4:
1. Update changelog according to reviews of Honnappa Nagarahalli.
2. Add Reviewed-by Honnappa Nagarahalli.
3. Correct grammar in help information.
4. Correct spell warning on "te" reported by checkpatch.pl.
5. Use ring_walk() to query ring info instead of rte_ring_lookup().
6. Fix that type definition the flag field of rte_ring does not match the usage.
7. Use rte_tel_data_add_dict_uint_hex instead of rte_tel_data_add_dict_u64
   for mask and flags.

v4->v5:
1. Add Acked-by Konstantin Ananyev and Chengwen Feng.
2. Add ABI change explanation for commit message of patch 1/3.

v5->v6:
1. Add Acked-by Morten Brørup.
2. Fix incorrect reference of commit.

Jie Hai (3):
  ring: fix unmatched type definition and usage
  ring: add telemetry cmd to list rings
  ring: add telemetry cmd for ring info

 lib/ring/meson.build     |   1 +
 lib/ring/rte_ring.c      | 139 +++++++++++++++++++++++++++++++++++++++
 lib/ring/rte_ring_core.h |   2 +-
 3 files changed, 141 insertions(+), 1 deletion(-)

-- 
2.33.0


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v5 1/3] ring: fix unmatched type definition and usage
  2023-05-09  6:23  0%     ` Ruifeng Wang
@ 2023-05-09  8:15  0%       ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2023-05-09  8:15 UTC (permalink / raw)
  To: Ruifeng Wang, Honnappa Nagarahalli, Konstantin Ananyev,
	Olivier Matz, Dharmik Jayesh Thakkar
  Cc: dev, liudongdong3, nd

On 2023/5/9 14:23, Ruifeng Wang wrote:
>> -----Original Message-----
>> From: Jie Hai <haijie1@huawei.com>
>> Sent: Tuesday, May 9, 2023 9:29 AM
>> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin Ananyev
>> <konstantin.v.ananyev@yandex.ru>; Ruifeng Wang <Ruifeng.Wang@arm.com>; Gavin Hu
>> <Gavin.Hu@arm.com>; Olivier Matz <olivier.matz@6wind.com>; Dharmik Jayesh Thakkar
>> <DharmikJayesh.Thakkar@arm.com>
>> Cc: dev@dpdk.org; liudongdong3@huawei.com
>> Subject: [PATCH v5 1/3] ring: fix unmatched type definition and usage
>>
>> Field 'flags' of struct rte_ring is defined as int type. However, it is used as unsigned
>> int. To ensure consistency, change the type of flags to unsigned int. Since these two
>> types has the same byte size, this change is not an ABI change.
>>
>> Fixes: cc4b218790f6 ("ring: support configurable element size")
> 
> The change looks good.
> However, I think the fix line is not accurate.
> I suppose it fixes af75078fece3 ("first public release").
> 
Thanks for your review. Sorry for quoting the wrong commit.
This issue was indeed introduced by commit af75078fece3 ("first public 
release").
I will fix this in the next version.
>>
>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
>> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
>> ---
>>   lib/ring/rte_ring_core.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h index
>> 82b237091b71..1c809abeb531 100644
>> --- a/lib/ring/rte_ring_core.h
>> +++ b/lib/ring/rte_ring_core.h
>> @@ -120,7 +120,7 @@ struct rte_ring_hts_headtail {  struct rte_ring {
>>   	char name[RTE_RING_NAMESIZE] __rte_cache_aligned;
>>   	/**< Name of the ring. */
>> -	int flags;               /**< Flags supplied at creation. */
>> +	uint32_t flags;               /**< Flags supplied at creation. */
>>   	const struct rte_memzone *memzone;
>>   			/**< Memzone, if any, containing the rte_ring */
>>   	uint32_t size;           /**< Size of ring. */
>> --
>> 2.33.0
> 
> .

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v5 1/3] ring: fix unmatched type definition and usage
  2023-05-09  1:29  3%   ` [PATCH v5 1/3] ring: fix unmatched type definition and usage Jie Hai
@ 2023-05-09  6:23  0%     ` Ruifeng Wang
  2023-05-09  8:15  0%       ` Jie Hai
  0 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2023-05-09  6:23 UTC (permalink / raw)
  To: Jie Hai, Honnappa Nagarahalli, Konstantin Ananyev, Olivier Matz,
	Dharmik Jayesh Thakkar
  Cc: dev, liudongdong3, nd

> -----Original Message-----
> From: Jie Hai <haijie1@huawei.com>
> Sent: Tuesday, May 9, 2023 9:29 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin Ananyev
> <konstantin.v.ananyev@yandex.ru>; Ruifeng Wang <Ruifeng.Wang@arm.com>; Gavin Hu
> <Gavin.Hu@arm.com>; Olivier Matz <olivier.matz@6wind.com>; Dharmik Jayesh Thakkar
> <DharmikJayesh.Thakkar@arm.com>
> Cc: dev@dpdk.org; liudongdong3@huawei.com
> Subject: [PATCH v5 1/3] ring: fix unmatched type definition and usage
> 
> Field 'flags' of struct rte_ring is defined as int type. However, it is used as unsigned
> int. To ensure consistency, change the type of flags to unsigned int. Since these two
> types has the same byte size, this change is not an ABI change.
> 
> Fixes: cc4b218790f6 ("ring: support configurable element size")

The change looks good.
However, I think the fix line is not accurate. 
I suppose it fixes af75078fece3 ("first public release").

> 
> Signed-off-by: Jie Hai <haijie1@huawei.com>
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
>  lib/ring/rte_ring_core.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h index
> 82b237091b71..1c809abeb531 100644
> --- a/lib/ring/rte_ring_core.h
> +++ b/lib/ring/rte_ring_core.h
> @@ -120,7 +120,7 @@ struct rte_ring_hts_headtail {  struct rte_ring {
>  	char name[RTE_RING_NAMESIZE] __rte_cache_aligned;
>  	/**< Name of the ring. */
> -	int flags;               /**< Flags supplied at creation. */
> +	uint32_t flags;               /**< Flags supplied at creation. */
>  	const struct rte_memzone *memzone;
>  			/**< Memzone, if any, containing the rte_ring */
>  	uint32_t size;           /**< Size of ring. */
> --
> 2.33.0


^ permalink raw reply	[relevance 0%]

* [PATCH v5 1/3] ring: fix unmatched type definition and usage
  2023-05-09  1:29  3% ` [PATCH v5 " Jie Hai
@ 2023-05-09  1:29  3%   ` Jie Hai
  2023-05-09  6:23  0%     ` Ruifeng Wang
  2023-05-09  9:24  3%   ` [PATCH v6 0/3] add telemetry cmds for ring Jie Hai
  1 sibling, 1 reply; 200+ results
From: Jie Hai @ 2023-05-09  1:29 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev, Ruifeng Wang, Gavin Hu,
	Olivier Matz, Dharmik Thakkar
  Cc: dev, liudongdong3

Field 'flags' of struct rte_ring is defined as int type. However,
it is used as unsigned int. To ensure consistency, change the
type of flags to unsigned int. Since these two types has the
same byte size, this change is not an ABI change.

Fixes: cc4b218790f6 ("ring: support configurable element size")

Signed-off-by: Jie Hai <haijie1@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
 lib/ring/rte_ring_core.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 82b237091b71..1c809abeb531 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -120,7 +120,7 @@ struct rte_ring_hts_headtail {
 struct rte_ring {
 	char name[RTE_RING_NAMESIZE] __rte_cache_aligned;
 	/**< Name of the ring. */
-	int flags;               /**< Flags supplied at creation. */
+	uint32_t flags;               /**< Flags supplied at creation. */
 	const struct rte_memzone *memzone;
 			/**< Memzone, if any, containing the rte_ring */
 	uint32_t size;           /**< Size of ring. */
-- 
2.33.0


^ permalink raw reply	[relevance 3%]

* [PATCH v5 0/3] add telemetry cmds for ring
  @ 2023-05-09  1:29  3% ` Jie Hai
  2023-05-09  1:29  3%   ` [PATCH v5 1/3] ring: fix unmatched type definition and usage Jie Hai
  2023-05-09  9:24  3%   ` [PATCH v6 0/3] add telemetry cmds for ring Jie Hai
  0 siblings, 2 replies; 200+ results
From: Jie Hai @ 2023-05-09  1:29 UTC (permalink / raw)
  Cc: dev, liudongdong3

This patch set supports telemetry cmd to list rings and dump information
of a ring by its name.

v1->v2:
1. Add space after "switch".
2. Fix wrong strlen parameter.

v2->v3:
1. Remove prefix "rte_" for static function.
2. Add Acked-by Konstantin Ananyev for PATCH 1.
3. Introduce functions to return strings instead copy strings.
4. Check pointer to memzone of ring.
5. Remove redundant variable.
6. Hold lock when access ring data.

v3->v4:
1. Update changelog according to reviews of Honnappa Nagarahalli.
2. Add Reviewed-by Honnappa Nagarahalli.
3. Correct grammar in help information.
4. Correct spell warning on "te" reported by checkpatch.pl.
5. Use ring_walk() to query ring info instead of rte_ring_lookup().
6. Fix that type definition the flag field of rte_ring does not match the usage.
7. Use rte_tel_data_add_dict_uint_hex instead of rte_tel_data_add_dict_u64
   for mask and flags.

v4-v5:
1. Add Acked-by Konstantin Ananyev and Chengwen Feng.
2. Add ABI change explanation for commit message of patch 1/3.

Jie Hai (3):
  ring: fix unmatched type definition and usage
  ring: add telemetry cmd to list rings
  ring: add telemetry cmd for ring info

 lib/ring/meson.build     |   1 +
 lib/ring/rte_ring.c      | 139 +++++++++++++++++++++++++++++++++++++++
 lib/ring/rte_ring_core.h |   2 +-
 3 files changed, 141 insertions(+), 1 deletion(-)

-- 
2.33.0


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 0/3] vhost: add device op to offload the interrupt kick
  2023-04-05 12:40  3% [PATCH v2 0/3] vhost: add device op to offload the interrupt kick Eelco Chaudron
  @ 2023-05-08 13:58  0% ` Eelco Chaudron
  1 sibling, 0 replies; 200+ results
From: Eelco Chaudron @ 2023-05-08 13:58 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia; +Cc: dev



On 5 Apr 2023, at 14:40, Eelco Chaudron wrote:

> This series adds an operation callback which gets called every time the
> library wants to call eventfd_write(). This eventfd_write() call could
> result in a system call, which could potentially block the PMD thread.
>
> The callback function can decide whether it's ok to handle the
> eventfd_write() now or have the newly introduced function,
> rte_vhost_notify_guest(), called at a later time.
>
> This can be used by 3rd party applications, like OVS, to avoid system
> calls being called as part of the PMD threads.


Wondering if anyone had a chance to look at this patchset.

Cheers,

Eelco

> v2: - Used vhost_virtqueue->index to find index for operation.
>     - Aligned function name to VDUSE RFC patchset.
>     - Added error and offload statistics counter.
>     - Mark new API as experimental.
>     - Change the virtual queue spin lock to read/write spin lock.
>     - Made shared counters atomic.
>     - Add versioned rte_vhost_driver_callback_register() for
>       ABI compliance.
>
> Eelco Chaudron (3):
>       vhost: Change vhost_virtqueue access lock to a read/write one.
>       vhost: make the guest_notifications statistic counter atomic.
>       vhost: add device op to offload the interrupt kick
>
>
>  lib/eal/include/generic/rte_rwlock.h | 17 +++++
>  lib/vhost/meson.build                |  2 +
>  lib/vhost/rte_vhost.h                | 23 ++++++-
>  lib/vhost/socket.c                   | 72 ++++++++++++++++++++--
>  lib/vhost/version.map                |  9 +++
>  lib/vhost/vhost.c                    | 92 +++++++++++++++++++++-------
>  lib/vhost/vhost.h                    | 70 ++++++++++++++-------
>  lib/vhost/vhost_user.c               | 14 ++---
>  lib/vhost/virtio_net.c               | 90 +++++++++++++--------------
>  9 files changed, 288 insertions(+), 101 deletions(-)


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] net/liquidio: remove LiquidIO ethdev driver
    2023-05-02 14:18  5% ` Ferruh Yigit
@ 2023-05-08 13:44  1% ` jerinj
  1 sibling, 0 replies; 200+ results
From: jerinj @ 2023-05-08 13:44 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Anatoly Burakov
  Cc: david.marchand, ferruh.yigit, Jerin Jacob

From: Jerin Jacob <jerinj@marvell.com>

The LiquidIO product line has been substituted with CN9K/CN10K
OCTEON product line smart NICs located at drivers/net/octeon_ep/.

DPDK 20.08 has categorized the LiquidIO driver as UNMAINTAINED
because of the absence of updates in the driver.

Due to the above reasons, the driver removed from DPDK 23.07.

Also removed deprecation notice entry for the removal in
doc/guides/rel_notes/deprecation.rst and skipped removed
driver file in ABI check in devtools/libabigail.abignore.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
v2:
- Skip driver ABI check (Ferruh)
- Addressed the review comments in
  http://patches.dpdk.org/project/dpdk/patch/20230428103127.1059989-1-jerinj@marvell.com/ (Ferruh)

 MAINTAINERS                              |    8 -
 devtools/libabigail.abignore             |    1 +
 doc/guides/nics/features/liquidio.ini    |   29 -
 doc/guides/nics/index.rst                |    1 -
 doc/guides/nics/liquidio.rst             |  169 --
 doc/guides/rel_notes/deprecation.rst     |    7 -
 doc/guides/rel_notes/release_23_07.rst   |    2 +
 drivers/net/liquidio/base/lio_23xx_reg.h |  165 --
 drivers/net/liquidio/base/lio_23xx_vf.c  |  513 ------
 drivers/net/liquidio/base/lio_23xx_vf.h  |   63 -
 drivers/net/liquidio/base/lio_hw_defs.h  |  239 ---
 drivers/net/liquidio/base/lio_mbox.c     |  246 ---
 drivers/net/liquidio/base/lio_mbox.h     |  102 -
 drivers/net/liquidio/lio_ethdev.c        | 2147 ----------------------
 drivers/net/liquidio/lio_ethdev.h        |  179 --
 drivers/net/liquidio/lio_logs.h          |   58 -
 drivers/net/liquidio/lio_rxtx.c          | 1804 ------------------
 drivers/net/liquidio/lio_rxtx.h          |  740 --------
 drivers/net/liquidio/lio_struct.h        |  661 -------
 drivers/net/liquidio/meson.build         |   16 -
 drivers/net/meson.build                  |    1 -
 21 files changed, 3 insertions(+), 7148 deletions(-)
 delete mode 100644 doc/guides/nics/features/liquidio.ini
 delete mode 100644 doc/guides/nics/liquidio.rst
 delete mode 100644 drivers/net/liquidio/base/lio_23xx_reg.h
 delete mode 100644 drivers/net/liquidio/base/lio_23xx_vf.c
 delete mode 100644 drivers/net/liquidio/base/lio_23xx_vf.h
 delete mode 100644 drivers/net/liquidio/base/lio_hw_defs.h
 delete mode 100644 drivers/net/liquidio/base/lio_mbox.c
 delete mode 100644 drivers/net/liquidio/base/lio_mbox.h
 delete mode 100644 drivers/net/liquidio/lio_ethdev.c
 delete mode 100644 drivers/net/liquidio/lio_ethdev.h
 delete mode 100644 drivers/net/liquidio/lio_logs.h
 delete mode 100644 drivers/net/liquidio/lio_rxtx.c
 delete mode 100644 drivers/net/liquidio/lio_rxtx.h
 delete mode 100644 drivers/net/liquidio/lio_struct.h
 delete mode 100644 drivers/net/liquidio/meson.build

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df23e5099..0157c26dd2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -681,14 +681,6 @@ F: drivers/net/thunderx/
 F: doc/guides/nics/thunderx.rst
 F: doc/guides/nics/features/thunderx.ini
 
-Cavium LiquidIO - UNMAINTAINED
-M: Shijith Thotton <sthotton@marvell.com>
-M: Srisivasubramanian Srinivasan <srinivasan@marvell.com>
-T: git://dpdk.org/next/dpdk-next-net-mrvl
-F: drivers/net/liquidio/
-F: doc/guides/nics/liquidio.rst
-F: doc/guides/nics/features/liquidio.ini
-
 Cavium OCTEON TX
 M: Harman Kalra <hkalra@marvell.com>
 T: git://dpdk.org/next/dpdk-next-net-mrvl
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 3ff51509de..c0361bfc7b 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -25,6 +25,7 @@
 ;
 ; SKIP_LIBRARY=librte_common_mlx5_glue
 ; SKIP_LIBRARY=librte_net_mlx4_glue
+; SKIP_LIBRARY=librte_net_liquidio
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Experimental APIs exceptions ;
diff --git a/doc/guides/nics/features/liquidio.ini b/doc/guides/nics/features/liquidio.ini
deleted file mode 100644
index a8bde282e0..0000000000
--- a/doc/guides/nics/features/liquidio.ini
+++ /dev/null
@@ -1,29 +0,0 @@
-;
-; Supported features of the 'LiquidIO' network poll mode driver.
-;
-; Refer to default.ini for the full list of available PMD features.
-;
-[Features]
-Speed capabilities   = Y
-Link status          = Y
-Link status event    = Y
-MTU update           = Y
-Scattered Rx         = Y
-Promiscuous mode     = Y
-Allmulticast mode    = Y
-RSS hash             = Y
-RSS key update       = Y
-RSS reta update      = Y
-VLAN filter          = Y
-CRC offload          = Y
-VLAN offload         = P
-L3 checksum offload  = Y
-L4 checksum offload  = Y
-Inner L3 checksum    = Y
-Inner L4 checksum    = Y
-Basic stats          = Y
-Extended stats       = Y
-Multiprocess aware   = Y
-Linux                = Y
-x86-64               = Y
-Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c9d1edf5e..31296822e5 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -44,7 +44,6 @@ Network Interface Controller Drivers
     ipn3ke
     ixgbe
     kni
-    liquidio
     mana
     memif
     mlx4
diff --git a/doc/guides/nics/liquidio.rst b/doc/guides/nics/liquidio.rst
deleted file mode 100644
index f893b3b539..0000000000
--- a/doc/guides/nics/liquidio.rst
+++ /dev/null
@@ -1,169 +0,0 @@
-..  SPDX-License-Identifier: BSD-3-Clause
-    Copyright(c) 2017 Cavium, Inc
-
-LiquidIO VF Poll Mode Driver
-============================
-
-The LiquidIO VF PMD library (**librte_net_liquidio**) provides poll mode driver support for
-Cavium LiquidIO® II server adapter VFs. PF management and VF creation can be
-done using kernel driver.
-
-More information can be found at `Cavium Official Website
-<http://cavium.com/LiquidIO_Adapters.html>`_.
-
-Supported LiquidIO Adapters
------------------------------
-
-- LiquidIO II CN2350 210SV/225SV
-- LiquidIO II CN2350 210SVPT
-- LiquidIO II CN2360 210SV/225SV
-- LiquidIO II CN2360 210SVPT
-
-
-SR-IOV: Prerequisites and Sample Application Notes
---------------------------------------------------
-
-This section provides instructions to configure SR-IOV with Linux OS.
-
-#. Verify SR-IOV and ARI capabilities are enabled on the adapter using ``lspci``:
-
-   .. code-block:: console
-
-      lspci -s <slot> -vvv
-
-   Example output:
-
-   .. code-block:: console
-
-      [...]
-      Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
-      [...]
-      Capabilities: [178 v1] Single Root I/O Virtualization (SR-IOV)
-      [...]
-      Kernel driver in use: LiquidIO
-
-#. Load the kernel module:
-
-   .. code-block:: console
-
-      modprobe liquidio
-
-#. Bring up the PF ports:
-
-   .. code-block:: console
-
-      ifconfig p4p1 up
-      ifconfig p4p2 up
-
-#. Change PF MTU if required:
-
-   .. code-block:: console
-
-      ifconfig p4p1 mtu 9000
-      ifconfig p4p2 mtu 9000
-
-#. Create VF device(s):
-
-   Echo number of VFs to be created into ``"sriov_numvfs"`` sysfs entry
-   of the parent PF.
-
-   .. code-block:: console
-
-      echo 1 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs
-      echo 1 > /sys/bus/pci/devices/0000:03:00.1/sriov_numvfs
-
-#. Assign VF MAC address:
-
-   Assign MAC address to the VF using iproute2 utility. The syntax is::
-
-      ip link set <PF iface> vf <VF id> mac <macaddr>
-
-   Example output:
-
-   .. code-block:: console
-
-      ip link set p4p1 vf 0 mac F2:A8:1B:5E:B4:66
-
-#. Assign VF(s) to VM.
-
-   The VF devices may be passed through to the guest VM using qemu or
-   virt-manager or virsh etc.
-
-   Example qemu guest launch command:
-
-   .. code-block:: console
-
-      ./qemu-system-x86_64 -name lio-vm -machine accel=kvm \
-      -cpu host -m 4096 -smp 4 \
-      -drive file=<disk_file>,if=none,id=disk1,format=<type> \
-      -device virtio-blk-pci,scsi=off,drive=disk1,id=virtio-disk1,bootindex=1 \
-      -device vfio-pci,host=03:00.3 -device vfio-pci,host=03:08.3
-
-#. Running testpmd
-
-   Refer to the document
-   :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` to run
-   ``testpmd`` application.
-
-   .. note::
-
-      Use ``igb_uio`` instead of ``vfio-pci`` in VM.
-
-   Example output:
-
-   .. code-block:: console
-
-      [...]
-      EAL: PCI device 0000:03:00.3 on NUMA socket 0
-      EAL:   probe driver: 177d:9712 net_liovf
-      EAL:   using IOMMU type 1 (Type 1)
-      PMD: net_liovf[03:00.3]INFO: DEVICE : CN23XX VF
-      EAL: PCI device 0000:03:08.3 on NUMA socket 0
-      EAL:   probe driver: 177d:9712 net_liovf
-      PMD: net_liovf[03:08.3]INFO: DEVICE : CN23XX VF
-      Interactive-mode selected
-      USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=171456, size=2176, socket=0
-      Configuring Port 0 (socket 0)
-      PMD: net_liovf[03:00.3]INFO: Starting port 0
-      Port 0: F2:A8:1B:5E:B4:66
-      Configuring Port 1 (socket 0)
-      PMD: net_liovf[03:08.3]INFO: Starting port 1
-      Port 1: 32:76:CC:EE:56:D7
-      Checking link statuses...
-      Port 0 Link Up - speed 10000 Mbps - full-duplex
-      Port 1 Link Up - speed 10000 Mbps - full-duplex
-      Done
-      testpmd>
-
-#. Enabling VF promiscuous mode
-
-   One VF per PF can be marked as trusted for promiscuous mode.
-
-   .. code-block:: console
-
-      ip link set dev <PF iface> vf <VF id> trust on
-
-
-Limitations
------------
-
-VF MTU
-~~~~~~
-
-VF MTU is limited by PF MTU. Raise PF value before configuring VF for larger packet size.
-
-VLAN offload
-~~~~~~~~~~~~
-
-Tx VLAN insertion is not supported and consequently VLAN offload feature is
-marked partial.
-
-Ring size
-~~~~~~~~~
-
-Number of descriptors for Rx/Tx ring should be in the range 128 to 512.
-
-CRC stripping
-~~~~~~~~~~~~~
-
-LiquidIO adapters strip ethernet FCS of every packet coming to the host interface.
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index dcc1ca1696..8e1cdd677a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -121,13 +121,6 @@ Deprecation Notices
 * net/bnx2x: Starting from DPDK 23.07, the Marvell QLogic bnx2x driver will be removed.
   This decision has been made to alleviate the burden of maintaining a discontinued product.
 
-* net/liquidio: Remove LiquidIO ethdev driver.
-  The LiquidIO product line has been substituted
-  with CN9K/CN10K OCTEON product line smart NICs located in ``drivers/net/octeon_ep/``.
-  DPDK 20.08 has categorized the LiquidIO driver as UNMAINTAINED
-  because of the absence of updates in the driver.
-  Due to the above reasons, the driver will be unavailable from DPDK 23.07.
-
 * cryptodev: The function ``rte_cryptodev_cb_fn`` will be updated
   to have another parameter ``qp_id`` to return the queue pair ID
   which got error interrupt to the application,
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
index a9b1293689..f13a7b32b6 100644
--- a/doc/guides/rel_notes/release_23_07.rst
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -68,6 +68,8 @@ Removed Items
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* Removed LiquidIO ethdev driver located at ``drivers/net/liquidio/``
+
 
 API Changes
 -----------
diff --git a/drivers/net/liquidio/base/lio_23xx_reg.h b/drivers/net/liquidio/base/lio_23xx_reg.h
deleted file mode 100644
index 9f28504b53..0000000000
--- a/drivers/net/liquidio/base/lio_23xx_reg.h
+++ /dev/null
@@ -1,165 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#ifndef _LIO_23XX_REG_H_
-#define _LIO_23XX_REG_H_
-
-/* ###################### REQUEST QUEUE ######################### */
-
-/* 64 registers for Input Queues Start Addr - SLI_PKT(0..63)_INSTR_BADDR */
-#define CN23XX_SLI_PKT_INSTR_BADDR_START64	0x10010
-
-/* 64 registers for Input Doorbell - SLI_PKT(0..63)_INSTR_BAOFF_DBELL */
-#define CN23XX_SLI_PKT_INSTR_BADDR_DBELL_START	0x10020
-
-/* 64 registers for Input Queue size - SLI_PKT(0..63)_INSTR_FIFO_RSIZE */
-#define CN23XX_SLI_PKT_INSTR_FIFO_RSIZE_START	0x10030
-
-/* 64 registers for Input Queue Instr Count - SLI_PKT_IN_DONE(0..63)_CNTS */
-#define CN23XX_SLI_PKT_IN_DONE_CNTS_START64	0x10040
-
-/* 64 registers (64-bit) - ES, RO, NS, Arbitration for Input Queue Data &
- * gather list fetches. SLI_PKT(0..63)_INPUT_CONTROL.
- */
-#define CN23XX_SLI_PKT_INPUT_CONTROL_START64	0x10000
-
-/* ------- Request Queue Macros --------- */
-
-/* Each Input Queue register is at a 16-byte Offset in BAR0 */
-#define CN23XX_IQ_OFFSET			0x20000
-
-#define CN23XX_SLI_IQ_PKT_CONTROL64(iq)					\
-	(CN23XX_SLI_PKT_INPUT_CONTROL_START64 + ((iq) * CN23XX_IQ_OFFSET))
-
-#define CN23XX_SLI_IQ_BASE_ADDR64(iq)					\
-	(CN23XX_SLI_PKT_INSTR_BADDR_START64 + ((iq) * CN23XX_IQ_OFFSET))
-
-#define CN23XX_SLI_IQ_SIZE(iq)						\
-	(CN23XX_SLI_PKT_INSTR_FIFO_RSIZE_START + ((iq) * CN23XX_IQ_OFFSET))
-
-#define CN23XX_SLI_IQ_DOORBELL(iq)					\
-	(CN23XX_SLI_PKT_INSTR_BADDR_DBELL_START + ((iq) * CN23XX_IQ_OFFSET))
-
-#define CN23XX_SLI_IQ_INSTR_COUNT64(iq)					\
-	(CN23XX_SLI_PKT_IN_DONE_CNTS_START64 + ((iq) * CN23XX_IQ_OFFSET))
-
-/* Number of instructions to be read in one MAC read request.
- * setting to Max value(4)
- */
-#define CN23XX_PKT_INPUT_CTL_RDSIZE			(3 << 25)
-#define CN23XX_PKT_INPUT_CTL_IS_64B			(1 << 24)
-#define CN23XX_PKT_INPUT_CTL_RST			(1 << 23)
-#define CN23XX_PKT_INPUT_CTL_QUIET			(1 << 28)
-#define CN23XX_PKT_INPUT_CTL_RING_ENB			(1 << 22)
-#define CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP		(1 << 6)
-#define CN23XX_PKT_INPUT_CTL_USE_CSR			(1 << 4)
-#define CN23XX_PKT_INPUT_CTL_GATHER_ES_64B_SWAP		(2)
-
-/* These bits[47:44] select the Physical function number within the MAC */
-#define CN23XX_PKT_INPUT_CTL_PF_NUM_POS		45
-/* These bits[43:32] select the function number within the PF */
-#define CN23XX_PKT_INPUT_CTL_VF_NUM_POS		32
-
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-#define CN23XX_PKT_INPUT_CTL_MASK			\
-	(CN23XX_PKT_INPUT_CTL_RDSIZE |			\
-	 CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP |	\
-	 CN23XX_PKT_INPUT_CTL_USE_CSR)
-#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-#define CN23XX_PKT_INPUT_CTL_MASK			\
-	(CN23XX_PKT_INPUT_CTL_RDSIZE |			\
-	 CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP |	\
-	 CN23XX_PKT_INPUT_CTL_USE_CSR |			\
-	 CN23XX_PKT_INPUT_CTL_GATHER_ES_64B_SWAP)
-#endif
-
-/* ############################ OUTPUT QUEUE ######################### */
-
-/* 64 registers for Output queue control - SLI_PKT(0..63)_OUTPUT_CONTROL */
-#define CN23XX_SLI_PKT_OUTPUT_CONTROL_START	0x10050
-
-/* 64 registers for Output queue buffer and info size
- * SLI_PKT(0..63)_OUT_SIZE
- */
-#define CN23XX_SLI_PKT_OUT_SIZE			0x10060
-
-/* 64 registers for Output Queue Start Addr - SLI_PKT(0..63)_SLIST_BADDR */
-#define CN23XX_SLI_SLIST_BADDR_START64		0x10070
-
-/* 64 registers for Output Queue Packet Credits
- * SLI_PKT(0..63)_SLIST_BAOFF_DBELL
- */
-#define CN23XX_SLI_PKT_SLIST_BAOFF_DBELL_START	0x10080
-
-/* 64 registers for Output Queue size - SLI_PKT(0..63)_SLIST_FIFO_RSIZE */
-#define CN23XX_SLI_PKT_SLIST_FIFO_RSIZE_START	0x10090
-
-/* 64 registers for Output Queue Packet Count - SLI_PKT(0..63)_CNTS */
-#define CN23XX_SLI_PKT_CNTS_START		0x100B0
-
-/* Each Output Queue register is at a 16-byte Offset in BAR0 */
-#define CN23XX_OQ_OFFSET			0x20000
-
-/* ------- Output Queue Macros --------- */
-
-#define CN23XX_SLI_OQ_PKT_CONTROL(oq)					\
-	(CN23XX_SLI_PKT_OUTPUT_CONTROL_START + ((oq) * CN23XX_OQ_OFFSET))
-
-#define CN23XX_SLI_OQ_BASE_ADDR64(oq)					\
-	(CN23XX_SLI_SLIST_BADDR_START64 + ((oq) * CN23XX_OQ_OFFSET))
-
-#define CN23XX_SLI_OQ_SIZE(oq)						\
-	(CN23XX_SLI_PKT_SLIST_FIFO_RSIZE_START + ((oq) * CN23XX_OQ_OFFSET))
-
-#define CN23XX_SLI_OQ_BUFF_INFO_SIZE(oq)				\
-	(CN23XX_SLI_PKT_OUT_SIZE + ((oq) * CN23XX_OQ_OFFSET))
-
-#define CN23XX_SLI_OQ_PKTS_SENT(oq)					\
-	(CN23XX_SLI_PKT_CNTS_START + ((oq) * CN23XX_OQ_OFFSET))
-
-#define CN23XX_SLI_OQ_PKTS_CREDIT(oq)					\
-	(CN23XX_SLI_PKT_SLIST_BAOFF_DBELL_START + ((oq) * CN23XX_OQ_OFFSET))
-
-/* ------------------ Masks ---------------- */
-#define CN23XX_PKT_OUTPUT_CTL_IPTR		(1 << 11)
-#define CN23XX_PKT_OUTPUT_CTL_ES		(1 << 9)
-#define CN23XX_PKT_OUTPUT_CTL_NSR		(1 << 8)
-#define CN23XX_PKT_OUTPUT_CTL_ROR		(1 << 7)
-#define CN23XX_PKT_OUTPUT_CTL_DPTR		(1 << 6)
-#define CN23XX_PKT_OUTPUT_CTL_BMODE		(1 << 5)
-#define CN23XX_PKT_OUTPUT_CTL_ES_P		(1 << 3)
-#define CN23XX_PKT_OUTPUT_CTL_NSR_P		(1 << 2)
-#define CN23XX_PKT_OUTPUT_CTL_ROR_P		(1 << 1)
-#define CN23XX_PKT_OUTPUT_CTL_RING_ENB		(1 << 0)
-
-/* Rings per Virtual Function [RO] */
-#define CN23XX_PKT_INPUT_CTL_RPVF_MASK		0x3F
-#define CN23XX_PKT_INPUT_CTL_RPVF_POS		48
-
-/* These bits[47:44][RO] give the Physical function
- * number info within the MAC
- */
-#define CN23XX_PKT_INPUT_CTL_PF_NUM_MASK	0x7
-
-/* These bits[43:32][RO] give the virtual function
- * number info within the PF
- */
-#define CN23XX_PKT_INPUT_CTL_VF_NUM_MASK	0x1FFF
-
-/* ######################### Mailbox Reg Macros ######################## */
-#define CN23XX_SLI_PKT_PF_VF_MBOX_SIG_START	0x10200
-#define CN23XX_VF_SLI_PKT_MBOX_INT_START	0x10210
-
-#define CN23XX_SLI_MBOX_OFFSET			0x20000
-#define CN23XX_SLI_MBOX_SIG_IDX_OFFSET		0x8
-
-#define CN23XX_SLI_PKT_PF_VF_MBOX_SIG(q, idx)				\
-	(CN23XX_SLI_PKT_PF_VF_MBOX_SIG_START +				\
-	 ((q) * CN23XX_SLI_MBOX_OFFSET +				\
-	  (idx) * CN23XX_SLI_MBOX_SIG_IDX_OFFSET))
-
-#define CN23XX_VF_SLI_PKT_MBOX_INT(q)					\
-	(CN23XX_VF_SLI_PKT_MBOX_INT_START + ((q) * CN23XX_SLI_MBOX_OFFSET))
-
-#endif /* _LIO_23XX_REG_H_ */
diff --git a/drivers/net/liquidio/base/lio_23xx_vf.c b/drivers/net/liquidio/base/lio_23xx_vf.c
deleted file mode 100644
index c6b8310b71..0000000000
--- a/drivers/net/liquidio/base/lio_23xx_vf.c
+++ /dev/null
@@ -1,513 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#include <string.h>
-
-#include <ethdev_driver.h>
-#include <rte_cycles.h>
-#include <rte_malloc.h>
-
-#include "lio_logs.h"
-#include "lio_23xx_vf.h"
-#include "lio_23xx_reg.h"
-#include "lio_mbox.h"
-
-static int
-cn23xx_vf_reset_io_queues(struct lio_device *lio_dev, uint32_t num_queues)
-{
-	uint32_t loop = CN23XX_VF_BUSY_READING_REG_LOOP_COUNT;
-	uint64_t d64, q_no;
-	int ret_val = 0;
-
-	PMD_INIT_FUNC_TRACE();
-
-	for (q_no = 0; q_no < num_queues; q_no++) {
-		/* set RST bit to 1. This bit applies to both IQ and OQ */
-		d64 = lio_read_csr64(lio_dev,
-				     CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
-		d64 = d64 | CN23XX_PKT_INPUT_CTL_RST;
-		lio_write_csr64(lio_dev, CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
-				d64);
-	}
-
-	/* wait until the RST bit is clear or the RST and QUIET bits are set */
-	for (q_no = 0; q_no < num_queues; q_no++) {
-		volatile uint64_t reg_val;
-
-		reg_val	= lio_read_csr64(lio_dev,
-					 CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
-		while ((reg_val & CN23XX_PKT_INPUT_CTL_RST) &&
-				!(reg_val & CN23XX_PKT_INPUT_CTL_QUIET) &&
-				loop) {
-			reg_val = lio_read_csr64(
-					lio_dev,
-					CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
-			loop = loop - 1;
-		}
-
-		if (loop == 0) {
-			lio_dev_err(lio_dev,
-				    "clearing the reset reg failed or setting the quiet reg failed for qno: %lu\n",
-				    (unsigned long)q_no);
-			return -1;
-		}
-
-		reg_val = reg_val & ~CN23XX_PKT_INPUT_CTL_RST;
-		lio_write_csr64(lio_dev, CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
-				reg_val);
-
-		reg_val = lio_read_csr64(
-		    lio_dev, CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
-		if (reg_val & CN23XX_PKT_INPUT_CTL_RST) {
-			lio_dev_err(lio_dev,
-				    "clearing the reset failed for qno: %lu\n",
-				    (unsigned long)q_no);
-			ret_val = -1;
-		}
-	}
-
-	return ret_val;
-}
-
-static int
-cn23xx_vf_setup_global_input_regs(struct lio_device *lio_dev)
-{
-	uint64_t q_no;
-	uint64_t d64;
-
-	PMD_INIT_FUNC_TRACE();
-
-	if (cn23xx_vf_reset_io_queues(lio_dev,
-				      lio_dev->sriov_info.rings_per_vf))
-		return -1;
-
-	for (q_no = 0; q_no < (lio_dev->sriov_info.rings_per_vf); q_no++) {
-		lio_write_csr64(lio_dev, CN23XX_SLI_IQ_DOORBELL(q_no),
-				0xFFFFFFFF);
-
-		d64 = lio_read_csr64(lio_dev,
-				     CN23XX_SLI_IQ_INSTR_COUNT64(q_no));
-
-		d64 &= 0xEFFFFFFFFFFFFFFFL;
-
-		lio_write_csr64(lio_dev, CN23XX_SLI_IQ_INSTR_COUNT64(q_no),
-				d64);
-
-		/* Select ES, RO, NS, RDSIZE,DPTR Fomat#0 for
-		 * the Input Queues
-		 */
-		lio_write_csr64(lio_dev, CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
-				CN23XX_PKT_INPUT_CTL_MASK);
-	}
-
-	return 0;
-}
-
-static void
-cn23xx_vf_setup_global_output_regs(struct lio_device *lio_dev)
-{
-	uint32_t reg_val;
-	uint32_t q_no;
-
-	PMD_INIT_FUNC_TRACE();
-
-	for (q_no = 0; q_no < lio_dev->sriov_info.rings_per_vf; q_no++) {
-		lio_write_csr(lio_dev, CN23XX_SLI_OQ_PKTS_CREDIT(q_no),
-			      0xFFFFFFFF);
-
-		reg_val =
-		    lio_read_csr(lio_dev, CN23XX_SLI_OQ_PKTS_SENT(q_no));
-
-		reg_val &= 0xEFFFFFFFFFFFFFFFL;
-
-		lio_write_csr(lio_dev, CN23XX_SLI_OQ_PKTS_SENT(q_no), reg_val);
-
-		reg_val =
-		    lio_read_csr(lio_dev, CN23XX_SLI_OQ_PKT_CONTROL(q_no));
-
-		/* set IPTR & DPTR */
-		reg_val |=
-		    (CN23XX_PKT_OUTPUT_CTL_IPTR | CN23XX_PKT_OUTPUT_CTL_DPTR);
-
-		/* reset BMODE */
-		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_BMODE);
-
-		/* No Relaxed Ordering, No Snoop, 64-bit Byte swap
-		 * for Output Queue Scatter List
-		 * reset ROR_P, NSR_P
-		 */
-		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ROR_P);
-		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_NSR_P);
-
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ES_P);
-#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-		reg_val |= (CN23XX_PKT_OUTPUT_CTL_ES_P);
-#endif
-		/* No Relaxed Ordering, No Snoop, 64-bit Byte swap
-		 * for Output Queue Data
-		 * reset ROR, NSR
-		 */
-		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ROR);
-		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_NSR);
-		/* set the ES bit */
-		reg_val |= (CN23XX_PKT_OUTPUT_CTL_ES);
-
-		/* write all the selected settings */
-		lio_write_csr(lio_dev, CN23XX_SLI_OQ_PKT_CONTROL(q_no),
-			      reg_val);
-	}
-}
-
-static int
-cn23xx_vf_setup_device_regs(struct lio_device *lio_dev)
-{
-	PMD_INIT_FUNC_TRACE();
-
-	if (cn23xx_vf_setup_global_input_regs(lio_dev))
-		return -1;
-
-	cn23xx_vf_setup_global_output_regs(lio_dev);
-
-	return 0;
-}
-
-static void
-cn23xx_vf_setup_iq_regs(struct lio_device *lio_dev, uint32_t iq_no)
-{
-	struct lio_instr_queue *iq = lio_dev->instr_queue[iq_no];
-	uint64_t pkt_in_done = 0;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/* Write the start of the input queue's ring and its size */
-	lio_write_csr64(lio_dev, CN23XX_SLI_IQ_BASE_ADDR64(iq_no),
-			iq->base_addr_dma);
-	lio_write_csr(lio_dev, CN23XX_SLI_IQ_SIZE(iq_no), iq->nb_desc);
-
-	/* Remember the doorbell & instruction count register addr
-	 * for this queue
-	 */
-	iq->doorbell_reg = (uint8_t *)lio_dev->hw_addr +
-				CN23XX_SLI_IQ_DOORBELL(iq_no);
-	iq->inst_cnt_reg = (uint8_t *)lio_dev->hw_addr +
-				CN23XX_SLI_IQ_INSTR_COUNT64(iq_no);
-	lio_dev_dbg(lio_dev, "InstQ[%d]:dbell reg @ 0x%p instcnt_reg @ 0x%p\n",
-		    iq_no, iq->doorbell_reg, iq->inst_cnt_reg);
-
-	/* Store the current instruction counter (used in flush_iq
-	 * calculation)
-	 */
-	pkt_in_done = rte_read64(iq->inst_cnt_reg);
-
-	/* Clear the count by writing back what we read, but don't
-	 * enable data traffic here
-	 */
-	rte_write64(pkt_in_done, iq->inst_cnt_reg);
-}
-
-static void
-cn23xx_vf_setup_oq_regs(struct lio_device *lio_dev, uint32_t oq_no)
-{
-	struct lio_droq *droq = lio_dev->droq[oq_no];
-
-	PMD_INIT_FUNC_TRACE();
-
-	lio_write_csr64(lio_dev, CN23XX_SLI_OQ_BASE_ADDR64(oq_no),
-			droq->desc_ring_dma);
-	lio_write_csr(lio_dev, CN23XX_SLI_OQ_SIZE(oq_no), droq->nb_desc);
-
-	lio_write_csr(lio_dev, CN23XX_SLI_OQ_BUFF_INFO_SIZE(oq_no),
-		      (droq->buffer_size | (OCTEON_RH_SIZE << 16)));
-
-	/* Get the mapped address of the pkt_sent and pkts_credit regs */
-	droq->pkts_sent_reg = (uint8_t *)lio_dev->hw_addr +
-					CN23XX_SLI_OQ_PKTS_SENT(oq_no);
-	droq->pkts_credit_reg = (uint8_t *)lio_dev->hw_addr +
-					CN23XX_SLI_OQ_PKTS_CREDIT(oq_no);
-}
-
-static void
-cn23xx_vf_free_mbox(struct lio_device *lio_dev)
-{
-	PMD_INIT_FUNC_TRACE();
-
-	rte_free(lio_dev->mbox[0]);
-	lio_dev->mbox[0] = NULL;
-
-	rte_free(lio_dev->mbox);
-	lio_dev->mbox = NULL;
-}
-
-static int
-cn23xx_vf_setup_mbox(struct lio_device *lio_dev)
-{
-	struct lio_mbox *mbox;
-
-	PMD_INIT_FUNC_TRACE();
-
-	if (lio_dev->mbox == NULL) {
-		lio_dev->mbox = rte_zmalloc(NULL, sizeof(void *), 0);
-		if (lio_dev->mbox == NULL)
-			return -ENOMEM;
-	}
-
-	mbox = rte_zmalloc(NULL, sizeof(struct lio_mbox), 0);
-	if (mbox == NULL) {
-		rte_free(lio_dev->mbox);
-		lio_dev->mbox = NULL;
-		return -ENOMEM;
-	}
-
-	rte_spinlock_init(&mbox->lock);
-
-	mbox->lio_dev = lio_dev;
-
-	mbox->q_no = 0;
-
-	mbox->state = LIO_MBOX_STATE_IDLE;
-
-	/* VF mbox interrupt reg */
-	mbox->mbox_int_reg = (uint8_t *)lio_dev->hw_addr +
-				CN23XX_VF_SLI_PKT_MBOX_INT(0);
-	/* VF reads from SIG0 reg */
-	mbox->mbox_read_reg = (uint8_t *)lio_dev->hw_addr +
-				CN23XX_SLI_PKT_PF_VF_MBOX_SIG(0, 0);
-	/* VF writes into SIG1 reg */
-	mbox->mbox_write_reg = (uint8_t *)lio_dev->hw_addr +
-				CN23XX_SLI_PKT_PF_VF_MBOX_SIG(0, 1);
-
-	lio_dev->mbox[0] = mbox;
-
-	rte_write64(LIO_PFVFSIG, mbox->mbox_read_reg);
-
-	return 0;
-}
-
-static int
-cn23xx_vf_enable_io_queues(struct lio_device *lio_dev)
-{
-	uint32_t q_no;
-
-	PMD_INIT_FUNC_TRACE();
-
-	for (q_no = 0; q_no < lio_dev->num_iqs; q_no++) {
-		uint64_t reg_val;
-
-		/* set the corresponding IQ IS_64B bit */
-		if (lio_dev->io_qmask.iq64B & (1ULL << q_no)) {
-			reg_val = lio_read_csr64(
-					lio_dev,
-					CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
-			reg_val = reg_val | CN23XX_PKT_INPUT_CTL_IS_64B;
-			lio_write_csr64(lio_dev,
-					CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
-					reg_val);
-		}
-
-		/* set the corresponding IQ ENB bit */
-		if (lio_dev->io_qmask.iq & (1ULL << q_no)) {
-			reg_val = lio_read_csr64(
-					lio_dev,
-					CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
-			reg_val = reg_val | CN23XX_PKT_INPUT_CTL_RING_ENB;
-			lio_write_csr64(lio_dev,
-					CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
-					reg_val);
-		}
-	}
-	for (q_no = 0; q_no < lio_dev->num_oqs; q_no++) {
-		uint32_t reg_val;
-
-		/* set the corresponding OQ ENB bit */
-		if (lio_dev->io_qmask.oq & (1ULL << q_no)) {
-			reg_val = lio_read_csr(
-					lio_dev,
-					CN23XX_SLI_OQ_PKT_CONTROL(q_no));
-			reg_val = reg_val | CN23XX_PKT_OUTPUT_CTL_RING_ENB;
-			lio_write_csr(lio_dev,
-				      CN23XX_SLI_OQ_PKT_CONTROL(q_no),
-				      reg_val);
-		}
-	}
-
-	return 0;
-}
-
-static void
-cn23xx_vf_disable_io_queues(struct lio_device *lio_dev)
-{
-	uint32_t num_queues;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/* per HRM, rings can only be disabled via reset operation,
-	 * NOT via SLI_PKT()_INPUT/OUTPUT_CONTROL[ENB]
-	 */
-	num_queues = lio_dev->num_iqs;
-	if (num_queues < lio_dev->num_oqs)
-		num_queues = lio_dev->num_oqs;
-
-	cn23xx_vf_reset_io_queues(lio_dev, num_queues);
-}
-
-void
-cn23xx_vf_ask_pf_to_do_flr(struct lio_device *lio_dev)
-{
-	struct lio_mbox_cmd mbox_cmd;
-
-	memset(&mbox_cmd, 0, sizeof(struct lio_mbox_cmd));
-	mbox_cmd.msg.s.type = LIO_MBOX_REQUEST;
-	mbox_cmd.msg.s.resp_needed = 0;
-	mbox_cmd.msg.s.cmd = LIO_VF_FLR_REQUEST;
-	mbox_cmd.msg.s.len = 1;
-	mbox_cmd.q_no = 0;
-	mbox_cmd.recv_len = 0;
-	mbox_cmd.recv_status = 0;
-	mbox_cmd.fn = NULL;
-	mbox_cmd.fn_arg = 0;
-
-	lio_mbox_write(lio_dev, &mbox_cmd);
-}
-
-static void
-cn23xx_pfvf_hs_callback(struct lio_device *lio_dev,
-			struct lio_mbox_cmd *cmd, void *arg)
-{
-	uint32_t major = 0;
-
-	PMD_INIT_FUNC_TRACE();
-
-	rte_memcpy((uint8_t *)&lio_dev->pfvf_hsword, cmd->msg.s.params, 6);
-	if (cmd->recv_len > 1) {
-		struct lio_version *lio_ver = (struct lio_version *)cmd->data;
-
-		major = lio_ver->major;
-		major = major << 16;
-	}
-
-	rte_atomic64_set((rte_atomic64_t *)arg, major | 1);
-}
-
-int
-cn23xx_pfvf_handshake(struct lio_device *lio_dev)
-{
-	struct lio_mbox_cmd mbox_cmd;
-	struct lio_version *lio_ver = (struct lio_version *)&mbox_cmd.data[0];
-	uint32_t q_no, count = 0;
-	rte_atomic64_t status;
-	uint32_t pfmajor;
-	uint32_t vfmajor;
-	uint32_t ret;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/* Sending VF_ACTIVE indication to the PF driver */
-	lio_dev_dbg(lio_dev, "requesting info from PF\n");
-
-	mbox_cmd.msg.mbox_msg64 = 0;
-	mbox_cmd.msg.s.type = LIO_MBOX_REQUEST;
-	mbox_cmd.msg.s.resp_needed = 1;
-	mbox_cmd.msg.s.cmd = LIO_VF_ACTIVE;
-	mbox_cmd.msg.s.len = 2;
-	mbox_cmd.data[0] = 0;
-	lio_ver->major = LIO_BASE_MAJOR_VERSION;
-	lio_ver->minor = LIO_BASE_MINOR_VERSION;
-	lio_ver->micro = LIO_BASE_MICRO_VERSION;
-	mbox_cmd.q_no = 0;
-	mbox_cmd.recv_len = 0;
-	mbox_cmd.recv_status = 0;
-	mbox_cmd.fn = (lio_mbox_callback)cn23xx_pfvf_hs_callback;
-	mbox_cmd.fn_arg = (void *)&status;
-
-	if (lio_mbox_write(lio_dev, &mbox_cmd)) {
-		lio_dev_err(lio_dev, "Write to mailbox failed\n");
-		return -1;
-	}
-
-	rte_atomic64_set(&status, 0);
-
-	do {
-		rte_delay_ms(1);
-	} while ((rte_atomic64_read(&status) == 0) && (count++ < 10000));
-
-	ret = rte_atomic64_read(&status);
-	if (ret == 0) {
-		lio_dev_err(lio_dev, "cn23xx_pfvf_handshake timeout\n");
-		return -1;
-	}
-
-	for (q_no = 0; q_no < lio_dev->num_iqs; q_no++)
-		lio_dev->instr_queue[q_no]->txpciq.s.pkind =
-						lio_dev->pfvf_hsword.pkind;
-
-	vfmajor = LIO_BASE_MAJOR_VERSION;
-	pfmajor = ret >> 16;
-	if (pfmajor != vfmajor) {
-		lio_dev_err(lio_dev,
-			    "VF LiquidIO driver (major version %d) is not compatible with LiquidIO PF driver (major version %d)\n",
-			    vfmajor, pfmajor);
-		ret = -EPERM;
-	} else {
-		lio_dev_dbg(lio_dev,
-			    "VF LiquidIO driver (major version %d), LiquidIO PF driver (major version %d)\n",
-			    vfmajor, pfmajor);
-		ret = 0;
-	}
-
-	lio_dev_dbg(lio_dev, "got data from PF pkind is %d\n",
-		    lio_dev->pfvf_hsword.pkind);
-
-	return ret;
-}
-
-void
-cn23xx_vf_handle_mbox(struct lio_device *lio_dev)
-{
-	uint64_t mbox_int_val;
-
-	/* read and clear by writing 1 */
-	mbox_int_val = rte_read64(lio_dev->mbox[0]->mbox_int_reg);
-	rte_write64(mbox_int_val, lio_dev->mbox[0]->mbox_int_reg);
-	if (lio_mbox_read(lio_dev->mbox[0]))
-		lio_mbox_process_message(lio_dev->mbox[0]);
-}
-
-int
-cn23xx_vf_setup_device(struct lio_device *lio_dev)
-{
-	uint64_t reg_val;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/* INPUT_CONTROL[RPVF] gives the VF IOq count */
-	reg_val = lio_read_csr64(lio_dev, CN23XX_SLI_IQ_PKT_CONTROL64(0));
-
-	lio_dev->pf_num = (reg_val >> CN23XX_PKT_INPUT_CTL_PF_NUM_POS) &
-				CN23XX_PKT_INPUT_CTL_PF_NUM_MASK;
-	lio_dev->vf_num = (reg_val >> CN23XX_PKT_INPUT_CTL_VF_NUM_POS) &
-				CN23XX_PKT_INPUT_CTL_VF_NUM_MASK;
-
-	reg_val = reg_val >> CN23XX_PKT_INPUT_CTL_RPVF_POS;
-
-	lio_dev->sriov_info.rings_per_vf =
-				reg_val & CN23XX_PKT_INPUT_CTL_RPVF_MASK;
-
-	lio_dev->default_config = lio_get_conf(lio_dev);
-	if (lio_dev->default_config == NULL)
-		return -1;
-
-	lio_dev->fn_list.setup_iq_regs		= cn23xx_vf_setup_iq_regs;
-	lio_dev->fn_list.setup_oq_regs		= cn23xx_vf_setup_oq_regs;
-	lio_dev->fn_list.setup_mbox		= cn23xx_vf_setup_mbox;
-	lio_dev->fn_list.free_mbox		= cn23xx_vf_free_mbox;
-
-	lio_dev->fn_list.setup_device_regs	= cn23xx_vf_setup_device_regs;
-
-	lio_dev->fn_list.enable_io_queues	= cn23xx_vf_enable_io_queues;
-	lio_dev->fn_list.disable_io_queues	= cn23xx_vf_disable_io_queues;
-
-	return 0;
-}
-
diff --git a/drivers/net/liquidio/base/lio_23xx_vf.h b/drivers/net/liquidio/base/lio_23xx_vf.h
deleted file mode 100644
index 8e5362db15..0000000000
--- a/drivers/net/liquidio/base/lio_23xx_vf.h
+++ /dev/null
@@ -1,63 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#ifndef _LIO_23XX_VF_H_
-#define _LIO_23XX_VF_H_
-
-#include <stdio.h>
-
-#include "lio_struct.h"
-
-static const struct lio_config default_cn23xx_conf	= {
-	.card_type				= LIO_23XX,
-	.card_name				= LIO_23XX_NAME,
-	/** IQ attributes */
-	.iq					= {
-		.max_iqs			= CN23XX_CFG_IO_QUEUES,
-		.pending_list_size		=
-			(CN23XX_MAX_IQ_DESCRIPTORS * CN23XX_CFG_IO_QUEUES),
-		.instr_type			= OCTEON_64BYTE_INSTR,
-	},
-
-	/** OQ attributes */
-	.oq					= {
-		.max_oqs			= CN23XX_CFG_IO_QUEUES,
-		.info_ptr			= OCTEON_OQ_INFOPTR_MODE,
-		.refill_threshold		= CN23XX_OQ_REFIL_THRESHOLD,
-	},
-
-	.num_nic_ports				= CN23XX_DEFAULT_NUM_PORTS,
-	.num_def_rx_descs			= CN23XX_MAX_OQ_DESCRIPTORS,
-	.num_def_tx_descs			= CN23XX_MAX_IQ_DESCRIPTORS,
-	.def_rx_buf_size			= CN23XX_OQ_BUF_SIZE,
-};
-
-static inline const struct lio_config *
-lio_get_conf(struct lio_device *lio_dev)
-{
-	const struct lio_config *default_lio_conf = NULL;
-
-	/* check the LIO Device model & return the corresponding lio
-	 * configuration
-	 */
-	default_lio_conf = &default_cn23xx_conf;
-
-	if (default_lio_conf == NULL) {
-		lio_dev_err(lio_dev, "Configuration verification failed\n");
-		return NULL;
-	}
-
-	return default_lio_conf;
-}
-
-#define CN23XX_VF_BUSY_READING_REG_LOOP_COUNT	100000
-
-void cn23xx_vf_ask_pf_to_do_flr(struct lio_device *lio_dev);
-
-int cn23xx_pfvf_handshake(struct lio_device *lio_dev);
-
-int cn23xx_vf_setup_device(struct lio_device  *lio_dev);
-
-void cn23xx_vf_handle_mbox(struct lio_device *lio_dev);
-#endif /* _LIO_23XX_VF_H_  */
diff --git a/drivers/net/liquidio/base/lio_hw_defs.h b/drivers/net/liquidio/base/lio_hw_defs.h
deleted file mode 100644
index 5e119c1241..0000000000
--- a/drivers/net/liquidio/base/lio_hw_defs.h
+++ /dev/null
@@ -1,239 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#ifndef _LIO_HW_DEFS_H_
-#define _LIO_HW_DEFS_H_
-
-#include <rte_io.h>
-
-#ifndef PCI_VENDOR_ID_CAVIUM
-#define PCI_VENDOR_ID_CAVIUM	0x177D
-#endif
-
-#define LIO_CN23XX_VF_VID	0x9712
-
-/* CN23xx subsystem device ids */
-#define PCI_SUBSYS_DEV_ID_CN2350_210		0x0004
-#define PCI_SUBSYS_DEV_ID_CN2360_210		0x0005
-#define PCI_SUBSYS_DEV_ID_CN2360_225		0x0006
-#define PCI_SUBSYS_DEV_ID_CN2350_225		0x0007
-#define PCI_SUBSYS_DEV_ID_CN2350_210SVPN3	0x0008
-#define PCI_SUBSYS_DEV_ID_CN2360_210SVPN3	0x0009
-#define PCI_SUBSYS_DEV_ID_CN2350_210SVPT	0x000a
-#define PCI_SUBSYS_DEV_ID_CN2360_210SVPT	0x000b
-
-/* --------------------------CONFIG VALUES------------------------ */
-
-/* CN23xx IQ configuration macros */
-#define CN23XX_MAX_RINGS_PER_PF			64
-#define CN23XX_MAX_RINGS_PER_VF			8
-
-#define CN23XX_MAX_INPUT_QUEUES			CN23XX_MAX_RINGS_PER_PF
-#define CN23XX_MAX_IQ_DESCRIPTORS		512
-#define CN23XX_MIN_IQ_DESCRIPTORS		128
-
-#define CN23XX_MAX_OUTPUT_QUEUES		CN23XX_MAX_RINGS_PER_PF
-#define CN23XX_MAX_OQ_DESCRIPTORS		512
-#define CN23XX_MIN_OQ_DESCRIPTORS		128
-#define CN23XX_OQ_BUF_SIZE			1536
-
-#define CN23XX_OQ_REFIL_THRESHOLD		16
-
-#define CN23XX_DEFAULT_NUM_PORTS		1
-
-#define CN23XX_CFG_IO_QUEUES			CN23XX_MAX_RINGS_PER_PF
-
-/* common OCTEON configuration macros */
-#define OCTEON_64BYTE_INSTR			64
-#define OCTEON_OQ_INFOPTR_MODE			1
-
-/* Max IOQs per LIO Link */
-#define LIO_MAX_IOQS_PER_IF			64
-
-/* Wait time in milliseconds for FLR */
-#define LIO_PCI_FLR_WAIT			100
-
-enum lio_card_type {
-	LIO_23XX /* 23xx */
-};
-
-#define LIO_23XX_NAME "23xx"
-
-#define LIO_DEV_RUNNING		0xc
-
-#define LIO_OQ_REFILL_THRESHOLD_CFG(cfg)				\
-		((cfg)->default_config->oq.refill_threshold)
-#define LIO_NUM_DEF_TX_DESCS_CFG(cfg)					\
-		((cfg)->default_config->num_def_tx_descs)
-
-#define LIO_IQ_INSTR_TYPE(cfg)		((cfg)->default_config->iq.instr_type)
-
-/* The following config values are fixed and should not be modified. */
-
-/* Maximum number of Instruction queues */
-#define LIO_MAX_INSTR_QUEUES(lio_dev)		CN23XX_MAX_RINGS_PER_VF
-
-#define LIO_MAX_POSSIBLE_INSTR_QUEUES		CN23XX_MAX_INPUT_QUEUES
-#define LIO_MAX_POSSIBLE_OUTPUT_QUEUES		CN23XX_MAX_OUTPUT_QUEUES
-
-#define LIO_DEVICE_NAME_LEN		32
-#define LIO_BASE_MAJOR_VERSION		1
-#define LIO_BASE_MINOR_VERSION		5
-#define LIO_BASE_MICRO_VERSION		1
-
-#define LIO_FW_VERSION_LENGTH		32
-
-#define LIO_Q_RECONF_MIN_VERSION	"1.7.0"
-#define LIO_VF_TRUST_MIN_VERSION	"1.7.1"
-
-/** Tag types used by Octeon cores in its work. */
-enum octeon_tag_type {
-	OCTEON_ORDERED_TAG	= 0,
-	OCTEON_ATOMIC_TAG	= 1,
-};
-
-/* pre-defined host->NIC tag values */
-#define LIO_CONTROL	(0x11111110)
-#define LIO_DATA(i)	(0x11111111 + (i))
-
-/* used for NIC operations */
-#define LIO_OPCODE	1
-
-/* Subcodes are used by host driver/apps to identify the sub-operation
- * for the core. They only need to by unique for a given subsystem.
- */
-#define LIO_OPCODE_SUBCODE(op, sub)		\
-		((((op) & 0x0f) << 8) | ((sub) & 0x7f))
-
-/** LIO_OPCODE subcodes */
-/* This subcode is sent by core PCI driver to indicate cores are ready. */
-#define LIO_OPCODE_NW_DATA		0x02 /* network packet data */
-#define LIO_OPCODE_CMD			0x03
-#define LIO_OPCODE_INFO			0x04
-#define LIO_OPCODE_PORT_STATS		0x05
-#define LIO_OPCODE_IF_CFG		0x09
-
-#define LIO_MIN_RX_BUF_SIZE		64
-#define LIO_MAX_RX_PKTLEN		(64 * 1024)
-
-/* NIC Command types */
-#define LIO_CMD_CHANGE_MTU		0x1
-#define LIO_CMD_CHANGE_DEVFLAGS		0x3
-#define LIO_CMD_RX_CTL			0x4
-#define LIO_CMD_CLEAR_STATS		0x6
-#define LIO_CMD_SET_RSS			0xD
-#define LIO_CMD_TNL_RX_CSUM_CTL		0x10
-#define LIO_CMD_TNL_TX_CSUM_CTL		0x11
-#define LIO_CMD_ADD_VLAN_FILTER		0x17
-#define LIO_CMD_DEL_VLAN_FILTER		0x18
-#define LIO_CMD_VXLAN_PORT_CONFIG	0x19
-#define LIO_CMD_QUEUE_COUNT_CTL		0x1f
-
-#define LIO_CMD_VXLAN_PORT_ADD		0x0
-#define LIO_CMD_VXLAN_PORT_DEL		0x1
-#define LIO_CMD_RXCSUM_ENABLE		0x0
-#define LIO_CMD_TXCSUM_ENABLE		0x0
-
-/* RX(packets coming from wire) Checksum verification flags */
-/* TCP/UDP csum */
-#define LIO_L4_CSUM_VERIFIED		0x1
-#define LIO_IP_CSUM_VERIFIED		0x2
-
-/* RSS */
-#define LIO_RSS_PARAM_DISABLE_RSS		0x10
-#define LIO_RSS_PARAM_HASH_KEY_UNCHANGED	0x08
-#define LIO_RSS_PARAM_ITABLE_UNCHANGED		0x04
-#define LIO_RSS_PARAM_HASH_INFO_UNCHANGED	0x02
-
-#define LIO_RSS_HASH_IPV4			0x100
-#define LIO_RSS_HASH_TCP_IPV4			0x200
-#define LIO_RSS_HASH_IPV6			0x400
-#define LIO_RSS_HASH_TCP_IPV6			0x1000
-#define LIO_RSS_HASH_IPV6_EX			0x800
-#define LIO_RSS_HASH_TCP_IPV6_EX		0x2000
-
-#define LIO_RSS_OFFLOAD_ALL (		\
-		LIO_RSS_HASH_IPV4 |	\
-		LIO_RSS_HASH_TCP_IPV4 |	\
-		LIO_RSS_HASH_IPV6 |	\
-		LIO_RSS_HASH_TCP_IPV6 |	\
-		LIO_RSS_HASH_IPV6_EX |	\
-		LIO_RSS_HASH_TCP_IPV6_EX)
-
-#define LIO_RSS_MAX_TABLE_SZ		128
-#define LIO_RSS_MAX_KEY_SZ		40
-#define LIO_RSS_PARAM_SIZE		16
-
-/* Interface flags communicated between host driver and core app. */
-enum lio_ifflags {
-	LIO_IFFLAG_PROMISC	= 0x01,
-	LIO_IFFLAG_ALLMULTI	= 0x02,
-	LIO_IFFLAG_UNICAST	= 0x10
-};
-
-/* Routines for reading and writing CSRs */
-#ifdef RTE_LIBRTE_LIO_DEBUG_REGS
-#define lio_write_csr(lio_dev, reg_off, value)				\
-	do {								\
-		typeof(lio_dev) _dev = lio_dev;				\
-		typeof(reg_off) _reg_off = reg_off;			\
-		typeof(value) _value = value;				\
-		PMD_REGS_LOG(_dev,					\
-			     "Write32: Reg: 0x%08lx Val: 0x%08lx\n",	\
-			     (unsigned long)_reg_off,			\
-			     (unsigned long)_value);			\
-		rte_write32(_value, _dev->hw_addr + _reg_off);		\
-	} while (0)
-
-#define lio_write_csr64(lio_dev, reg_off, val64)			\
-	do {								\
-		typeof(lio_dev) _dev = lio_dev;				\
-		typeof(reg_off) _reg_off = reg_off;			\
-		typeof(val64) _val64 = val64;				\
-		PMD_REGS_LOG(						\
-		    _dev,						\
-		    "Write64: Reg: 0x%08lx Val: 0x%016llx\n",		\
-		    (unsigned long)_reg_off,				\
-		    (unsigned long long)_val64);			\
-		rte_write64(_val64, _dev->hw_addr + _reg_off);		\
-	} while (0)
-
-#define lio_read_csr(lio_dev, reg_off)					\
-	({								\
-		typeof(lio_dev) _dev = lio_dev;				\
-		typeof(reg_off) _reg_off = reg_off;			\
-		uint32_t val = rte_read32(_dev->hw_addr + _reg_off);	\
-		PMD_REGS_LOG(_dev,					\
-			     "Read32: Reg: 0x%08lx Val: 0x%08lx\n",	\
-			     (unsigned long)_reg_off,			\
-			     (unsigned long)val);			\
-		val;							\
-	})
-
-#define lio_read_csr64(lio_dev, reg_off)				\
-	({								\
-		typeof(lio_dev) _dev = lio_dev;				\
-		typeof(reg_off) _reg_off = reg_off;			\
-		uint64_t val64 = rte_read64(_dev->hw_addr + _reg_off);	\
-		PMD_REGS_LOG(						\
-		    _dev,						\
-		    "Read64: Reg: 0x%08lx Val: 0x%016llx\n",		\
-		    (unsigned long)_reg_off,				\
-		    (unsigned long long)val64);				\
-		val64;							\
-	})
-#else
-#define lio_write_csr(lio_dev, reg_off, value)				\
-	rte_write32(value, (lio_dev)->hw_addr + (reg_off))
-
-#define lio_write_csr64(lio_dev, reg_off, val64)			\
-	rte_write64(val64, (lio_dev)->hw_addr + (reg_off))
-
-#define lio_read_csr(lio_dev, reg_off)					\
-	rte_read32((lio_dev)->hw_addr + (reg_off))
-
-#define lio_read_csr64(lio_dev, reg_off)				\
-	rte_read64((lio_dev)->hw_addr + (reg_off))
-#endif
-#endif /* _LIO_HW_DEFS_H_ */
diff --git a/drivers/net/liquidio/base/lio_mbox.c b/drivers/net/liquidio/base/lio_mbox.c
deleted file mode 100644
index 2ac2b1b334..0000000000
--- a/drivers/net/liquidio/base/lio_mbox.c
+++ /dev/null
@@ -1,246 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#include <ethdev_driver.h>
-#include <rte_cycles.h>
-
-#include "lio_logs.h"
-#include "lio_struct.h"
-#include "lio_mbox.h"
-
-/**
- * lio_mbox_read:
- * @mbox: Pointer mailbox
- *
- * Reads the 8-bytes of data from the mbox register
- * Writes back the acknowledgment indicating completion of read
- */
-int
-lio_mbox_read(struct lio_mbox *mbox)
-{
-	union lio_mbox_message msg;
-	int ret = 0;
-
-	msg.mbox_msg64 = rte_read64(mbox->mbox_read_reg);
-
-	if ((msg.mbox_msg64 == LIO_PFVFACK) || (msg.mbox_msg64 == LIO_PFVFSIG))
-		return 0;
-
-	if (mbox->state & LIO_MBOX_STATE_REQ_RECEIVING) {
-		mbox->mbox_req.data[mbox->mbox_req.recv_len - 1] =
-					msg.mbox_msg64;
-		mbox->mbox_req.recv_len++;
-	} else {
-		if (mbox->state & LIO_MBOX_STATE_RES_RECEIVING) {
-			mbox->mbox_resp.data[mbox->mbox_resp.recv_len - 1] =
-					msg.mbox_msg64;
-			mbox->mbox_resp.recv_len++;
-		} else {
-			if ((mbox->state & LIO_MBOX_STATE_IDLE) &&
-					(msg.s.type == LIO_MBOX_REQUEST)) {
-				mbox->state &= ~LIO_MBOX_STATE_IDLE;
-				mbox->state |= LIO_MBOX_STATE_REQ_RECEIVING;
-				mbox->mbox_req.msg.mbox_msg64 = msg.mbox_msg64;
-				mbox->mbox_req.q_no = mbox->q_no;
-				mbox->mbox_req.recv_len = 1;
-			} else {
-				if ((mbox->state &
-				     LIO_MBOX_STATE_RES_PENDING) &&
-				    (msg.s.type == LIO_MBOX_RESPONSE)) {
-					mbox->state &=
-						~LIO_MBOX_STATE_RES_PENDING;
-					mbox->state |=
-						LIO_MBOX_STATE_RES_RECEIVING;
-					mbox->mbox_resp.msg.mbox_msg64 =
-								msg.mbox_msg64;
-					mbox->mbox_resp.q_no = mbox->q_no;
-					mbox->mbox_resp.recv_len = 1;
-				} else {
-					rte_write64(LIO_PFVFERR,
-						    mbox->mbox_read_reg);
-					mbox->state |= LIO_MBOX_STATE_ERROR;
-					return -1;
-				}
-			}
-		}
-	}
-
-	if (mbox->state & LIO_MBOX_STATE_REQ_RECEIVING) {
-		if (mbox->mbox_req.recv_len < msg.s.len) {
-			ret = 0;
-		} else {
-			mbox->state &= ~LIO_MBOX_STATE_REQ_RECEIVING;
-			mbox->state |= LIO_MBOX_STATE_REQ_RECEIVED;
-			ret = 1;
-		}
-	} else {
-		if (mbox->state & LIO_MBOX_STATE_RES_RECEIVING) {
-			if (mbox->mbox_resp.recv_len < msg.s.len) {
-				ret = 0;
-			} else {
-				mbox->state &= ~LIO_MBOX_STATE_RES_RECEIVING;
-				mbox->state |= LIO_MBOX_STATE_RES_RECEIVED;
-				ret = 1;
-			}
-		} else {
-			RTE_ASSERT(0);
-		}
-	}
-
-	rte_write64(LIO_PFVFACK, mbox->mbox_read_reg);
-
-	return ret;
-}
-
-/**
- * lio_mbox_write:
- * @lio_dev: Pointer lio device
- * @mbox_cmd: Cmd to send to mailbox.
- *
- * Populates the queue specific mbox structure
- * with cmd information.
- * Write the cmd to mbox register
- */
-int
-lio_mbox_write(struct lio_device *lio_dev,
-	       struct lio_mbox_cmd *mbox_cmd)
-{
-	struct lio_mbox *mbox = lio_dev->mbox[mbox_cmd->q_no];
-	uint32_t count, i, ret = LIO_MBOX_STATUS_SUCCESS;
-
-	if ((mbox_cmd->msg.s.type == LIO_MBOX_RESPONSE) &&
-			!(mbox->state & LIO_MBOX_STATE_REQ_RECEIVED))
-		return LIO_MBOX_STATUS_FAILED;
-
-	if ((mbox_cmd->msg.s.type == LIO_MBOX_REQUEST) &&
-			!(mbox->state & LIO_MBOX_STATE_IDLE))
-		return LIO_MBOX_STATUS_BUSY;
-
-	if (mbox_cmd->msg.s.type == LIO_MBOX_REQUEST) {
-		rte_memcpy(&mbox->mbox_resp, mbox_cmd,
-			   sizeof(struct lio_mbox_cmd));
-		mbox->state = LIO_MBOX_STATE_RES_PENDING;
-	}
-
-	count = 0;
-
-	while (rte_read64(mbox->mbox_write_reg) != LIO_PFVFSIG) {
-		rte_delay_ms(1);
-		if (count++ == 1000) {
-			ret = LIO_MBOX_STATUS_FAILED;
-			break;
-		}
-	}
-
-	if (ret == LIO_MBOX_STATUS_SUCCESS) {
-		rte_write64(mbox_cmd->msg.mbox_msg64, mbox->mbox_write_reg);
-		for (i = 0; i < (uint32_t)(mbox_cmd->msg.s.len - 1); i++) {
-			count = 0;
-			while (rte_read64(mbox->mbox_write_reg) !=
-					LIO_PFVFACK) {
-				rte_delay_ms(1);
-				if (count++ == 1000) {
-					ret = LIO_MBOX_STATUS_FAILED;
-					break;
-				}
-			}
-			rte_write64(mbox_cmd->data[i], mbox->mbox_write_reg);
-		}
-	}
-
-	if (mbox_cmd->msg.s.type == LIO_MBOX_RESPONSE) {
-		mbox->state = LIO_MBOX_STATE_IDLE;
-		rte_write64(LIO_PFVFSIG, mbox->mbox_read_reg);
-	} else {
-		if ((!mbox_cmd->msg.s.resp_needed) ||
-				(ret == LIO_MBOX_STATUS_FAILED)) {
-			mbox->state &= ~LIO_MBOX_STATE_RES_PENDING;
-			if (!(mbox->state & (LIO_MBOX_STATE_REQ_RECEIVING |
-					     LIO_MBOX_STATE_REQ_RECEIVED)))
-				mbox->state = LIO_MBOX_STATE_IDLE;
-		}
-	}
-
-	return ret;
-}
-
-/**
- * lio_mbox_process_cmd:
- * @mbox: Pointer mailbox
- * @mbox_cmd: Pointer to command received
- *
- * Process the cmd received in mbox
- */
-static int
-lio_mbox_process_cmd(struct lio_mbox *mbox,
-		     struct lio_mbox_cmd *mbox_cmd)
-{
-	struct lio_device *lio_dev = mbox->lio_dev;
-
-	if (mbox_cmd->msg.s.cmd == LIO_CORES_CRASHED)
-		lio_dev_err(lio_dev, "Octeon core(s) crashed or got stuck!\n");
-
-	return 0;
-}
-
-/**
- * Process the received mbox message.
- */
-int
-lio_mbox_process_message(struct lio_mbox *mbox)
-{
-	struct lio_mbox_cmd mbox_cmd;
-
-	if (mbox->state & LIO_MBOX_STATE_ERROR) {
-		if (mbox->state & (LIO_MBOX_STATE_RES_PENDING |
-				   LIO_MBOX_STATE_RES_RECEIVING)) {
-			rte_memcpy(&mbox_cmd, &mbox->mbox_resp,
-				   sizeof(struct lio_mbox_cmd));
-			mbox->state = LIO_MBOX_STATE_IDLE;
-			rte_write64(LIO_PFVFSIG, mbox->mbox_read_reg);
-			mbox_cmd.recv_status = 1;
-			if (mbox_cmd.fn)
-				mbox_cmd.fn(mbox->lio_dev, &mbox_cmd,
-					    mbox_cmd.fn_arg);
-
-			return 0;
-		}
-
-		mbox->state = LIO_MBOX_STATE_IDLE;
-		rte_write64(LIO_PFVFSIG, mbox->mbox_read_reg);
-
-		return 0;
-	}
-
-	if (mbox->state & LIO_MBOX_STATE_RES_RECEIVED) {
-		rte_memcpy(&mbox_cmd, &mbox->mbox_resp,
-			   sizeof(struct lio_mbox_cmd));
-		mbox->state = LIO_MBOX_STATE_IDLE;
-		rte_write64(LIO_PFVFSIG, mbox->mbox_read_reg);
-		mbox_cmd.recv_status = 0;
-		if (mbox_cmd.fn)
-			mbox_cmd.fn(mbox->lio_dev, &mbox_cmd, mbox_cmd.fn_arg);
-
-		return 0;
-	}
-
-	if (mbox->state & LIO_MBOX_STATE_REQ_RECEIVED) {
-		rte_memcpy(&mbox_cmd, &mbox->mbox_req,
-			   sizeof(struct lio_mbox_cmd));
-		if (!mbox_cmd.msg.s.resp_needed) {
-			mbox->state &= ~LIO_MBOX_STATE_REQ_RECEIVED;
-			if (!(mbox->state & LIO_MBOX_STATE_RES_PENDING))
-				mbox->state = LIO_MBOX_STATE_IDLE;
-			rte_write64(LIO_PFVFSIG, mbox->mbox_read_reg);
-		}
-
-		lio_mbox_process_cmd(mbox, &mbox_cmd);
-
-		return 0;
-	}
-
-	RTE_ASSERT(0);
-
-	return 0;
-}
diff --git a/drivers/net/liquidio/base/lio_mbox.h b/drivers/net/liquidio/base/lio_mbox.h
deleted file mode 100644
index 457917e91f..0000000000
--- a/drivers/net/liquidio/base/lio_mbox.h
+++ /dev/null
@@ -1,102 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#ifndef _LIO_MBOX_H_
-#define _LIO_MBOX_H_
-
-#include <stdint.h>
-
-#include <rte_spinlock.h>
-
-/* Macros for Mail Box Communication */
-
-#define LIO_MBOX_DATA_MAX			32
-
-#define LIO_VF_ACTIVE				0x1
-#define LIO_VF_FLR_REQUEST			0x2
-#define LIO_CORES_CRASHED			0x3
-
-/* Macro for Read acknowledgment */
-#define LIO_PFVFACK				0xffffffffffffffff
-#define LIO_PFVFSIG				0x1122334455667788
-#define LIO_PFVFERR				0xDEADDEADDEADDEAD
-
-enum lio_mbox_cmd_status {
-	LIO_MBOX_STATUS_SUCCESS		= 0,
-	LIO_MBOX_STATUS_FAILED		= 1,
-	LIO_MBOX_STATUS_BUSY		= 2
-};
-
-enum lio_mbox_message_type {
-	LIO_MBOX_REQUEST	= 0,
-	LIO_MBOX_RESPONSE	= 1
-};
-
-union lio_mbox_message {
-	uint64_t mbox_msg64;
-	struct {
-		uint16_t type : 1;
-		uint16_t resp_needed : 1;
-		uint16_t cmd : 6;
-		uint16_t len : 8;
-		uint8_t params[6];
-	} s;
-};
-
-typedef void (*lio_mbox_callback)(void *, void *, void *);
-
-struct lio_mbox_cmd {
-	union lio_mbox_message msg;
-	uint64_t data[LIO_MBOX_DATA_MAX];
-	uint32_t q_no;
-	uint32_t recv_len;
-	uint32_t recv_status;
-	lio_mbox_callback fn;
-	void *fn_arg;
-};
-
-enum lio_mbox_state {
-	LIO_MBOX_STATE_IDLE		= 1,
-	LIO_MBOX_STATE_REQ_RECEIVING	= 2,
-	LIO_MBOX_STATE_REQ_RECEIVED	= 4,
-	LIO_MBOX_STATE_RES_PENDING	= 8,
-	LIO_MBOX_STATE_RES_RECEIVING	= 16,
-	LIO_MBOX_STATE_RES_RECEIVED	= 16,
-	LIO_MBOX_STATE_ERROR		= 32
-};
-
-struct lio_mbox {
-	/* A spinlock to protect access to this q_mbox. */
-	rte_spinlock_t lock;
-
-	struct lio_device *lio_dev;
-
-	uint32_t q_no;
-
-	enum lio_mbox_state state;
-
-	/* SLI_MAC_PF_MBOX_INT for PF, SLI_PKT_MBOX_INT for VF. */
-	void *mbox_int_reg;
-
-	/* SLI_PKT_PF_VF_MBOX_SIG(0) for PF,
-	 * SLI_PKT_PF_VF_MBOX_SIG(1) for VF.
-	 */
-	void *mbox_write_reg;
-
-	/* SLI_PKT_PF_VF_MBOX_SIG(1) for PF,
-	 * SLI_PKT_PF_VF_MBOX_SIG(0) for VF.
-	 */
-	void *mbox_read_reg;
-
-	struct lio_mbox_cmd mbox_req;
-
-	struct lio_mbox_cmd mbox_resp;
-
-};
-
-int lio_mbox_read(struct lio_mbox *mbox);
-int lio_mbox_write(struct lio_device *lio_dev,
-		   struct lio_mbox_cmd *mbox_cmd);
-int lio_mbox_process_message(struct lio_mbox *mbox);
-#endif	/* _LIO_MBOX_H_ */
diff --git a/drivers/net/liquidio/lio_ethdev.c b/drivers/net/liquidio/lio_ethdev.c
deleted file mode 100644
index ebcfbb1a5c..0000000000
--- a/drivers/net/liquidio/lio_ethdev.c
+++ /dev/null
@@ -1,2147 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#include <rte_string_fns.h>
-#include <ethdev_driver.h>
-#include <ethdev_pci.h>
-#include <rte_cycles.h>
-#include <rte_malloc.h>
-#include <rte_alarm.h>
-#include <rte_ether.h>
-
-#include "lio_logs.h"
-#include "lio_23xx_vf.h"
-#include "lio_ethdev.h"
-#include "lio_rxtx.h"
-
-/* Default RSS key in use */
-static uint8_t lio_rss_key[40] = {
-	0x6D, 0x5A, 0x56, 0xDA, 0x25, 0x5B, 0x0E, 0xC2,
-	0x41, 0x67, 0x25, 0x3D, 0x43, 0xA3, 0x8F, 0xB0,
-	0xD0, 0xCA, 0x2B, 0xCB, 0xAE, 0x7B, 0x30, 0xB4,
-	0x77, 0xCB, 0x2D, 0xA3, 0x80, 0x30, 0xF2, 0x0C,
-	0x6A, 0x42, 0xB7, 0x3B, 0xBE, 0xAC, 0x01, 0xFA,
-};
-
-static const struct rte_eth_desc_lim lio_rx_desc_lim = {
-	.nb_max		= CN23XX_MAX_OQ_DESCRIPTORS,
-	.nb_min		= CN23XX_MIN_OQ_DESCRIPTORS,
-	.nb_align	= 1,
-};
-
-static const struct rte_eth_desc_lim lio_tx_desc_lim = {
-	.nb_max		= CN23XX_MAX_IQ_DESCRIPTORS,
-	.nb_min		= CN23XX_MIN_IQ_DESCRIPTORS,
-	.nb_align	= 1,
-};
-
-/* Wait for control command to reach nic. */
-static uint16_t
-lio_wait_for_ctrl_cmd(struct lio_device *lio_dev,
-		      struct lio_dev_ctrl_cmd *ctrl_cmd)
-{
-	uint16_t timeout = LIO_MAX_CMD_TIMEOUT;
-
-	while ((ctrl_cmd->cond == 0) && --timeout) {
-		lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-		rte_delay_ms(1);
-	}
-
-	return !timeout;
-}
-
-/**
- * \brief Send Rx control command
- * @param eth_dev Pointer to the structure rte_eth_dev
- * @param start_stop whether to start or stop
- */
-static int
-lio_send_rx_ctrl_cmd(struct rte_eth_dev *eth_dev, int start_stop)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_RX_CTL;
-	ctrl_pkt.ncmd.s.param1 = start_stop;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to send RX Control message\n");
-		return -1;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "RX Control command timed out\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-/* store statistics names and its offset in stats structure */
-struct rte_lio_xstats_name_off {
-	char name[RTE_ETH_XSTATS_NAME_SIZE];
-	unsigned int offset;
-};
-
-static const struct rte_lio_xstats_name_off rte_lio_stats_strings[] = {
-	{"rx_pkts", offsetof(struct octeon_rx_stats, total_rcvd)},
-	{"rx_bytes", offsetof(struct octeon_rx_stats, bytes_rcvd)},
-	{"rx_broadcast_pkts", offsetof(struct octeon_rx_stats, total_bcst)},
-	{"rx_multicast_pkts", offsetof(struct octeon_rx_stats, total_mcst)},
-	{"rx_flow_ctrl_pkts", offsetof(struct octeon_rx_stats, ctl_rcvd)},
-	{"rx_fifo_err", offsetof(struct octeon_rx_stats, fifo_err)},
-	{"rx_dmac_drop", offsetof(struct octeon_rx_stats, dmac_drop)},
-	{"rx_fcs_err", offsetof(struct octeon_rx_stats, fcs_err)},
-	{"rx_jabber_err", offsetof(struct octeon_rx_stats, jabber_err)},
-	{"rx_l2_err", offsetof(struct octeon_rx_stats, l2_err)},
-	{"rx_vxlan_pkts", offsetof(struct octeon_rx_stats, fw_rx_vxlan)},
-	{"rx_vxlan_err", offsetof(struct octeon_rx_stats, fw_rx_vxlan_err)},
-	{"rx_lro_pkts", offsetof(struct octeon_rx_stats, fw_lro_pkts)},
-	{"tx_pkts", (offsetof(struct octeon_tx_stats, total_pkts_sent)) +
-						sizeof(struct octeon_rx_stats)},
-	{"tx_bytes", (offsetof(struct octeon_tx_stats, total_bytes_sent)) +
-						sizeof(struct octeon_rx_stats)},
-	{"tx_broadcast_pkts",
-		(offsetof(struct octeon_tx_stats, bcast_pkts_sent)) +
-			sizeof(struct octeon_rx_stats)},
-	{"tx_multicast_pkts",
-		(offsetof(struct octeon_tx_stats, mcast_pkts_sent)) +
-			sizeof(struct octeon_rx_stats)},
-	{"tx_flow_ctrl_pkts", (offsetof(struct octeon_tx_stats, ctl_sent)) +
-						sizeof(struct octeon_rx_stats)},
-	{"tx_fifo_err", (offsetof(struct octeon_tx_stats, fifo_err)) +
-						sizeof(struct octeon_rx_stats)},
-	{"tx_total_collisions", (offsetof(struct octeon_tx_stats,
-					  total_collisions)) +
-						sizeof(struct octeon_rx_stats)},
-	{"tx_tso", (offsetof(struct octeon_tx_stats, fw_tso)) +
-						sizeof(struct octeon_rx_stats)},
-	{"tx_vxlan_pkts", (offsetof(struct octeon_tx_stats, fw_tx_vxlan)) +
-						sizeof(struct octeon_rx_stats)},
-};
-
-#define LIO_NB_XSTATS	RTE_DIM(rte_lio_stats_strings)
-
-/* Get hw stats of the port */
-static int
-lio_dev_xstats_get(struct rte_eth_dev *eth_dev, struct rte_eth_xstat *xstats,
-		   unsigned int n)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	uint16_t timeout = LIO_MAX_CMD_TIMEOUT;
-	struct octeon_link_stats *hw_stats;
-	struct lio_link_stats_resp *resp;
-	struct lio_soft_command *sc;
-	uint32_t resp_size;
-	unsigned int i;
-	int retval;
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down\n",
-			    lio_dev->port_id);
-		return -EINVAL;
-	}
-
-	if (n < LIO_NB_XSTATS)
-		return LIO_NB_XSTATS;
-
-	resp_size = sizeof(struct lio_link_stats_resp);
-	sc = lio_alloc_soft_command(lio_dev, 0, resp_size, 0);
-	if (sc == NULL)
-		return -ENOMEM;
-
-	resp = (struct lio_link_stats_resp *)sc->virtrptr;
-	lio_prepare_soft_command(lio_dev, sc, LIO_OPCODE,
-				 LIO_OPCODE_PORT_STATS, 0, 0, 0);
-
-	/* Setting wait time in seconds */
-	sc->wait_time = LIO_MAX_CMD_TIMEOUT / 1000;
-
-	retval = lio_send_soft_command(lio_dev, sc);
-	if (retval == LIO_IQ_SEND_FAILED) {
-		lio_dev_err(lio_dev, "failed to get port stats from firmware. status: %x\n",
-			    retval);
-		goto get_stats_fail;
-	}
-
-	while ((*sc->status_word == LIO_COMPLETION_WORD_INIT) && --timeout) {
-		lio_flush_iq(lio_dev, lio_dev->instr_queue[sc->iq_no]);
-		lio_process_ordered_list(lio_dev);
-		rte_delay_ms(1);
-	}
-
-	retval = resp->status;
-	if (retval) {
-		lio_dev_err(lio_dev, "failed to get port stats from firmware\n");
-		goto get_stats_fail;
-	}
-
-	lio_swap_8B_data((uint64_t *)(&resp->link_stats),
-			 sizeof(struct octeon_link_stats) >> 3);
-
-	hw_stats = &resp->link_stats;
-
-	for (i = 0; i < LIO_NB_XSTATS; i++) {
-		xstats[i].id = i;
-		xstats[i].value =
-		    *(uint64_t *)(((char *)hw_stats) +
-					rte_lio_stats_strings[i].offset);
-	}
-
-	lio_free_soft_command(sc);
-
-	return LIO_NB_XSTATS;
-
-get_stats_fail:
-	lio_free_soft_command(sc);
-
-	return -1;
-}
-
-static int
-lio_dev_xstats_get_names(struct rte_eth_dev *eth_dev,
-			 struct rte_eth_xstat_name *xstats_names,
-			 unsigned limit __rte_unused)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	unsigned int i;
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down\n",
-			    lio_dev->port_id);
-		return -EINVAL;
-	}
-
-	if (xstats_names == NULL)
-		return LIO_NB_XSTATS;
-
-	/* Note: limit checked in rte_eth_xstats_names() */
-
-	for (i = 0; i < LIO_NB_XSTATS; i++) {
-		snprintf(xstats_names[i].name, sizeof(xstats_names[i].name),
-			 "%s", rte_lio_stats_strings[i].name);
-	}
-
-	return LIO_NB_XSTATS;
-}
-
-/* Reset hw stats for the port */
-static int
-lio_dev_xstats_reset(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-	int ret;
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down\n",
-			    lio_dev->port_id);
-		return -EINVAL;
-	}
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_CLEAR_STATS;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	ret = lio_send_ctrl_pkt(lio_dev, &ctrl_pkt);
-	if (ret != 0) {
-		lio_dev_err(lio_dev, "Failed to send clear stats command\n");
-		return ret;
-	}
-
-	ret = lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd);
-	if (ret != 0) {
-		lio_dev_err(lio_dev, "Clear stats command timed out\n");
-		return ret;
-	}
-
-	/* clear stored per queue stats */
-	if (*eth_dev->dev_ops->stats_reset == NULL)
-		return 0;
-	return (*eth_dev->dev_ops->stats_reset)(eth_dev);
-}
-
-/* Retrieve the device statistics (# packets in/out, # bytes in/out, etc */
-static int
-lio_dev_stats_get(struct rte_eth_dev *eth_dev,
-		  struct rte_eth_stats *stats)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_droq_stats *oq_stats;
-	struct lio_iq_stats *iq_stats;
-	struct lio_instr_queue *txq;
-	struct lio_droq *droq;
-	int i, iq_no, oq_no;
-	uint64_t bytes = 0;
-	uint64_t pkts = 0;
-	uint64_t drop = 0;
-
-	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-		iq_no = lio_dev->linfo.txpciq[i].s.q_no;
-		txq = lio_dev->instr_queue[iq_no];
-		if (txq != NULL) {
-			iq_stats = &txq->stats;
-			pkts += iq_stats->tx_done;
-			drop += iq_stats->tx_dropped;
-			bytes += iq_stats->tx_tot_bytes;
-		}
-	}
-
-	stats->opackets = pkts;
-	stats->obytes = bytes;
-	stats->oerrors = drop;
-
-	pkts = 0;
-	drop = 0;
-	bytes = 0;
-
-	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-		oq_no = lio_dev->linfo.rxpciq[i].s.q_no;
-		droq = lio_dev->droq[oq_no];
-		if (droq != NULL) {
-			oq_stats = &droq->stats;
-			pkts += oq_stats->rx_pkts_received;
-			drop += (oq_stats->rx_dropped +
-					oq_stats->dropped_toomany +
-					oq_stats->dropped_nomem);
-			bytes += oq_stats->rx_bytes_received;
-		}
-	}
-	stats->ibytes = bytes;
-	stats->ipackets = pkts;
-	stats->ierrors = drop;
-
-	return 0;
-}
-
-static int
-lio_dev_stats_reset(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_droq_stats *oq_stats;
-	struct lio_iq_stats *iq_stats;
-	struct lio_instr_queue *txq;
-	struct lio_droq *droq;
-	int i, iq_no, oq_no;
-
-	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-		iq_no = lio_dev->linfo.txpciq[i].s.q_no;
-		txq = lio_dev->instr_queue[iq_no];
-		if (txq != NULL) {
-			iq_stats = &txq->stats;
-			memset(iq_stats, 0, sizeof(struct lio_iq_stats));
-		}
-	}
-
-	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-		oq_no = lio_dev->linfo.rxpciq[i].s.q_no;
-		droq = lio_dev->droq[oq_no];
-		if (droq != NULL) {
-			oq_stats = &droq->stats;
-			memset(oq_stats, 0, sizeof(struct lio_droq_stats));
-		}
-	}
-
-	return 0;
-}
-
-static int
-lio_dev_info_get(struct rte_eth_dev *eth_dev,
-		 struct rte_eth_dev_info *devinfo)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
-
-	switch (pci_dev->id.subsystem_device_id) {
-	/* CN23xx 10G cards */
-	case PCI_SUBSYS_DEV_ID_CN2350_210:
-	case PCI_SUBSYS_DEV_ID_CN2360_210:
-	case PCI_SUBSYS_DEV_ID_CN2350_210SVPN3:
-	case PCI_SUBSYS_DEV_ID_CN2360_210SVPN3:
-	case PCI_SUBSYS_DEV_ID_CN2350_210SVPT:
-	case PCI_SUBSYS_DEV_ID_CN2360_210SVPT:
-		devinfo->speed_capa = RTE_ETH_LINK_SPEED_10G;
-		break;
-	/* CN23xx 25G cards */
-	case PCI_SUBSYS_DEV_ID_CN2350_225:
-	case PCI_SUBSYS_DEV_ID_CN2360_225:
-		devinfo->speed_capa = RTE_ETH_LINK_SPEED_25G;
-		break;
-	default:
-		devinfo->speed_capa = RTE_ETH_LINK_SPEED_10G;
-		lio_dev_err(lio_dev,
-			    "Unknown CN23XX subsystem device id. Setting 10G as default link speed.\n");
-		return -EINVAL;
-	}
-
-	devinfo->max_rx_queues = lio_dev->max_rx_queues;
-	devinfo->max_tx_queues = lio_dev->max_tx_queues;
-
-	devinfo->min_rx_bufsize = LIO_MIN_RX_BUF_SIZE;
-	devinfo->max_rx_pktlen = LIO_MAX_RX_PKTLEN;
-
-	devinfo->max_mac_addrs = 1;
-
-	devinfo->rx_offload_capa = (RTE_ETH_RX_OFFLOAD_IPV4_CKSUM		|
-				    RTE_ETH_RX_OFFLOAD_UDP_CKSUM		|
-				    RTE_ETH_RX_OFFLOAD_TCP_CKSUM		|
-				    RTE_ETH_RX_OFFLOAD_VLAN_STRIP		|
-				    RTE_ETH_RX_OFFLOAD_RSS_HASH);
-	devinfo->tx_offload_capa = (RTE_ETH_TX_OFFLOAD_IPV4_CKSUM		|
-				    RTE_ETH_TX_OFFLOAD_UDP_CKSUM		|
-				    RTE_ETH_TX_OFFLOAD_TCP_CKSUM		|
-				    RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM);
-
-	devinfo->rx_desc_lim = lio_rx_desc_lim;
-	devinfo->tx_desc_lim = lio_tx_desc_lim;
-
-	devinfo->reta_size = LIO_RSS_MAX_TABLE_SZ;
-	devinfo->hash_key_size = LIO_RSS_MAX_KEY_SZ;
-	devinfo->flow_type_rss_offloads = (RTE_ETH_RSS_IPV4			|
-					   RTE_ETH_RSS_NONFRAG_IPV4_TCP	|
-					   RTE_ETH_RSS_IPV6			|
-					   RTE_ETH_RSS_NONFRAG_IPV6_TCP	|
-					   RTE_ETH_RSS_IPV6_EX		|
-					   RTE_ETH_RSS_IPV6_TCP_EX);
-	return 0;
-}
-
-static int
-lio_dev_mtu_set(struct rte_eth_dev *eth_dev, uint16_t mtu)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	PMD_INIT_FUNC_TRACE();
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down, can't set MTU\n",
-			    lio_dev->port_id);
-		return -EINVAL;
-	}
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_CHANGE_MTU;
-	ctrl_pkt.ncmd.s.param1 = mtu;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to send command to change MTU\n");
-		return -1;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "Command to change MTU timed out\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-lio_dev_rss_reta_update(struct rte_eth_dev *eth_dev,
-			struct rte_eth_rss_reta_entry64 *reta_conf,
-			uint16_t reta_size)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_rss_ctx *rss_state = &lio_dev->rss_state;
-	struct lio_rss_set *rss_param;
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-	int i, j, index;
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down, can't update reta\n",
-			    lio_dev->port_id);
-		return -EINVAL;
-	}
-
-	if (reta_size != LIO_RSS_MAX_TABLE_SZ) {
-		lio_dev_err(lio_dev,
-			    "The size of hash lookup table configured (%d) doesn't match the number hardware can supported (%d)\n",
-			    reta_size, LIO_RSS_MAX_TABLE_SZ);
-		return -EINVAL;
-	}
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	rss_param = (struct lio_rss_set *)&ctrl_pkt.udd[0];
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_SET_RSS;
-	ctrl_pkt.ncmd.s.more = sizeof(struct lio_rss_set) >> 3;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	rss_param->param.flags = 0xF;
-	rss_param->param.flags &= ~LIO_RSS_PARAM_ITABLE_UNCHANGED;
-	rss_param->param.itablesize = LIO_RSS_MAX_TABLE_SZ;
-
-	for (i = 0; i < (reta_size / RTE_ETH_RETA_GROUP_SIZE); i++) {
-		for (j = 0; j < RTE_ETH_RETA_GROUP_SIZE; j++) {
-			if ((reta_conf[i].mask) & ((uint64_t)1 << j)) {
-				index = (i * RTE_ETH_RETA_GROUP_SIZE) + j;
-				rss_state->itable[index] = reta_conf[i].reta[j];
-			}
-		}
-	}
-
-	rss_state->itable_size = LIO_RSS_MAX_TABLE_SZ;
-	memcpy(rss_param->itable, rss_state->itable, rss_state->itable_size);
-
-	lio_swap_8B_data((uint64_t *)rss_param, LIO_RSS_PARAM_SIZE >> 3);
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to set rss hash\n");
-		return -1;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "Set rss hash timed out\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-lio_dev_rss_reta_query(struct rte_eth_dev *eth_dev,
-		       struct rte_eth_rss_reta_entry64 *reta_conf,
-		       uint16_t reta_size)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_rss_ctx *rss_state = &lio_dev->rss_state;
-	int i, num;
-
-	if (reta_size != LIO_RSS_MAX_TABLE_SZ) {
-		lio_dev_err(lio_dev,
-			    "The size of hash lookup table configured (%d) doesn't match the number hardware can supported (%d)\n",
-			    reta_size, LIO_RSS_MAX_TABLE_SZ);
-		return -EINVAL;
-	}
-
-	num = reta_size / RTE_ETH_RETA_GROUP_SIZE;
-
-	for (i = 0; i < num; i++) {
-		memcpy(reta_conf->reta,
-		       &rss_state->itable[i * RTE_ETH_RETA_GROUP_SIZE],
-		       RTE_ETH_RETA_GROUP_SIZE);
-		reta_conf++;
-	}
-
-	return 0;
-}
-
-static int
-lio_dev_rss_hash_conf_get(struct rte_eth_dev *eth_dev,
-			  struct rte_eth_rss_conf *rss_conf)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_rss_ctx *rss_state = &lio_dev->rss_state;
-	uint8_t *hash_key = NULL;
-	uint64_t rss_hf = 0;
-
-	if (rss_state->hash_disable) {
-		lio_dev_info(lio_dev, "RSS disabled in nic\n");
-		rss_conf->rss_hf = 0;
-		return 0;
-	}
-
-	/* Get key value */
-	hash_key = rss_conf->rss_key;
-	if (hash_key != NULL)
-		memcpy(hash_key, rss_state->hash_key, rss_state->hash_key_size);
-
-	if (rss_state->ip)
-		rss_hf |= RTE_ETH_RSS_IPV4;
-	if (rss_state->tcp_hash)
-		rss_hf |= RTE_ETH_RSS_NONFRAG_IPV4_TCP;
-	if (rss_state->ipv6)
-		rss_hf |= RTE_ETH_RSS_IPV6;
-	if (rss_state->ipv6_tcp_hash)
-		rss_hf |= RTE_ETH_RSS_NONFRAG_IPV6_TCP;
-	if (rss_state->ipv6_ex)
-		rss_hf |= RTE_ETH_RSS_IPV6_EX;
-	if (rss_state->ipv6_tcp_ex_hash)
-		rss_hf |= RTE_ETH_RSS_IPV6_TCP_EX;
-
-	rss_conf->rss_hf = rss_hf;
-
-	return 0;
-}
-
-static int
-lio_dev_rss_hash_update(struct rte_eth_dev *eth_dev,
-			struct rte_eth_rss_conf *rss_conf)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_rss_ctx *rss_state = &lio_dev->rss_state;
-	struct lio_rss_set *rss_param;
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down, can't update hash\n",
-			    lio_dev->port_id);
-		return -EINVAL;
-	}
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	rss_param = (struct lio_rss_set *)&ctrl_pkt.udd[0];
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_SET_RSS;
-	ctrl_pkt.ncmd.s.more = sizeof(struct lio_rss_set) >> 3;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	rss_param->param.flags = 0xF;
-
-	if (rss_conf->rss_key) {
-		rss_param->param.flags &= ~LIO_RSS_PARAM_HASH_KEY_UNCHANGED;
-		rss_state->hash_key_size = LIO_RSS_MAX_KEY_SZ;
-		rss_param->param.hashkeysize = LIO_RSS_MAX_KEY_SZ;
-		memcpy(rss_state->hash_key, rss_conf->rss_key,
-		       rss_state->hash_key_size);
-		memcpy(rss_param->key, rss_state->hash_key,
-		       rss_state->hash_key_size);
-	}
-
-	if ((rss_conf->rss_hf & LIO_RSS_OFFLOAD_ALL) == 0) {
-		/* Can't disable rss through hash flags,
-		 * if it is enabled by default during init
-		 */
-		if (!rss_state->hash_disable)
-			return -EINVAL;
-
-		/* This is for --disable-rss during testpmd launch */
-		rss_param->param.flags |= LIO_RSS_PARAM_DISABLE_RSS;
-	} else {
-		uint32_t hashinfo = 0;
-
-		/* Can't enable rss if disabled by default during init */
-		if (rss_state->hash_disable)
-			return -EINVAL;
-
-		if (rss_conf->rss_hf & RTE_ETH_RSS_IPV4) {
-			hashinfo |= LIO_RSS_HASH_IPV4;
-			rss_state->ip = 1;
-		} else {
-			rss_state->ip = 0;
-		}
-
-		if (rss_conf->rss_hf & RTE_ETH_RSS_NONFRAG_IPV4_TCP) {
-			hashinfo |= LIO_RSS_HASH_TCP_IPV4;
-			rss_state->tcp_hash = 1;
-		} else {
-			rss_state->tcp_hash = 0;
-		}
-
-		if (rss_conf->rss_hf & RTE_ETH_RSS_IPV6) {
-			hashinfo |= LIO_RSS_HASH_IPV6;
-			rss_state->ipv6 = 1;
-		} else {
-			rss_state->ipv6 = 0;
-		}
-
-		if (rss_conf->rss_hf & RTE_ETH_RSS_NONFRAG_IPV6_TCP) {
-			hashinfo |= LIO_RSS_HASH_TCP_IPV6;
-			rss_state->ipv6_tcp_hash = 1;
-		} else {
-			rss_state->ipv6_tcp_hash = 0;
-		}
-
-		if (rss_conf->rss_hf & RTE_ETH_RSS_IPV6_EX) {
-			hashinfo |= LIO_RSS_HASH_IPV6_EX;
-			rss_state->ipv6_ex = 1;
-		} else {
-			rss_state->ipv6_ex = 0;
-		}
-
-		if (rss_conf->rss_hf & RTE_ETH_RSS_IPV6_TCP_EX) {
-			hashinfo |= LIO_RSS_HASH_TCP_IPV6_EX;
-			rss_state->ipv6_tcp_ex_hash = 1;
-		} else {
-			rss_state->ipv6_tcp_ex_hash = 0;
-		}
-
-		rss_param->param.flags &= ~LIO_RSS_PARAM_HASH_INFO_UNCHANGED;
-		rss_param->param.hashinfo = hashinfo;
-	}
-
-	lio_swap_8B_data((uint64_t *)rss_param, LIO_RSS_PARAM_SIZE >> 3);
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to set rss hash\n");
-		return -1;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "Set rss hash timed out\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-/**
- * Add vxlan dest udp port for an interface.
- *
- * @param eth_dev
- *  Pointer to the structure rte_eth_dev
- * @param udp_tnl
- *  udp tunnel conf
- *
- * @return
- *  On success return 0
- *  On failure return -1
- */
-static int
-lio_dev_udp_tunnel_add(struct rte_eth_dev *eth_dev,
-		       struct rte_eth_udp_tunnel *udp_tnl)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	if (udp_tnl == NULL)
-		return -EINVAL;
-
-	if (udp_tnl->prot_type != RTE_ETH_TUNNEL_TYPE_VXLAN) {
-		lio_dev_err(lio_dev, "Unsupported tunnel type\n");
-		return -1;
-	}
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_VXLAN_PORT_CONFIG;
-	ctrl_pkt.ncmd.s.param1 = udp_tnl->udp_port;
-	ctrl_pkt.ncmd.s.more = LIO_CMD_VXLAN_PORT_ADD;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to send VXLAN_PORT_ADD command\n");
-		return -1;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "VXLAN_PORT_ADD command timed out\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-/**
- * Remove vxlan dest udp port for an interface.
- *
- * @param eth_dev
- *  Pointer to the structure rte_eth_dev
- * @param udp_tnl
- *  udp tunnel conf
- *
- * @return
- *  On success return 0
- *  On failure return -1
- */
-static int
-lio_dev_udp_tunnel_del(struct rte_eth_dev *eth_dev,
-		       struct rte_eth_udp_tunnel *udp_tnl)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	if (udp_tnl == NULL)
-		return -EINVAL;
-
-	if (udp_tnl->prot_type != RTE_ETH_TUNNEL_TYPE_VXLAN) {
-		lio_dev_err(lio_dev, "Unsupported tunnel type\n");
-		return -1;
-	}
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_VXLAN_PORT_CONFIG;
-	ctrl_pkt.ncmd.s.param1 = udp_tnl->udp_port;
-	ctrl_pkt.ncmd.s.more = LIO_CMD_VXLAN_PORT_DEL;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to send VXLAN_PORT_DEL command\n");
-		return -1;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "VXLAN_PORT_DEL command timed out\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-lio_dev_vlan_filter_set(struct rte_eth_dev *eth_dev, uint16_t vlan_id, int on)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	if (lio_dev->linfo.vlan_is_admin_assigned)
-		return -EPERM;
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = on ?
-			LIO_CMD_ADD_VLAN_FILTER : LIO_CMD_DEL_VLAN_FILTER;
-	ctrl_pkt.ncmd.s.param1 = vlan_id;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to %s VLAN port\n",
-			    on ? "add" : "remove");
-		return -1;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "Command to %s VLAN port timed out\n",
-			    on ? "add" : "remove");
-		return -1;
-	}
-
-	return 0;
-}
-
-static uint64_t
-lio_hweight64(uint64_t w)
-{
-	uint64_t res = w - ((w >> 1) & 0x5555555555555555ul);
-
-	res =
-	    (res & 0x3333333333333333ul) + ((res >> 2) & 0x3333333333333333ul);
-	res = (res + (res >> 4)) & 0x0F0F0F0F0F0F0F0Ful;
-	res = res + (res >> 8);
-	res = res + (res >> 16);
-
-	return (res + (res >> 32)) & 0x00000000000000FFul;
-}
-
-static int
-lio_dev_link_update(struct rte_eth_dev *eth_dev,
-		    int wait_to_complete __rte_unused)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct rte_eth_link link;
-
-	/* Initialize */
-	memset(&link, 0, sizeof(link));
-	link.link_status = RTE_ETH_LINK_DOWN;
-	link.link_speed = RTE_ETH_SPEED_NUM_NONE;
-	link.link_duplex = RTE_ETH_LINK_HALF_DUPLEX;
-	link.link_autoneg = RTE_ETH_LINK_AUTONEG;
-
-	/* Return what we found */
-	if (lio_dev->linfo.link.s.link_up == 0) {
-		/* Interface is down */
-		return rte_eth_linkstatus_set(eth_dev, &link);
-	}
-
-	link.link_status = RTE_ETH_LINK_UP; /* Interface is up */
-	link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX;
-	switch (lio_dev->linfo.link.s.speed) {
-	case LIO_LINK_SPEED_10000:
-		link.link_speed = RTE_ETH_SPEED_NUM_10G;
-		break;
-	case LIO_LINK_SPEED_25000:
-		link.link_speed = RTE_ETH_SPEED_NUM_25G;
-		break;
-	default:
-		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
-		link.link_duplex = RTE_ETH_LINK_HALF_DUPLEX;
-	}
-
-	return rte_eth_linkstatus_set(eth_dev, &link);
-}
-
-/**
- * \brief Net device enable, disable allmulticast
- * @param eth_dev Pointer to the structure rte_eth_dev
- *
- * @return
- *  On success return 0
- *  On failure return negative errno
- */
-static int
-lio_change_dev_flag(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	/* Create a ctrl pkt command to be sent to core app. */
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_CHANGE_DEVFLAGS;
-	ctrl_pkt.ncmd.s.param1 = lio_dev->ifflags;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to send change flag message\n");
-		return -EAGAIN;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "Change dev flag command timed out\n");
-		return -ETIMEDOUT;
-	}
-
-	return 0;
-}
-
-static int
-lio_dev_promiscuous_enable(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	if (strcmp(lio_dev->firmware_version, LIO_VF_TRUST_MIN_VERSION) < 0) {
-		lio_dev_err(lio_dev, "Require firmware version >= %s\n",
-			    LIO_VF_TRUST_MIN_VERSION);
-		return -EAGAIN;
-	}
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down, can't enable promiscuous\n",
-			    lio_dev->port_id);
-		return -EAGAIN;
-	}
-
-	lio_dev->ifflags |= LIO_IFFLAG_PROMISC;
-	return lio_change_dev_flag(eth_dev);
-}
-
-static int
-lio_dev_promiscuous_disable(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	if (strcmp(lio_dev->firmware_version, LIO_VF_TRUST_MIN_VERSION) < 0) {
-		lio_dev_err(lio_dev, "Require firmware version >= %s\n",
-			    LIO_VF_TRUST_MIN_VERSION);
-		return -EAGAIN;
-	}
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down, can't disable promiscuous\n",
-			    lio_dev->port_id);
-		return -EAGAIN;
-	}
-
-	lio_dev->ifflags &= ~LIO_IFFLAG_PROMISC;
-	return lio_change_dev_flag(eth_dev);
-}
-
-static int
-lio_dev_allmulticast_enable(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down, can't enable multicast\n",
-			    lio_dev->port_id);
-		return -EAGAIN;
-	}
-
-	lio_dev->ifflags |= LIO_IFFLAG_ALLMULTI;
-	return lio_change_dev_flag(eth_dev);
-}
-
-static int
-lio_dev_allmulticast_disable(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	if (!lio_dev->intf_open) {
-		lio_dev_err(lio_dev, "Port %d down, can't disable multicast\n",
-			    lio_dev->port_id);
-		return -EAGAIN;
-	}
-
-	lio_dev->ifflags &= ~LIO_IFFLAG_ALLMULTI;
-	return lio_change_dev_flag(eth_dev);
-}
-
-static void
-lio_dev_rss_configure(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_rss_ctx *rss_state = &lio_dev->rss_state;
-	struct rte_eth_rss_reta_entry64 reta_conf[8];
-	struct rte_eth_rss_conf rss_conf;
-	uint16_t i;
-
-	/* Configure the RSS key and the RSS protocols used to compute
-	 * the RSS hash of input packets.
-	 */
-	rss_conf = eth_dev->data->dev_conf.rx_adv_conf.rss_conf;
-	if ((rss_conf.rss_hf & LIO_RSS_OFFLOAD_ALL) == 0) {
-		rss_state->hash_disable = 1;
-		lio_dev_rss_hash_update(eth_dev, &rss_conf);
-		return;
-	}
-
-	if (rss_conf.rss_key == NULL)
-		rss_conf.rss_key = lio_rss_key; /* Default hash key */
-
-	lio_dev_rss_hash_update(eth_dev, &rss_conf);
-
-	memset(reta_conf, 0, sizeof(reta_conf));
-	for (i = 0; i < LIO_RSS_MAX_TABLE_SZ; i++) {
-		uint8_t q_idx, conf_idx, reta_idx;
-
-		q_idx = (uint8_t)((eth_dev->data->nb_rx_queues > 1) ?
-				  i % eth_dev->data->nb_rx_queues : 0);
-		conf_idx = i / RTE_ETH_RETA_GROUP_SIZE;
-		reta_idx = i % RTE_ETH_RETA_GROUP_SIZE;
-		reta_conf[conf_idx].reta[reta_idx] = q_idx;
-		reta_conf[conf_idx].mask |= ((uint64_t)1 << reta_idx);
-	}
-
-	lio_dev_rss_reta_update(eth_dev, reta_conf, LIO_RSS_MAX_TABLE_SZ);
-}
-
-static void
-lio_dev_mq_rx_configure(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_rss_ctx *rss_state = &lio_dev->rss_state;
-	struct rte_eth_rss_conf rss_conf;
-
-	switch (eth_dev->data->dev_conf.rxmode.mq_mode) {
-	case RTE_ETH_MQ_RX_RSS:
-		lio_dev_rss_configure(eth_dev);
-		break;
-	case RTE_ETH_MQ_RX_NONE:
-	/* if mq_mode is none, disable rss mode. */
-	default:
-		memset(&rss_conf, 0, sizeof(rss_conf));
-		rss_state->hash_disable = 1;
-		lio_dev_rss_hash_update(eth_dev, &rss_conf);
-	}
-}
-
-/**
- * Setup our receive queue/ringbuffer. This is the
- * queue the Octeon uses to send us packets and
- * responses. We are given a memory pool for our
- * packet buffers that are used to populate the receive
- * queue.
- *
- * @param eth_dev
- *    Pointer to the structure rte_eth_dev
- * @param q_no
- *    Queue number
- * @param num_rx_descs
- *    Number of entries in the queue
- * @param socket_id
- *    Where to allocate memory
- * @param rx_conf
- *    Pointer to the struction rte_eth_rxconf
- * @param mp
- *    Pointer to the packet pool
- *
- * @return
- *    - On success, return 0
- *    - On failure, return -1
- */
-static int
-lio_dev_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t q_no,
-		       uint16_t num_rx_descs, unsigned int socket_id,
-		       const struct rte_eth_rxconf *rx_conf __rte_unused,
-		       struct rte_mempool *mp)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct rte_pktmbuf_pool_private *mbp_priv;
-	uint32_t fw_mapped_oq;
-	uint16_t buf_size;
-
-	if (q_no >= lio_dev->nb_rx_queues) {
-		lio_dev_err(lio_dev, "Invalid rx queue number %u\n", q_no);
-		return -EINVAL;
-	}
-
-	lio_dev_dbg(lio_dev, "setting up rx queue %u\n", q_no);
-
-	fw_mapped_oq = lio_dev->linfo.rxpciq[q_no].s.q_no;
-
-	/* Free previous allocation if any */
-	if (eth_dev->data->rx_queues[q_no] != NULL) {
-		lio_dev_rx_queue_release(eth_dev, q_no);
-		eth_dev->data->rx_queues[q_no] = NULL;
-	}
-
-	mbp_priv = rte_mempool_get_priv(mp);
-	buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM;
-
-	if (lio_setup_droq(lio_dev, fw_mapped_oq, num_rx_descs, buf_size, mp,
-			   socket_id)) {
-		lio_dev_err(lio_dev, "droq allocation failed\n");
-		return -1;
-	}
-
-	eth_dev->data->rx_queues[q_no] = lio_dev->droq[fw_mapped_oq];
-
-	return 0;
-}
-
-/**
- * Release the receive queue/ringbuffer. Called by
- * the upper layers.
- *
- * @param eth_dev
- *    Pointer to Ethernet device structure.
- * @param q_no
- *    Receive queue index.
- *
- * @return
- *    - nothing
- */
-void
-lio_dev_rx_queue_release(struct rte_eth_dev *dev, uint16_t q_no)
-{
-	struct lio_droq *droq = dev->data->rx_queues[q_no];
-	int oq_no;
-
-	if (droq) {
-		oq_no = droq->q_no;
-		lio_delete_droq_queue(droq->lio_dev, oq_no);
-	}
-}
-
-/**
- * Allocate and initialize SW ring. Initialize associated HW registers.
- *
- * @param eth_dev
- *   Pointer to structure rte_eth_dev
- *
- * @param q_no
- *   Queue number
- *
- * @param num_tx_descs
- *   Number of ringbuffer descriptors
- *
- * @param socket_id
- *   NUMA socket id, used for memory allocations
- *
- * @param tx_conf
- *   Pointer to the structure rte_eth_txconf
- *
- * @return
- *   - On success, return 0
- *   - On failure, return -errno value
- */
-static int
-lio_dev_tx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t q_no,
-		       uint16_t num_tx_descs, unsigned int socket_id,
-		       const struct rte_eth_txconf *tx_conf __rte_unused)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	int fw_mapped_iq = lio_dev->linfo.txpciq[q_no].s.q_no;
-	int retval;
-
-	if (q_no >= lio_dev->nb_tx_queues) {
-		lio_dev_err(lio_dev, "Invalid tx queue number %u\n", q_no);
-		return -EINVAL;
-	}
-
-	lio_dev_dbg(lio_dev, "setting up tx queue %u\n", q_no);
-
-	/* Free previous allocation if any */
-	if (eth_dev->data->tx_queues[q_no] != NULL) {
-		lio_dev_tx_queue_release(eth_dev, q_no);
-		eth_dev->data->tx_queues[q_no] = NULL;
-	}
-
-	retval = lio_setup_iq(lio_dev, q_no, lio_dev->linfo.txpciq[q_no],
-			      num_tx_descs, lio_dev, socket_id);
-
-	if (retval) {
-		lio_dev_err(lio_dev, "Runtime IQ(TxQ) creation failed.\n");
-		return retval;
-	}
-
-	retval = lio_setup_sglists(lio_dev, q_no, fw_mapped_iq,
-				lio_dev->instr_queue[fw_mapped_iq]->nb_desc,
-				socket_id);
-
-	if (retval) {
-		lio_delete_instruction_queue(lio_dev, fw_mapped_iq);
-		return retval;
-	}
-
-	eth_dev->data->tx_queues[q_no] = lio_dev->instr_queue[fw_mapped_iq];
-
-	return 0;
-}
-
-/**
- * Release the transmit queue/ringbuffer. Called by
- * the upper layers.
- *
- * @param eth_dev
- *    Pointer to Ethernet device structure.
- * @param q_no
- *   Transmit queue index.
- *
- * @return
- *    - nothing
- */
-void
-lio_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t q_no)
-{
-	struct lio_instr_queue *tq = dev->data->tx_queues[q_no];
-	uint32_t fw_mapped_iq_no;
-
-
-	if (tq) {
-		/* Free sg_list */
-		lio_delete_sglist(tq);
-
-		fw_mapped_iq_no = tq->txpciq.s.q_no;
-		lio_delete_instruction_queue(tq->lio_dev, fw_mapped_iq_no);
-	}
-}
-
-/**
- * Api to check link state.
- */
-static void
-lio_dev_get_link_status(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	uint16_t timeout = LIO_MAX_CMD_TIMEOUT;
-	struct lio_link_status_resp *resp;
-	union octeon_link_status *ls;
-	struct lio_soft_command *sc;
-	uint32_t resp_size;
-
-	if (!lio_dev->intf_open)
-		return;
-
-	resp_size = sizeof(struct lio_link_status_resp);
-	sc = lio_alloc_soft_command(lio_dev, 0, resp_size, 0);
-	if (sc == NULL)
-		return;
-
-	resp = (struct lio_link_status_resp *)sc->virtrptr;
-	lio_prepare_soft_command(lio_dev, sc, LIO_OPCODE,
-				 LIO_OPCODE_INFO, 0, 0, 0);
-
-	/* Setting wait time in seconds */
-	sc->wait_time = LIO_MAX_CMD_TIMEOUT / 1000;
-
-	if (lio_send_soft_command(lio_dev, sc) == LIO_IQ_SEND_FAILED)
-		goto get_status_fail;
-
-	while ((*sc->status_word == LIO_COMPLETION_WORD_INIT) && --timeout) {
-		lio_flush_iq(lio_dev, lio_dev->instr_queue[sc->iq_no]);
-		rte_delay_ms(1);
-	}
-
-	if (resp->status)
-		goto get_status_fail;
-
-	ls = &resp->link_info.link;
-
-	lio_swap_8B_data((uint64_t *)ls, sizeof(union octeon_link_status) >> 3);
-
-	if (lio_dev->linfo.link.link_status64 != ls->link_status64) {
-		if (ls->s.mtu < eth_dev->data->mtu) {
-			lio_dev_info(lio_dev, "Lowered VF MTU to %d as PF MTU dropped\n",
-				     ls->s.mtu);
-			eth_dev->data->mtu = ls->s.mtu;
-		}
-		lio_dev->linfo.link.link_status64 = ls->link_status64;
-		lio_dev_link_update(eth_dev, 0);
-	}
-
-	lio_free_soft_command(sc);
-
-	return;
-
-get_status_fail:
-	lio_free_soft_command(sc);
-}
-
-/* This function will be invoked every LSC_TIMEOUT ns (100ms)
- * and will update link state if it changes.
- */
-static void
-lio_sync_link_state_check(void *eth_dev)
-{
-	struct lio_device *lio_dev =
-		(((struct rte_eth_dev *)eth_dev)->data->dev_private);
-
-	if (lio_dev->port_configured)
-		lio_dev_get_link_status(eth_dev);
-
-	/* Schedule periodic link status check.
-	 * Stop check if interface is close and start again while opening.
-	 */
-	if (lio_dev->intf_open)
-		rte_eal_alarm_set(LIO_LSC_TIMEOUT, lio_sync_link_state_check,
-				  eth_dev);
-}
-
-static int
-lio_dev_start(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	uint16_t timeout = LIO_MAX_CMD_TIMEOUT;
-	int ret = 0;
-
-	lio_dev_info(lio_dev, "Starting port %d\n", eth_dev->data->port_id);
-
-	if (lio_dev->fn_list.enable_io_queues(lio_dev))
-		return -1;
-
-	if (lio_send_rx_ctrl_cmd(eth_dev, 1))
-		return -1;
-
-	/* Ready for link status updates */
-	lio_dev->intf_open = 1;
-	rte_mb();
-
-	/* Configure RSS if device configured with multiple RX queues. */
-	lio_dev_mq_rx_configure(eth_dev);
-
-	/* Before update the link info,
-	 * must set linfo.link.link_status64 to 0.
-	 */
-	lio_dev->linfo.link.link_status64 = 0;
-
-	/* start polling for lsc */
-	ret = rte_eal_alarm_set(LIO_LSC_TIMEOUT,
-				lio_sync_link_state_check,
-				eth_dev);
-	if (ret) {
-		lio_dev_err(lio_dev,
-			    "link state check handler creation failed\n");
-		goto dev_lsc_handle_error;
-	}
-
-	while ((lio_dev->linfo.link.link_status64 == 0) && (--timeout))
-		rte_delay_ms(1);
-
-	if (lio_dev->linfo.link.link_status64 == 0) {
-		ret = -1;
-		goto dev_mtu_set_error;
-	}
-
-	ret = lio_dev_mtu_set(eth_dev, eth_dev->data->mtu);
-	if (ret != 0)
-		goto dev_mtu_set_error;
-
-	return 0;
-
-dev_mtu_set_error:
-	rte_eal_alarm_cancel(lio_sync_link_state_check, eth_dev);
-
-dev_lsc_handle_error:
-	lio_dev->intf_open = 0;
-	lio_send_rx_ctrl_cmd(eth_dev, 0);
-
-	return ret;
-}
-
-/* Stop device and disable input/output functions */
-static int
-lio_dev_stop(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	lio_dev_info(lio_dev, "Stopping port %d\n", eth_dev->data->port_id);
-	eth_dev->data->dev_started = 0;
-	lio_dev->intf_open = 0;
-	rte_mb();
-
-	/* Cancel callback if still running. */
-	rte_eal_alarm_cancel(lio_sync_link_state_check, eth_dev);
-
-	lio_send_rx_ctrl_cmd(eth_dev, 0);
-
-	lio_wait_for_instr_fetch(lio_dev);
-
-	/* Clear recorded link status */
-	lio_dev->linfo.link.link_status64 = 0;
-
-	return 0;
-}
-
-static int
-lio_dev_set_link_up(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	if (!lio_dev->intf_open) {
-		lio_dev_info(lio_dev, "Port is stopped, Start the port first\n");
-		return 0;
-	}
-
-	if (lio_dev->linfo.link.s.link_up) {
-		lio_dev_info(lio_dev, "Link is already UP\n");
-		return 0;
-	}
-
-	if (lio_send_rx_ctrl_cmd(eth_dev, 1)) {
-		lio_dev_err(lio_dev, "Unable to set Link UP\n");
-		return -1;
-	}
-
-	lio_dev->linfo.link.s.link_up = 1;
-	eth_dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
-
-	return 0;
-}
-
-static int
-lio_dev_set_link_down(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	if (!lio_dev->intf_open) {
-		lio_dev_info(lio_dev, "Port is stopped, Start the port first\n");
-		return 0;
-	}
-
-	if (!lio_dev->linfo.link.s.link_up) {
-		lio_dev_info(lio_dev, "Link is already DOWN\n");
-		return 0;
-	}
-
-	lio_dev->linfo.link.s.link_up = 0;
-	eth_dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
-
-	if (lio_send_rx_ctrl_cmd(eth_dev, 0)) {
-		lio_dev->linfo.link.s.link_up = 1;
-		eth_dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
-		lio_dev_err(lio_dev, "Unable to set Link Down\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-/**
- * Reset and stop the device. This occurs on the first
- * call to this routine. Subsequent calls will simply
- * return. NB: This will require the NIC to be rebooted.
- *
- * @param eth_dev
- *    Pointer to the structure rte_eth_dev
- *
- * @return
- *    - nothing
- */
-static int
-lio_dev_close(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	int ret = 0;
-
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return 0;
-
-	lio_dev_info(lio_dev, "closing port %d\n", eth_dev->data->port_id);
-
-	if (lio_dev->intf_open)
-		ret = lio_dev_stop(eth_dev);
-
-	/* Reset ioq regs */
-	lio_dev->fn_list.setup_device_regs(lio_dev);
-
-	if (lio_dev->pci_dev->kdrv == RTE_PCI_KDRV_IGB_UIO) {
-		cn23xx_vf_ask_pf_to_do_flr(lio_dev);
-		rte_delay_ms(LIO_PCI_FLR_WAIT);
-	}
-
-	/* lio_free_mbox */
-	lio_dev->fn_list.free_mbox(lio_dev);
-
-	/* Free glist resources */
-	rte_free(lio_dev->glist_head);
-	rte_free(lio_dev->glist_lock);
-	lio_dev->glist_head = NULL;
-	lio_dev->glist_lock = NULL;
-
-	lio_dev->port_configured = 0;
-
-	 /* Delete all queues */
-	lio_dev_clear_queues(eth_dev);
-
-	return ret;
-}
-
-/**
- * Enable tunnel rx checksum verification from firmware.
- */
-static void
-lio_enable_hw_tunnel_rx_checksum(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_TNL_RX_CSUM_CTL;
-	ctrl_pkt.ncmd.s.param1 = LIO_CMD_RXCSUM_ENABLE;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to send TNL_RX_CSUM command\n");
-		return;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd))
-		lio_dev_err(lio_dev, "TNL_RX_CSUM command timed out\n");
-}
-
-/**
- * Enable checksum calculation for inner packet in a tunnel.
- */
-static void
-lio_enable_hw_tunnel_tx_checksum(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_TNL_TX_CSUM_CTL;
-	ctrl_pkt.ncmd.s.param1 = LIO_CMD_TXCSUM_ENABLE;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to send TNL_TX_CSUM command\n");
-		return;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd))
-		lio_dev_err(lio_dev, "TNL_TX_CSUM command timed out\n");
-}
-
-static int
-lio_send_queue_count_update(struct rte_eth_dev *eth_dev, int num_txq,
-			    int num_rxq)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	struct lio_dev_ctrl_cmd ctrl_cmd;
-	struct lio_ctrl_pkt ctrl_pkt;
-
-	if (strcmp(lio_dev->firmware_version, LIO_Q_RECONF_MIN_VERSION) < 0) {
-		lio_dev_err(lio_dev, "Require firmware version >= %s\n",
-			    LIO_Q_RECONF_MIN_VERSION);
-		return -ENOTSUP;
-	}
-
-	/* flush added to prevent cmd failure
-	 * incase the queue is full
-	 */
-	lio_flush_iq(lio_dev, lio_dev->instr_queue[0]);
-
-	memset(&ctrl_pkt, 0, sizeof(struct lio_ctrl_pkt));
-	memset(&ctrl_cmd, 0, sizeof(struct lio_dev_ctrl_cmd));
-
-	ctrl_cmd.eth_dev = eth_dev;
-	ctrl_cmd.cond = 0;
-
-	ctrl_pkt.ncmd.s.cmd = LIO_CMD_QUEUE_COUNT_CTL;
-	ctrl_pkt.ncmd.s.param1 = num_txq;
-	ctrl_pkt.ncmd.s.param2 = num_rxq;
-	ctrl_pkt.ctrl_cmd = &ctrl_cmd;
-
-	if (lio_send_ctrl_pkt(lio_dev, &ctrl_pkt)) {
-		lio_dev_err(lio_dev, "Failed to send queue count control command\n");
-		return -1;
-	}
-
-	if (lio_wait_for_ctrl_cmd(lio_dev, &ctrl_cmd)) {
-		lio_dev_err(lio_dev, "Queue count control command timed out\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-lio_reconf_queues(struct rte_eth_dev *eth_dev, int num_txq, int num_rxq)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	int ret;
-
-	if (lio_dev->nb_rx_queues != num_rxq ||
-	    lio_dev->nb_tx_queues != num_txq) {
-		if (lio_send_queue_count_update(eth_dev, num_txq, num_rxq))
-			return -1;
-		lio_dev->nb_rx_queues = num_rxq;
-		lio_dev->nb_tx_queues = num_txq;
-	}
-
-	if (lio_dev->intf_open) {
-		ret = lio_dev_stop(eth_dev);
-		if (ret != 0)
-			return ret;
-	}
-
-	/* Reset ioq registers */
-	if (lio_dev->fn_list.setup_device_regs(lio_dev)) {
-		lio_dev_err(lio_dev, "Failed to configure device registers\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int
-lio_dev_configure(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-	uint16_t timeout = LIO_MAX_CMD_TIMEOUT;
-	int retval, num_iqueues, num_oqueues;
-	uint8_t mac[RTE_ETHER_ADDR_LEN], i;
-	struct lio_if_cfg_resp *resp;
-	struct lio_soft_command *sc;
-	union lio_if_cfg if_cfg;
-	uint32_t resp_size;
-
-	PMD_INIT_FUNC_TRACE();
-
-	if (eth_dev->data->dev_conf.rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
-		eth_dev->data->dev_conf.rxmode.offloads |=
-			RTE_ETH_RX_OFFLOAD_RSS_HASH;
-
-	/* Inform firmware about change in number of queues to use.
-	 * Disable IO queues and reset registers for re-configuration.
-	 */
-	if (lio_dev->port_configured)
-		return lio_reconf_queues(eth_dev,
-					 eth_dev->data->nb_tx_queues,
-					 eth_dev->data->nb_rx_queues);
-
-	lio_dev->nb_rx_queues = eth_dev->data->nb_rx_queues;
-	lio_dev->nb_tx_queues = eth_dev->data->nb_tx_queues;
-
-	/* Set max number of queues which can be re-configured. */
-	lio_dev->max_rx_queues = eth_dev->data->nb_rx_queues;
-	lio_dev->max_tx_queues = eth_dev->data->nb_tx_queues;
-
-	resp_size = sizeof(struct lio_if_cfg_resp);
-	sc = lio_alloc_soft_command(lio_dev, 0, resp_size, 0);
-	if (sc == NULL)
-		return -ENOMEM;
-
-	resp = (struct lio_if_cfg_resp *)sc->virtrptr;
-
-	/* Firmware doesn't have capability to reconfigure the queues,
-	 * Claim all queues, and use as many required
-	 */
-	if_cfg.if_cfg64 = 0;
-	if_cfg.s.num_iqueues = lio_dev->nb_tx_queues;
-	if_cfg.s.num_oqueues = lio_dev->nb_rx_queues;
-	if_cfg.s.base_queue = 0;
-
-	if_cfg.s.gmx_port_id = lio_dev->pf_num;
-
-	lio_prepare_soft_command(lio_dev, sc, LIO_OPCODE,
-				 LIO_OPCODE_IF_CFG, 0,
-				 if_cfg.if_cfg64, 0);
-
-	/* Setting wait time in seconds */
-	sc->wait_time = LIO_MAX_CMD_TIMEOUT / 1000;
-
-	retval = lio_send_soft_command(lio_dev, sc);
-	if (retval == LIO_IQ_SEND_FAILED) {
-		lio_dev_err(lio_dev, "iq/oq config failed status: %x\n",
-			    retval);
-		/* Soft instr is freed by driver in case of failure. */
-		goto nic_config_fail;
-	}
-
-	/* Sleep on a wait queue till the cond flag indicates that the
-	 * response arrived or timed-out.
-	 */
-	while ((*sc->status_word == LIO_COMPLETION_WORD_INIT) && --timeout) {
-		lio_flush_iq(lio_dev, lio_dev->instr_queue[sc->iq_no]);
-		lio_process_ordered_list(lio_dev);
-		rte_delay_ms(1);
-	}
-
-	retval = resp->status;
-	if (retval) {
-		lio_dev_err(lio_dev, "iq/oq config failed\n");
-		goto nic_config_fail;
-	}
-
-	strlcpy(lio_dev->firmware_version,
-		resp->cfg_info.lio_firmware_version, LIO_FW_VERSION_LENGTH);
-
-	lio_swap_8B_data((uint64_t *)(&resp->cfg_info),
-			 sizeof(struct octeon_if_cfg_info) >> 3);
-
-	num_iqueues = lio_hweight64(resp->cfg_info.iqmask);
-	num_oqueues = lio_hweight64(resp->cfg_info.oqmask);
-
-	if (!(num_iqueues) || !(num_oqueues)) {
-		lio_dev_err(lio_dev,
-			    "Got bad iqueues (%016lx) or oqueues (%016lx) from firmware.\n",
-			    (unsigned long)resp->cfg_info.iqmask,
-			    (unsigned long)resp->cfg_info.oqmask);
-		goto nic_config_fail;
-	}
-
-	lio_dev_dbg(lio_dev,
-		    "interface %d, iqmask %016lx, oqmask %016lx, numiqueues %d, numoqueues %d\n",
-		    eth_dev->data->port_id,
-		    (unsigned long)resp->cfg_info.iqmask,
-		    (unsigned long)resp->cfg_info.oqmask,
-		    num_iqueues, num_oqueues);
-
-	lio_dev->linfo.num_rxpciq = num_oqueues;
-	lio_dev->linfo.num_txpciq = num_iqueues;
-
-	for (i = 0; i < num_oqueues; i++) {
-		lio_dev->linfo.rxpciq[i].rxpciq64 =
-		    resp->cfg_info.linfo.rxpciq[i].rxpciq64;
-		lio_dev_dbg(lio_dev, "index %d OQ %d\n",
-			    i, lio_dev->linfo.rxpciq[i].s.q_no);
-	}
-
-	for (i = 0; i < num_iqueues; i++) {
-		lio_dev->linfo.txpciq[i].txpciq64 =
-		    resp->cfg_info.linfo.txpciq[i].txpciq64;
-		lio_dev_dbg(lio_dev, "index %d IQ %d\n",
-			    i, lio_dev->linfo.txpciq[i].s.q_no);
-	}
-
-	lio_dev->linfo.hw_addr = resp->cfg_info.linfo.hw_addr;
-	lio_dev->linfo.gmxport = resp->cfg_info.linfo.gmxport;
-	lio_dev->linfo.link.link_status64 =
-			resp->cfg_info.linfo.link.link_status64;
-
-	/* 64-bit swap required on LE machines */
-	lio_swap_8B_data(&lio_dev->linfo.hw_addr, 1);
-	for (i = 0; i < RTE_ETHER_ADDR_LEN; i++)
-		mac[i] = *((uint8_t *)(((uint8_t *)&lio_dev->linfo.hw_addr) +
-				       2 + i));
-
-	/* Copy the permanent MAC address */
-	rte_ether_addr_copy((struct rte_ether_addr *)mac,
-			&eth_dev->data->mac_addrs[0]);
-
-	/* enable firmware checksum support for tunnel packets */
-	lio_enable_hw_tunnel_rx_checksum(eth_dev);
-	lio_enable_hw_tunnel_tx_checksum(eth_dev);
-
-	lio_dev->glist_lock =
-	    rte_zmalloc(NULL, sizeof(*lio_dev->glist_lock) * num_iqueues, 0);
-	if (lio_dev->glist_lock == NULL)
-		return -ENOMEM;
-
-	lio_dev->glist_head =
-		rte_zmalloc(NULL, sizeof(*lio_dev->glist_head) * num_iqueues,
-			    0);
-	if (lio_dev->glist_head == NULL) {
-		rte_free(lio_dev->glist_lock);
-		lio_dev->glist_lock = NULL;
-		return -ENOMEM;
-	}
-
-	lio_dev_link_update(eth_dev, 0);
-
-	lio_dev->port_configured = 1;
-
-	lio_free_soft_command(sc);
-
-	/* Reset ioq regs */
-	lio_dev->fn_list.setup_device_regs(lio_dev);
-
-	/* Free iq_0 used during init */
-	lio_free_instr_queue0(lio_dev);
-
-	return 0;
-
-nic_config_fail:
-	lio_dev_err(lio_dev, "Failed retval %d\n", retval);
-	lio_free_soft_command(sc);
-	lio_free_instr_queue0(lio_dev);
-
-	return -ENODEV;
-}
-
-/* Define our ethernet definitions */
-static const struct eth_dev_ops liovf_eth_dev_ops = {
-	.dev_configure		= lio_dev_configure,
-	.dev_start		= lio_dev_start,
-	.dev_stop		= lio_dev_stop,
-	.dev_set_link_up	= lio_dev_set_link_up,
-	.dev_set_link_down	= lio_dev_set_link_down,
-	.dev_close		= lio_dev_close,
-	.promiscuous_enable	= lio_dev_promiscuous_enable,
-	.promiscuous_disable	= lio_dev_promiscuous_disable,
-	.allmulticast_enable	= lio_dev_allmulticast_enable,
-	.allmulticast_disable	= lio_dev_allmulticast_disable,
-	.link_update		= lio_dev_link_update,
-	.stats_get		= lio_dev_stats_get,
-	.xstats_get		= lio_dev_xstats_get,
-	.xstats_get_names	= lio_dev_xstats_get_names,
-	.stats_reset		= lio_dev_stats_reset,
-	.xstats_reset		= lio_dev_xstats_reset,
-	.dev_infos_get		= lio_dev_info_get,
-	.vlan_filter_set	= lio_dev_vlan_filter_set,
-	.rx_queue_setup		= lio_dev_rx_queue_setup,
-	.rx_queue_release	= lio_dev_rx_queue_release,
-	.tx_queue_setup		= lio_dev_tx_queue_setup,
-	.tx_queue_release	= lio_dev_tx_queue_release,
-	.reta_update		= lio_dev_rss_reta_update,
-	.reta_query		= lio_dev_rss_reta_query,
-	.rss_hash_conf_get	= lio_dev_rss_hash_conf_get,
-	.rss_hash_update	= lio_dev_rss_hash_update,
-	.udp_tunnel_port_add	= lio_dev_udp_tunnel_add,
-	.udp_tunnel_port_del	= lio_dev_udp_tunnel_del,
-	.mtu_set		= lio_dev_mtu_set,
-};
-
-static void
-lio_check_pf_hs_response(void *lio_dev)
-{
-	struct lio_device *dev = lio_dev;
-
-	/* check till response arrives */
-	if (dev->pfvf_hsword.coproc_tics_per_us)
-		return;
-
-	cn23xx_vf_handle_mbox(dev);
-
-	rte_eal_alarm_set(1, lio_check_pf_hs_response, lio_dev);
-}
-
-/**
- * \brief Identify the LIO device and to map the BAR address space
- * @param lio_dev lio device
- */
-static int
-lio_chip_specific_setup(struct lio_device *lio_dev)
-{
-	struct rte_pci_device *pdev = lio_dev->pci_dev;
-	uint32_t dev_id = pdev->id.device_id;
-	const char *s;
-	int ret = 1;
-
-	switch (dev_id) {
-	case LIO_CN23XX_VF_VID:
-		lio_dev->chip_id = LIO_CN23XX_VF_VID;
-		ret = cn23xx_vf_setup_device(lio_dev);
-		s = "CN23XX VF";
-		break;
-	default:
-		s = "?";
-		lio_dev_err(lio_dev, "Unsupported Chip\n");
-	}
-
-	if (!ret)
-		lio_dev_info(lio_dev, "DEVICE : %s\n", s);
-
-	return ret;
-}
-
-static int
-lio_first_time_init(struct lio_device *lio_dev,
-		    struct rte_pci_device *pdev)
-{
-	int dpdk_queues;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/* set dpdk specific pci device pointer */
-	lio_dev->pci_dev = pdev;
-
-	/* Identify the LIO type and set device ops */
-	if (lio_chip_specific_setup(lio_dev)) {
-		lio_dev_err(lio_dev, "Chip specific setup failed\n");
-		return -1;
-	}
-
-	/* Initialize soft command buffer pool */
-	if (lio_setup_sc_buffer_pool(lio_dev)) {
-		lio_dev_err(lio_dev, "sc buffer pool allocation failed\n");
-		return -1;
-	}
-
-	/* Initialize lists to manage the requests of different types that
-	 * arrive from applications for this lio device.
-	 */
-	lio_setup_response_list(lio_dev);
-
-	if (lio_dev->fn_list.setup_mbox(lio_dev)) {
-		lio_dev_err(lio_dev, "Mailbox setup failed\n");
-		goto error;
-	}
-
-	/* Check PF response */
-	lio_check_pf_hs_response((void *)lio_dev);
-
-	/* Do handshake and exit if incompatible PF driver */
-	if (cn23xx_pfvf_handshake(lio_dev))
-		goto error;
-
-	/* Request and wait for device reset. */
-	if (pdev->kdrv == RTE_PCI_KDRV_IGB_UIO) {
-		cn23xx_vf_ask_pf_to_do_flr(lio_dev);
-		/* FLR wait time doubled as a precaution. */
-		rte_delay_ms(LIO_PCI_FLR_WAIT * 2);
-	}
-
-	if (lio_dev->fn_list.setup_device_regs(lio_dev)) {
-		lio_dev_err(lio_dev, "Failed to configure device registers\n");
-		goto error;
-	}
-
-	if (lio_setup_instr_queue0(lio_dev)) {
-		lio_dev_err(lio_dev, "Failed to setup instruction queue 0\n");
-		goto error;
-	}
-
-	dpdk_queues = (int)lio_dev->sriov_info.rings_per_vf;
-
-	lio_dev->max_tx_queues = dpdk_queues;
-	lio_dev->max_rx_queues = dpdk_queues;
-
-	/* Enable input and output queues for this device */
-	if (lio_dev->fn_list.enable_io_queues(lio_dev))
-		goto error;
-
-	return 0;
-
-error:
-	lio_free_sc_buffer_pool(lio_dev);
-	if (lio_dev->mbox[0])
-		lio_dev->fn_list.free_mbox(lio_dev);
-	if (lio_dev->instr_queue[0])
-		lio_free_instr_queue0(lio_dev);
-
-	return -1;
-}
-
-static int
-lio_eth_dev_uninit(struct rte_eth_dev *eth_dev)
-{
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	PMD_INIT_FUNC_TRACE();
-
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return 0;
-
-	/* lio_free_sc_buffer_pool */
-	lio_free_sc_buffer_pool(lio_dev);
-
-	return 0;
-}
-
-static int
-lio_eth_dev_init(struct rte_eth_dev *eth_dev)
-{
-	struct rte_pci_device *pdev = RTE_ETH_DEV_TO_PCI(eth_dev);
-	struct lio_device *lio_dev = LIO_DEV(eth_dev);
-
-	PMD_INIT_FUNC_TRACE();
-
-	eth_dev->rx_pkt_burst = &lio_dev_recv_pkts;
-	eth_dev->tx_pkt_burst = &lio_dev_xmit_pkts;
-
-	/* Primary does the initialization. */
-	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
-		return 0;
-
-	rte_eth_copy_pci_info(eth_dev, pdev);
-
-	if (pdev->mem_resource[0].addr) {
-		lio_dev->hw_addr = pdev->mem_resource[0].addr;
-	} else {
-		PMD_INIT_LOG(ERR, "ERROR: Failed to map BAR0\n");
-		return -ENODEV;
-	}
-
-	lio_dev->eth_dev = eth_dev;
-	/* set lio device print string */
-	snprintf(lio_dev->dev_string, sizeof(lio_dev->dev_string),
-		 "%s[%02x:%02x.%x]", pdev->driver->driver.name,
-		 pdev->addr.bus, pdev->addr.devid, pdev->addr.function);
-
-	lio_dev->port_id = eth_dev->data->port_id;
-
-	if (lio_first_time_init(lio_dev, pdev)) {
-		lio_dev_err(lio_dev, "Device init failed\n");
-		return -EINVAL;
-	}
-
-	eth_dev->dev_ops = &liovf_eth_dev_ops;
-	eth_dev->data->mac_addrs = rte_zmalloc("lio", RTE_ETHER_ADDR_LEN, 0);
-	if (eth_dev->data->mac_addrs == NULL) {
-		lio_dev_err(lio_dev,
-			    "MAC addresses memory allocation failed\n");
-		eth_dev->dev_ops = NULL;
-		eth_dev->rx_pkt_burst = NULL;
-		eth_dev->tx_pkt_burst = NULL;
-		return -ENOMEM;
-	}
-
-	rte_atomic64_set(&lio_dev->status, LIO_DEV_RUNNING);
-	rte_wmb();
-
-	lio_dev->port_configured = 0;
-	/* Always allow unicast packets */
-	lio_dev->ifflags |= LIO_IFFLAG_UNICAST;
-
-	return 0;
-}
-
-static int
-lio_eth_dev_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
-		      struct rte_pci_device *pci_dev)
-{
-	return rte_eth_dev_pci_generic_probe(pci_dev, sizeof(struct lio_device),
-			lio_eth_dev_init);
-}
-
-static int
-lio_eth_dev_pci_remove(struct rte_pci_device *pci_dev)
-{
-	return rte_eth_dev_pci_generic_remove(pci_dev,
-					      lio_eth_dev_uninit);
-}
-
-/* Set of PCI devices this driver supports */
-static const struct rte_pci_id pci_id_liovf_map[] = {
-	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, LIO_CN23XX_VF_VID) },
-	{ .vendor_id = 0, /* sentinel */ }
-};
-
-static struct rte_pci_driver rte_liovf_pmd = {
-	.id_table	= pci_id_liovf_map,
-	.drv_flags      = RTE_PCI_DRV_NEED_MAPPING,
-	.probe		= lio_eth_dev_pci_probe,
-	.remove		= lio_eth_dev_pci_remove,
-};
-
-RTE_PMD_REGISTER_PCI(net_liovf, rte_liovf_pmd);
-RTE_PMD_REGISTER_PCI_TABLE(net_liovf, pci_id_liovf_map);
-RTE_PMD_REGISTER_KMOD_DEP(net_liovf, "* igb_uio | vfio-pci");
-RTE_LOG_REGISTER_SUFFIX(lio_logtype_init, init, NOTICE);
-RTE_LOG_REGISTER_SUFFIX(lio_logtype_driver, driver, NOTICE);
diff --git a/drivers/net/liquidio/lio_ethdev.h b/drivers/net/liquidio/lio_ethdev.h
deleted file mode 100644
index ece2b03858..0000000000
--- a/drivers/net/liquidio/lio_ethdev.h
+++ /dev/null
@@ -1,179 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#ifndef _LIO_ETHDEV_H_
-#define _LIO_ETHDEV_H_
-
-#include <stdint.h>
-
-#include "lio_struct.h"
-
-/* timeout to check link state updates from firmware in us */
-#define LIO_LSC_TIMEOUT		100000 /* 100000us (100ms) */
-#define LIO_MAX_CMD_TIMEOUT     10000 /* 10000ms (10s) */
-
-/* The max frame size with default MTU */
-#define LIO_ETH_MAX_LEN (RTE_ETHER_MTU + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN)
-
-#define LIO_DEV(_eth_dev)		((_eth_dev)->data->dev_private)
-
-/* LIO Response condition variable */
-struct lio_dev_ctrl_cmd {
-	struct rte_eth_dev *eth_dev;
-	uint64_t cond;
-};
-
-enum lio_bus_speed {
-	LIO_LINK_SPEED_UNKNOWN  = 0,
-	LIO_LINK_SPEED_10000    = 10000,
-	LIO_LINK_SPEED_25000    = 25000
-};
-
-struct octeon_if_cfg_info {
-	uint64_t iqmask;	/** mask for IQs enabled for the port */
-	uint64_t oqmask;	/** mask for OQs enabled for the port */
-	struct octeon_link_info linfo; /** initial link information */
-	char lio_firmware_version[LIO_FW_VERSION_LENGTH];
-};
-
-/** Stats for each NIC port in RX direction. */
-struct octeon_rx_stats {
-	/* link-level stats */
-	uint64_t total_rcvd;
-	uint64_t bytes_rcvd;
-	uint64_t total_bcst;
-	uint64_t total_mcst;
-	uint64_t runts;
-	uint64_t ctl_rcvd;
-	uint64_t fifo_err; /* Accounts for over/under-run of buffers */
-	uint64_t dmac_drop;
-	uint64_t fcs_err;
-	uint64_t jabber_err;
-	uint64_t l2_err;
-	uint64_t frame_err;
-
-	/* firmware stats */
-	uint64_t fw_total_rcvd;
-	uint64_t fw_total_fwd;
-	uint64_t fw_total_fwd_bytes;
-	uint64_t fw_err_pko;
-	uint64_t fw_err_link;
-	uint64_t fw_err_drop;
-	uint64_t fw_rx_vxlan;
-	uint64_t fw_rx_vxlan_err;
-
-	/* LRO */
-	uint64_t fw_lro_pkts;   /* Number of packets that are LROed */
-	uint64_t fw_lro_octs;   /* Number of octets that are LROed */
-	uint64_t fw_total_lro;  /* Number of LRO packets formed */
-	uint64_t fw_lro_aborts; /* Number of times lRO of packet aborted */
-	uint64_t fw_lro_aborts_port;
-	uint64_t fw_lro_aborts_seq;
-	uint64_t fw_lro_aborts_tsval;
-	uint64_t fw_lro_aborts_timer;
-	/* intrmod: packet forward rate */
-	uint64_t fwd_rate;
-};
-
-/** Stats for each NIC port in RX direction. */
-struct octeon_tx_stats {
-	/* link-level stats */
-	uint64_t total_pkts_sent;
-	uint64_t total_bytes_sent;
-	uint64_t mcast_pkts_sent;
-	uint64_t bcast_pkts_sent;
-	uint64_t ctl_sent;
-	uint64_t one_collision_sent;	/* Packets sent after one collision */
-	/* Packets sent after multiple collision */
-	uint64_t multi_collision_sent;
-	/* Packets not sent due to max collisions */
-	uint64_t max_collision_fail;
-	/* Packets not sent due to max deferrals */
-	uint64_t max_deferral_fail;
-	/* Accounts for over/under-run of buffers */
-	uint64_t fifo_err;
-	uint64_t runts;
-	uint64_t total_collisions; /* Total number of collisions detected */
-
-	/* firmware stats */
-	uint64_t fw_total_sent;
-	uint64_t fw_total_fwd;
-	uint64_t fw_total_fwd_bytes;
-	uint64_t fw_err_pko;
-	uint64_t fw_err_link;
-	uint64_t fw_err_drop;
-	uint64_t fw_err_tso;
-	uint64_t fw_tso;     /* number of tso requests */
-	uint64_t fw_tso_fwd; /* number of packets segmented in tso */
-	uint64_t fw_tx_vxlan;
-};
-
-struct octeon_link_stats {
-	struct octeon_rx_stats fromwire;
-	struct octeon_tx_stats fromhost;
-};
-
-union lio_if_cfg {
-	uint64_t if_cfg64;
-	struct {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-		uint64_t base_queue : 16;
-		uint64_t num_iqueues : 16;
-		uint64_t num_oqueues : 16;
-		uint64_t gmx_port_id : 8;
-		uint64_t vf_id : 8;
-#else
-		uint64_t vf_id : 8;
-		uint64_t gmx_port_id : 8;
-		uint64_t num_oqueues : 16;
-		uint64_t num_iqueues : 16;
-		uint64_t base_queue : 16;
-#endif
-	} s;
-};
-
-struct lio_if_cfg_resp {
-	uint64_t rh;
-	struct octeon_if_cfg_info cfg_info;
-	uint64_t status;
-};
-
-struct lio_link_stats_resp {
-	uint64_t rh;
-	struct octeon_link_stats link_stats;
-	uint64_t status;
-};
-
-struct lio_link_status_resp {
-	uint64_t rh;
-	struct octeon_link_info link_info;
-	uint64_t status;
-};
-
-struct lio_rss_set {
-	struct param {
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-		uint64_t flags : 16;
-		uint64_t hashinfo : 32;
-		uint64_t itablesize : 16;
-		uint64_t hashkeysize : 16;
-		uint64_t reserved : 48;
-#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-		uint64_t itablesize : 16;
-		uint64_t hashinfo : 32;
-		uint64_t flags : 16;
-		uint64_t reserved : 48;
-		uint64_t hashkeysize : 16;
-#endif
-	} param;
-
-	uint8_t itable[LIO_RSS_MAX_TABLE_SZ];
-	uint8_t key[LIO_RSS_MAX_KEY_SZ];
-};
-
-void lio_dev_rx_queue_release(struct rte_eth_dev *dev, uint16_t q_no);
-
-void lio_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t q_no);
-
-#endif	/* _LIO_ETHDEV_H_ */
diff --git a/drivers/net/liquidio/lio_logs.h b/drivers/net/liquidio/lio_logs.h
deleted file mode 100644
index f227827081..0000000000
--- a/drivers/net/liquidio/lio_logs.h
+++ /dev/null
@@ -1,58 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#ifndef _LIO_LOGS_H_
-#define _LIO_LOGS_H_
-
-extern int lio_logtype_driver;
-#define lio_dev_printf(lio_dev, level, fmt, args...)		\
-	rte_log(RTE_LOG_ ## level, lio_logtype_driver,		\
-		"%s" fmt, (lio_dev)->dev_string, ##args)
-
-#define lio_dev_info(lio_dev, fmt, args...)				\
-	lio_dev_printf(lio_dev, INFO, "INFO: " fmt, ##args)
-
-#define lio_dev_err(lio_dev, fmt, args...)				\
-	lio_dev_printf(lio_dev, ERR, "ERROR: %s() " fmt, __func__, ##args)
-
-extern int lio_logtype_init;
-#define PMD_INIT_LOG(level, fmt, args...) \
-	rte_log(RTE_LOG_ ## level, lio_logtype_init, \
-		fmt, ## args)
-
-/* Enable these through config options */
-#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, "%s() >>\n", __func__)
-
-#define lio_dev_dbg(lio_dev, fmt, args...)				\
-	lio_dev_printf(lio_dev, DEBUG, "DEBUG: %s() " fmt, __func__, ##args)
-
-#ifdef RTE_LIBRTE_LIO_DEBUG_RX
-#define PMD_RX_LOG(lio_dev, level, fmt, args...)			\
-	lio_dev_printf(lio_dev, level, "RX: %s() " fmt, __func__, ##args)
-#else /* !RTE_LIBRTE_LIO_DEBUG_RX */
-#define PMD_RX_LOG(lio_dev, level, fmt, args...) do { } while (0)
-#endif /* RTE_LIBRTE_LIO_DEBUG_RX */
-
-#ifdef RTE_LIBRTE_LIO_DEBUG_TX
-#define PMD_TX_LOG(lio_dev, level, fmt, args...)			\
-	lio_dev_printf(lio_dev, level, "TX: %s() " fmt, __func__, ##args)
-#else /* !RTE_LIBRTE_LIO_DEBUG_TX */
-#define PMD_TX_LOG(lio_dev, level, fmt, args...) do { } while (0)
-#endif /* RTE_LIBRTE_LIO_DEBUG_TX */
-
-#ifdef RTE_LIBRTE_LIO_DEBUG_MBOX
-#define PMD_MBOX_LOG(lio_dev, level, fmt, args...)			\
-	lio_dev_printf(lio_dev, level, "MBOX: %s() " fmt, __func__, ##args)
-#else /* !RTE_LIBRTE_LIO_DEBUG_MBOX */
-#define PMD_MBOX_LOG(level, fmt, args...) do { } while (0)
-#endif /* RTE_LIBRTE_LIO_DEBUG_MBOX */
-
-#ifdef RTE_LIBRTE_LIO_DEBUG_REGS
-#define PMD_REGS_LOG(lio_dev, fmt, args...)				\
-	lio_dev_printf(lio_dev, DEBUG, "REGS: " fmt, ##args)
-#else /* !RTE_LIBRTE_LIO_DEBUG_REGS */
-#define PMD_REGS_LOG(level, fmt, args...) do { } while (0)
-#endif /* RTE_LIBRTE_LIO_DEBUG_REGS */
-
-#endif  /* _LIO_LOGS_H_ */
diff --git a/drivers/net/liquidio/lio_rxtx.c b/drivers/net/liquidio/lio_rxtx.c
deleted file mode 100644
index e09798ddd7..0000000000
--- a/drivers/net/liquidio/lio_rxtx.c
+++ /dev/null
@@ -1,1804 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#include <ethdev_driver.h>
-#include <rte_cycles.h>
-#include <rte_malloc.h>
-
-#include "lio_logs.h"
-#include "lio_struct.h"
-#include "lio_ethdev.h"
-#include "lio_rxtx.h"
-
-#define LIO_MAX_SG 12
-/* Flush iq if available tx_desc fall below LIO_FLUSH_WM */
-#define LIO_FLUSH_WM(_iq) ((_iq)->nb_desc / 2)
-#define LIO_PKT_IN_DONE_CNT_MASK 0x00000000FFFFFFFFULL
-
-static void
-lio_droq_compute_max_packet_bufs(struct lio_droq *droq)
-{
-	uint32_t count = 0;
-
-	do {
-		count += droq->buffer_size;
-	} while (count < LIO_MAX_RX_PKTLEN);
-}
-
-static void
-lio_droq_reset_indices(struct lio_droq *droq)
-{
-	droq->read_idx	= 0;
-	droq->write_idx	= 0;
-	droq->refill_idx = 0;
-	droq->refill_count = 0;
-	rte_atomic64_set(&droq->pkts_pending, 0);
-}
-
-static void
-lio_droq_destroy_ring_buffers(struct lio_droq *droq)
-{
-	uint32_t i;
-
-	for (i = 0; i < droq->nb_desc; i++) {
-		if (droq->recv_buf_list[i].buffer) {
-			rte_pktmbuf_free((struct rte_mbuf *)
-					 droq->recv_buf_list[i].buffer);
-			droq->recv_buf_list[i].buffer = NULL;
-		}
-	}
-
-	lio_droq_reset_indices(droq);
-}
-
-static int
-lio_droq_setup_ring_buffers(struct lio_device *lio_dev,
-			    struct lio_droq *droq)
-{
-	struct lio_droq_desc *desc_ring = droq->desc_ring;
-	uint32_t i;
-	void *buf;
-
-	for (i = 0; i < droq->nb_desc; i++) {
-		buf = rte_pktmbuf_alloc(droq->mpool);
-		if (buf == NULL) {
-			lio_dev_err(lio_dev, "buffer alloc failed\n");
-			droq->stats.rx_alloc_failure++;
-			lio_droq_destroy_ring_buffers(droq);
-			return -ENOMEM;
-		}
-
-		droq->recv_buf_list[i].buffer = buf;
-		droq->info_list[i].length = 0;
-
-		/* map ring buffers into memory */
-		desc_ring[i].info_ptr = lio_map_ring_info(droq, i);
-		desc_ring[i].buffer_ptr =
-			lio_map_ring(droq->recv_buf_list[i].buffer);
-	}
-
-	lio_droq_reset_indices(droq);
-
-	lio_droq_compute_max_packet_bufs(droq);
-
-	return 0;
-}
-
-static void
-lio_dma_zone_free(struct lio_device *lio_dev, const struct rte_memzone *mz)
-{
-	const struct rte_memzone *mz_tmp;
-	int ret = 0;
-
-	if (mz == NULL) {
-		lio_dev_err(lio_dev, "Memzone NULL\n");
-		return;
-	}
-
-	mz_tmp = rte_memzone_lookup(mz->name);
-	if (mz_tmp == NULL) {
-		lio_dev_err(lio_dev, "Memzone %s Not Found\n", mz->name);
-		return;
-	}
-
-	ret = rte_memzone_free(mz);
-	if (ret)
-		lio_dev_err(lio_dev, "Memzone free Failed ret %d\n", ret);
-}
-
-/**
- *  Frees the space for descriptor ring for the droq.
- *
- *  @param lio_dev	- pointer to the lio device structure
- *  @param q_no		- droq no.
- */
-static void
-lio_delete_droq(struct lio_device *lio_dev, uint32_t q_no)
-{
-	struct lio_droq *droq = lio_dev->droq[q_no];
-
-	lio_dev_dbg(lio_dev, "OQ[%d]\n", q_no);
-
-	lio_droq_destroy_ring_buffers(droq);
-	rte_free(droq->recv_buf_list);
-	droq->recv_buf_list = NULL;
-	lio_dma_zone_free(lio_dev, droq->info_mz);
-	lio_dma_zone_free(lio_dev, droq->desc_ring_mz);
-
-	memset(droq, 0, LIO_DROQ_SIZE);
-}
-
-static void *
-lio_alloc_info_buffer(struct lio_device *lio_dev,
-		      struct lio_droq *droq, unsigned int socket_id)
-{
-	droq->info_mz = rte_eth_dma_zone_reserve(lio_dev->eth_dev,
-						 "info_list", droq->q_no,
-						 (droq->nb_desc *
-							LIO_DROQ_INFO_SIZE),
-						 RTE_CACHE_LINE_SIZE,
-						 socket_id);
-
-	if (droq->info_mz == NULL)
-		return NULL;
-
-	droq->info_list_dma = droq->info_mz->iova;
-	droq->info_alloc_size = droq->info_mz->len;
-	droq->info_base_addr = (size_t)droq->info_mz->addr;
-
-	return droq->info_mz->addr;
-}
-
-/**
- *  Allocates space for the descriptor ring for the droq and
- *  sets the base addr, num desc etc in Octeon registers.
- *
- * @param lio_dev	- pointer to the lio device structure
- * @param q_no		- droq no.
- * @param app_ctx	- pointer to application context
- * @return Success: 0	Failure: -1
- */
-static int
-lio_init_droq(struct lio_device *lio_dev, uint32_t q_no,
-	      uint32_t num_descs, uint32_t desc_size,
-	      struct rte_mempool *mpool, unsigned int socket_id)
-{
-	uint32_t c_refill_threshold;
-	uint32_t desc_ring_size;
-	struct lio_droq *droq;
-
-	lio_dev_dbg(lio_dev, "OQ[%d]\n", q_no);
-
-	droq = lio_dev->droq[q_no];
-	droq->lio_dev = lio_dev;
-	droq->q_no = q_no;
-	droq->mpool = mpool;
-
-	c_refill_threshold = LIO_OQ_REFILL_THRESHOLD_CFG(lio_dev);
-
-	droq->nb_desc = num_descs;
-	droq->buffer_size = desc_size;
-
-	desc_ring_size = droq->nb_desc * LIO_DROQ_DESC_SIZE;
-	droq->desc_ring_mz = rte_eth_dma_zone_reserve(lio_dev->eth_dev,
-						      "droq", q_no,
-						      desc_ring_size,
-						      RTE_CACHE_LINE_SIZE,
-						      socket_id);
-
-	if (droq->desc_ring_mz == NULL) {
-		lio_dev_err(lio_dev,
-			    "Output queue %d ring alloc failed\n", q_no);
-		return -1;
-	}
-
-	droq->desc_ring_dma = droq->desc_ring_mz->iova;
-	droq->desc_ring = (struct lio_droq_desc *)droq->desc_ring_mz->addr;
-
-	lio_dev_dbg(lio_dev, "droq[%d]: desc_ring: virt: 0x%p, dma: %lx\n",
-		    q_no, droq->desc_ring, (unsigned long)droq->desc_ring_dma);
-	lio_dev_dbg(lio_dev, "droq[%d]: num_desc: %d\n", q_no,
-		    droq->nb_desc);
-
-	droq->info_list = lio_alloc_info_buffer(lio_dev, droq, socket_id);
-	if (droq->info_list == NULL) {
-		lio_dev_err(lio_dev, "Cannot allocate memory for info list.\n");
-		goto init_droq_fail;
-	}
-
-	droq->recv_buf_list = rte_zmalloc_socket("recv_buf_list",
-						 (droq->nb_desc *
-							LIO_DROQ_RECVBUF_SIZE),
-						 RTE_CACHE_LINE_SIZE,
-						 socket_id);
-	if (droq->recv_buf_list == NULL) {
-		lio_dev_err(lio_dev,
-			    "Output queue recv buf list alloc failed\n");
-		goto init_droq_fail;
-	}
-
-	if (lio_droq_setup_ring_buffers(lio_dev, droq))
-		goto init_droq_fail;
-
-	droq->refill_threshold = c_refill_threshold;
-
-	rte_spinlock_init(&droq->lock);
-
-	lio_dev->fn_list.setup_oq_regs(lio_dev, q_no);
-
-	lio_dev->io_qmask.oq |= (1ULL << q_no);
-
-	return 0;
-
-init_droq_fail:
-	lio_delete_droq(lio_dev, q_no);
-
-	return -1;
-}
-
-int
-lio_setup_droq(struct lio_device *lio_dev, int oq_no, int num_descs,
-	       int desc_size, struct rte_mempool *mpool, unsigned int socket_id)
-{
-	struct lio_droq *droq;
-
-	PMD_INIT_FUNC_TRACE();
-
-	/* Allocate the DS for the new droq. */
-	droq = rte_zmalloc_socket("ethdev RX queue", sizeof(*droq),
-				  RTE_CACHE_LINE_SIZE, socket_id);
-	if (droq == NULL)
-		return -ENOMEM;
-
-	lio_dev->droq[oq_no] = droq;
-
-	/* Initialize the Droq */
-	if (lio_init_droq(lio_dev, oq_no, num_descs, desc_size, mpool,
-			  socket_id)) {
-		lio_dev_err(lio_dev, "Droq[%u] Initialization Failed\n", oq_no);
-		rte_free(lio_dev->droq[oq_no]);
-		lio_dev->droq[oq_no] = NULL;
-		return -ENOMEM;
-	}
-
-	lio_dev->num_oqs++;
-
-	lio_dev_dbg(lio_dev, "Total number of OQ: %d\n", lio_dev->num_oqs);
-
-	/* Send credit for octeon output queues. credits are always
-	 * sent after the output queue is enabled.
-	 */
-	rte_write32(lio_dev->droq[oq_no]->nb_desc,
-		    lio_dev->droq[oq_no]->pkts_credit_reg);
-	rte_wmb();
-
-	return 0;
-}
-
-static inline uint32_t
-lio_droq_get_bufcount(uint32_t buf_size, uint32_t total_len)
-{
-	uint32_t buf_cnt = 0;
-
-	while (total_len > (buf_size * buf_cnt))
-		buf_cnt++;
-
-	return buf_cnt;
-}
-
-/* If we were not able to refill all buffers, try to move around
- * the buffers that were not dispatched.
- */
-static inline uint32_t
-lio_droq_refill_pullup_descs(struct lio_droq *droq,
-			     struct lio_droq_desc *desc_ring)
-{
-	uint32_t refill_index = droq->refill_idx;
-	uint32_t desc_refilled = 0;
-
-	while (refill_index != droq->read_idx) {
-		if (droq->recv_buf_list[refill_index].buffer) {
-			droq->recv_buf_list[droq->refill_idx].buffer =
-				droq->recv_buf_list[refill_index].buffer;
-			desc_ring[droq->refill_idx].buffer_ptr =
-				desc_ring[refill_index].buffer_ptr;
-			droq->recv_buf_list[refill_index].buffer = NULL;
-			desc_ring[refill_index].buffer_ptr = 0;
-			do {
-				droq->refill_idx = lio_incr_index(
-							droq->refill_idx, 1,
-							droq->nb_desc);
-				desc_refilled++;
-				droq->refill_count--;
-			} while (droq->recv_buf_list[droq->refill_idx].buffer);
-		}
-		refill_index = lio_incr_index(refill_index, 1,
-					      droq->nb_desc);
-	}	/* while */
-
-	return desc_refilled;
-}
-
-/* lio_droq_refill
- *
- * @param droq		- droq in which descriptors require new buffers.
- *
- * Description:
- *  Called during normal DROQ processing in interrupt mode or by the poll
- *  thread to refill the descriptors from which buffers were dispatched
- *  to upper layers. Attempts to allocate new buffers. If that fails, moves
- *  up buffers (that were not dispatched) to form a contiguous ring.
- *
- * Returns:
- *  No of descriptors refilled.
- *
- * Locks:
- * This routine is called with droq->lock held.
- */
-static uint32_t
-lio_droq_refill(struct lio_droq *droq)
-{
-	struct lio_droq_desc *desc_ring;
-	uint32_t desc_refilled = 0;
-	void *buf = NULL;
-
-	desc_ring = droq->desc_ring;
-
-	while (droq->refill_count && (desc_refilled < droq->nb_desc)) {
-		/* If a valid buffer exists (happens if there is no dispatch),
-		 * reuse the buffer, else allocate.
-		 */
-		if (droq->recv_buf_list[droq->refill_idx].buffer == NULL) {
-			buf = rte_pktmbuf_alloc(droq->mpool);
-			/* If a buffer could not be allocated, no point in
-			 * continuing
-			 */
-			if (buf == NULL) {
-				droq->stats.rx_alloc_failure++;
-				break;
-			}
-
-			droq->recv_buf_list[droq->refill_idx].buffer = buf;
-		}
-
-		desc_ring[droq->refill_idx].buffer_ptr =
-		    lio_map_ring(droq->recv_buf_list[droq->refill_idx].buffer);
-		/* Reset any previous values in the length field. */
-		droq->info_list[droq->refill_idx].length = 0;
-
-		droq->refill_idx = lio_incr_index(droq->refill_idx, 1,
-						  droq->nb_desc);
-		desc_refilled++;
-		droq->refill_count--;
-	}
-
-	if (droq->refill_count)
-		desc_refilled += lio_droq_refill_pullup_descs(droq, desc_ring);
-
-	/* if droq->refill_count
-	 * The refill count would not change in pass two. We only moved buffers
-	 * to close the gap in the ring, but we would still have the same no. of
-	 * buffers to refill.
-	 */
-	return desc_refilled;
-}
-
-static int
-lio_droq_fast_process_packet(struct lio_device *lio_dev,
-			     struct lio_droq *droq,
-			     struct rte_mbuf **rx_pkts)
-{
-	struct rte_mbuf *nicbuf = NULL;
-	struct lio_droq_info *info;
-	uint32_t total_len = 0;
-	int data_total_len = 0;
-	uint32_t pkt_len = 0;
-	union octeon_rh *rh;
-	int data_pkts = 0;
-
-	info = &droq->info_list[droq->read_idx];
-	lio_swap_8B_data((uint64_t *)info, 2);
-
-	if (!info->length)
-		return -1;
-
-	/* Len of resp hdr in included in the received data len. */
-	info->length -= OCTEON_RH_SIZE;
-	rh = &info->rh;
-
-	total_len += (uint32_t)info->length;
-
-	if (lio_opcode_slow_path(rh)) {
-		uint32_t buf_cnt;
-
-		buf_cnt = lio_droq_get_bufcount(droq->buffer_size,
-						(uint32_t)info->length);
-		droq->read_idx = lio_incr_index(droq->read_idx, buf_cnt,
-						droq->nb_desc);
-		droq->refill_count += buf_cnt;
-	} else {
-		if (info->length <= droq->buffer_size) {
-			if (rh->r_dh.has_hash)
-				pkt_len = (uint32_t)(info->length - 8);
-			else
-				pkt_len = (uint32_t)info->length;
-
-			nicbuf = droq->recv_buf_list[droq->read_idx].buffer;
-			droq->recv_buf_list[droq->read_idx].buffer = NULL;
-			droq->read_idx = lio_incr_index(
-						droq->read_idx, 1,
-						droq->nb_desc);
-			droq->refill_count++;
-
-			if (likely(nicbuf != NULL)) {
-				/* We don't have a way to pass flags yet */
-				nicbuf->ol_flags = 0;
-				if (rh->r_dh.has_hash) {
-					uint64_t *hash_ptr;
-
-					nicbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
-					hash_ptr = rte_pktmbuf_mtod(nicbuf,
-								    uint64_t *);
-					lio_swap_8B_data(hash_ptr, 1);
-					nicbuf->hash.rss = (uint32_t)*hash_ptr;
-					nicbuf->data_off += 8;
-				}
-
-				nicbuf->pkt_len = pkt_len;
-				nicbuf->data_len = pkt_len;
-				nicbuf->port = lio_dev->port_id;
-				/* Store the mbuf */
-				rx_pkts[data_pkts++] = nicbuf;
-				data_total_len += pkt_len;
-			}
-
-			/* Prefetch buffer pointers when on a cache line
-			 * boundary
-			 */
-			if ((droq->read_idx & 3) == 0) {
-				rte_prefetch0(
-				    &droq->recv_buf_list[droq->read_idx]);
-				rte_prefetch0(
-				    &droq->info_list[droq->read_idx]);
-			}
-		} else {
-			struct rte_mbuf *first_buf = NULL;
-			struct rte_mbuf *last_buf = NULL;
-
-			while (pkt_len < info->length) {
-				int cpy_len = 0;
-
-				cpy_len = ((pkt_len + droq->buffer_size) >
-						info->length)
-						? ((uint32_t)info->length -
-							pkt_len)
-						: droq->buffer_size;
-
-				nicbuf =
-				    droq->recv_buf_list[droq->read_idx].buffer;
-				droq->recv_buf_list[droq->read_idx].buffer =
-				    NULL;
-
-				if (likely(nicbuf != NULL)) {
-					/* Note the first seg */
-					if (!pkt_len)
-						first_buf = nicbuf;
-
-					nicbuf->port = lio_dev->port_id;
-					/* We don't have a way to pass
-					 * flags yet
-					 */
-					nicbuf->ol_flags = 0;
-					if ((!pkt_len) && (rh->r_dh.has_hash)) {
-						uint64_t *hash_ptr;
-
-						nicbuf->ol_flags |=
-						    RTE_MBUF_F_RX_RSS_HASH;
-						hash_ptr = rte_pktmbuf_mtod(
-						    nicbuf, uint64_t *);
-						lio_swap_8B_data(hash_ptr, 1);
-						nicbuf->hash.rss =
-						    (uint32_t)*hash_ptr;
-						nicbuf->data_off += 8;
-						nicbuf->pkt_len = cpy_len - 8;
-						nicbuf->data_len = cpy_len - 8;
-					} else {
-						nicbuf->pkt_len = cpy_len;
-						nicbuf->data_len = cpy_len;
-					}
-
-					if (pkt_len)
-						first_buf->nb_segs++;
-
-					if (last_buf)
-						last_buf->next = nicbuf;
-
-					last_buf = nicbuf;
-				} else {
-					PMD_RX_LOG(lio_dev, ERR, "no buf\n");
-				}
-
-				pkt_len += cpy_len;
-				droq->read_idx = lio_incr_index(
-							droq->read_idx,
-							1, droq->nb_desc);
-				droq->refill_count++;
-
-				/* Prefetch buffer pointers when on a
-				 * cache line boundary
-				 */
-				if ((droq->read_idx & 3) == 0) {
-					rte_prefetch0(&droq->recv_buf_list
-							      [droq->read_idx]);
-
-					rte_prefetch0(
-					    &droq->info_list[droq->read_idx]);
-				}
-			}
-			rx_pkts[data_pkts++] = first_buf;
-			if (rh->r_dh.has_hash)
-				data_total_len += (pkt_len - 8);
-			else
-				data_total_len += pkt_len;
-		}
-
-		/* Inform upper layer about packet checksum verification */
-		struct rte_mbuf *m = rx_pkts[data_pkts - 1];
-
-		if (rh->r_dh.csum_verified & LIO_IP_CSUM_VERIFIED)
-			m->ol_flags |= RTE_MBUF_F_RX_IP_CKSUM_GOOD;
-
-		if (rh->r_dh.csum_verified & LIO_L4_CSUM_VERIFIED)
-			m->ol_flags |= RTE_MBUF_F_RX_L4_CKSUM_GOOD;
-	}
-
-	if (droq->refill_count >= droq->refill_threshold) {
-		int desc_refilled = lio_droq_refill(droq);
-
-		/* Flush the droq descriptor data to memory to be sure
-		 * that when we update the credits the data in memory is
-		 * accurate.
-		 */
-		rte_wmb();
-		rte_write32(desc_refilled, droq->pkts_credit_reg);
-		/* make sure mmio write completes */
-		rte_wmb();
-	}
-
-	info->length = 0;
-	info->rh.rh64 = 0;
-
-	droq->stats.pkts_received++;
-	droq->stats.rx_pkts_received += data_pkts;
-	droq->stats.rx_bytes_received += data_total_len;
-	droq->stats.bytes_received += total_len;
-
-	return data_pkts;
-}
-
-static uint32_t
-lio_droq_fast_process_packets(struct lio_device *lio_dev,
-			      struct lio_droq *droq,
-			      struct rte_mbuf **rx_pkts,
-			      uint32_t pkts_to_process)
-{
-	int ret, data_pkts = 0;
-	uint32_t pkt;
-
-	for (pkt = 0; pkt < pkts_to_process; pkt++) {
-		ret = lio_droq_fast_process_packet(lio_dev, droq,
-						   &rx_pkts[data_pkts]);
-		if (ret < 0) {
-			lio_dev_err(lio_dev, "Port[%d] DROQ[%d] idx: %d len:0, pkt_cnt: %d\n",
-				    lio_dev->port_id, droq->q_no,
-				    droq->read_idx, pkts_to_process);
-			break;
-		}
-		data_pkts += ret;
-	}
-
-	rte_atomic64_sub(&droq->pkts_pending, pkt);
-
-	return data_pkts;
-}
-
-static inline uint32_t
-lio_droq_check_hw_for_pkts(struct lio_droq *droq)
-{
-	uint32_t last_count;
-	uint32_t pkt_count;
-
-	pkt_count = rte_read32(droq->pkts_sent_reg);
-
-	last_count = pkt_count - droq->pkt_count;
-	droq->pkt_count = pkt_count;
-
-	if (last_count)
-		rte_atomic64_add(&droq->pkts_pending, last_count);
-
-	return last_count;
-}
-
-uint16_t
-lio_dev_recv_pkts(void *rx_queue,
-		  struct rte_mbuf **rx_pkts,
-		  uint16_t budget)
-{
-	struct lio_droq *droq = rx_queue;
-	struct lio_device *lio_dev = droq->lio_dev;
-	uint32_t pkts_processed = 0;
-	uint32_t pkt_count = 0;
-
-	lio_droq_check_hw_for_pkts(droq);
-
-	pkt_count = rte_atomic64_read(&droq->pkts_pending);
-	if (!pkt_count)
-		return 0;
-
-	if (pkt_count > budget)
-		pkt_count = budget;
-
-	/* Grab the lock */
-	rte_spinlock_lock(&droq->lock);
-	pkts_processed = lio_droq_fast_process_packets(lio_dev,
-						       droq, rx_pkts,
-						       pkt_count);
-
-	if (droq->pkt_count) {
-		rte_write32(droq->pkt_count, droq->pkts_sent_reg);
-		droq->pkt_count = 0;
-	}
-
-	/* Release the spin lock */
-	rte_spinlock_unlock(&droq->lock);
-
-	return pkts_processed;
-}
-
-void
-lio_delete_droq_queue(struct lio_device *lio_dev,
-		      int oq_no)
-{
-	lio_delete_droq(lio_dev, oq_no);
-	lio_dev->num_oqs--;
-	rte_free(lio_dev->droq[oq_no]);
-	lio_dev->droq[oq_no] = NULL;
-}
-
-/**
- *  lio_init_instr_queue()
- *  @param lio_dev	- pointer to the lio device structure.
- *  @param txpciq	- queue to be initialized.
- *
- *  Called at driver init time for each input queue. iq_conf has the
- *  configuration parameters for the queue.
- *
- *  @return  Success: 0	Failure: -1
- */
-static int
-lio_init_instr_queue(struct lio_device *lio_dev,
-		     union octeon_txpciq txpciq,
-		     uint32_t num_descs, unsigned int socket_id)
-{
-	uint32_t iq_no = (uint32_t)txpciq.s.q_no;
-	struct lio_instr_queue *iq;
-	uint32_t instr_type;
-	uint32_t q_size;
-
-	instr_type = LIO_IQ_INSTR_TYPE(lio_dev);
-
-	q_size = instr_type * num_descs;
-	iq = lio_dev->instr_queue[iq_no];
-	iq->iq_mz = rte_eth_dma_zone_reserve(lio_dev->eth_dev,
-					     "instr_queue", iq_no, q_size,
-					     RTE_CACHE_LINE_SIZE,
-					     socket_id);
-	if (iq->iq_mz == NULL) {
-		lio_dev_err(lio_dev, "Cannot allocate memory for instr queue %d\n",
-			    iq_no);
-		return -1;
-	}
-
-	iq->base_addr_dma = iq->iq_mz->iova;
-	iq->base_addr = (uint8_t *)iq->iq_mz->addr;
-
-	iq->nb_desc = num_descs;
-
-	/* Initialize a list to holds requests that have been posted to Octeon
-	 * but has yet to be fetched by octeon
-	 */
-	iq->request_list = rte_zmalloc_socket("request_list",
-					      sizeof(*iq->request_list) *
-							num_descs,
-					      RTE_CACHE_LINE_SIZE,
-					      socket_id);
-	if (iq->request_list == NULL) {
-		lio_dev_err(lio_dev, "Alloc failed for IQ[%d] nr free list\n",
-			    iq_no);
-		lio_dma_zone_free(lio_dev, iq->iq_mz);
-		return -1;
-	}
-
-	lio_dev_dbg(lio_dev, "IQ[%d]: base: %p basedma: %lx count: %d\n",
-		    iq_no, iq->base_addr, (unsigned long)iq->base_addr_dma,
-		    iq->nb_desc);
-
-	iq->lio_dev = lio_dev;
-	iq->txpciq.txpciq64 = txpciq.txpciq64;
-	iq->fill_cnt = 0;
-	iq->host_write_index = 0;
-	iq->lio_read_index = 0;
-	iq->flush_index = 0;
-
-	rte_atomic64_set(&iq->instr_pending, 0);
-
-	/* Initialize the spinlock for this instruction queue */
-	rte_spinlock_init(&iq->lock);
-	rte_spinlock_init(&iq->post_lock);
-
-	rte_atomic64_clear(&iq->iq_flush_running);
-
-	lio_dev->io_qmask.iq |= (1ULL << iq_no);
-
-	/* Set the 32B/64B mode for each input queue */
-	lio_dev->io_qmask.iq64B |= ((instr_type == 64) << iq_no);
-	iq->iqcmd_64B = (instr_type == 64);
-
-	lio_dev->fn_list.setup_iq_regs(lio_dev, iq_no);
-
-	return 0;
-}
-
-int
-lio_setup_instr_queue0(struct lio_device *lio_dev)
-{
-	union octeon_txpciq txpciq;
-	uint32_t num_descs = 0;
-	uint32_t iq_no = 0;
-
-	num_descs = LIO_NUM_DEF_TX_DESCS_CFG(lio_dev);
-
-	lio_dev->num_iqs = 0;
-
-	lio_dev->instr_queue[0] = rte_zmalloc(NULL,
-					sizeof(struct lio_instr_queue), 0);
-	if (lio_dev->instr_queue[0] == NULL)
-		return -ENOMEM;
-
-	lio_dev->instr_queue[0]->q_index = 0;
-	lio_dev->instr_queue[0]->app_ctx = (void *)(size_t)0;
-	txpciq.txpciq64 = 0;
-	txpciq.s.q_no = iq_no;
-	txpciq.s.pkind = lio_dev->pfvf_hsword.pkind;
-	txpciq.s.use_qpg = 0;
-	txpciq.s.qpg = 0;
-	if (lio_init_instr_queue(lio_dev, txpciq, num_descs, SOCKET_ID_ANY)) {
-		rte_free(lio_dev->instr_queue[0]);
-		lio_dev->instr_queue[0] = NULL;
-		return -1;
-	}
-
-	lio_dev->num_iqs++;
-
-	return 0;
-}
-
-/**
- *  lio_delete_instr_queue()
- *  @param lio_dev	- pointer to the lio device structure.
- *  @param iq_no	- queue to be deleted.
- *
- *  Called at driver unload time for each input queue. Deletes all
- *  allocated resources for the input queue.
- */
-static void
-lio_delete_instr_queue(struct lio_device *lio_dev, uint32_t iq_no)
-{
-	struct lio_instr_queue *iq = lio_dev->instr_queue[iq_no];
-
-	rte_free(iq->request_list);
-	iq->request_list = NULL;
-	lio_dma_zone_free(lio_dev, iq->iq_mz);
-}
-
-void
-lio_free_instr_queue0(struct lio_device *lio_dev)
-{
-	lio_delete_instr_queue(lio_dev, 0);
-	rte_free(lio_dev->instr_queue[0]);
-	lio_dev->instr_queue[0] = NULL;
-	lio_dev->num_iqs--;
-}
-
-/* Return 0 on success, -1 on failure */
-int
-lio_setup_iq(struct lio_device *lio_dev, int q_index,
-	     union octeon_txpciq txpciq, uint32_t num_descs, void *app_ctx,
-	     unsigned int socket_id)
-{
-	uint32_t iq_no = (uint32_t)txpciq.s.q_no;
-
-	lio_dev->instr_queue[iq_no] = rte_zmalloc_socket("ethdev TX queue",
-						sizeof(struct lio_instr_queue),
-						RTE_CACHE_LINE_SIZE, socket_id);
-	if (lio_dev->instr_queue[iq_no] == NULL)
-		return -1;
-
-	lio_dev->instr_queue[iq_no]->q_index = q_index;
-	lio_dev->instr_queue[iq_no]->app_ctx = app_ctx;
-
-	if (lio_init_instr_queue(lio_dev, txpciq, num_descs, socket_id)) {
-		rte_free(lio_dev->instr_queue[iq_no]);
-		lio_dev->instr_queue[iq_no] = NULL;
-		return -1;
-	}
-
-	lio_dev->num_iqs++;
-
-	return 0;
-}
-
-int
-lio_wait_for_instr_fetch(struct lio_device *lio_dev)
-{
-	int pending, instr_cnt;
-	int i, retry = 1000;
-
-	do {
-		instr_cnt = 0;
-
-		for (i = 0; i < LIO_MAX_INSTR_QUEUES(lio_dev); i++) {
-			if (!(lio_dev->io_qmask.iq & (1ULL << i)))
-				continue;
-
-			if (lio_dev->instr_queue[i] == NULL)
-				break;
-
-			pending = rte_atomic64_read(
-			    &lio_dev->instr_queue[i]->instr_pending);
-			if (pending)
-				lio_flush_iq(lio_dev, lio_dev->instr_queue[i]);
-
-			instr_cnt += pending;
-		}
-
-		if (instr_cnt == 0)
-			break;
-
-		rte_delay_ms(1);
-
-	} while (retry-- && instr_cnt);
-
-	return instr_cnt;
-}
-
-static inline void
-lio_ring_doorbell(struct lio_device *lio_dev,
-		  struct lio_instr_queue *iq)
-{
-	if (rte_atomic64_read(&lio_dev->status) == LIO_DEV_RUNNING) {
-		rte_write32(iq->fill_cnt, iq->doorbell_reg);
-		/* make sure doorbell write goes through */
-		rte_wmb();
-		iq->fill_cnt = 0;
-	}
-}
-
-static inline void
-copy_cmd_into_iq(struct lio_instr_queue *iq, uint8_t *cmd)
-{
-	uint8_t *iqptr, cmdsize;
-
-	cmdsize = ((iq->iqcmd_64B) ? 64 : 32);
-	iqptr = iq->base_addr + (cmdsize * iq->host_write_index);
-
-	rte_memcpy(iqptr, cmd, cmdsize);
-}
-
-static inline struct lio_iq_post_status
-post_command2(struct lio_instr_queue *iq, uint8_t *cmd)
-{
-	struct lio_iq_post_status st;
-
-	st.status = LIO_IQ_SEND_OK;
-
-	/* This ensures that the read index does not wrap around to the same
-	 * position if queue gets full before Octeon could fetch any instr.
-	 */
-	if (rte_atomic64_read(&iq->instr_pending) >=
-			(int32_t)(iq->nb_desc - 1)) {
-		st.status = LIO_IQ_SEND_FAILED;
-		st.index = -1;
-		return st;
-	}
-
-	if (rte_atomic64_read(&iq->instr_pending) >=
-			(int32_t)(iq->nb_desc - 2))
-		st.status = LIO_IQ_SEND_STOP;
-
-	copy_cmd_into_iq(iq, cmd);
-
-	/* "index" is returned, host_write_index is modified. */
-	st.index = iq->host_write_index;
-	iq->host_write_index = lio_incr_index(iq->host_write_index, 1,
-					      iq->nb_desc);
-	iq->fill_cnt++;
-
-	/* Flush the command into memory. We need to be sure the data is in
-	 * memory before indicating that the instruction is pending.
-	 */
-	rte_wmb();
-
-	rte_atomic64_inc(&iq->instr_pending);
-
-	return st;
-}
-
-static inline void
-lio_add_to_request_list(struct lio_instr_queue *iq,
-			int idx, void *buf, int reqtype)
-{
-	iq->request_list[idx].buf = buf;
-	iq->request_list[idx].reqtype = reqtype;
-}
-
-static inline void
-lio_free_netsgbuf(void *buf)
-{
-	struct lio_buf_free_info *finfo = buf;
-	struct lio_device *lio_dev = finfo->lio_dev;
-	struct rte_mbuf *m = finfo->mbuf;
-	struct lio_gather *g = finfo->g;
-	uint8_t iq = finfo->iq_no;
-
-	/* This will take care of multiple segments also */
-	rte_pktmbuf_free(m);
-
-	rte_spinlock_lock(&lio_dev->glist_lock[iq]);
-	STAILQ_INSERT_TAIL(&lio_dev->glist_head[iq], &g->list, entries);
-	rte_spinlock_unlock(&lio_dev->glist_lock[iq]);
-	rte_free(finfo);
-}
-
-/* Can only run in process context */
-static int
-lio_process_iq_request_list(struct lio_device *lio_dev,
-			    struct lio_instr_queue *iq)
-{
-	struct octeon_instr_irh *irh = NULL;
-	uint32_t old = iq->flush_index;
-	struct lio_soft_command *sc;
-	uint32_t inst_count = 0;
-	int reqtype;
-	void *buf;
-
-	while (old != iq->lio_read_index) {
-		reqtype = iq->request_list[old].reqtype;
-		buf     = iq->request_list[old].buf;
-
-		if (reqtype == LIO_REQTYPE_NONE)
-			goto skip_this;
-
-		switch (reqtype) {
-		case LIO_REQTYPE_NORESP_NET:
-			rte_pktmbuf_free((struct rte_mbuf *)buf);
-			break;
-		case LIO_REQTYPE_NORESP_NET_SG:
-			lio_free_netsgbuf(buf);
-			break;
-		case LIO_REQTYPE_SOFT_COMMAND:
-			sc = buf;
-			irh = (struct octeon_instr_irh *)&sc->cmd.cmd3.irh;
-			if (irh->rflag) {
-				/* We're expecting a response from Octeon.
-				 * It's up to lio_process_ordered_list() to
-				 * process sc. Add sc to the ordered soft
-				 * command response list because we expect
-				 * a response from Octeon.
-				 */
-				rte_spinlock_lock(&lio_dev->response_list.lock);
-				rte_atomic64_inc(
-				    &lio_dev->response_list.pending_req_count);
-				STAILQ_INSERT_TAIL(
-					&lio_dev->response_list.head,
-					&sc->node, entries);
-				rte_spinlock_unlock(
-						&lio_dev->response_list.lock);
-			} else {
-				if (sc->callback) {
-					/* This callback must not sleep */
-					sc->callback(LIO_REQUEST_DONE,
-						     sc->callback_arg);
-				}
-			}
-			break;
-		default:
-			lio_dev_err(lio_dev,
-				    "Unknown reqtype: %d buf: %p at idx %d\n",
-				    reqtype, buf, old);
-		}
-
-		iq->request_list[old].buf = NULL;
-		iq->request_list[old].reqtype = 0;
-
-skip_this:
-		inst_count++;
-		old = lio_incr_index(old, 1, iq->nb_desc);
-	}
-
-	iq->flush_index = old;
-
-	return inst_count;
-}
-
-static void
-lio_update_read_index(struct lio_instr_queue *iq)
-{
-	uint32_t pkt_in_done = rte_read32(iq->inst_cnt_reg);
-	uint32_t last_done;
-
-	last_done = pkt_in_done - iq->pkt_in_done;
-	iq->pkt_in_done = pkt_in_done;
-
-	/* Add last_done and modulo with the IQ size to get new index */
-	iq->lio_read_index = (iq->lio_read_index +
-			(uint32_t)(last_done & LIO_PKT_IN_DONE_CNT_MASK)) %
-			iq->nb_desc;
-}
-
-int
-lio_flush_iq(struct lio_device *lio_dev, struct lio_instr_queue *iq)
-{
-	uint32_t inst_processed = 0;
-	int tx_done = 1;
-
-	if (rte_atomic64_test_and_set(&iq->iq_flush_running) == 0)
-		return tx_done;
-
-	rte_spinlock_lock(&iq->lock);
-
-	lio_update_read_index(iq);
-
-	do {
-		/* Process any outstanding IQ packets. */
-		if (iq->flush_index == iq->lio_read_index)
-			break;
-
-		inst_processed = lio_process_iq_request_list(lio_dev, iq);
-
-		if (inst_processed) {
-			rte_atomic64_sub(&iq->instr_pending, inst_processed);
-			iq->stats.instr_processed += inst_processed;
-		}
-
-		inst_processed = 0;
-
-	} while (1);
-
-	rte_spinlock_unlock(&iq->lock);
-
-	rte_atomic64_clear(&iq->iq_flush_running);
-
-	return tx_done;
-}
-
-static int
-lio_send_command(struct lio_device *lio_dev, uint32_t iq_no, void *cmd,
-		 void *buf, uint32_t datasize, uint32_t reqtype)
-{
-	struct lio_instr_queue *iq = lio_dev->instr_queue[iq_no];
-	struct lio_iq_post_status st;
-
-	rte_spinlock_lock(&iq->post_lock);
-
-	st = post_command2(iq, cmd);
-
-	if (st.status != LIO_IQ_SEND_FAILED) {
-		lio_add_to_request_list(iq, st.index, buf, reqtype);
-		LIO_INCR_INSTRQUEUE_PKT_COUNT(lio_dev, iq_no, bytes_sent,
-					      datasize);
-		LIO_INCR_INSTRQUEUE_PKT_COUNT(lio_dev, iq_no, instr_posted, 1);
-
-		lio_ring_doorbell(lio_dev, iq);
-	} else {
-		LIO_INCR_INSTRQUEUE_PKT_COUNT(lio_dev, iq_no, instr_dropped, 1);
-	}
-
-	rte_spinlock_unlock(&iq->post_lock);
-
-	return st.status;
-}
-
-void
-lio_prepare_soft_command(struct lio_device *lio_dev,
-			 struct lio_soft_command *sc, uint8_t opcode,
-			 uint8_t subcode, uint32_t irh_ossp, uint64_t ossp0,
-			 uint64_t ossp1)
-{
-	struct octeon_instr_pki_ih3 *pki_ih3;
-	struct octeon_instr_ih3 *ih3;
-	struct octeon_instr_irh *irh;
-	struct octeon_instr_rdp *rdp;
-
-	RTE_ASSERT(opcode <= 15);
-	RTE_ASSERT(subcode <= 127);
-
-	ih3	  = (struct octeon_instr_ih3 *)&sc->cmd.cmd3.ih3;
-
-	ih3->pkind = lio_dev->instr_queue[sc->iq_no]->txpciq.s.pkind;
-
-	pki_ih3 = (struct octeon_instr_pki_ih3 *)&sc->cmd.cmd3.pki_ih3;
-
-	pki_ih3->w	= 1;
-	pki_ih3->raw	= 1;
-	pki_ih3->utag	= 1;
-	pki_ih3->uqpg	= lio_dev->instr_queue[sc->iq_no]->txpciq.s.use_qpg;
-	pki_ih3->utt	= 1;
-
-	pki_ih3->tag	= LIO_CONTROL;
-	pki_ih3->tagtype = OCTEON_ATOMIC_TAG;
-	pki_ih3->qpg	= lio_dev->instr_queue[sc->iq_no]->txpciq.s.qpg;
-	pki_ih3->pm	= 0x7;
-	pki_ih3->sl	= 8;
-
-	if (sc->datasize)
-		ih3->dlengsz = sc->datasize;
-
-	irh		= (struct octeon_instr_irh *)&sc->cmd.cmd3.irh;
-	irh->opcode	= opcode;
-	irh->subcode	= subcode;
-
-	/* opcode/subcode specific parameters (ossp) */
-	irh->ossp = irh_ossp;
-	sc->cmd.cmd3.ossp[0] = ossp0;
-	sc->cmd.cmd3.ossp[1] = ossp1;
-
-	if (sc->rdatasize) {
-		rdp = (struct octeon_instr_rdp *)&sc->cmd.cmd3.rdp;
-		rdp->pcie_port = lio_dev->pcie_port;
-		rdp->rlen      = sc->rdatasize;
-		irh->rflag = 1;
-		/* PKI IH3 */
-		ih3->fsz    = OCTEON_SOFT_CMD_RESP_IH3;
-	} else {
-		irh->rflag = 0;
-		/* PKI IH3 */
-		ih3->fsz    = OCTEON_PCI_CMD_O3;
-	}
-}
-
-int
-lio_send_soft_command(struct lio_device *lio_dev,
-		      struct lio_soft_command *sc)
-{
-	struct octeon_instr_ih3 *ih3;
-	struct octeon_instr_irh *irh;
-	uint32_t len = 0;
-
-	ih3 = (struct octeon_instr_ih3 *)&sc->cmd.cmd3.ih3;
-	if (ih3->dlengsz) {
-		RTE_ASSERT(sc->dmadptr);
-		sc->cmd.cmd3.dptr = sc->dmadptr;
-	}
-
-	irh = (struct octeon_instr_irh *)&sc->cmd.cmd3.irh;
-	if (irh->rflag) {
-		RTE_ASSERT(sc->dmarptr);
-		RTE_ASSERT(sc->status_word != NULL);
-		*sc->status_word = LIO_COMPLETION_WORD_INIT;
-		sc->cmd.cmd3.rptr = sc->dmarptr;
-	}
-
-	len = (uint32_t)ih3->dlengsz;
-
-	if (sc->wait_time)
-		sc->timeout = lio_uptime + sc->wait_time;
-
-	return lio_send_command(lio_dev, sc->iq_no, &sc->cmd, sc, len,
-				LIO_REQTYPE_SOFT_COMMAND);
-}
-
-int
-lio_setup_sc_buffer_pool(struct lio_device *lio_dev)
-{
-	char sc_pool_name[RTE_MEMPOOL_NAMESIZE];
-	uint16_t buf_size;
-
-	buf_size = LIO_SOFT_COMMAND_BUFFER_SIZE + RTE_PKTMBUF_HEADROOM;
-	snprintf(sc_pool_name, sizeof(sc_pool_name),
-		 "lio_sc_pool_%u", lio_dev->port_id);
-	lio_dev->sc_buf_pool = rte_pktmbuf_pool_create(sc_pool_name,
-						LIO_MAX_SOFT_COMMAND_BUFFERS,
-						0, 0, buf_size, SOCKET_ID_ANY);
-	return 0;
-}
-
-void
-lio_free_sc_buffer_pool(struct lio_device *lio_dev)
-{
-	rte_mempool_free(lio_dev->sc_buf_pool);
-}
-
-struct lio_soft_command *
-lio_alloc_soft_command(struct lio_device *lio_dev, uint32_t datasize,
-		       uint32_t rdatasize, uint32_t ctxsize)
-{
-	uint32_t offset = sizeof(struct lio_soft_command);
-	struct lio_soft_command *sc;
-	struct rte_mbuf *m;
-	uint64_t dma_addr;
-
-	RTE_ASSERT((offset + datasize + rdatasize + ctxsize) <=
-		   LIO_SOFT_COMMAND_BUFFER_SIZE);
-
-	m = rte_pktmbuf_alloc(lio_dev->sc_buf_pool);
-	if (m == NULL) {
-		lio_dev_err(lio_dev, "Cannot allocate mbuf for sc\n");
-		return NULL;
-	}
-
-	/* set rte_mbuf data size and there is only 1 segment */
-	m->pkt_len = LIO_SOFT_COMMAND_BUFFER_SIZE;
-	m->data_len = LIO_SOFT_COMMAND_BUFFER_SIZE;
-
-	/* use rte_mbuf buffer for soft command */
-	sc = rte_pktmbuf_mtod(m, struct lio_soft_command *);
-	memset(sc, 0, LIO_SOFT_COMMAND_BUFFER_SIZE);
-	sc->size = LIO_SOFT_COMMAND_BUFFER_SIZE;
-	sc->dma_addr = rte_mbuf_data_iova(m);
-	sc->mbuf = m;
-
-	dma_addr = sc->dma_addr;
-
-	if (ctxsize) {
-		sc->ctxptr = (uint8_t *)sc + offset;
-		sc->ctxsize = ctxsize;
-	}
-
-	/* Start data at 128 byte boundary */
-	offset = (offset + ctxsize + 127) & 0xffffff80;
-
-	if (datasize) {
-		sc->virtdptr = (uint8_t *)sc + offset;
-		sc->dmadptr = dma_addr + offset;
-		sc->datasize = datasize;
-	}
-
-	/* Start rdata at 128 byte boundary */
-	offset = (offset + datasize + 127) & 0xffffff80;
-
-	if (rdatasize) {
-		RTE_ASSERT(rdatasize >= 16);
-		sc->virtrptr = (uint8_t *)sc + offset;
-		sc->dmarptr = dma_addr + offset;
-		sc->rdatasize = rdatasize;
-		sc->status_word = (uint64_t *)((uint8_t *)(sc->virtrptr) +
-					       rdatasize - 8);
-	}
-
-	return sc;
-}
-
-void
-lio_free_soft_command(struct lio_soft_command *sc)
-{
-	rte_pktmbuf_free(sc->mbuf);
-}
-
-void
-lio_setup_response_list(struct lio_device *lio_dev)
-{
-	STAILQ_INIT(&lio_dev->response_list.head);
-	rte_spinlock_init(&lio_dev->response_list.lock);
-	rte_atomic64_set(&lio_dev->response_list.pending_req_count, 0);
-}
-
-int
-lio_process_ordered_list(struct lio_device *lio_dev)
-{
-	int resp_to_process = LIO_MAX_ORD_REQS_TO_PROCESS;
-	struct lio_response_list *ordered_sc_list;
-	struct lio_soft_command *sc;
-	int request_complete = 0;
-	uint64_t status64;
-	uint32_t status;
-
-	ordered_sc_list = &lio_dev->response_list;
-
-	do {
-		rte_spinlock_lock(&ordered_sc_list->lock);
-
-		if (STAILQ_EMPTY(&ordered_sc_list->head)) {
-			/* ordered_sc_list is empty; there is
-			 * nothing to process
-			 */
-			rte_spinlock_unlock(&ordered_sc_list->lock);
-			return -1;
-		}
-
-		sc = LIO_STQUEUE_FIRST_ENTRY(&ordered_sc_list->head,
-					     struct lio_soft_command, node);
-
-		status = LIO_REQUEST_PENDING;
-
-		/* check if octeon has finished DMA'ing a response
-		 * to where rptr is pointing to
-		 */
-		status64 = *sc->status_word;
-
-		if (status64 != LIO_COMPLETION_WORD_INIT) {
-			/* This logic ensures that all 64b have been written.
-			 * 1. check byte 0 for non-FF
-			 * 2. if non-FF, then swap result from BE to host order
-			 * 3. check byte 7 (swapped to 0) for non-FF
-			 * 4. if non-FF, use the low 32-bit status code
-			 * 5. if either byte 0 or byte 7 is FF, don't use status
-			 */
-			if ((status64 & 0xff) != 0xff) {
-				lio_swap_8B_data(&status64, 1);
-				if (((status64 & 0xff) != 0xff)) {
-					/* retrieve 16-bit firmware status */
-					status = (uint32_t)(status64 &
-							    0xffffULL);
-					if (status) {
-						status =
-						LIO_FIRMWARE_STATUS_CODE(
-									status);
-					} else {
-						/* i.e. no error */
-						status = LIO_REQUEST_DONE;
-					}
-				}
-			}
-		} else if ((sc->timeout && lio_check_timeout(lio_uptime,
-							     sc->timeout))) {
-			lio_dev_err(lio_dev,
-				    "cmd failed, timeout (%ld, %ld)\n",
-				    (long)lio_uptime, (long)sc->timeout);
-			status = LIO_REQUEST_TIMEOUT;
-		}
-
-		if (status != LIO_REQUEST_PENDING) {
-			/* we have received a response or we have timed out.
-			 * remove node from linked list
-			 */
-			STAILQ_REMOVE(&ordered_sc_list->head,
-				      &sc->node, lio_stailq_node, entries);
-			rte_atomic64_dec(
-			    &lio_dev->response_list.pending_req_count);
-			rte_spinlock_unlock(&ordered_sc_list->lock);
-
-			if (sc->callback)
-				sc->callback(status, sc->callback_arg);
-
-			request_complete++;
-		} else {
-			/* no response yet */
-			request_complete = 0;
-			rte_spinlock_unlock(&ordered_sc_list->lock);
-		}
-
-		/* If we hit the Max Ordered requests to process every loop,
-		 * we quit and let this function be invoked the next time
-		 * the poll thread runs to process the remaining requests.
-		 * This function can take up the entire CPU if there is
-		 * no upper limit to the requests processed.
-		 */
-		if (request_complete >= resp_to_process)
-			break;
-	} while (request_complete);
-
-	return 0;
-}
-
-static inline struct lio_stailq_node *
-list_delete_first_node(struct lio_stailq_head *head)
-{
-	struct lio_stailq_node *node;
-
-	if (STAILQ_EMPTY(head))
-		node = NULL;
-	else
-		node = STAILQ_FIRST(head);
-
-	if (node)
-		STAILQ_REMOVE(head, node, lio_stailq_node, entries);
-
-	return node;
-}
-
-void
-lio_delete_sglist(struct lio_instr_queue *txq)
-{
-	struct lio_device *lio_dev = txq->lio_dev;
-	int iq_no = txq->q_index;
-	struct lio_gather *g;
-
-	if (lio_dev->glist_head == NULL)
-		return;
-
-	do {
-		g = (struct lio_gather *)list_delete_first_node(
-						&lio_dev->glist_head[iq_no]);
-		if (g) {
-			if (g->sg)
-				rte_free(
-				    (void *)((unsigned long)g->sg - g->adjust));
-			rte_free(g);
-		}
-	} while (g);
-}
-
-/**
- * \brief Setup gather lists
- * @param lio per-network private data
- */
-int
-lio_setup_sglists(struct lio_device *lio_dev, int iq_no,
-		  int fw_mapped_iq, int num_descs, unsigned int socket_id)
-{
-	struct lio_gather *g;
-	int i;
-
-	rte_spinlock_init(&lio_dev->glist_lock[iq_no]);
-
-	STAILQ_INIT(&lio_dev->glist_head[iq_no]);
-
-	for (i = 0; i < num_descs; i++) {
-		g = rte_zmalloc_socket(NULL, sizeof(*g), RTE_CACHE_LINE_SIZE,
-				       socket_id);
-		if (g == NULL) {
-			lio_dev_err(lio_dev,
-				    "lio_gather memory allocation failed for qno %d\n",
-				    iq_no);
-			break;
-		}
-
-		g->sg_size =
-		    ((ROUNDUP4(LIO_MAX_SG) >> 2) * LIO_SG_ENTRY_SIZE);
-
-		g->sg = rte_zmalloc_socket(NULL, g->sg_size + 8,
-					   RTE_CACHE_LINE_SIZE, socket_id);
-		if (g->sg == NULL) {
-			lio_dev_err(lio_dev,
-				    "sg list memory allocation failed for qno %d\n",
-				    iq_no);
-			rte_free(g);
-			break;
-		}
-
-		/* The gather component should be aligned on 64-bit boundary */
-		if (((unsigned long)g->sg) & 7) {
-			g->adjust = 8 - (((unsigned long)g->sg) & 7);
-			g->sg =
-			    (struct lio_sg_entry *)((unsigned long)g->sg +
-						       g->adjust);
-		}
-
-		STAILQ_INSERT_TAIL(&lio_dev->glist_head[iq_no], &g->list,
-				   entries);
-	}
-
-	if (i != num_descs) {
-		lio_delete_sglist(lio_dev->instr_queue[fw_mapped_iq]);
-		return -ENOMEM;
-	}
-
-	return 0;
-}
-
-void
-lio_delete_instruction_queue(struct lio_device *lio_dev, int iq_no)
-{
-	lio_delete_instr_queue(lio_dev, iq_no);
-	rte_free(lio_dev->instr_queue[iq_no]);
-	lio_dev->instr_queue[iq_no] = NULL;
-	lio_dev->num_iqs--;
-}
-
-static inline uint32_t
-lio_iq_get_available(struct lio_device *lio_dev, uint32_t q_no)
-{
-	return ((lio_dev->instr_queue[q_no]->nb_desc - 1) -
-		(uint32_t)rte_atomic64_read(
-				&lio_dev->instr_queue[q_no]->instr_pending));
-}
-
-static inline int
-lio_iq_is_full(struct lio_device *lio_dev, uint32_t q_no)
-{
-	return ((uint32_t)rte_atomic64_read(
-				&lio_dev->instr_queue[q_no]->instr_pending) >=
-				(lio_dev->instr_queue[q_no]->nb_desc - 2));
-}
-
-static int
-lio_dev_cleanup_iq(struct lio_device *lio_dev, int iq_no)
-{
-	struct lio_instr_queue *iq = lio_dev->instr_queue[iq_no];
-	uint32_t count = 10000;
-
-	while ((lio_iq_get_available(lio_dev, iq_no) < LIO_FLUSH_WM(iq)) &&
-			--count)
-		lio_flush_iq(lio_dev, iq);
-
-	return count ? 0 : 1;
-}
-
-static void
-lio_ctrl_cmd_callback(uint32_t status __rte_unused, void *sc_ptr)
-{
-	struct lio_soft_command *sc = sc_ptr;
-	struct lio_dev_ctrl_cmd *ctrl_cmd;
-	struct lio_ctrl_pkt *ctrl_pkt;
-
-	ctrl_pkt = (struct lio_ctrl_pkt *)sc->ctxptr;
-	ctrl_cmd = ctrl_pkt->ctrl_cmd;
-	ctrl_cmd->cond = 1;
-
-	lio_free_soft_command(sc);
-}
-
-static inline struct lio_soft_command *
-lio_alloc_ctrl_pkt_sc(struct lio_device *lio_dev,
-		      struct lio_ctrl_pkt *ctrl_pkt)
-{
-	struct lio_soft_command *sc = NULL;
-	uint32_t uddsize, datasize;
-	uint32_t rdatasize;
-	uint8_t *data;
-
-	uddsize = (uint32_t)(ctrl_pkt->ncmd.s.more * 8);
-
-	datasize = OCTEON_CMD_SIZE + uddsize;
-	rdatasize = (ctrl_pkt->wait_time) ? 16 : 0;
-
-	sc = lio_alloc_soft_command(lio_dev, datasize,
-				    rdatasize, sizeof(struct lio_ctrl_pkt));
-	if (sc == NULL)
-		return NULL;
-
-	rte_memcpy(sc->ctxptr, ctrl_pkt, sizeof(struct lio_ctrl_pkt));
-
-	data = (uint8_t *)sc->virtdptr;
-
-	rte_memcpy(data, &ctrl_pkt->ncmd, OCTEON_CMD_SIZE);
-
-	lio_swap_8B_data((uint64_t *)data, OCTEON_CMD_SIZE >> 3);
-
-	if (uddsize) {
-		/* Endian-Swap for UDD should have been done by caller. */
-		rte_memcpy(data + OCTEON_CMD_SIZE, ctrl_pkt->udd, uddsize);
-	}
-
-	sc->iq_no = (uint32_t)ctrl_pkt->iq_no;
-
-	lio_prepare_soft_command(lio_dev, sc,
-				 LIO_OPCODE, LIO_OPCODE_CMD,
-				 0, 0, 0);
-
-	sc->callback = lio_ctrl_cmd_callback;
-	sc->callback_arg = sc;
-	sc->wait_time = ctrl_pkt->wait_time;
-
-	return sc;
-}
-
-int
-lio_send_ctrl_pkt(struct lio_device *lio_dev, struct lio_ctrl_pkt *ctrl_pkt)
-{
-	struct lio_soft_command *sc = NULL;
-	int retval;
-
-	sc = lio_alloc_ctrl_pkt_sc(lio_dev, ctrl_pkt);
-	if (sc == NULL) {
-		lio_dev_err(lio_dev, "soft command allocation failed\n");
-		return -1;
-	}
-
-	retval = lio_send_soft_command(lio_dev, sc);
-	if (retval == LIO_IQ_SEND_FAILED) {
-		lio_free_soft_command(sc);
-		lio_dev_err(lio_dev, "Port: %d soft command: %d send failed status: %x\n",
-			    lio_dev->port_id, ctrl_pkt->ncmd.s.cmd, retval);
-		return -1;
-	}
-
-	return retval;
-}
-
-/** Send data packet to the device
- *  @param lio_dev - lio device pointer
- *  @param ndata   - control structure with queueing, and buffer information
- *
- *  @returns IQ_FAILED if it failed to add to the input queue. IQ_STOP if it the
- *  queue should be stopped, and LIO_IQ_SEND_OK if it sent okay.
- */
-static inline int
-lio_send_data_pkt(struct lio_device *lio_dev, struct lio_data_pkt *ndata)
-{
-	return lio_send_command(lio_dev, ndata->q_no, &ndata->cmd,
-				ndata->buf, ndata->datasize, ndata->reqtype);
-}
-
-uint16_t
-lio_dev_xmit_pkts(void *tx_queue, struct rte_mbuf **pkts, uint16_t nb_pkts)
-{
-	struct lio_instr_queue *txq = tx_queue;
-	union lio_cmd_setup cmdsetup;
-	struct lio_device *lio_dev;
-	struct lio_iq_stats *stats;
-	struct lio_data_pkt ndata;
-	int i, processed = 0;
-	struct rte_mbuf *m;
-	uint32_t tag = 0;
-	int status = 0;
-	int iq_no;
-
-	lio_dev = txq->lio_dev;
-	iq_no = txq->txpciq.s.q_no;
-	stats = &lio_dev->instr_queue[iq_no]->stats;
-
-	if (!lio_dev->intf_open || !lio_dev->linfo.link.s.link_up) {
-		PMD_TX_LOG(lio_dev, ERR, "Transmit failed link_status : %d\n",
-			   lio_dev->linfo.link.s.link_up);
-		goto xmit_failed;
-	}
-
-	lio_dev_cleanup_iq(lio_dev, iq_no);
-
-	for (i = 0; i < nb_pkts; i++) {
-		uint32_t pkt_len = 0;
-
-		m = pkts[i];
-
-		/* Prepare the attributes for the data to be passed to BASE. */
-		memset(&ndata, 0, sizeof(struct lio_data_pkt));
-
-		ndata.buf = m;
-
-		ndata.q_no = iq_no;
-		if (lio_iq_is_full(lio_dev, ndata.q_no)) {
-			stats->tx_iq_busy++;
-			if (lio_dev_cleanup_iq(lio_dev, iq_no)) {
-				PMD_TX_LOG(lio_dev, ERR,
-					   "Transmit failed iq:%d full\n",
-					   ndata.q_no);
-				break;
-			}
-		}
-
-		cmdsetup.cmd_setup64 = 0;
-		cmdsetup.s.iq_no = iq_no;
-
-		/* check checksum offload flags to form cmd */
-		if (m->ol_flags & RTE_MBUF_F_TX_IP_CKSUM)
-			cmdsetup.s.ip_csum = 1;
-
-		if (m->ol_flags & RTE_MBUF_F_TX_OUTER_IP_CKSUM)
-			cmdsetup.s.tnl_csum = 1;
-		else if ((m->ol_flags & RTE_MBUF_F_TX_TCP_CKSUM) ||
-				(m->ol_flags & RTE_MBUF_F_TX_UDP_CKSUM))
-			cmdsetup.s.transport_csum = 1;
-
-		if (m->nb_segs == 1) {
-			pkt_len = rte_pktmbuf_data_len(m);
-			cmdsetup.s.u.datasize = pkt_len;
-			lio_prepare_pci_cmd(lio_dev, &ndata.cmd,
-					    &cmdsetup, tag);
-			ndata.cmd.cmd3.dptr = rte_mbuf_data_iova(m);
-			ndata.reqtype = LIO_REQTYPE_NORESP_NET;
-		} else {
-			struct lio_buf_free_info *finfo;
-			struct lio_gather *g;
-			rte_iova_t phyaddr;
-			int i, frags;
-
-			finfo = (struct lio_buf_free_info *)rte_malloc(NULL,
-							sizeof(*finfo), 0);
-			if (finfo == NULL) {
-				PMD_TX_LOG(lio_dev, ERR,
-					   "free buffer alloc failed\n");
-				goto xmit_failed;
-			}
-
-			rte_spinlock_lock(&lio_dev->glist_lock[iq_no]);
-			g = (struct lio_gather *)list_delete_first_node(
-						&lio_dev->glist_head[iq_no]);
-			rte_spinlock_unlock(&lio_dev->glist_lock[iq_no]);
-			if (g == NULL) {
-				PMD_TX_LOG(lio_dev, ERR,
-					   "Transmit scatter gather: glist null!\n");
-				goto xmit_failed;
-			}
-
-			cmdsetup.s.gather = 1;
-			cmdsetup.s.u.gatherptrs = m->nb_segs;
-			lio_prepare_pci_cmd(lio_dev, &ndata.cmd,
-					    &cmdsetup, tag);
-
-			memset(g->sg, 0, g->sg_size);
-			g->sg[0].ptr[0] = rte_mbuf_data_iova(m);
-			lio_add_sg_size(&g->sg[0], m->data_len, 0);
-			pkt_len = m->data_len;
-			finfo->mbuf = m;
-
-			/* First seg taken care above */
-			frags = m->nb_segs - 1;
-			i = 1;
-			m = m->next;
-			while (frags--) {
-				g->sg[(i >> 2)].ptr[(i & 3)] =
-						rte_mbuf_data_iova(m);
-				lio_add_sg_size(&g->sg[(i >> 2)],
-						m->data_len, (i & 3));
-				pkt_len += m->data_len;
-				i++;
-				m = m->next;
-			}
-
-			phyaddr = rte_mem_virt2iova(g->sg);
-			if (phyaddr == RTE_BAD_IOVA) {
-				PMD_TX_LOG(lio_dev, ERR, "bad phys addr\n");
-				goto xmit_failed;
-			}
-
-			ndata.cmd.cmd3.dptr = phyaddr;
-			ndata.reqtype = LIO_REQTYPE_NORESP_NET_SG;
-
-			finfo->g = g;
-			finfo->lio_dev = lio_dev;
-			finfo->iq_no = (uint64_t)iq_no;
-			ndata.buf = finfo;
-		}
-
-		ndata.datasize = pkt_len;
-
-		status = lio_send_data_pkt(lio_dev, &ndata);
-
-		if (unlikely(status == LIO_IQ_SEND_FAILED)) {
-			PMD_TX_LOG(lio_dev, ERR, "send failed\n");
-			break;
-		}
-
-		if (unlikely(status == LIO_IQ_SEND_STOP)) {
-			PMD_TX_LOG(lio_dev, DEBUG, "iq full\n");
-			/* create space as iq is full */
-			lio_dev_cleanup_iq(lio_dev, iq_no);
-		}
-
-		stats->tx_done++;
-		stats->tx_tot_bytes += pkt_len;
-		processed++;
-	}
-
-xmit_failed:
-	stats->tx_dropped += (nb_pkts - processed);
-
-	return processed;
-}
-
-void
-lio_dev_clear_queues(struct rte_eth_dev *eth_dev)
-{
-	struct lio_instr_queue *txq;
-	struct lio_droq *rxq;
-	uint16_t i;
-
-	for (i = 0; i < eth_dev->data->nb_tx_queues; i++) {
-		txq = eth_dev->data->tx_queues[i];
-		if (txq != NULL) {
-			lio_dev_tx_queue_release(eth_dev, i);
-			eth_dev->data->tx_queues[i] = NULL;
-		}
-	}
-
-	for (i = 0; i < eth_dev->data->nb_rx_queues; i++) {
-		rxq = eth_dev->data->rx_queues[i];
-		if (rxq != NULL) {
-			lio_dev_rx_queue_release(eth_dev, i);
-			eth_dev->data->rx_queues[i] = NULL;
-		}
-	}
-}
diff --git a/drivers/net/liquidio/lio_rxtx.h b/drivers/net/liquidio/lio_rxtx.h
deleted file mode 100644
index d2a45104f0..0000000000
--- a/drivers/net/liquidio/lio_rxtx.h
+++ /dev/null
@@ -1,740 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#ifndef _LIO_RXTX_H_
-#define _LIO_RXTX_H_
-
-#include <stdio.h>
-#include <stdint.h>
-
-#include <rte_spinlock.h>
-#include <rte_memory.h>
-
-#include "lio_struct.h"
-
-#ifndef ROUNDUP4
-#define ROUNDUP4(val) (((val) + 3) & 0xfffffffc)
-#endif
-
-#define LIO_STQUEUE_FIRST_ENTRY(ptr, type, elem)	\
-	(type *)((char *)((ptr)->stqh_first) - offsetof(type, elem))
-
-#define lio_check_timeout(cur_time, chk_time) ((cur_time) > (chk_time))
-
-#define lio_uptime		\
-	(size_t)(rte_get_timer_cycles() / rte_get_timer_hz())
-
-/** Descriptor format.
- *  The descriptor ring is made of descriptors which have 2 64-bit values:
- *  -# Physical (bus) address of the data buffer.
- *  -# Physical (bus) address of a lio_droq_info structure.
- *  The device DMA's incoming packets and its information at the address
- *  given by these descriptor fields.
- */
-struct lio_droq_desc {
-	/** The buffer pointer */
-	uint64_t buffer_ptr;
-
-	/** The Info pointer */
-	uint64_t info_ptr;
-};
-
-#define LIO_DROQ_DESC_SIZE	(sizeof(struct lio_droq_desc))
-
-/** Information about packet DMA'ed by Octeon.
- *  The format of the information available at Info Pointer after Octeon
- *  has posted a packet. Not all descriptors have valid information. Only
- *  the Info field of the first descriptor for a packet has information
- *  about the packet.
- */
-struct lio_droq_info {
-	/** The Output Receive Header. */
-	union octeon_rh rh;
-
-	/** The Length of the packet. */
-	uint64_t length;
-};
-
-#define LIO_DROQ_INFO_SIZE	(sizeof(struct lio_droq_info))
-
-/** Pointer to data buffer.
- *  Driver keeps a pointer to the data buffer that it made available to
- *  the Octeon device. Since the descriptor ring keeps physical (bus)
- *  addresses, this field is required for the driver to keep track of
- *  the virtual address pointers.
- */
-struct lio_recv_buffer {
-	/** Packet buffer, including meta data. */
-	void *buffer;
-
-	/** Data in the packet buffer. */
-	uint8_t *data;
-
-};
-
-#define LIO_DROQ_RECVBUF_SIZE	(sizeof(struct lio_recv_buffer))
-
-#define LIO_DROQ_SIZE		(sizeof(struct lio_droq))
-
-#define LIO_IQ_SEND_OK		0
-#define LIO_IQ_SEND_STOP	1
-#define LIO_IQ_SEND_FAILED	-1
-
-/* conditions */
-#define LIO_REQTYPE_NONE		0
-#define LIO_REQTYPE_NORESP_NET		1
-#define LIO_REQTYPE_NORESP_NET_SG	2
-#define LIO_REQTYPE_SOFT_COMMAND	3
-
-struct lio_request_list {
-	uint32_t reqtype;
-	void *buf;
-};
-
-/*----------------------  INSTRUCTION FORMAT ----------------------------*/
-
-struct lio_instr3_64B {
-	/** Pointer where the input data is available. */
-	uint64_t dptr;
-
-	/** Instruction Header. */
-	uint64_t ih3;
-
-	/** Instruction Header. */
-	uint64_t pki_ih3;
-
-	/** Input Request Header. */
-	uint64_t irh;
-
-	/** opcode/subcode specific parameters */
-	uint64_t ossp[2];
-
-	/** Return Data Parameters */
-	uint64_t rdp;
-
-	/** Pointer where the response for a RAW mode packet will be written
-	 *  by Octeon.
-	 */
-	uint64_t rptr;
-
-};
-
-union lio_instr_64B {
-	struct lio_instr3_64B cmd3;
-};
-
-/** The size of each buffer in soft command buffer pool */
-#define LIO_SOFT_COMMAND_BUFFER_SIZE	1536
-
-/** Maximum number of buffers to allocate into soft command buffer pool */
-#define LIO_MAX_SOFT_COMMAND_BUFFERS	255
-
-struct lio_soft_command {
-	/** Soft command buffer info. */
-	struct lio_stailq_node node;
-	uint64_t dma_addr;
-	uint32_t size;
-
-	/** Command and return status */
-	union lio_instr_64B cmd;
-
-#define LIO_COMPLETION_WORD_INIT	0xffffffffffffffffULL
-	uint64_t *status_word;
-
-	/** Data buffer info */
-	void *virtdptr;
-	uint64_t dmadptr;
-	uint32_t datasize;
-
-	/** Return buffer info */
-	void *virtrptr;
-	uint64_t dmarptr;
-	uint32_t rdatasize;
-
-	/** Context buffer info */
-	void *ctxptr;
-	uint32_t ctxsize;
-
-	/** Time out and callback */
-	size_t wait_time;
-	size_t timeout;
-	uint32_t iq_no;
-	void (*callback)(uint32_t, void *);
-	void *callback_arg;
-	struct rte_mbuf *mbuf;
-};
-
-struct lio_iq_post_status {
-	int status;
-	int index;
-};
-
-/*   wqe
- *  ---------------  0
- * |  wqe  word0-3 |
- *  ---------------  32
- * |    PCI IH     |
- *  ---------------  40
- * |     RPTR      |
- *  ---------------  48
- * |    PCI IRH    |
- *  ---------------  56
- * |    OCTEON_CMD |
- *  ---------------  64
- * | Addtl 8-BData |
- * |               |
- *  ---------------
- */
-
-union octeon_cmd {
-	uint64_t cmd64;
-
-	struct	{
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-		uint64_t cmd : 5;
-
-		uint64_t more : 6; /* How many udd words follow the command */
-
-		uint64_t reserved : 29;
-
-		uint64_t param1 : 16;
-
-		uint64_t param2 : 8;
-
-#elif RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-
-		uint64_t param2 : 8;
-
-		uint64_t param1 : 16;
-
-		uint64_t reserved : 29;
-
-		uint64_t more : 6;
-
-		uint64_t cmd : 5;
-
-#endif
-	} s;
-};
-
-#define OCTEON_CMD_SIZE (sizeof(union octeon_cmd))
-
-/* Maximum number of 8-byte words can be
- * sent in a NIC control message.
- */
-#define LIO_MAX_NCTRL_UDD	32
-
-/* Structure of control information passed by driver to the BASE
- * layer when sending control commands to Octeon device software.
- */
-struct lio_ctrl_pkt {
-	/** Command to be passed to the Octeon device software. */
-	union octeon_cmd ncmd;
-
-	/** Send buffer */
-	void *data;
-	uint64_t dmadata;
-
-	/** Response buffer */
-	void *rdata;
-	uint64_t dmardata;
-
-	/** Additional data that may be needed by some commands. */
-	uint64_t udd[LIO_MAX_NCTRL_UDD];
-
-	/** Input queue to use to send this command. */
-	uint64_t iq_no;
-
-	/** Time to wait for Octeon software to respond to this control command.
-	 *  If wait_time is 0, BASE assumes no response is expected.
-	 */
-	size_t wait_time;
-
-	struct lio_dev_ctrl_cmd *ctrl_cmd;
-};
-
-/** Structure of data information passed by driver to the BASE
- *  layer when forwarding data to Octeon device software.
- */
-struct lio_data_pkt {
-	/** Pointer to information maintained by NIC module for this packet. The
-	 *  BASE layer passes this as-is to the driver.
-	 */
-	void *buf;
-
-	/** Type of buffer passed in "buf" above. */
-	uint32_t reqtype;
-
-	/** Total data bytes to be transferred in this command. */
-	uint32_t datasize;
-
-	/** Command to be passed to the Octeon device software. */
-	union lio_instr_64B cmd;
-
-	/** Input queue to use to send this command. */
-	uint32_t q_no;
-};
-
-/** Structure passed by driver to BASE layer to prepare a command to send
- *  network data to Octeon.
- */
-union lio_cmd_setup {
-	struct {
-		uint32_t iq_no : 8;
-		uint32_t gather : 1;
-		uint32_t timestamp : 1;
-		uint32_t ip_csum : 1;
-		uint32_t transport_csum : 1;
-		uint32_t tnl_csum : 1;
-		uint32_t rsvd : 19;
-
-		union {
-			uint32_t datasize;
-			uint32_t gatherptrs;
-		} u;
-	} s;
-
-	uint64_t cmd_setup64;
-};
-
-/* Instruction Header */
-struct octeon_instr_ih3 {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-
-	/** Reserved3 */
-	uint64_t reserved3 : 1;
-
-	/** Gather indicator 1=gather*/
-	uint64_t gather : 1;
-
-	/** Data length OR no. of entries in gather list */
-	uint64_t dlengsz : 14;
-
-	/** Front Data size */
-	uint64_t fsz : 6;
-
-	/** Reserved2 */
-	uint64_t reserved2 : 4;
-
-	/** PKI port kind - PKIND */
-	uint64_t pkind : 6;
-
-	/** Reserved1 */
-	uint64_t reserved1 : 32;
-
-#elif RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	/** Reserved1 */
-	uint64_t reserved1 : 32;
-
-	/** PKI port kind - PKIND */
-	uint64_t pkind : 6;
-
-	/** Reserved2 */
-	uint64_t reserved2 : 4;
-
-	/** Front Data size */
-	uint64_t fsz : 6;
-
-	/** Data length OR no. of entries in gather list */
-	uint64_t dlengsz : 14;
-
-	/** Gather indicator 1=gather*/
-	uint64_t gather : 1;
-
-	/** Reserved3 */
-	uint64_t reserved3 : 1;
-
-#endif
-};
-
-/* PKI Instruction Header(PKI IH) */
-struct octeon_instr_pki_ih3 {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-
-	/** Wider bit */
-	uint64_t w : 1;
-
-	/** Raw mode indicator 1 = RAW */
-	uint64_t raw : 1;
-
-	/** Use Tag */
-	uint64_t utag : 1;
-
-	/** Use QPG */
-	uint64_t uqpg : 1;
-
-	/** Reserved2 */
-	uint64_t reserved2 : 1;
-
-	/** Parse Mode */
-	uint64_t pm : 3;
-
-	/** Skip Length */
-	uint64_t sl : 8;
-
-	/** Use Tag Type */
-	uint64_t utt : 1;
-
-	/** Tag type */
-	uint64_t tagtype : 2;
-
-	/** Reserved1 */
-	uint64_t reserved1 : 2;
-
-	/** QPG Value */
-	uint64_t qpg : 11;
-
-	/** Tag Value */
-	uint64_t tag : 32;
-
-#elif RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-
-	/** Tag Value */
-	uint64_t tag : 32;
-
-	/** QPG Value */
-	uint64_t qpg : 11;
-
-	/** Reserved1 */
-	uint64_t reserved1 : 2;
-
-	/** Tag type */
-	uint64_t tagtype : 2;
-
-	/** Use Tag Type */
-	uint64_t utt : 1;
-
-	/** Skip Length */
-	uint64_t sl : 8;
-
-	/** Parse Mode */
-	uint64_t pm : 3;
-
-	/** Reserved2 */
-	uint64_t reserved2 : 1;
-
-	/** Use QPG */
-	uint64_t uqpg : 1;
-
-	/** Use Tag */
-	uint64_t utag : 1;
-
-	/** Raw mode indicator 1 = RAW */
-	uint64_t raw : 1;
-
-	/** Wider bit */
-	uint64_t w : 1;
-#endif
-};
-
-/** Input Request Header */
-struct octeon_instr_irh {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-	uint64_t opcode : 4;
-	uint64_t rflag : 1;
-	uint64_t subcode : 7;
-	uint64_t vlan : 12;
-	uint64_t priority : 3;
-	uint64_t reserved : 5;
-	uint64_t ossp : 32; /* opcode/subcode specific parameters */
-#elif RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	uint64_t ossp : 32; /* opcode/subcode specific parameters */
-	uint64_t reserved : 5;
-	uint64_t priority : 3;
-	uint64_t vlan : 12;
-	uint64_t subcode : 7;
-	uint64_t rflag : 1;
-	uint64_t opcode : 4;
-#endif
-};
-
-/* pkiih3 + irh + ossp[0] + ossp[1] + rdp + rptr = 40 bytes */
-#define OCTEON_SOFT_CMD_RESP_IH3	(40 + 8)
-/* pki_h3 + irh + ossp[0] + ossp[1] = 32 bytes */
-#define OCTEON_PCI_CMD_O3		(24 + 8)
-
-/** Return Data Parameters */
-struct octeon_instr_rdp {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-	uint64_t reserved : 49;
-	uint64_t pcie_port : 3;
-	uint64_t rlen : 12;
-#elif RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	uint64_t rlen : 12;
-	uint64_t pcie_port : 3;
-	uint64_t reserved : 49;
-#endif
-};
-
-union octeon_packet_params {
-	uint32_t pkt_params32;
-	struct {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-		uint32_t reserved : 24;
-		uint32_t ip_csum : 1; /* Perform IP header checksum(s) */
-		/* Perform Outer transport header checksum */
-		uint32_t transport_csum : 1;
-		/* Find tunnel, and perform transport csum. */
-		uint32_t tnl_csum : 1;
-		uint32_t tsflag : 1;   /* Timestamp this packet */
-		uint32_t ipsec_ops : 4; /* IPsec operation */
-#else
-		uint32_t ipsec_ops : 4;
-		uint32_t tsflag : 1;
-		uint32_t tnl_csum : 1;
-		uint32_t transport_csum : 1;
-		uint32_t ip_csum : 1;
-		uint32_t reserved : 7;
-#endif
-	} s;
-};
-
-/** Utility function to prepare a 64B NIC instruction based on a setup command
- * @param cmd - pointer to instruction to be filled in.
- * @param setup - pointer to the setup structure
- * @param q_no - which queue for back pressure
- *
- * Assumes the cmd instruction is pre-allocated, but no fields are filled in.
- */
-static inline void
-lio_prepare_pci_cmd(struct lio_device *lio_dev,
-		    union lio_instr_64B *cmd,
-		    union lio_cmd_setup *setup,
-		    uint32_t tag)
-{
-	union octeon_packet_params packet_params;
-	struct octeon_instr_pki_ih3 *pki_ih3;
-	struct octeon_instr_irh *irh;
-	struct octeon_instr_ih3 *ih3;
-	int port;
-
-	memset(cmd, 0, sizeof(union lio_instr_64B));
-
-	ih3 = (struct octeon_instr_ih3 *)&cmd->cmd3.ih3;
-	pki_ih3 = (struct octeon_instr_pki_ih3 *)&cmd->cmd3.pki_ih3;
-
-	/* assume that rflag is cleared so therefore front data will only have
-	 * irh and ossp[1] and ossp[2] for a total of 24 bytes
-	 */
-	ih3->pkind = lio_dev->instr_queue[setup->s.iq_no]->txpciq.s.pkind;
-	/* PKI IH */
-	ih3->fsz = OCTEON_PCI_CMD_O3;
-
-	if (!setup->s.gather) {
-		ih3->dlengsz = setup->s.u.datasize;
-	} else {
-		ih3->gather = 1;
-		ih3->dlengsz = setup->s.u.gatherptrs;
-	}
-
-	pki_ih3->w = 1;
-	pki_ih3->raw = 0;
-	pki_ih3->utag = 0;
-	pki_ih3->utt = 1;
-	pki_ih3->uqpg = lio_dev->instr_queue[setup->s.iq_no]->txpciq.s.use_qpg;
-
-	port = (int)lio_dev->instr_queue[setup->s.iq_no]->txpciq.s.port;
-
-	if (tag)
-		pki_ih3->tag = tag;
-	else
-		pki_ih3->tag = LIO_DATA(port);
-
-	pki_ih3->tagtype = OCTEON_ORDERED_TAG;
-	pki_ih3->qpg = lio_dev->instr_queue[setup->s.iq_no]->txpciq.s.qpg;
-	pki_ih3->pm = 0x0; /* parse from L2 */
-	pki_ih3->sl = 32;  /* sl will be sizeof(pki_ih3) + irh + ossp0 + ossp1*/
-
-	irh = (struct octeon_instr_irh *)&cmd->cmd3.irh;
-
-	irh->opcode = LIO_OPCODE;
-	irh->subcode = LIO_OPCODE_NW_DATA;
-
-	packet_params.pkt_params32 = 0;
-	packet_params.s.ip_csum = setup->s.ip_csum;
-	packet_params.s.transport_csum = setup->s.transport_csum;
-	packet_params.s.tnl_csum = setup->s.tnl_csum;
-	packet_params.s.tsflag = setup->s.timestamp;
-
-	irh->ossp = packet_params.pkt_params32;
-}
-
-int lio_setup_sc_buffer_pool(struct lio_device *lio_dev);
-void lio_free_sc_buffer_pool(struct lio_device *lio_dev);
-
-struct lio_soft_command *
-lio_alloc_soft_command(struct lio_device *lio_dev,
-		       uint32_t datasize, uint32_t rdatasize,
-		       uint32_t ctxsize);
-void lio_prepare_soft_command(struct lio_device *lio_dev,
-			      struct lio_soft_command *sc,
-			      uint8_t opcode, uint8_t subcode,
-			      uint32_t irh_ossp, uint64_t ossp0,
-			      uint64_t ossp1);
-int lio_send_soft_command(struct lio_device *lio_dev,
-			  struct lio_soft_command *sc);
-void lio_free_soft_command(struct lio_soft_command *sc);
-
-/** Send control packet to the device
- *  @param lio_dev - lio device pointer
- *  @param nctrl   - control structure with command, timeout, and callback info
- *
- *  @returns IQ_FAILED if it failed to add to the input queue. IQ_STOP if it the
- *  queue should be stopped, and LIO_IQ_SEND_OK if it sent okay.
- */
-int lio_send_ctrl_pkt(struct lio_device *lio_dev,
-		      struct lio_ctrl_pkt *ctrl_pkt);
-
-/** Maximum ordered requests to process in every invocation of
- *  lio_process_ordered_list(). The function will continue to process requests
- *  as long as it can find one that has finished processing. If it keeps
- *  finding requests that have completed, the function can run for ever. The
- *  value defined here sets an upper limit on the number of requests it can
- *  process before it returns control to the poll thread.
- */
-#define LIO_MAX_ORD_REQS_TO_PROCESS	4096
-
-/** Error codes used in Octeon Host-Core communication.
- *
- *   31		16 15		0
- *   ----------------------------
- * |		|		|
- *   ----------------------------
- *   Error codes are 32-bit wide. The upper 16-bits, called Major Error Number,
- *   are reserved to identify the group to which the error code belongs. The
- *   lower 16-bits, called Minor Error Number, carry the actual code.
- *
- *   So error codes are (MAJOR NUMBER << 16)| MINOR_NUMBER.
- */
-/** Status for a request.
- *  If the request is successfully queued, the driver will return
- *  a LIO_REQUEST_PENDING status. LIO_REQUEST_TIMEOUT is only returned by
- *  the driver if the response for request failed to arrive before a
- *  time-out period or if the request processing * got interrupted due to
- *  a signal respectively.
- */
-enum {
-	/** A value of 0x00000000 indicates no error i.e. success */
-	LIO_REQUEST_DONE	= 0x00000000,
-	/** (Major number: 0x0000; Minor Number: 0x0001) */
-	LIO_REQUEST_PENDING	= 0x00000001,
-	LIO_REQUEST_TIMEOUT	= 0x00000003,
-
-};
-
-/*------ Error codes used by firmware (bits 15..0 set by firmware */
-#define LIO_FIRMWARE_MAJOR_ERROR_CODE	 0x0001
-#define LIO_FIRMWARE_STATUS_CODE(status) \
-	((LIO_FIRMWARE_MAJOR_ERROR_CODE << 16) | (status))
-
-/** Initialize the response lists. The number of response lists to create is
- *  given by count.
- *  @param lio_dev - the lio device structure.
- */
-void lio_setup_response_list(struct lio_device *lio_dev);
-
-/** Check the status of first entry in the ordered list. If the instruction at
- *  that entry finished processing or has timed-out, the entry is cleaned.
- *  @param lio_dev - the lio device structure.
- *  @return 1 if the ordered list is empty, 0 otherwise.
- */
-int lio_process_ordered_list(struct lio_device *lio_dev);
-
-#define LIO_INCR_INSTRQUEUE_PKT_COUNT(lio_dev, iq_no, field, count)	\
-	(((lio_dev)->instr_queue[iq_no]->stats.field) += count)
-
-static inline void
-lio_swap_8B_data(uint64_t *data, uint32_t blocks)
-{
-	while (blocks) {
-		*data = rte_cpu_to_be_64(*data);
-		blocks--;
-		data++;
-	}
-}
-
-static inline uint64_t
-lio_map_ring(void *buf)
-{
-	rte_iova_t dma_addr;
-
-	dma_addr = rte_mbuf_data_iova_default(((struct rte_mbuf *)buf));
-
-	return (uint64_t)dma_addr;
-}
-
-static inline uint64_t
-lio_map_ring_info(struct lio_droq *droq, uint32_t i)
-{
-	rte_iova_t dma_addr;
-
-	dma_addr = droq->info_list_dma + (i * LIO_DROQ_INFO_SIZE);
-
-	return (uint64_t)dma_addr;
-}
-
-static inline int
-lio_opcode_slow_path(union octeon_rh *rh)
-{
-	uint16_t subcode1, subcode2;
-
-	subcode1 = LIO_OPCODE_SUBCODE(rh->r.opcode, rh->r.subcode);
-	subcode2 = LIO_OPCODE_SUBCODE(LIO_OPCODE, LIO_OPCODE_NW_DATA);
-
-	return subcode2 != subcode1;
-}
-
-static inline void
-lio_add_sg_size(struct lio_sg_entry *sg_entry,
-		uint16_t size, uint32_t pos)
-{
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-	sg_entry->u.size[pos] = size;
-#elif RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	sg_entry->u.size[3 - pos] = size;
-#endif
-}
-
-/* Macro to increment index.
- * Index is incremented by count; if the sum exceeds
- * max, index is wrapped-around to the start.
- */
-static inline uint32_t
-lio_incr_index(uint32_t index, uint32_t count, uint32_t max)
-{
-	if ((index + count) >= max)
-		index = index + count - max;
-	else
-		index += count;
-
-	return index;
-}
-
-int lio_setup_droq(struct lio_device *lio_dev, int q_no, int num_descs,
-		   int desc_size, struct rte_mempool *mpool,
-		   unsigned int socket_id);
-uint16_t lio_dev_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
-			   uint16_t budget);
-void lio_delete_droq_queue(struct lio_device *lio_dev, int oq_no);
-
-void lio_delete_sglist(struct lio_instr_queue *txq);
-int lio_setup_sglists(struct lio_device *lio_dev, int iq_no,
-		      int fw_mapped_iq, int num_descs, unsigned int socket_id);
-uint16_t lio_dev_xmit_pkts(void *tx_queue, struct rte_mbuf **pkts,
-			   uint16_t nb_pkts);
-int lio_wait_for_instr_fetch(struct lio_device *lio_dev);
-int lio_setup_iq(struct lio_device *lio_dev, int q_index,
-		 union octeon_txpciq iq_no, uint32_t num_descs, void *app_ctx,
-		 unsigned int socket_id);
-int lio_flush_iq(struct lio_device *lio_dev, struct lio_instr_queue *iq);
-void lio_delete_instruction_queue(struct lio_device *lio_dev, int iq_no);
-/** Setup instruction queue zero for the device
- *  @param lio_dev which lio device to setup
- *
- *  @return 0 if success. -1 if fails
- */
-int lio_setup_instr_queue0(struct lio_device *lio_dev);
-void lio_free_instr_queue0(struct lio_device *lio_dev);
-void lio_dev_clear_queues(struct rte_eth_dev *eth_dev);
-#endif	/* _LIO_RXTX_H_ */
diff --git a/drivers/net/liquidio/lio_struct.h b/drivers/net/liquidio/lio_struct.h
deleted file mode 100644
index 10270c560e..0000000000
--- a/drivers/net/liquidio/lio_struct.h
+++ /dev/null
@@ -1,661 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Cavium, Inc
- */
-
-#ifndef _LIO_STRUCT_H_
-#define _LIO_STRUCT_H_
-
-#include <stdio.h>
-#include <stdint.h>
-#include <sys/queue.h>
-
-#include <rte_spinlock.h>
-#include <rte_atomic.h>
-
-#include "lio_hw_defs.h"
-
-struct lio_stailq_node {
-	STAILQ_ENTRY(lio_stailq_node) entries;
-};
-
-STAILQ_HEAD(lio_stailq_head, lio_stailq_node);
-
-struct lio_version {
-	uint16_t major;
-	uint16_t minor;
-	uint16_t micro;
-	uint16_t reserved;
-};
-
-/** Input Queue statistics. Each input queue has four stats fields. */
-struct lio_iq_stats {
-	uint64_t instr_posted; /**< Instructions posted to this queue. */
-	uint64_t instr_processed; /**< Instructions processed in this queue. */
-	uint64_t instr_dropped; /**< Instructions that could not be processed */
-	uint64_t bytes_sent; /**< Bytes sent through this queue. */
-	uint64_t tx_done; /**< Num of packets sent to network. */
-	uint64_t tx_iq_busy; /**< Num of times this iq was found to be full. */
-	uint64_t tx_dropped; /**< Num of pkts dropped due to xmitpath errors. */
-	uint64_t tx_tot_bytes; /**< Total count of bytes sent to network. */
-};
-
-/** Output Queue statistics. Each output queue has four stats fields. */
-struct lio_droq_stats {
-	/** Number of packets received in this queue. */
-	uint64_t pkts_received;
-
-	/** Bytes received by this queue. */
-	uint64_t bytes_received;
-
-	/** Packets dropped due to no memory available. */
-	uint64_t dropped_nomem;
-
-	/** Packets dropped due to large number of pkts to process. */
-	uint64_t dropped_toomany;
-
-	/** Number of packets  sent to stack from this queue. */
-	uint64_t rx_pkts_received;
-
-	/** Number of Bytes sent to stack from this queue. */
-	uint64_t rx_bytes_received;
-
-	/** Num of Packets dropped due to receive path failures. */
-	uint64_t rx_dropped;
-
-	/** Num of vxlan packets received; */
-	uint64_t rx_vxlan;
-
-	/** Num of failures of rte_pktmbuf_alloc() */
-	uint64_t rx_alloc_failure;
-
-};
-
-/** The Descriptor Ring Output Queue structure.
- *  This structure has all the information required to implement a
- *  DROQ.
- */
-struct lio_droq {
-	/** A spinlock to protect access to this ring. */
-	rte_spinlock_t lock;
-
-	uint32_t q_no;
-
-	uint32_t pkt_count;
-
-	struct lio_device *lio_dev;
-
-	/** The 8B aligned descriptor ring starts at this address. */
-	struct lio_droq_desc *desc_ring;
-
-	/** Index in the ring where the driver should read the next packet */
-	uint32_t read_idx;
-
-	/** Index in the ring where Octeon will write the next packet */
-	uint32_t write_idx;
-
-	/** Index in the ring where the driver will refill the descriptor's
-	 * buffer
-	 */
-	uint32_t refill_idx;
-
-	/** Packets pending to be processed */
-	rte_atomic64_t pkts_pending;
-
-	/** Number of  descriptors in this ring. */
-	uint32_t nb_desc;
-
-	/** The number of descriptors pending refill. */
-	uint32_t refill_count;
-
-	uint32_t refill_threshold;
-
-	/** The 8B aligned info ptrs begin from this address. */
-	struct lio_droq_info *info_list;
-
-	/** The receive buffer list. This list has the virtual addresses of the
-	 *  buffers.
-	 */
-	struct lio_recv_buffer *recv_buf_list;
-
-	/** The size of each buffer pointed by the buffer pointer. */
-	uint32_t buffer_size;
-
-	/** Pointer to the mapped packet credit register.
-	 *  Host writes number of info/buffer ptrs available to this register
-	 */
-	void *pkts_credit_reg;
-
-	/** Pointer to the mapped packet sent register.
-	 *  Octeon writes the number of packets DMA'ed to host memory
-	 *  in this register.
-	 */
-	void *pkts_sent_reg;
-
-	/** Statistics for this DROQ. */
-	struct lio_droq_stats stats;
-
-	/** DMA mapped address of the DROQ descriptor ring. */
-	size_t desc_ring_dma;
-
-	/** Info ptr list are allocated at this virtual address. */
-	size_t info_base_addr;
-
-	/** DMA mapped address of the info list */
-	size_t info_list_dma;
-
-	/** Allocated size of info list. */
-	uint32_t info_alloc_size;
-
-	/** Memory zone **/
-	const struct rte_memzone *desc_ring_mz;
-	const struct rte_memzone *info_mz;
-	struct rte_mempool *mpool;
-};
-
-/** Receive Header */
-union octeon_rh {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-	uint64_t rh64;
-	struct	{
-		uint64_t opcode : 4;
-		uint64_t subcode : 8;
-		uint64_t len : 3; /** additional 64-bit words */
-		uint64_t reserved : 17;
-		uint64_t ossp : 32; /** opcode/subcode specific parameters */
-	} r;
-	struct	{
-		uint64_t opcode : 4;
-		uint64_t subcode : 8;
-		uint64_t len : 3; /** additional 64-bit words */
-		uint64_t extra : 28;
-		uint64_t vlan : 12;
-		uint64_t priority : 3;
-		uint64_t csum_verified : 3; /** checksum verified. */
-		uint64_t has_hwtstamp : 1; /** Has hardware timestamp.1 = yes.*/
-		uint64_t encap_on : 1;
-		uint64_t has_hash : 1; /** Has hash (rth or rss). 1 = yes. */
-	} r_dh;
-	struct {
-		uint64_t opcode : 4;
-		uint64_t subcode : 8;
-		uint64_t len : 3; /** additional 64-bit words */
-		uint64_t reserved : 8;
-		uint64_t extra : 25;
-		uint64_t gmxport : 16;
-	} r_nic_info;
-#else
-	uint64_t rh64;
-	struct {
-		uint64_t ossp : 32; /** opcode/subcode specific parameters */
-		uint64_t reserved : 17;
-		uint64_t len : 3; /** additional 64-bit words */
-		uint64_t subcode : 8;
-		uint64_t opcode : 4;
-	} r;
-	struct {
-		uint64_t has_hash : 1; /** Has hash (rth or rss). 1 = yes. */
-		uint64_t encap_on : 1;
-		uint64_t has_hwtstamp : 1;  /** 1 = has hwtstamp */
-		uint64_t csum_verified : 3; /** checksum verified. */
-		uint64_t priority : 3;
-		uint64_t vlan : 12;
-		uint64_t extra : 28;
-		uint64_t len : 3; /** additional 64-bit words */
-		uint64_t subcode : 8;
-		uint64_t opcode : 4;
-	} r_dh;
-	struct {
-		uint64_t gmxport : 16;
-		uint64_t extra : 25;
-		uint64_t reserved : 8;
-		uint64_t len : 3; /** additional 64-bit words */
-		uint64_t subcode : 8;
-		uint64_t opcode : 4;
-	} r_nic_info;
-#endif
-};
-
-#define OCTEON_RH_SIZE (sizeof(union octeon_rh))
-
-/** The txpciq info passed to host from the firmware */
-union octeon_txpciq {
-	uint64_t txpciq64;
-
-	struct {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-		uint64_t q_no : 8;
-		uint64_t port : 8;
-		uint64_t pkind : 6;
-		uint64_t use_qpg : 1;
-		uint64_t qpg : 11;
-		uint64_t aura_num : 10;
-		uint64_t reserved : 20;
-#else
-		uint64_t reserved : 20;
-		uint64_t aura_num : 10;
-		uint64_t qpg : 11;
-		uint64_t use_qpg : 1;
-		uint64_t pkind : 6;
-		uint64_t port : 8;
-		uint64_t q_no : 8;
-#endif
-	} s;
-};
-
-/** The instruction (input) queue.
- *  The input queue is used to post raw (instruction) mode data or packet
- *  data to Octeon device from the host. Each input queue for
- *  a LIO device has one such structure to represent it.
- */
-struct lio_instr_queue {
-	/** A spinlock to protect access to the input ring.  */
-	rte_spinlock_t lock;
-
-	rte_spinlock_t post_lock;
-
-	struct lio_device *lio_dev;
-
-	uint32_t pkt_in_done;
-
-	rte_atomic64_t iq_flush_running;
-
-	/** Flag that indicates if the queue uses 64 byte commands. */
-	uint32_t iqcmd_64B:1;
-
-	/** Queue info. */
-	union octeon_txpciq txpciq;
-
-	uint32_t rsvd:17;
-
-	uint32_t status:8;
-
-	/** Number of  descriptors in this ring. */
-	uint32_t nb_desc;
-
-	/** Index in input ring where the driver should write the next packet */
-	uint32_t host_write_index;
-
-	/** Index in input ring where Octeon is expected to read the next
-	 *  packet.
-	 */
-	uint32_t lio_read_index;
-
-	/** This index aids in finding the window in the queue where Octeon
-	 *  has read the commands.
-	 */
-	uint32_t flush_index;
-
-	/** This field keeps track of the instructions pending in this queue. */
-	rte_atomic64_t instr_pending;
-
-	/** Pointer to the Virtual Base addr of the input ring. */
-	uint8_t *base_addr;
-
-	struct lio_request_list *request_list;
-
-	/** Octeon doorbell register for the ring. */
-	void *doorbell_reg;
-
-	/** Octeon instruction count register for this ring. */
-	void *inst_cnt_reg;
-
-	/** Number of instructions pending to be posted to Octeon. */
-	uint32_t fill_cnt;
-
-	/** Statistics for this input queue. */
-	struct lio_iq_stats stats;
-
-	/** DMA mapped base address of the input descriptor ring. */
-	uint64_t base_addr_dma;
-
-	/** Application context */
-	void *app_ctx;
-
-	/* network stack queue index */
-	int q_index;
-
-	/* Memory zone */
-	const struct rte_memzone *iq_mz;
-};
-
-/** This structure is used by driver to store information required
- *  to free the mbuff when the packet has been fetched by Octeon.
- *  Bytes offset below assume worst-case of a 64-bit system.
- */
-struct lio_buf_free_info {
-	/** Bytes 1-8. Pointer to network device private structure. */
-	struct lio_device *lio_dev;
-
-	/** Bytes 9-16. Pointer to mbuff. */
-	struct rte_mbuf *mbuf;
-
-	/** Bytes 17-24. Pointer to gather list. */
-	struct lio_gather *g;
-
-	/** Bytes 25-32. Physical address of mbuf->data or gather list. */
-	uint64_t dptr;
-
-	/** Bytes 33-47. Piggybacked soft command, if any */
-	struct lio_soft_command *sc;
-
-	/** Bytes 48-63. iq no */
-	uint64_t iq_no;
-};
-
-/* The Scatter-Gather List Entry. The scatter or gather component used with
- * input instruction has this format.
- */
-struct lio_sg_entry {
-	/** The first 64 bit gives the size of data in each dptr. */
-	union {
-		uint16_t size[4];
-		uint64_t size64;
-	} u;
-
-	/** The 4 dptr pointers for this entry. */
-	uint64_t ptr[4];
-};
-
-#define LIO_SG_ENTRY_SIZE	(sizeof(struct lio_sg_entry))
-
-/** Structure of a node in list of gather components maintained by
- *  driver for each network device.
- */
-struct lio_gather {
-	/** List manipulation. Next and prev pointers. */
-	struct lio_stailq_node list;
-
-	/** Size of the gather component at sg in bytes. */
-	int sg_size;
-
-	/** Number of bytes that sg was adjusted to make it 8B-aligned. */
-	int adjust;
-
-	/** Gather component that can accommodate max sized fragment list
-	 *  received from the IP layer.
-	 */
-	struct lio_sg_entry *sg;
-};
-
-struct lio_rss_ctx {
-	uint16_t hash_key_size;
-	uint8_t  hash_key[LIO_RSS_MAX_KEY_SZ];
-	/* Ideally a factor of number of queues */
-	uint8_t  itable[LIO_RSS_MAX_TABLE_SZ];
-	uint8_t  itable_size;
-	uint8_t  ip;
-	uint8_t  tcp_hash;
-	uint8_t  ipv6;
-	uint8_t  ipv6_tcp_hash;
-	uint8_t  ipv6_ex;
-	uint8_t  ipv6_tcp_ex_hash;
-	uint8_t  hash_disable;
-};
-
-struct lio_io_enable {
-	uint64_t iq;
-	uint64_t oq;
-	uint64_t iq64B;
-};
-
-struct lio_fn_list {
-	void (*setup_iq_regs)(struct lio_device *, uint32_t);
-	void (*setup_oq_regs)(struct lio_device *, uint32_t);
-
-	int (*setup_mbox)(struct lio_device *);
-	void (*free_mbox)(struct lio_device *);
-
-	int (*setup_device_regs)(struct lio_device *);
-	int (*enable_io_queues)(struct lio_device *);
-	void (*disable_io_queues)(struct lio_device *);
-};
-
-struct lio_pf_vf_hs_word {
-#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
-	/** PKIND value assigned for the DPI interface */
-	uint64_t pkind : 8;
-
-	/** OCTEON core clock multiplier */
-	uint64_t core_tics_per_us : 16;
-
-	/** OCTEON coprocessor clock multiplier */
-	uint64_t coproc_tics_per_us : 16;
-
-	/** app that currently running on OCTEON */
-	uint64_t app_mode : 8;
-
-	/** RESERVED */
-	uint64_t reserved : 16;
-
-#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-
-	/** RESERVED */
-	uint64_t reserved : 16;
-
-	/** app that currently running on OCTEON */
-	uint64_t app_mode : 8;
-
-	/** OCTEON coprocessor clock multiplier */
-	uint64_t coproc_tics_per_us : 16;
-
-	/** OCTEON core clock multiplier */
-	uint64_t core_tics_per_us : 16;
-
-	/** PKIND value assigned for the DPI interface */
-	uint64_t pkind : 8;
-#endif
-};
-
-struct lio_sriov_info {
-	/** Number of rings assigned to VF */
-	uint32_t rings_per_vf;
-
-	/** Number of VF devices enabled */
-	uint32_t num_vfs;
-};
-
-/* Head of a response list */
-struct lio_response_list {
-	/** List structure to add delete pending entries to */
-	struct lio_stailq_head head;
-
-	/** A lock for this response list */
-	rte_spinlock_t lock;
-
-	rte_atomic64_t pending_req_count;
-};
-
-/* Structure to define the configuration attributes for each Input queue. */
-struct lio_iq_config {
-	/* Max number of IQs available */
-	uint8_t max_iqs;
-
-	/** Pending list size (usually set to the sum of the size of all Input
-	 *  queues)
-	 */
-	uint32_t pending_list_size;
-
-	/** Command size - 32 or 64 bytes */
-	uint32_t instr_type;
-};
-
-/* Structure to define the configuration attributes for each Output queue. */
-struct lio_oq_config {
-	/* Max number of OQs available */
-	uint8_t max_oqs;
-
-	/** If set, the Output queue uses info-pointer mode. (Default: 1 ) */
-	uint32_t info_ptr;
-
-	/** The number of buffers that were consumed during packet processing by
-	 *  the driver on this Output queue before the driver attempts to
-	 *  replenish the descriptor ring with new buffers.
-	 */
-	uint32_t refill_threshold;
-};
-
-/* Structure to define the configuration. */
-struct lio_config {
-	uint16_t card_type;
-	const char *card_name;
-
-	/** Input Queue attributes. */
-	struct lio_iq_config iq;
-
-	/** Output Queue attributes. */
-	struct lio_oq_config oq;
-
-	int num_nic_ports;
-
-	int num_def_tx_descs;
-
-	/* Num of desc for rx rings */
-	int num_def_rx_descs;
-
-	int def_rx_buf_size;
-};
-
-/** Status of a RGMII Link on Octeon as seen by core driver. */
-union octeon_link_status {
-	uint64_t link_status64;
-
-	struct {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-		uint64_t duplex : 8;
-		uint64_t mtu : 16;
-		uint64_t speed : 16;
-		uint64_t link_up : 1;
-		uint64_t autoneg : 1;
-		uint64_t if_mode : 5;
-		uint64_t pause : 1;
-		uint64_t flashing : 1;
-		uint64_t reserved : 15;
-#else
-		uint64_t reserved : 15;
-		uint64_t flashing : 1;
-		uint64_t pause : 1;
-		uint64_t if_mode : 5;
-		uint64_t autoneg : 1;
-		uint64_t link_up : 1;
-		uint64_t speed : 16;
-		uint64_t mtu : 16;
-		uint64_t duplex : 8;
-#endif
-	} s;
-};
-
-/** The rxpciq info passed to host from the firmware */
-union octeon_rxpciq {
-	uint64_t rxpciq64;
-
-	struct {
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-		uint64_t q_no : 8;
-		uint64_t reserved : 56;
-#else
-		uint64_t reserved : 56;
-		uint64_t q_no : 8;
-#endif
-	} s;
-};
-
-/** Information for a OCTEON ethernet interface shared between core & host. */
-struct octeon_link_info {
-	union octeon_link_status link;
-	uint64_t hw_addr;
-
-#if RTE_BYTE_ORDER == RTE_BIG_ENDIAN
-	uint64_t gmxport : 16;
-	uint64_t macaddr_is_admin_assigned : 1;
-	uint64_t vlan_is_admin_assigned : 1;
-	uint64_t rsvd : 30;
-	uint64_t num_txpciq : 8;
-	uint64_t num_rxpciq : 8;
-#else
-	uint64_t num_rxpciq : 8;
-	uint64_t num_txpciq : 8;
-	uint64_t rsvd : 30;
-	uint64_t vlan_is_admin_assigned : 1;
-	uint64_t macaddr_is_admin_assigned : 1;
-	uint64_t gmxport : 16;
-#endif
-
-	union octeon_txpciq txpciq[LIO_MAX_IOQS_PER_IF];
-	union octeon_rxpciq rxpciq[LIO_MAX_IOQS_PER_IF];
-};
-
-/* -----------------------  THE LIO DEVICE  --------------------------- */
-/** The lio device.
- *  Each lio device has this structure to represent all its
- *  components.
- */
-struct lio_device {
-	/** PCI device pointer */
-	struct rte_pci_device *pci_dev;
-
-	/** Octeon Chip type */
-	uint16_t chip_id;
-	uint16_t pf_num;
-	uint16_t vf_num;
-
-	/** This device's PCIe port used for traffic. */
-	uint16_t pcie_port;
-
-	/** The state of this device */
-	rte_atomic64_t status;
-
-	uint8_t intf_open;
-
-	struct octeon_link_info linfo;
-
-	uint8_t *hw_addr;
-
-	struct lio_fn_list fn_list;
-
-	uint32_t num_iqs;
-
-	/** Guards each glist */
-	rte_spinlock_t *glist_lock;
-	/** Array of gather component linked lists */
-	struct lio_stailq_head *glist_head;
-
-	/* The pool containing pre allocated buffers used for soft commands */
-	struct rte_mempool *sc_buf_pool;
-
-	/** The input instruction queues */
-	struct lio_instr_queue *instr_queue[LIO_MAX_POSSIBLE_INSTR_QUEUES];
-
-	/** The singly-linked tail queues of instruction response */
-	struct lio_response_list response_list;
-
-	uint32_t num_oqs;
-
-	/** The DROQ output queues  */
-	struct lio_droq *droq[LIO_MAX_POSSIBLE_OUTPUT_QUEUES];
-
-	struct lio_io_enable io_qmask;
-
-	struct lio_sriov_info sriov_info;
-
-	struct lio_pf_vf_hs_word pfvf_hsword;
-
-	/** Mail Box details of each lio queue. */
-	struct lio_mbox **mbox;
-
-	char dev_string[LIO_DEVICE_NAME_LEN]; /* Device print string */
-
-	const struct lio_config *default_config;
-
-	struct rte_eth_dev      *eth_dev;
-
-	uint64_t ifflags;
-	uint8_t max_rx_queues;
-	uint8_t max_tx_queues;
-	uint8_t nb_rx_queues;
-	uint8_t nb_tx_queues;
-	uint8_t port_configured;
-	struct lio_rss_ctx rss_state;
-	uint16_t port_id;
-	char firmware_version[LIO_FW_VERSION_LENGTH];
-};
-#endif /* _LIO_STRUCT_H_ */
diff --git a/drivers/net/liquidio/meson.build b/drivers/net/liquidio/meson.build
deleted file mode 100644
index ebadbf3dea..0000000000
--- a/drivers/net/liquidio/meson.build
+++ /dev/null
@@ -1,16 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2018 Intel Corporation
-
-if is_windows
-    build = false
-    reason = 'not supported on Windows'
-    subdir_done()
-endif
-
-sources = files(
-        'base/lio_23xx_vf.c',
-        'base/lio_mbox.c',
-        'lio_ethdev.c',
-        'lio_rxtx.c',
-)
-includes += include_directories('base')
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index b1df17ce8c..f68bbc27a7 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -36,7 +36,6 @@ drivers = [
         'ipn3ke',
         'ixgbe',
         'kni',
-        'liquidio',
         'mana',
         'memif',
         'mlx4',
-- 
2.40.1


^ permalink raw reply	[relevance 1%]

* Minutes of Technical Board Meeting, 2023-01-11
       [not found]       ` <DS0PR11MB73090EC350B82E0730D0D9A197CE9@DS0PR11MB7309.namprd11.prod.outlook.com>
@ 2023-05-05 15:05  3%     ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-05-05 15:05 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon


NOTE: The technical board meetings are on every second Wednesday at
https://meet.jit.si/DPDK at 3 pm UTC. Meetings are public, and DPDK
community members are welcome to attend.

NOTE: Next meeting will be on Wednesday 2023-01-25 @ 3pm UTC, and will
be chaired by Bruce.

Agenda Items
============

1) C99 standard
---------------

Future support for C11 atomics raised the question of should C99 be
required for DPDK. Several places use C99 already but it is not project
wide. DPDK is using C11 now but marked as extension where used.

The open issues are:
  - do not want to require application to require C99 but
    want to allow applications using C99. This impacts inline in headers.
  - Need to announce. Should not cause API/ABI breakage.
  - the testing and infrastructure are impacted as well.
  - need to keep inline for performance reasons.

Bruce is adding build support for test and compatibility.
Investigating what fallout is from project wide enablement.

2) Technical Writer
------------------

Possible candidate did not work out. Two candidates under
review.

3) MIT License
--------------

Original Governing Board wording for MIT license exception
became overly complicated. Linux Foundation legal expert
revised it. Governing Board is reviewing.

4) Governing Board
------------------

DPDK Technical Board member to Governing Board:
  - past Stephen; current Thomas; next Aaron

Recent vote on modification to charter to codify treasurer role.

Discussion on marketing. The existing Linux Foundation model
has DPDK project paying for things that are not necessary and
not getting the expected support.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] net/liquidio: removed LiquidIO ethdev driver
  @ 2023-05-02 14:18  5% ` Ferruh Yigit
  2023-05-08 13:44  1% ` [dpdk-dev] [PATCH v2] net/liquidio: remove " jerinj
  1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-05-02 14:18 UTC (permalink / raw)
  To: jerinj, dev, Thomas Monjalon, Shijith Thotton,
	Srisivasubramanian Srinivasan, Anatoly Burakov, David Marchand

On 4/28/2023 11:31 AM, jerinj@marvell.com wrote:
> From: Jerin Jacob <jerinj@marvell.com>
> 
> The LiquidIO product line has been substituted with CN9K/CN10K
> OCTEON product line smart NICs located at drivers/net/octeon_ep/.
> 
> DPDK 20.08 has categorized the LiquidIO driver as UNMAINTAINED
> because of the absence of updates in the driver.
> 
> Due to the above reasons, the driver removed from DPDK 23.07.
> 
> Also removed deprecation notice entry for the removal in
> doc/guides/rel_notes/deprecation.rst.
> 
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---
>  MAINTAINERS                              |    8 -
>  doc/guides/nics/features/liquidio.ini    |   29 -
>  doc/guides/nics/index.rst                |    1 -
>  doc/guides/nics/liquidio.rst             |  169 --
>  doc/guides/rel_notes/deprecation.rst     |    7 -
>  doc/guides/rel_notes/release_23_07.rst   |    9 +-
>  drivers/net/liquidio/base/lio_23xx_reg.h |  165 --
>  drivers/net/liquidio/base/lio_23xx_vf.c  |  513 ------
>  drivers/net/liquidio/base/lio_23xx_vf.h  |   63 -
>  drivers/net/liquidio/base/lio_hw_defs.h  |  239 ---
>  drivers/net/liquidio/base/lio_mbox.c     |  246 ---
>  drivers/net/liquidio/base/lio_mbox.h     |  102 -
>  drivers/net/liquidio/lio_ethdev.c        | 2147 ----------------------
>  drivers/net/liquidio/lio_ethdev.h        |  179 --
>  drivers/net/liquidio/lio_logs.h          |   58 -
>  drivers/net/liquidio/lio_rxtx.c          | 1804 ------------------
>  drivers/net/liquidio/lio_rxtx.h          |  740 --------
>  drivers/net/liquidio/lio_struct.h        |  661 -------
>  drivers/net/liquidio/meson.build         |   16 -
>  drivers/net/meson.build                  |    1 -
>  20 files changed, 1 insertion(+), 7156 deletions(-)
>  delete mode 100644 doc/guides/nics/features/liquidio.ini
>  delete mode 100644 doc/guides/nics/liquidio.rst
>  delete mode 100644 drivers/net/liquidio/base/lio_23xx_reg.h
>  delete mode 100644 drivers/net/liquidio/base/lio_23xx_vf.c
>  delete mode 100644 drivers/net/liquidio/base/lio_23xx_vf.h
>  delete mode 100644 drivers/net/liquidio/base/lio_hw_defs.h
>  delete mode 100644 drivers/net/liquidio/base/lio_mbox.c
>  delete mode 100644 drivers/net/liquidio/base/lio_mbox.h
>  delete mode 100644 drivers/net/liquidio/lio_ethdev.c
>  delete mode 100644 drivers/net/liquidio/lio_ethdev.h
>  delete mode 100644 drivers/net/liquidio/lio_logs.h
>  delete mode 100644 drivers/net/liquidio/lio_rxtx.c
>  delete mode 100644 drivers/net/liquidio/lio_rxtx.h
>  delete mode 100644 drivers/net/liquidio/lio_struct.h
>  delete mode 100644 drivers/net/liquidio/meson.build
> 

This cause warning in the ABI check script [1], not because there is an
ABI breakage, but because how script works, that needs to be fixed as well.

[1]
Checking ABI compatibility of build-gcc-shared
.../dpdk-next-net/devtools/../devtools/check-abi.sh
/tmp/dpdk-abiref/v22.11.1/build-gcc-shared
.../dpdk-next-net/build-gcc-shared/install
Error: cannot find librte_net_liquidio.so.23.0 in
.../dpdk-next-net/build-gcc-shared/install

<...>

> --- a/doc/guides/rel_notes/release_23_07.rst
> +++ b/doc/guides/rel_notes/release_23_07.rst
> @@ -59,14 +59,7 @@ New Features
>  Removed Items
>  -------------
>  
> -.. This section should contain removed items in this release. Sample format:
> -
> -   * Add a short 1-2 sentence description of the removed item
> -     in the past tense.
> -
> -   This section is a comment. Do not overwrite or remove it.
> -   Also, make sure to start the actual text at the margin.
> -   =======================================================
> +* Removed LiquidIO ethdev driver located at ``drivers/net/liquidio/``
>  
>  

No need to remove the section comment.

Rest looks good to me.


^ permalink raw reply	[relevance 5%]

* [PATCH v8 10/14] eal: expand most macros to empty when using MSVC
  @ 2023-05-02  3:15  5%   ` Tyler Retzlaff
  2023-05-02  3:15  3%   ` [PATCH v8 12/14] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-05-02  3:15 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

For now expand a lot of common rte macros empty. The catch here is we
need to test that most of the macros do what they should but at the same
time they are blocking work needed to bootstrap of the unit tests.

Later we will return and provide (where possible) expansions that work
correctly for msvc and where not possible provide some alternate macros
to achieve the same outcome.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/eal/include/rte_branch_prediction.h |  8 +++++
 lib/eal/include/rte_common.h            | 54 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_compat.h            | 20 ++++++++++++
 3 files changed, 82 insertions(+)

diff --git a/lib/eal/include/rte_branch_prediction.h b/lib/eal/include/rte_branch_prediction.h
index 0256a9d..1eff9f6 100644
--- a/lib/eal/include/rte_branch_prediction.h
+++ b/lib/eal/include/rte_branch_prediction.h
@@ -25,7 +25,11 @@
  *
  */
 #ifndef likely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define likely(x)	__builtin_expect(!!(x), 1)
+#else
+#define likely(x)	(!!(x))
+#endif
 #endif /* likely */
 
 /**
@@ -39,7 +43,11 @@
  *
  */
 #ifndef unlikely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	(!!(x))
+#endif
 #endif /* unlikely */
 
 #ifdef __cplusplus
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 2f464e3..0c55a23 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -41,6 +41,10 @@
 #define RTE_STD_C11
 #endif
 
+#ifdef RTE_TOOLCHAIN_MSVC
+#define __extension__
+#endif
+
 /*
  * RTE_TOOLCHAIN_GCC is defined if the target is built with GCC,
  * while a host application (like pmdinfogen) may have another compiler.
@@ -65,7 +69,11 @@
 /**
  * Force alignment
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
+#else
+#define __rte_aligned(a)
+#endif
 
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
@@ -80,16 +88,29 @@
 /**
  * Force a structure to be packed
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_packed __attribute__((__packed__))
+#else
+#define __rte_packed
+#endif
 
 /**
  * Macro to mark a type that is not subject to type-based aliasing rules
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_may_alias __attribute__((__may_alias__))
+#else
+#define __rte_may_alias
+#endif
 
 /******* Macro to mark functions and fields scheduled for removal *****/
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_deprecated	__attribute__((__deprecated__))
 #define __rte_deprecated_msg(msg)	__attribute__((__deprecated__(msg)))
+#else
+#define __rte_deprecated
+#define __rte_deprecated_msg(msg)
+#endif
 
 /**
  *  Macro to mark macros and defines scheduled for removal
@@ -110,14 +131,22 @@
 /**
  * Force symbol to be generated even if it appears to be unused.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_used __attribute__((used))
+#else
+#define __rte_used
+#endif
 
 /*********** Macros to eliminate unused variable warnings ********/
 
 /**
  * short definition to mark a function parameter unused
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_unused __attribute__((__unused__))
+#else
+#define __rte_unused
+#endif
 
 /**
  * Mark pointer as restricted with regard to pointer aliasing.
@@ -141,6 +170,7 @@
  * even if the underlying stdio implementation is ANSI-compliant,
  * so this must be overridden.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #if RTE_CC_IS_GNU
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(gnu_printf, format_index, first_arg)))
@@ -148,6 +178,9 @@
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
+#else
+#define __rte_format_printf(format_index, first_arg)
+#endif
 
 /**
  * Tells compiler that the function returns a value that points to
@@ -222,7 +255,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /**
  * Hint never returning function
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_noreturn __attribute__((noreturn))
+#else
+#define __rte_noreturn
+#endif
 
 /**
  * Issue a warning in case the function's return value is ignored.
@@ -247,12 +284,20 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  }
  * @endcode
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_warn_unused_result __attribute__((warn_unused_result))
+#else
+#define __rte_warn_unused_result
+#endif
 
 /**
  * Force a function to be inlined
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_always_inline inline __attribute__((always_inline))
+#else
+#define __rte_always_inline
+#endif
 
 /**
  * Force a function to be noinlined
@@ -437,7 +482,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 #define RTE_CACHE_LINE_MIN_SIZE 64
 
 /** Force alignment to cache line. */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
+#else
+#define __rte_cache_aligned
+#endif
 
 /** Force minimum cache line alignment. */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
@@ -812,12 +861,17 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  struct wrapper *w = container_of(x, struct wrapper, c);
  */
 #ifndef container_of
+#ifndef RTE_TOOLCHAIN_MSVC
 #define container_of(ptr, type, member)	__extension__ ({		\
 			const typeof(((type *)0)->member) *_ptr = (ptr); \
 			__rte_unused type *_target_ptr =	\
 				(type *)(ptr);				\
 			(type *)(((uintptr_t)_ptr) - offsetof(type, member)); \
 		})
+#else
+#define container_of(ptr, type, member) \
+			((type *)((uintptr_t)(ptr) - offsetof(type, member)))
+#endif
 #endif
 
 /** Swap two variables. */
diff --git a/lib/eal/include/rte_compat.h b/lib/eal/include/rte_compat.h
index fc9fbaa..6a4b5ee 100644
--- a/lib/eal/include/rte_compat.h
+++ b/lib/eal/include/rte_compat.h
@@ -12,14 +12,22 @@
 
 #ifndef ALLOW_EXPERIMENTAL_API
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((deprecated("Symbol is not yet part of stable ABI"), \
 section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #endif
 
@@ -30,23 +38,35 @@
 
 #if !defined ALLOW_INTERNAL_API && __has_attribute(error) /* For GCC */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((error("Symbol is not public ABI"), \
 section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #elif !defined ALLOW_INTERNAL_API && __has_attribute(diagnose_if) /* For clang */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 _Pragma("GCC diagnostic push") \
 _Pragma("GCC diagnostic ignored \"-Wgcc-compat\"") \
 __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
 section(".text.internal"))) \
 _Pragma("GCC diagnostic pop")
+#else
+#define __rte_internal
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #endif
 
-- 
1.8.3.1


^ permalink raw reply	[relevance 5%]

* [PATCH v8 12/14] telemetry: avoid expanding versioned symbol macros on MSVC
    2023-05-02  3:15  5%   ` [PATCH v8 10/14] eal: expand most macros to empty when using MSVC Tyler Retzlaff
@ 2023-05-02  3:15  3%   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-05-02  3:15 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

Windows does not support versioned symbols. Fortunately Windows also
doesn't have an exported stable ABI.

Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
functions.

Windows does have a way to achieve similar versioning for symbols but it
is not a simple #define so it will be done as a work package later.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
index 2bac2de..284c16e 100644
--- a/lib/telemetry/telemetry_data.c
+++ b/lib/telemetry/telemetry_data.c
@@ -82,8 +82,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
 		int64_t x), rte_tel_data_add_array_int_v24);
+#else
+int
+rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
+{
+	return rte_tel_data_add_array_int_v24(d, x);
+}
+#endif
 
 int
 rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
@@ -220,8 +228,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
 		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
+#else
+int
+rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
+{
+	return rte_tel_data_add_dict_int_v24(d, name, val);
+}
+#endif
 
 int
 rte_tel_data_add_dict_uint(struct rte_tel_data *d,
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/3] security: introduce out of place support for inline ingress
  2023-04-18  8:33  4%     ` Jerin Jacob
@ 2023-04-24 22:41  3%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-04-24 22:41 UTC (permalink / raw)
  To: Stephen Hemminger, Jerin Jacob
  Cc: Nithin Dabilpuram, Akhil Goyal, jerinj, dev, Morten Brørup,
	techboard

18/04/2023 10:33, Jerin Jacob:
> On Tue, Apr 11, 2023 at 11:36 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Tue, 11 Apr 2023 15:34:07 +0530
> > Nithin Dabilpuram <ndabilpuram@marvell.com> wrote:
> >
> > > diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> > > index 4bacf9fcd9..866cd4e8ee 100644
> > > --- a/lib/security/rte_security.h
> > > +++ b/lib/security/rte_security.h
> > > @@ -275,6 +275,17 @@ struct rte_security_ipsec_sa_options {
> > >        */
> > >       uint32_t ip_reassembly_en : 1;
> > >
> > > +     /** Enable out of place processing on inline inbound packets.
> > > +      *
> > > +      * * 1: Enable driver to perform Out-of-place(OOP) processing for this inline
> > > +      *      inbound SA if supported by driver. PMD need to register mbuf
> > > +      *      dynamic field using rte_security_oop_dynfield_register()
> > > +      *      and security session creation would fail if dynfield is not
> > > +      *      registered successfully.
> > > +      * * 0: Disable OOP processing for this session (default).
> > > +      */
> > > +     uint32_t ingress_oop : 1;
> > > +
> > >       /** Reserved bit fields for future extension
> > >        *
> > >        * User should ensure reserved_opts is cleared as it may change in
> > > @@ -282,7 +293,7 @@ struct rte_security_ipsec_sa_options {
> > >        *
> > >        * Note: Reduce number of bits in reserved_opts for every new option.
> > >        */
> > > -     uint32_t reserved_opts : 17;
> > > +     uint32_t reserved_opts : 16;
> > >  };
> >
> > NAK
> > Let me repeat the reserved bit rant. YAGNI
> >
> > Reserved space is not usable without ABI breakage unless the existing
> > code enforces that reserved space has to be zero.
> >
> > Just saying "User should ensure reserved_opts is cleared" is not enough.
> 
> Yes. I think, we need to enforce to have _init functions for the
> structures which is using reserved filed.
> 
> On the same note on YAGNI, I am wondering why NOT introduce
> RTE_NEXT_ABI marco kind of scheme to compile out ABI breaking changes.
> By keeping RTE_NEXT_ABI disable by default, enable explicitly if user
> wants it to avoid waiting for one year any ABI breaking changes.
> There are a lot of "fixed appliance" customers (not OS distribution
> driven customer) they are willing to recompile DPDK for new feature.
> What we are loosing with this scheme?

RTE_NEXT_ABI is described in the ABI policy.
We are not doing it currently, but I think we could
when it is not too much complicate in the code.

The only problems I see are:
- more #ifdef clutter
- 2 binary versions to test
- CI and checks must handle RTE_NEXT_ABI version




^ permalink raw reply	[relevance 3%]

* Re: [RFC] lib: set/get max memzone segments
  @ 2023-04-21  8:34  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-04-21  8:34 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Ophir Munk, dev, Bruce Richardson, Devendra Singh Rawat,
	Alok Prasad, Matan Azrad, Lior Margalit

20/04/2023 20:20, Tyler Retzlaff:
> On Thu, Apr 20, 2023 at 09:43:28AM +0200, Thomas Monjalon wrote:
> > 19/04/2023 16:51, Tyler Retzlaff:
> > > On Wed, Apr 19, 2023 at 11:36:34AM +0300, Ophir Munk wrote:
> > > > In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> > > > coded as 2560.  For applications requiring different values of this
> > > > parameter – it is more convenient to set the max value via an rte API -
> > > > rather than changing the dpdk source code per application.  In many
> > > > organizations, the possibility to compile a private DPDK library for a
> > > > particular application does not exist at all.  With this option there is
> > > > no need to recompile DPDK and it allows using an in-box packaged DPDK.
> > > > An example usage for updating the RTE_MAX_MEMZONE would be of an
> > > > application that uses the DPDK mempool library which is based on DPDK
> > > > memzone library.  The application may need to create a number of
> > > > steering tables, each of which will require its own mempool allocation.
> > > > This commit is not about how to optimize the application usage of
> > > > mempool nor about how to improve the mempool implementation based on
> > > > memzone.  It is about how to make the max memzone definition - run-time
> > > > customized.
> > > > This commit adds an API which must be called before rte_eal_init():
> > > > rte_memzone_max_set(int max).  If not called, the default memzone
> > > > (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
> > > > max memzone: rte_memzone_max_get().
> > > > 
> > > > Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> > > > ---
> > > 
> > > the use case of each application may want a different non-hard coded
> > > value makes sense.
> > > 
> > > it's less clear to me that requiring it be called before eal init makes
> > > sense over just providing it as configuration to eal init so that it is
> > > composed.
> > 
> > Why do you think it would be better as EAL init option?
> > From an API perspective, I think it is simpler to call a dedicated function.
> > And I don't think a user wants to deal with it when starting the application.
> 
> because a dedicated function that can be called detached from the eal
> state enables an opportunity for accidental and confusing use outside
> the correct context.
> 
> i know the above prescribes not to do this but.
> 
> now you can call set after eal init, but we protect about calling it
> after init by failing. what do we do sensibly with the failure?

It would be a developer mistake which could be fix during development stage
very easily. I don't see a problem here.

> > > can you elaborate further on why you need get if you have a one-shot
> > > set? why would the application not know the value if you can only ever
> > > call it once before init?
> > 
> > The "get" function is used in this patch by test and qede driver.
> > The application could use it as well, especially to query the default value.
> 
> this seems incoherent to me, why does the application not know if it has
> called set or not? if it called set it knows what the value is, if it didn't
> call set it knows what the default is.

No the application doesn't know the default, it is an internal value.

> anyway, the use case is valid and i would like to see the ability to
> change it dynamically i'd prefer not to see an api like this be introduced
> as prescribed but that's for you folks to decide.
> 
> anyway, i own a lot of apis that operate just like the proposed and
> they're great source of support overhead. i prefer not to rely on
> documenting a contract when i can enforce the contract and implicit state
> machine mechanically with the api instead.
> 
> fwiw a nicer pattern for doing this one of framework influencing config
> might look something like this.
> 
> struct eal_config config;
> 
> eal_config_init(&config); // defaults are set entire state made valid
> eal_config_set_max_memzone(&config, 1024); // default is overridden
> 
> rte_eal_init(&config);

In general, we discovered that functions doing too much are bad
for usability and for ABI stability.
In the function eal_config_init() that you propose,
any change in the struct eal_config will be an ABI breakage.



^ permalink raw reply	[relevance 4%]

* Re: [PATCH] eventdev: fix alignment padding
  2023-04-18 11:06  4% ` Morten Brørup
@ 2023-04-18 12:40  3%   ` Mattias Rönnblom
  0 siblings, 0 replies; 200+ results
From: Mattias Rönnblom @ 2023-04-18 12:40 UTC (permalink / raw)
  To: Morten Brørup, Sivaprasad Tummala, jerinj; +Cc: dev

On 2023-04-18 13:06, Morten Brørup wrote:
>> From: Sivaprasad Tummala [mailto:sivaprasad.tummala@amd.com]
>> Sent: Tuesday, 18 April 2023 12.46
>>
>> fixed the padding required to align to cacheline size.
>>
>> Fixes: 54f17843a887 ("eventdev: add port maintenance API")
>> Cc: mattias.ronnblom@ericsson.com
>>
>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>> ---
>>   lib/eventdev/rte_eventdev_core.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/eventdev/rte_eventdev_core.h
>> b/lib/eventdev/rte_eventdev_core.h
>> index c328bdbc82..c27a52ccc0 100644
>> --- a/lib/eventdev/rte_eventdev_core.h
>> +++ b/lib/eventdev/rte_eventdev_core.h
>> @@ -65,7 +65,7 @@ struct rte_event_fp_ops {
>>   	/**< PMD Tx adapter enqueue same destination function. */
>>   	event_crypto_adapter_enqueue_t ca_enqueue;
>>   	/**< PMD Crypto adapter enqueue function. */
>> -	uintptr_t reserved[6];
>> +	uintptr_t reserved[5];
>>   } __rte_cache_aligned;
> 
> This fix changes the size (reduces it by one cache line) of the elements in the public rte_event_fp_ops array, and thus breaks the ABI.
> 
> BTW, the patch it fixes, which was dated November 2021, also broke the ABI.

21.11 has a new ABI version, so that's not an issue.

> 
>>
>>   extern struct rte_event_fp_ops rte_event_fp_ops[RTE_EVENT_MAX_DEVS];
>> --
>> 2.34.1


^ permalink raw reply	[relevance 3%]

* RE: [PATCH] eventdev: fix alignment padding
  @ 2023-04-18 11:06  4% ` Morten Brørup
  2023-04-18 12:40  3%   ` Mattias Rönnblom
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-04-18 11:06 UTC (permalink / raw)
  To: Sivaprasad Tummala, jerinj; +Cc: dev, mattias.ronnblom

> From: Sivaprasad Tummala [mailto:sivaprasad.tummala@amd.com]
> Sent: Tuesday, 18 April 2023 12.46
> 
> fixed the padding required to align to cacheline size.
> 
> Fixes: 54f17843a887 ("eventdev: add port maintenance API")
> Cc: mattias.ronnblom@ericsson.com
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>  lib/eventdev/rte_eventdev_core.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/eventdev/rte_eventdev_core.h
> b/lib/eventdev/rte_eventdev_core.h
> index c328bdbc82..c27a52ccc0 100644
> --- a/lib/eventdev/rte_eventdev_core.h
> +++ b/lib/eventdev/rte_eventdev_core.h
> @@ -65,7 +65,7 @@ struct rte_event_fp_ops {
>  	/**< PMD Tx adapter enqueue same destination function. */
>  	event_crypto_adapter_enqueue_t ca_enqueue;
>  	/**< PMD Crypto adapter enqueue function. */
> -	uintptr_t reserved[6];
> +	uintptr_t reserved[5];
>  } __rte_cache_aligned;

This fix changes the size (reduces it by one cache line) of the elements in the public rte_event_fp_ops array, and thus breaks the ABI.

BTW, the patch it fixes, which was dated November 2021, also broke the ABI.

> 
>  extern struct rte_event_fp_ops rte_event_fp_ops[RTE_EVENT_MAX_DEVS];
> --
> 2.34.1


^ permalink raw reply	[relevance 4%]

* RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
  2023-04-18  7:46  3% ` David Marchand
  2023-04-18  9:27  0%   ` Xia, Chenbo
@ 2023-04-18  9:33  0%   ` Xia, Chenbo
  1 sibling, 0 replies; 200+ results
From: Xia, Chenbo @ 2023-04-18  9:33 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, skori, Cao, Yahui, Li, Miao

David,

Sorry that I missed one comment...

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, April 18, 2023 3:47 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com
> Subject: Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> Hello Chenbo,
> 
> On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
> >
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
> 
> Sorry, I did not take the time to look into the details.
> Could you summarize what would be the benefit of this series?
> 
> 
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that
> > driver could use them to access specific BAR using pread/pwrite
> > system calls when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question
> > is for all sparse mmap regions, should they be mapped to a
> > continuous virtual address region that follows device-specific
> > BAR layout or not. In theory, there could be three options to
> > support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one
> > and each region could be located anywhere in process address
> > space. But accessing the mmaped BAR will not be as easy as
> > 'bar_base_address + bar_offset', driver needs to check the
> > sparse mmap information to access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change
> > is introduced in bus_pci_driver.h. Corresponding changes in
> > all drivers are also done and currently I am assuming drivers
> > do not support this feature so they will not check the
> > 'is_sparse' flag but assumes it to be false. Note that it will
> > not break any driver and each vendor can add related logic when
> > they start to support this feature. This is only because I don't
> > want to introduce complexity to drivers that do not want to
> > support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> > ======================================================================
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> > mmaped. Region #1 will be mapped at 'base_addr' and region #2
> > will be mapped at 'base_addr + 0x3000'. The good thing is if
> > we implement like this, driver can still access all BAR registers
> > using 'bar_base_address + bar_offset' way and we don't need
> > to introduce any driver API change. But the address space
> > range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> > be reserved so it could result in waste of address space or memory
> > (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> > range). Meanwhile, driver needs to know which part of BAR is
> > mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> >   bus/pci: introduce an internal representation of PCI device
> 
> I think this first patch main motivation was to avoid ABI issues.
> Since v22.11, the rte_pci_device object is opaque to applications.
> 
> So, do we still need this patch?

I think it could be good to reduce unnecessary driver APIs..
Hiding these region information could be friendly to driver developer?

Thanks,
Chenbo

> 
> 
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >   bus/pci: add VFIO sparse mmap support
> >
> >  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
> >  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
> >  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
> >  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
> >  drivers/bus/pci/bsd/pci.c                     |  43 +-
> >  drivers/bus/pci/bus_pci_driver.h              |  24 +-
> >  drivers/bus/pci/linux/pci.c                   |  91 +++-
> >  drivers/bus/pci/linux/pci_init.h              |  14 +-
> >  drivers/bus/pci/linux/pci_uio.c               |  34 +-
> >  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
> >  drivers/bus/pci/pci_common.c                  |  57 ++-
> >  drivers/bus/pci/pci_common_uio.c              |  12 +-
> >  drivers/bus/pci/private.h                     |  25 +-
> >  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
> >  drivers/bus/pci/version.map                   |   3 +
> >  drivers/common/cnxk/roc_dev.c                 |   4 +-
> >  drivers/common/cnxk/roc_dpi.c                 |   2 +-
> >  drivers/common/cnxk/roc_ml.c                  |  22 +-
> >  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
> >  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
> >  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
> >  drivers/compress/octeontx/otx_zip.c           |   4 +-
> >  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
> >  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
> >  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
> >  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
> >  drivers/crypto/virtio/virtio_pci.c            |   6 +-
> >  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
> >  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
> >  drivers/dma/idxd/idxd_pci.c                   |   4 +-
> >  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
> >  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
> >  drivers/event/octeontx/ssovf_probe.c          |  38 +-
> >  drivers/event/octeontx/timvf_probe.c          |  18 +-
> >  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
> >  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
> >  drivers/net/ark/ark_ethdev.c                  |   4 +-
> >  drivers/net/atlantic/atl_ethdev.c             |   2 +-
> >  drivers/net/avp/avp_ethdev.c                  |  20 +-
> >  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
> >  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
> >  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
> >  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
> >  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
> >  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
> >  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
> >  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
> >  drivers/net/e1000/em_ethdev.c                 |   4 +-
> >  drivers/net/e1000/igb_ethdev.c                |   4 +-
> >  drivers/net/ena/ena_ethdev.c                  |   4 +-
> >  drivers/net/enetc/enetc_ethdev.c              |   2 +-
> >  drivers/net/enic/enic_main.c                  |   4 +-
> >  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
> >  drivers/net/gve/gve_ethdev.c                  |   4 +-
> >  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
> >  drivers/net/hns3/hns3_ethdev.c                |   2 +-
> >  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
> >  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
> >  drivers/net/i40e/i40e_ethdev.c                |   2 +-
> >  drivers/net/iavf/iavf_ethdev.c                |   2 +-
> >  drivers/net/ice/ice_dcf.c                     |   2 +-
> >  drivers/net/ice/ice_ethdev.c                  |   2 +-
> >  drivers/net/idpf/idpf_ethdev.c                |   4 +-
> >  drivers/net/igc/igc_ethdev.c                  |   2 +-
> >  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
> >  drivers/net/liquidio/lio_ethdev.c             |   4 +-
> >  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
> >  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
> >  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
> >  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
> >  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
> >  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
> >  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
> >  drivers/net/qede/qede_main.c                  |   6 +-
> >  drivers/net/sfc/sfc.c                         |   2 +-
> >  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
> >  drivers/net/virtio/virtio_pci.c               |   6 +-
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
> >  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
> >  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
> >  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
> >  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
> >  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
> >  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
> >  lib/eal/include/rte_vfio.h                    |   1 -
> >  90 files changed, 853 insertions(+), 352 deletions(-)
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 0%]

* RE: [RFC 0/4] Support VFIO sparse mmap in PCI bus
  2023-04-18  7:46  3% ` David Marchand
@ 2023-04-18  9:27  0%   ` Xia, Chenbo
  2023-04-18  9:33  0%   ` Xia, Chenbo
  1 sibling, 0 replies; 200+ results
From: Xia, Chenbo @ 2023-04-18  9:27 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, skori, Cao, Yahui, Li, Miao

Hi David,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, April 18, 2023 3:47 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>
> Cc: dev@dpdk.org; skori@marvell.com
> Subject: Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
> 
> Hello Chenbo,
> 
> On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
> >
> > This series introduces a VFIO standard capability, called sparse
> > mmap to PCI bus. In linux kernel, it's defined as
> > VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> > mmap whole BAR region into DPDK process, only mmap part of the
> > BAR region after getting sparse mmap information from kernel.
> > For the rest of BAR region that is not mmap-ed, DPDK process
> > can use pread/pwrite system calls to access. Sparse mmap is
> > useful when kernel does not want userspace to mmap whole BAR
> > region, or kernel wants to control over access to specific BAR
> > region. Vendors can choose to enable this feature or not for
> > their devices in their specific kernel modules.
> 
> Sorry, I did not take the time to look into the details.
> Could you summarize what would be the benefit of this series?

It could be different benefit for different vendor. There was one discussion:
http://inbox.dpdk.org/dev/CO6PR18MB386016A2634AF375F5B4BA8CB4899@CO6PR18MB3860.namprd18.prod.outlook.com/

Above problem is some device has very large BAR, and we don't want DPDK to map
the whole BAR.

For Intel devices, one benefit is that we want our kernel module to control over
access to specific BAR region so we will let DPDK process unable to mmap that region.
(Because after mmap, kernel will not know if userspace is accessing device BAR).

So that's why I summarize as 'Sparse mmap is useful when kernel does not want
userspace to mmap whole BAR region, or kernel wants to control over access to
specific BAR region'. It could be more usage for other vendors that I have not realized

Thanks,
Chenbo

> 
> 
> >
> > In this patchset:
> >
> > Patch 1-3 is mainly for introducing BAR access APIs so that
> > driver could use them to access specific BAR using pread/pwrite
> > system calls when part of the BAR is not mmap-able.
> >
> > Patch 4 adds the VFIO sparse mmap support finally. A question
> > is for all sparse mmap regions, should they be mapped to a
> > continuous virtual address region that follows device-specific
> > BAR layout or not. In theory, there could be three options to
> > support this feature.
> >
> > Option 1: Map sparse mmap regions independently
> > ======================================================
> > In this approach, we mmap each sparse mmap region one by one
> > and each region could be located anywhere in process address
> > space. But accessing the mmaped BAR will not be as easy as
> > 'bar_base_address + bar_offset', driver needs to check the
> > sparse mmap information to access specific BAR register.
> >
> > Patch 4 in this patchset adopts this option. Driver API change
> > is introduced in bus_pci_driver.h. Corresponding changes in
> > all drivers are also done and currently I am assuming drivers
> > do not support this feature so they will not check the
> > 'is_sparse' flag but assumes it to be false. Note that it will
> > not break any driver and each vendor can add related logic when
> > they start to support this feature. This is only because I don't
> > want to introduce complexity to drivers that do not want to
> > support this feature.
> >
> > Option 2: Map sparse mmap regions based on device-specific BAR layout
> > ======================================================================
> > In this approach, the sparse mmap regions are mapped to continuous
> > virtual address region that follows device-specific BAR layout.
> > For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> > region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> > mmaped. Region #1 will be mapped at 'base_addr' and region #2
> > will be mapped at 'base_addr + 0x3000'. The good thing is if
> > we implement like this, driver can still access all BAR registers
> > using 'bar_base_address + bar_offset' way and we don't need
> > to introduce any driver API change. But the address space
> > range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> > be reserved so it could result in waste of address space or memory
> > (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> > range). Meanwhile, driver needs to know which part of BAR is
> > mmaped (this is possible since the range is defined by vendor's
> > specific kernel module).
> >
> > Option 3: Support both option 1 & 2
> > ===================================
> > We could define a driver flag to let driver choose which way it
> > perfers since either option has its own Pros & Cons.
> >
> > Please share your comments, Thanks!
> >
> >
> > Chenbo Xia (4):
> >   bus/pci: introduce an internal representation of PCI device
> 
> I think this first patch main motivation was to avoid ABI issues.
> Since v22.11, the rte_pci_device object is opaque to applications.
> 
> So, do we still need this patch?
> 
> 
> >   bus/pci: avoid depending on private value in kernel source
> >   bus/pci: introduce helper for MMIO read and write
> >   bus/pci: add VFIO sparse mmap support
> >
> >  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
> >  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
> >  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
> >  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
> >  drivers/bus/pci/bsd/pci.c                     |  43 +-
> >  drivers/bus/pci/bus_pci_driver.h              |  24 +-
> >  drivers/bus/pci/linux/pci.c                   |  91 +++-
> >  drivers/bus/pci/linux/pci_init.h              |  14 +-
> >  drivers/bus/pci/linux/pci_uio.c               |  34 +-
> >  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
> >  drivers/bus/pci/pci_common.c                  |  57 ++-
> >  drivers/bus/pci/pci_common_uio.c              |  12 +-
> >  drivers/bus/pci/private.h                     |  25 +-
> >  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
> >  drivers/bus/pci/version.map                   |   3 +
> >  drivers/common/cnxk/roc_dev.c                 |   4 +-
> >  drivers/common/cnxk/roc_dpi.c                 |   2 +-
> >  drivers/common/cnxk/roc_ml.c                  |  22 +-
> >  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
> >  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
> >  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
> >  drivers/compress/octeontx/otx_zip.c           |   4 +-
> >  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
> >  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
> >  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
> >  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
> >  drivers/crypto/virtio/virtio_pci.c            |   6 +-
> >  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
> >  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
> >  drivers/dma/idxd/idxd_pci.c                   |   4 +-
> >  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
> >  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
> >  drivers/event/octeontx/ssovf_probe.c          |  38 +-
> >  drivers/event/octeontx/timvf_probe.c          |  18 +-
> >  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
> >  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
> >  drivers/net/ark/ark_ethdev.c                  |   4 +-
> >  drivers/net/atlantic/atl_ethdev.c             |   2 +-
> >  drivers/net/avp/avp_ethdev.c                  |  20 +-
> >  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
> >  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
> >  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
> >  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
> >  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
> >  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
> >  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
> >  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
> >  drivers/net/e1000/em_ethdev.c                 |   4 +-
> >  drivers/net/e1000/igb_ethdev.c                |   4 +-
> >  drivers/net/ena/ena_ethdev.c                  |   4 +-
> >  drivers/net/enetc/enetc_ethdev.c              |   2 +-
> >  drivers/net/enic/enic_main.c                  |   4 +-
> >  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
> >  drivers/net/gve/gve_ethdev.c                  |   4 +-
> >  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
> >  drivers/net/hns3/hns3_ethdev.c                |   2 +-
> >  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
> >  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
> >  drivers/net/i40e/i40e_ethdev.c                |   2 +-
> >  drivers/net/iavf/iavf_ethdev.c                |   2 +-
> >  drivers/net/ice/ice_dcf.c                     |   2 +-
> >  drivers/net/ice/ice_ethdev.c                  |   2 +-
> >  drivers/net/idpf/idpf_ethdev.c                |   4 +-
> >  drivers/net/igc/igc_ethdev.c                  |   2 +-
> >  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
> >  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
> >  drivers/net/liquidio/lio_ethdev.c             |   4 +-
> >  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
> >  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
> >  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
> >  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
> >  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
> >  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
> >  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
> >  drivers/net/qede/qede_main.c                  |   6 +-
> >  drivers/net/sfc/sfc.c                         |   2 +-
> >  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
> >  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
> >  drivers/net/virtio/virtio_pci.c               |   6 +-
> >  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
> >  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
> >  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
> >  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
> >  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
> >  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
> >  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
> >  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
> >  lib/eal/include/rte_vfio.h                    |   1 -
> >  90 files changed, 853 insertions(+), 352 deletions(-)
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 1/4] doc: announce new cpu flag added to rte_cpu_flag_t
  2023-04-18  8:52  3%         ` Ferruh Yigit
@ 2023-04-18  9:22  3%           ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2023-04-18  9:22 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Sivaprasad Tummala, david.hunt, dev, david.marchand, Thomas Monjalon

On Tue, Apr 18, 2023 at 09:52:49AM +0100, Ferruh Yigit wrote:
> On 4/18/2023 9:25 AM, Sivaprasad Tummala wrote:
> > A new flag RTE_CPUFLAG_MONITORX is added to rte_cpu_flag_t in
> > DPDK 23.07 release to support monitorx instruction on EPYC processors.
> > This results in ABI breakage for legacy apps.
> > 
> > Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> > index dcc1ca1696..831713983f 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -163,3 +163,6 @@ Deprecation Notices
> >    The new port library API (functions rte_swx_port_*)
> >    will gradually transition from experimental to stable status
> >    starting with DPDK 23.07 release.
> > +
> > +* eal/x86: The enum ``rte_cpu_flag_t`` will be extended with a new cpu flag
> > +  ``RTE_CPUFLAG_MONITORX`` to support monitorx instruction on EPYC processors.
> 
> 
> OK to add new CPU flag,
> Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
> 
> 
> But @David, @Bruce, is it OK to break ABI whenever a new CPU flag is
> added, should we hide CPU flags better?
> 
> Or other option can be drop the 'RTE_CPUFLAG_NUMFLAGS' and allow
> appending new flags to the end although this may lead enum become more
> messy by time.

+1 top drop the NUMFLAGS value. We should not break ABI each time we need a
new flag.

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v4 1/4] doc: announce new cpu flag added to rte_cpu_flag_t
  2023-04-18  8:25  3%       ` [PATCH v4 1/4] doc: announce new cpu flag added to rte_cpu_flag_t Sivaprasad Tummala
@ 2023-04-18  8:52  3%         ` Ferruh Yigit
  2023-04-18  9:22  3%           ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-04-18  8:52 UTC (permalink / raw)
  To: Sivaprasad Tummala, david.hunt
  Cc: dev, david.marchand, Bruce Richardson, Thomas Monjalon

On 4/18/2023 9:25 AM, Sivaprasad Tummala wrote:
> A new flag RTE_CPUFLAG_MONITORX is added to rte_cpu_flag_t in
> DPDK 23.07 release to support monitorx instruction on EPYC processors.
> This results in ABI breakage for legacy apps.
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index dcc1ca1696..831713983f 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -163,3 +163,6 @@ Deprecation Notices
>    The new port library API (functions rte_swx_port_*)
>    will gradually transition from experimental to stable status
>    starting with DPDK 23.07 release.
> +
> +* eal/x86: The enum ``rte_cpu_flag_t`` will be extended with a new cpu flag
> +  ``RTE_CPUFLAG_MONITORX`` to support monitorx instruction on EPYC processors.


OK to add new CPU flag,
Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>


But @David, @Bruce, is it OK to break ABI whenever a new CPU flag is
added, should we hide CPU flags better?

Or other option can be drop the 'RTE_CPUFLAG_NUMFLAGS' and allow
appending new flags to the end although this may lead enum become more
messy by time.

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/3] security: introduce out of place support for inline ingress
  2023-04-11 18:05  3%   ` Stephen Hemminger
@ 2023-04-18  8:33  4%     ` Jerin Jacob
  2023-04-24 22:41  3%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-04-18  8:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Nithin Dabilpuram, Thomas Monjalon, Akhil Goyal, jerinj, dev,
	Morten Brørup, techboard

On Tue, Apr 11, 2023 at 11:36 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 11 Apr 2023 15:34:07 +0530
> Nithin Dabilpuram <ndabilpuram@marvell.com> wrote:
>
> > diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> > index 4bacf9fcd9..866cd4e8ee 100644
> > --- a/lib/security/rte_security.h
> > +++ b/lib/security/rte_security.h
> > @@ -275,6 +275,17 @@ struct rte_security_ipsec_sa_options {
> >        */
> >       uint32_t ip_reassembly_en : 1;
> >
> > +     /** Enable out of place processing on inline inbound packets.
> > +      *
> > +      * * 1: Enable driver to perform Out-of-place(OOP) processing for this inline
> > +      *      inbound SA if supported by driver. PMD need to register mbuf
> > +      *      dynamic field using rte_security_oop_dynfield_register()
> > +      *      and security session creation would fail if dynfield is not
> > +      *      registered successfully.
> > +      * * 0: Disable OOP processing for this session (default).
> > +      */
> > +     uint32_t ingress_oop : 1;
> > +
> >       /** Reserved bit fields for future extension
> >        *
> >        * User should ensure reserved_opts is cleared as it may change in
> > @@ -282,7 +293,7 @@ struct rte_security_ipsec_sa_options {
> >        *
> >        * Note: Reduce number of bits in reserved_opts for every new option.
> >        */
> > -     uint32_t reserved_opts : 17;
> > +     uint32_t reserved_opts : 16;
> >  };
>
> NAK
> Let me repeat the reserved bit rant. YAGNI
>
> Reserved space is not usable without ABI breakage unless the existing
> code enforces that reserved space has to be zero.
>
> Just saying "User should ensure reserved_opts is cleared" is not enough.

Yes. I think, we need to enforce to have _init functions for the
structures which is using reserved filed.

On the same note on YAGNI, I am wondering why NOT introduce
RTE_NEXT_ABI marco kind of scheme to compile out ABI breaking changes.
By keeping RTE_NEXT_ABI disable by default, enable explicitly if user
wants it to avoid waiting for one year any ABI breaking changes.
There are a lot of "fixed appliance" customers (not OS distribution
driven customer) they are willing to recompile DPDK for new feature.
What we are loosing with this scheme?




>
>

^ permalink raw reply	[relevance 4%]

* [PATCH v4 1/4] doc: announce new cpu flag added to rte_cpu_flag_t
  @ 2023-04-18  8:25  3%       ` Sivaprasad Tummala
  2023-04-18  8:52  3%         ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Sivaprasad Tummala @ 2023-04-18  8:25 UTC (permalink / raw)
  To: david.hunt; +Cc: dev, david.marchand, ferruh.yigit

A new flag RTE_CPUFLAG_MONITORX is added to rte_cpu_flag_t in
DPDK 23.07 release to support monitorx instruction on EPYC processors.
This results in ABI breakage for legacy apps.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 doc/guides/rel_notes/deprecation.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index dcc1ca1696..831713983f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -163,3 +163,6 @@ Deprecation Notices
   The new port library API (functions rte_swx_port_*)
   will gradually transition from experimental to stable status
   starting with DPDK 23.07 release.
+
+* eal/x86: The enum ``rte_cpu_flag_t`` will be extended with a new cpu flag
+  ``RTE_CPUFLAG_MONITORX`` to support monitorx instruction on EPYC processors.
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* Re: [RFC 0/4] Support VFIO sparse mmap in PCI bus
  @ 2023-04-18  7:46  3% ` David Marchand
  2023-04-18  9:27  0%   ` Xia, Chenbo
  2023-04-18  9:33  0%   ` Xia, Chenbo
  0 siblings, 2 replies; 200+ results
From: David Marchand @ 2023-04-18  7:46 UTC (permalink / raw)
  To: Chenbo Xia; +Cc: dev, skori

Hello Chenbo,

On Tue, Apr 18, 2023 at 7:49 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
>
> This series introduces a VFIO standard capability, called sparse
> mmap to PCI bus. In linux kernel, it's defined as
> VFIO_REGION_INFO_CAP_SPARSE_MMAP. Sparse mmap means instead of
> mmap whole BAR region into DPDK process, only mmap part of the
> BAR region after getting sparse mmap information from kernel.
> For the rest of BAR region that is not mmap-ed, DPDK process
> can use pread/pwrite system calls to access. Sparse mmap is
> useful when kernel does not want userspace to mmap whole BAR
> region, or kernel wants to control over access to specific BAR
> region. Vendors can choose to enable this feature or not for
> their devices in their specific kernel modules.

Sorry, I did not take the time to look into the details.
Could you summarize what would be the benefit of this series?


>
> In this patchset:
>
> Patch 1-3 is mainly for introducing BAR access APIs so that
> driver could use them to access specific BAR using pread/pwrite
> system calls when part of the BAR is not mmap-able.
>
> Patch 4 adds the VFIO sparse mmap support finally. A question
> is for all sparse mmap regions, should they be mapped to a
> continuous virtual address region that follows device-specific
> BAR layout or not. In theory, there could be three options to
> support this feature.
>
> Option 1: Map sparse mmap regions independently
> ======================================================
> In this approach, we mmap each sparse mmap region one by one
> and each region could be located anywhere in process address
> space. But accessing the mmaped BAR will not be as easy as
> 'bar_base_address + bar_offset', driver needs to check the
> sparse mmap information to access specific BAR register.
>
> Patch 4 in this patchset adopts this option. Driver API change
> is introduced in bus_pci_driver.h. Corresponding changes in
> all drivers are also done and currently I am assuming drivers
> do not support this feature so they will not check the
> 'is_sparse' flag but assumes it to be false. Note that it will
> not break any driver and each vendor can add related logic when
> they start to support this feature. This is only because I don't
> want to introduce complexity to drivers that do not want to
> support this feature.
>
> Option 2: Map sparse mmap regions based on device-specific BAR layout
> ======================================================================
> In this approach, the sparse mmap regions are mapped to continuous
> virtual address region that follows device-specific BAR layout.
> For example, the BAR size is 0x4000 and only 0-0x1000 (sparse mmap
> region #1) and 0x3000-0x4000 (sparse mmap region #2) could be
> mmaped. Region #1 will be mapped at 'base_addr' and region #2
> will be mapped at 'base_addr + 0x3000'. The good thing is if
> we implement like this, driver can still access all BAR registers
> using 'bar_base_address + bar_offset' way and we don't need
> to introduce any driver API change. But the address space
> range 'base_addr + 0x1000' to 'base_addr + 0x3000' may need to
> be reserved so it could result in waste of address space or memory
> (when we use MAP_ANONYMOUS and MAP_PRIVATE flag to reserve this
> range). Meanwhile, driver needs to know which part of BAR is
> mmaped (this is possible since the range is defined by vendor's
> specific kernel module).
>
> Option 3: Support both option 1 & 2
> ===================================
> We could define a driver flag to let driver choose which way it
> perfers since either option has its own Pros & Cons.
>
> Please share your comments, Thanks!
>
>
> Chenbo Xia (4):
>   bus/pci: introduce an internal representation of PCI device

I think this first patch main motivation was to avoid ABI issues.
Since v22.11, the rte_pci_device object is opaque to applications.

So, do we still need this patch?


>   bus/pci: avoid depending on private value in kernel source
>   bus/pci: introduce helper for MMIO read and write
>   bus/pci: add VFIO sparse mmap support
>
>  drivers/baseband/acc/rte_acc100_pmd.c         |   6 +-
>  drivers/baseband/acc/rte_vrb_pmd.c            |   6 +-
>  .../fpga_5gnr_fec/rte_fpga_5gnr_fec.c         |   6 +-
>  drivers/baseband/fpga_lte_fec/fpga_lte_fec.c  |   6 +-
>  drivers/bus/pci/bsd/pci.c                     |  43 +-
>  drivers/bus/pci/bus_pci_driver.h              |  24 +-
>  drivers/bus/pci/linux/pci.c                   |  91 +++-
>  drivers/bus/pci/linux/pci_init.h              |  14 +-
>  drivers/bus/pci/linux/pci_uio.c               |  34 +-
>  drivers/bus/pci/linux/pci_vfio.c              | 445 ++++++++++++++----
>  drivers/bus/pci/pci_common.c                  |  57 ++-
>  drivers/bus/pci/pci_common_uio.c              |  12 +-
>  drivers/bus/pci/private.h                     |  25 +-
>  drivers/bus/pci/rte_bus_pci.h                 |  48 ++
>  drivers/bus/pci/version.map                   |   3 +
>  drivers/common/cnxk/roc_dev.c                 |   4 +-
>  drivers/common/cnxk/roc_dpi.c                 |   2 +-
>  drivers/common/cnxk/roc_ml.c                  |  22 +-
>  drivers/common/qat/dev/qat_dev_gen1.c         |   2 +-
>  drivers/common/qat/dev/qat_dev_gen4.c         |   4 +-
>  drivers/common/sfc_efx/sfc_efx.c              |   2 +-
>  drivers/compress/octeontx/otx_zip.c           |   4 +-
>  drivers/crypto/ccp/ccp_dev.c                  |   4 +-
>  drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +-
>  drivers/crypto/nitrox/nitrox_device.c         |   4 +-
>  drivers/crypto/octeontx/otx_cryptodev_ops.c   |   6 +-
>  drivers/crypto/virtio/virtio_pci.c            |   6 +-
>  drivers/dma/cnxk/cnxk_dmadev.c                |   2 +-
>  drivers/dma/hisilicon/hisi_dmadev.c           |   6 +-
>  drivers/dma/idxd/idxd_pci.c                   |   4 +-
>  drivers/dma/ioat/ioat_dmadev.c                |   2 +-
>  drivers/event/dlb2/pf/dlb2_main.c             |  16 +-
>  drivers/event/octeontx/ssovf_probe.c          |  38 +-
>  drivers/event/octeontx/timvf_probe.c          |  18 +-
>  drivers/event/skeleton/skeleton_eventdev.c    |   2 +-
>  drivers/mempool/octeontx/octeontx_fpavf.c     |   6 +-
>  drivers/net/ark/ark_ethdev.c                  |   4 +-
>  drivers/net/atlantic/atl_ethdev.c             |   2 +-
>  drivers/net/avp/avp_ethdev.c                  |  20 +-
>  drivers/net/axgbe/axgbe_ethdev.c              |   4 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c              |   6 +-
>  drivers/net/bnxt/bnxt_ethdev.c                |   8 +-
>  drivers/net/cpfl/cpfl_ethdev.c                |   4 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c              |   2 +-
>  drivers/net/cxgbe/cxgbe_main.c                |   2 +-
>  drivers/net/cxgbe/cxgbevf_ethdev.c            |   2 +-
>  drivers/net/cxgbe/cxgbevf_main.c              |   2 +-
>  drivers/net/e1000/em_ethdev.c                 |   4 +-
>  drivers/net/e1000/igb_ethdev.c                |   4 +-
>  drivers/net/ena/ena_ethdev.c                  |   4 +-
>  drivers/net/enetc/enetc_ethdev.c              |   2 +-
>  drivers/net/enic/enic_main.c                  |   4 +-
>  drivers/net/fm10k/fm10k_ethdev.c              |   2 +-
>  drivers/net/gve/gve_ethdev.c                  |   4 +-
>  drivers/net/hinic/base/hinic_pmd_hwif.c       |  14 +-
>  drivers/net/hns3/hns3_ethdev.c                |   2 +-
>  drivers/net/hns3/hns3_ethdev_vf.c             |   2 +-
>  drivers/net/hns3/hns3_rxtx.c                  |   4 +-
>  drivers/net/i40e/i40e_ethdev.c                |   2 +-
>  drivers/net/iavf/iavf_ethdev.c                |   2 +-
>  drivers/net/ice/ice_dcf.c                     |   2 +-
>  drivers/net/ice/ice_ethdev.c                  |   2 +-
>  drivers/net/idpf/idpf_ethdev.c                |   4 +-
>  drivers/net/igc/igc_ethdev.c                  |   2 +-
>  drivers/net/ionic/ionic_dev_pci.c             |   2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c              |   4 +-
>  drivers/net/liquidio/lio_ethdev.c             |   4 +-
>  drivers/net/nfp/nfp_ethdev.c                  |   2 +-
>  drivers/net/nfp/nfp_ethdev_vf.c               |   6 +-
>  drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c    |   4 +-
>  drivers/net/ngbe/ngbe_ethdev.c                |   2 +-
>  drivers/net/octeon_ep/otx_ep_ethdev.c         |   2 +-
>  drivers/net/octeontx/base/octeontx_pkivf.c    |   6 +-
>  drivers/net/octeontx/base/octeontx_pkovf.c    |  12 +-
>  drivers/net/qede/qede_main.c                  |   6 +-
>  drivers/net/sfc/sfc.c                         |   2 +-
>  drivers/net/thunderx/nicvf_ethdev.c           |   2 +-
>  drivers/net/txgbe/txgbe_ethdev.c              |   2 +-
>  drivers/net/txgbe/txgbe_ethdev_vf.c           |   2 +-
>  drivers/net/virtio/virtio_pci.c               |   6 +-
>  drivers/net/vmxnet3/vmxnet3_ethdev.c          |   4 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy.c             |  10 +-
>  drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |   6 +-
>  drivers/raw/ifpga/afu_pmd_n3000.c             |   4 +-
>  drivers/raw/ifpga/ifpga_rawdev.c              |   6 +-
>  drivers/raw/ntb/ntb_hw_intel.c                |   8 +-
>  drivers/vdpa/ifc/ifcvf_vdpa.c                 |   6 +-
>  drivers/vdpa/sfc/sfc_vdpa_hw.c                |   2 +-
>  drivers/vdpa/sfc/sfc_vdpa_ops.c               |   2 +-
>  lib/eal/include/rte_vfio.h                    |   1 -
>  90 files changed, 853 insertions(+), 352 deletions(-)


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* [PATCH v7 10/14] eal: expand most macros to empty when using MSVC
  @ 2023-04-17 16:10  5%   ` Tyler Retzlaff
  2023-04-17 16:10  3%   ` [PATCH v7 12/14] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-17 16:10 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

For now expand a lot of common rte macros empty. The catch here is we
need to test that most of the macros do what they should but at the same
time they are blocking work needed to bootstrap of the unit tests.

Later we will return and provide (where possible) expansions that work
correctly for msvc and where not possible provide some alternate macros
to achieve the same outcome.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/eal/include/rte_branch_prediction.h |  8 +++++
 lib/eal/include/rte_common.h            | 54 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_compat.h            | 20 ++++++++++++
 3 files changed, 82 insertions(+)

diff --git a/lib/eal/include/rte_branch_prediction.h b/lib/eal/include/rte_branch_prediction.h
index 0256a9d..1eff9f6 100644
--- a/lib/eal/include/rte_branch_prediction.h
+++ b/lib/eal/include/rte_branch_prediction.h
@@ -25,7 +25,11 @@
  *
  */
 #ifndef likely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define likely(x)	__builtin_expect(!!(x), 1)
+#else
+#define likely(x)	(!!(x))
+#endif
 #endif /* likely */
 
 /**
@@ -39,7 +43,11 @@
  *
  */
 #ifndef unlikely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	(!!(x))
+#endif
 #endif /* unlikely */
 
 #ifdef __cplusplus
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 2f464e3..0c55a23 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -41,6 +41,10 @@
 #define RTE_STD_C11
 #endif
 
+#ifdef RTE_TOOLCHAIN_MSVC
+#define __extension__
+#endif
+
 /*
  * RTE_TOOLCHAIN_GCC is defined if the target is built with GCC,
  * while a host application (like pmdinfogen) may have another compiler.
@@ -65,7 +69,11 @@
 /**
  * Force alignment
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
+#else
+#define __rte_aligned(a)
+#endif
 
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
@@ -80,16 +88,29 @@
 /**
  * Force a structure to be packed
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_packed __attribute__((__packed__))
+#else
+#define __rte_packed
+#endif
 
 /**
  * Macro to mark a type that is not subject to type-based aliasing rules
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_may_alias __attribute__((__may_alias__))
+#else
+#define __rte_may_alias
+#endif
 
 /******* Macro to mark functions and fields scheduled for removal *****/
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_deprecated	__attribute__((__deprecated__))
 #define __rte_deprecated_msg(msg)	__attribute__((__deprecated__(msg)))
+#else
+#define __rte_deprecated
+#define __rte_deprecated_msg(msg)
+#endif
 
 /**
  *  Macro to mark macros and defines scheduled for removal
@@ -110,14 +131,22 @@
 /**
  * Force symbol to be generated even if it appears to be unused.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_used __attribute__((used))
+#else
+#define __rte_used
+#endif
 
 /*********** Macros to eliminate unused variable warnings ********/
 
 /**
  * short definition to mark a function parameter unused
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_unused __attribute__((__unused__))
+#else
+#define __rte_unused
+#endif
 
 /**
  * Mark pointer as restricted with regard to pointer aliasing.
@@ -141,6 +170,7 @@
  * even if the underlying stdio implementation is ANSI-compliant,
  * so this must be overridden.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #if RTE_CC_IS_GNU
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(gnu_printf, format_index, first_arg)))
@@ -148,6 +178,9 @@
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
+#else
+#define __rte_format_printf(format_index, first_arg)
+#endif
 
 /**
  * Tells compiler that the function returns a value that points to
@@ -222,7 +255,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /**
  * Hint never returning function
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_noreturn __attribute__((noreturn))
+#else
+#define __rte_noreturn
+#endif
 
 /**
  * Issue a warning in case the function's return value is ignored.
@@ -247,12 +284,20 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  }
  * @endcode
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_warn_unused_result __attribute__((warn_unused_result))
+#else
+#define __rte_warn_unused_result
+#endif
 
 /**
  * Force a function to be inlined
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_always_inline inline __attribute__((always_inline))
+#else
+#define __rte_always_inline
+#endif
 
 /**
  * Force a function to be noinlined
@@ -437,7 +482,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 #define RTE_CACHE_LINE_MIN_SIZE 64
 
 /** Force alignment to cache line. */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
+#else
+#define __rte_cache_aligned
+#endif
 
 /** Force minimum cache line alignment. */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
@@ -812,12 +861,17 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  struct wrapper *w = container_of(x, struct wrapper, c);
  */
 #ifndef container_of
+#ifndef RTE_TOOLCHAIN_MSVC
 #define container_of(ptr, type, member)	__extension__ ({		\
 			const typeof(((type *)0)->member) *_ptr = (ptr); \
 			__rte_unused type *_target_ptr =	\
 				(type *)(ptr);				\
 			(type *)(((uintptr_t)_ptr) - offsetof(type, member)); \
 		})
+#else
+#define container_of(ptr, type, member) \
+			((type *)((uintptr_t)(ptr) - offsetof(type, member)))
+#endif
 #endif
 
 /** Swap two variables. */
diff --git a/lib/eal/include/rte_compat.h b/lib/eal/include/rte_compat.h
index fc9fbaa..6a4b5ee 100644
--- a/lib/eal/include/rte_compat.h
+++ b/lib/eal/include/rte_compat.h
@@ -12,14 +12,22 @@
 
 #ifndef ALLOW_EXPERIMENTAL_API
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((deprecated("Symbol is not yet part of stable ABI"), \
 section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #endif
 
@@ -30,23 +38,35 @@
 
 #if !defined ALLOW_INTERNAL_API && __has_attribute(error) /* For GCC */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((error("Symbol is not public ABI"), \
 section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #elif !defined ALLOW_INTERNAL_API && __has_attribute(diagnose_if) /* For clang */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 _Pragma("GCC diagnostic push") \
 _Pragma("GCC diagnostic ignored \"-Wgcc-compat\"") \
 __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
 section(".text.internal"))) \
 _Pragma("GCC diagnostic pop")
+#else
+#define __rte_internal
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #endif
 
-- 
1.8.3.1


^ permalink raw reply	[relevance 5%]

* [PATCH v7 12/14] telemetry: avoid expanding versioned symbol macros on MSVC
    2023-04-17 16:10  5%   ` [PATCH v7 10/14] eal: expand most macros to empty when using MSVC Tyler Retzlaff
@ 2023-04-17 16:10  3%   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-17 16:10 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

Windows does not support versioned symbols. Fortunately Windows also
doesn't have an exported stable ABI.

Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
functions.

Windows does have a way to achieve similar versioning for symbols but it
is not a simple #define so it will be done as a work package later.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
index 2bac2de..284c16e 100644
--- a/lib/telemetry/telemetry_data.c
+++ b/lib/telemetry/telemetry_data.c
@@ -82,8 +82,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
 		int64_t x), rte_tel_data_add_array_int_v24);
+#else
+int
+rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
+{
+	return rte_tel_data_add_array_int_v24(d, x);
+}
+#endif
 
 int
 rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
@@ -220,8 +228,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
 		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
+#else
+int
+rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
+{
+	return rte_tel_data_add_dict_int_v24(d, name, val);
+}
+#endif
 
 int
 rte_tel_data_add_dict_uint(struct rte_tel_data *d,
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v3 1/4] doc: announce new cpu flag added to rte_cpu_flag_t
  2023-04-13 11:53  3% ` [PATCH v2 2/3] doc: announce new cpu flag added to rte_cpu_flag_t Sivaprasad Tummala
@ 2023-04-17  4:31  3%   ` Sivaprasad Tummala
    0 siblings, 1 reply; 200+ results
From: Sivaprasad Tummala @ 2023-04-17  4:31 UTC (permalink / raw)
  To: david.hunt; +Cc: dev

A new flag RTE_CPUFLAG_MONITORX is added to rte_cpu_flag_t in
DPDK 23.07 release to support monitorx instruction on EPYC processors.
This results in ABI breakage for legacy apps.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 doc/guides/rel_notes/deprecation.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index dcc1ca1696..831713983f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -163,3 +163,6 @@ Deprecation Notices
   The new port library API (functions rte_swx_port_*)
   will gradually transition from experimental to stable status
   starting with DPDK 23.07 release.
+
+* eal/x86: The enum ``rte_cpu_flag_t`` will be extended with a new cpu flag
+  ``RTE_CPUFLAG_MONITORX`` to support monitorx instruction on EPYC processors.
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* RE: [PATCH v5 11/14] eal: expand most macros to empty when using MSVC
  2023-04-15 20:52  4%           ` Tyler Retzlaff
@ 2023-04-15 22:41  4%             ` Morten Brørup
  0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2023-04-15 22:41 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: bruce.richardson, david.marchand, thomas, konstantin.ananyev, dev

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Saturday, 15 April 2023 22.52
> 
> On Sat, Apr 15, 2023 at 09:16:21AM +0200, Morten Brørup wrote:
> > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > Sent: Friday, 14 April 2023 19.02
> > >
> > > On Fri, Apr 14, 2023 at 08:45:17AM +0200, Morten Brørup wrote:
> > > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > > Sent: Thursday, 13 April 2023 23.26
> > > > >
> > > > > For now expand a lot of common rte macros empty. The catch here
> is
> > > we
> > > > > need to test that most of the macros do what they should but at
> the
> > > same
> > > > > time they are blocking work needed to bootstrap of the unit
> tests.
> > > > >
> > > > > Later we will return and provide (where possible) expansions
> that
> > > work
> > > > > correctly for msvc and where not possible provide some alternate
> > > macros
> > > > > to achieve the same outcome.
> > > > >
> > > > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >
> > [...]
> >
> > > > >  /**
> > > > >   * Force alignment
> > > > >   */
> > > > > +#ifndef RTE_TOOLCHAIN_MSVC
> > > > >  #define __rte_aligned(a) __attribute__((__aligned__(a)))
> > > > > +#else
> > > > > +#define __rte_aligned(a)
> > > > > +#endif
> > > >
> > > > It should be reviewed that __rte_aligned() is only used for
> > > optimization purposes, and is not required for DPDK to function
> > > properly.
> > >
> > > so to expand on what i have in mind (and explain why i leave it
> expanded
> > > empty for now)
> > >
> > > while msvc has a __declspec for align there is a mismatch between
> > > where gcc and msvc want it placed to control alignment of objects.
> > >
> > > msvc support won't be functional in 23.07 because of atomics. so
> once
> > > we reach the 23.11 cycle (where we can merge c11 changes) it means
> we
> > > can also use standard _Alignas which can accomplish the same thing
> > > but portably.
> >
> > That (C11 standard _Alignas) should be the roadmap for solving the
> alignment requirements.
> >
> > This should be a general principle for DPDK... if the C standard
> offers something, don't reinvent our own. And as a consequence of the
> upgrade to C11, we should deprecate all our own now-obsolete substitutes
> for these.
> >
> > >
> > > full disclosure the catch is i still have to properly locate the
> <thing>
> > > that does the alignment and some small questions about the expansion
> and
> > > use of the existing macro.
> > >
> > > on the subject of DPDK requiring proper alignment, you're right it
> > > is generally for performance but also for pre-c11 atomics.
> > >
> > > one question i have been asking myself is would the community see
> value
> > > in more compile time assertions / testing of the size and alignment
> of
> > > structures and offset of structure fields? we have a few key
> > > RTE_BUILD_BUG_ON() assertions but i've discovered they don't offer
> > > comprehensive protection.
> >
> > Absolutely. Catching bugs at build time is much better than any
> alternative!
> 
> that's handy feedback. i am now encouraged to include more compile time
> checks in advance of or along with changes related to structure abi.

Sounds good.

Disclaimer: "Absolutely" was my personal response. But I seriously doubt that anyone in the DPDK community would object to more build time checks. Stability and code quality carries a lot of weight in DPDK community discussions.

With that said, please expect that maintainers might want you to split your patches, so the additional checks are separated from the MSVC changes.

> follow on question, once we do get to use c11 would something like
> _Static_assert be preferable over RTE_BUILD_BUG_ON? structures sensitive
> to layout could be co-located with the asserts right at the point of
> definition. or is there something extra RTE_BUILD_BUG_ON gives us?

People may have different opinions on RTE_BUILD_BUG_ON vs. _Static_assert or static_assert.

Personally, I prefer static_assert/_Static_assert. It also has the advantage that it can be used in the global scope, directly following the structure definitions (like you mention), whereas RTE_BUILD_BUG_ON must be inside a code block (which can probably be worked around by making a dummy static inline function only containing the RTE_BUILD_BUG_ON).

And in the spirit of my proposal of not using home-grown macros as alternatives to what the C standard provides, I think we should deprecate and get rid of RTE_BUILD_BUG_ON in favor of static_assert/_Static_assert introduced by the C11 standard. (My personal opinion, no such principle decision has been made!)

If we want to keep RTE_BUILD_BUG_ON for some reason, we could change its implementation to use static_assert/_Static_assert instead of creating an invalid pointer to make the compilation fail.

> 
> >
> > > > >  /**
> > > > >   * Force a structure to be packed
> > > > >   */
> > > > > +#ifndef RTE_TOOLCHAIN_MSVC
> > > > >  #define __rte_packed __attribute__((__packed__))
> > > > > +#else
> > > > > +#define __rte_packed
> > > > > +#endif
> > > >
> > > > Similar comment as for __rte_aligned(); however, I consider it
> more
> > > likely that structure packing is a functional requirement, and not
> just
> > > used for optimization. Based on my experience, it may be used for
> > > packing network structures; perhaps not in DPDK itself but maybe in
> DPDK
> > > applications.
> > >
> > > so interestingly i've discovered this is kind of a mess and as you
> note
> > > some places we can't just "fix" it for abi compatibility reasons.
> > >
> > > in some instances the packing is being applied to structures where
> it is
> > > essentially a noop. i.e. natural alignment gets you the same thing
> so it
> > > is superfluous.
> > >
> > > in some instances the packing is being applied to structures that
> are
> > > private and it appears to be completely unnecessary e.g. some
> structure
> > > that isn't nested into something else and sizeof() or offsetof()
> fields
> > > don't matter in the context of their use.
> > >
> > > in some instances it is completely necessary usually when type
> punning
> > > buffers containing network framing etc...
> > >
> > > unfortunately the standard doesn't offer me an out here as there is
> an
> > > issue of placement of the pragma/attributes that do the packing.
> > >
> > > for places it isn't needed it, whatever i just expand empty. for
> places
> > > it is superfluous again because msvc has no stable abi (we're not
> > > established yet) again i just expand empty. finally for the places
> where
> > > it is needed i'll probably need to expand conditionally but i think
> the
> > > instances are far fewer than current use.
> >
> > Optimally, we will have a common macro (or other solution) to support
> both GCC/CLANG and MSVC to replace or supplement __rte_packed. However,
> the cost of this may be an API break if we replace __rte_packed.
> >
> > >
> > > >
> > > > The same risk applies to __rte_aligned(), but with lower
> probability.
> > >
> > > so that's the long winded story of why they are both expanded empty
> for
> > > now for msvc. but when the time comes i want to submit patch series
> that
> > > focus on each specifically to generate robust discussion.
> >
> > Sounds like the right path to take.
> >
> > Now, I'm thinking ahead here...
> >
> > We should be prepared to accept a major ABI/API break at one point in
> time, to replace our home-grown macros with C11 standard solutions and
> to fully support MSVC. This is not happening anytime soon, but the
> Techboard should acknowledge that this is going to happen (with an
> unspecified release), so it can be formally announced. The sooner it is
> announced, the more time developers will have to prepare for it.
> 
> so, just to avoid any confusion i want to make it clear that i am not
> planning to submit changes that would change abi as a part of supporting
> msvc (aside from changing to standard atomics which we agreed on).

Thank you for clarifying.

> 
> in general there are some cleanups we could make in the area of code
> maintainability and portability and we may want to discuss the
> advantages or disadvantages of making those changes. but i think those
> changes are a topic unrelated to windows or msvc specifically.

This was the point I was trying to make, when I proposed accepting a major ABI/API break. Sorry about my unclear wording.

If we collect a wish list of breaking changes, I would personally prefer a "big bang" major ABI/API break, rather than a series of incremental API/ABI breaks over multiple DPDK release. In this regard, we could mix both changes driven by the migration to pure C11 (e.g. getting rid of now-obsolete macros, such as RTE_BUILD_BUG_ON, and compiler intrinsics, such as __rte_aligned) and MSVC portability changes (e.g. an improved macro to support structure packing).

> 
> >
> > All the details do not need to be known at the time of the
> announcement; they can be added along the way, based on the discussions
> from your future patches.
> 
> >
> > >
> > > ty

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v5 11/14] eal: expand most macros to empty when using MSVC
  2023-04-15  7:16  3%         ` Morten Brørup
@ 2023-04-15 20:52  4%           ` Tyler Retzlaff
  2023-04-15 22:41  4%             ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-04-15 20:52 UTC (permalink / raw)
  To: Morten Brørup
  Cc: bruce.richardson, david.marchand, thomas, konstantin.ananyev, dev

On Sat, Apr 15, 2023 at 09:16:21AM +0200, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Friday, 14 April 2023 19.02
> > 
> > On Fri, Apr 14, 2023 at 08:45:17AM +0200, Morten Brørup wrote:
> > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > Sent: Thursday, 13 April 2023 23.26
> > > >
> > > > For now expand a lot of common rte macros empty. The catch here is
> > we
> > > > need to test that most of the macros do what they should but at the
> > same
> > > > time they are blocking work needed to bootstrap of the unit tests.
> > > >
> > > > Later we will return and provide (where possible) expansions that
> > work
> > > > correctly for msvc and where not possible provide some alternate
> > macros
> > > > to achieve the same outcome.
> > > >
> > > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
> [...]
> 
> > > >  /**
> > > >   * Force alignment
> > > >   */
> > > > +#ifndef RTE_TOOLCHAIN_MSVC
> > > >  #define __rte_aligned(a) __attribute__((__aligned__(a)))
> > > > +#else
> > > > +#define __rte_aligned(a)
> > > > +#endif
> > >
> > > It should be reviewed that __rte_aligned() is only used for
> > optimization purposes, and is not required for DPDK to function
> > properly.
> > 
> > so to expand on what i have in mind (and explain why i leave it expanded
> > empty for now)
> > 
> > while msvc has a __declspec for align there is a mismatch between
> > where gcc and msvc want it placed to control alignment of objects.
> > 
> > msvc support won't be functional in 23.07 because of atomics. so once
> > we reach the 23.11 cycle (where we can merge c11 changes) it means we
> > can also use standard _Alignas which can accomplish the same thing
> > but portably.
> 
> That (C11 standard _Alignas) should be the roadmap for solving the alignment requirements.
> 
> This should be a general principle for DPDK... if the C standard offers something, don't reinvent our own. And as a consequence of the upgrade to C11, we should deprecate all our own now-obsolete substitutes for these.
> 
> > 
> > full disclosure the catch is i still have to properly locate the <thing>
> > that does the alignment and some small questions about the expansion and
> > use of the existing macro.
> > 
> > on the subject of DPDK requiring proper alignment, you're right it
> > is generally for performance but also for pre-c11 atomics.
> > 
> > one question i have been asking myself is would the community see value
> > in more compile time assertions / testing of the size and alignment of
> > structures and offset of structure fields? we have a few key
> > RTE_BUILD_BUG_ON() assertions but i've discovered they don't offer
> > comprehensive protection.
> 
> Absolutely. Catching bugs at build time is much better than any alternative!

that's handy feedback. i am now encouraged to include more compile time
checks in advance of or along with changes related to structure abi.
follow on question, once we do get to use c11 would something like
_Static_assert be preferable over RTE_BUILD_BUG_ON? structures sensitive
to layout could be co-located with the asserts right at the point of
definition. or is there something extra RTE_BUILD_BUG_ON gives us?

> 
> > > >  /**
> > > >   * Force a structure to be packed
> > > >   */
> > > > +#ifndef RTE_TOOLCHAIN_MSVC
> > > >  #define __rte_packed __attribute__((__packed__))
> > > > +#else
> > > > +#define __rte_packed
> > > > +#endif
> > >
> > > Similar comment as for __rte_aligned(); however, I consider it more
> > likely that structure packing is a functional requirement, and not just
> > used for optimization. Based on my experience, it may be used for
> > packing network structures; perhaps not in DPDK itself but maybe in DPDK
> > applications.
> > 
> > so interestingly i've discovered this is kind of a mess and as you note
> > some places we can't just "fix" it for abi compatibility reasons.
> > 
> > in some instances the packing is being applied to structures where it is
> > essentially a noop. i.e. natural alignment gets you the same thing so it
> > is superfluous.
> > 
> > in some instances the packing is being applied to structures that are
> > private and it appears to be completely unnecessary e.g. some structure
> > that isn't nested into something else and sizeof() or offsetof() fields
> > don't matter in the context of their use.
> > 
> > in some instances it is completely necessary usually when type punning
> > buffers containing network framing etc...
> > 
> > unfortunately the standard doesn't offer me an out here as there is an
> > issue of placement of the pragma/attributes that do the packing.
> > 
> > for places it isn't needed it, whatever i just expand empty. for places
> > it is superfluous again because msvc has no stable abi (we're not
> > established yet) again i just expand empty. finally for the places where
> > it is needed i'll probably need to expand conditionally but i think the
> > instances are far fewer than current use.
> 
> Optimally, we will have a common macro (or other solution) to support both GCC/CLANG and MSVC to replace or supplement __rte_packed. However, the cost of this may be an API break if we replace __rte_packed.
> 
> > 
> > >
> > > The same risk applies to __rte_aligned(), but with lower probability.
> > 
> > so that's the long winded story of why they are both expanded empty for
> > now for msvc. but when the time comes i want to submit patch series that
> > focus on each specifically to generate robust discussion.
> 
> Sounds like the right path to take.
> 
> Now, I'm thinking ahead here...
> 
> We should be prepared to accept a major ABI/API break at one point in time, to replace our home-grown macros with C11 standard solutions and to fully support MSVC. This is not happening anytime soon, but the Techboard should acknowledge that this is going to happen (with an unspecified release), so it can be formally announced. The sooner it is announced, the more time developers will have to prepare for it.

so, just to avoid any confusion i want to make it clear that i am not
planning to submit changes that would change abi as a part of supporting
msvc (aside from changing to standard atomics which we agreed on).

in general there are some cleanups we could make in the area of code
maintainability and portability and we may want to discuss the
advantages or disadvantages of making those changes. but i think those
changes are a topic unrelated to windows or msvc specifically.

> 
> All the details do not need to be known at the time of the announcement; they can be added along the way, based on the discussions from your future patches.

> 
> > 
> > ty

^ permalink raw reply	[relevance 4%]

* RE: [PATCH v5 11/14] eal: expand most macros to empty when using MSVC
  2023-04-14 17:02  4%       ` Tyler Retzlaff
@ 2023-04-15  7:16  3%         ` Morten Brørup
  2023-04-15 20:52  4%           ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-04-15  7:16 UTC (permalink / raw)
  To: Tyler Retzlaff, bruce.richardson, david.marchand, thomas,
	konstantin.ananyev
  Cc: dev

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Friday, 14 April 2023 19.02
> 
> On Fri, Apr 14, 2023 at 08:45:17AM +0200, Morten Brørup wrote:
> > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > Sent: Thursday, 13 April 2023 23.26
> > >
> > > For now expand a lot of common rte macros empty. The catch here is
> we
> > > need to test that most of the macros do what they should but at the
> same
> > > time they are blocking work needed to bootstrap of the unit tests.
> > >
> > > Later we will return and provide (where possible) expansions that
> work
> > > correctly for msvc and where not possible provide some alternate
> macros
> > > to achieve the same outcome.
> > >
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

[...]

> > >  /**
> > >   * Force alignment
> > >   */
> > > +#ifndef RTE_TOOLCHAIN_MSVC
> > >  #define __rte_aligned(a) __attribute__((__aligned__(a)))
> > > +#else
> > > +#define __rte_aligned(a)
> > > +#endif
> >
> > It should be reviewed that __rte_aligned() is only used for
> optimization purposes, and is not required for DPDK to function
> properly.
> 
> so to expand on what i have in mind (and explain why i leave it expanded
> empty for now)
> 
> while msvc has a __declspec for align there is a mismatch between
> where gcc and msvc want it placed to control alignment of objects.
> 
> msvc support won't be functional in 23.07 because of atomics. so once
> we reach the 23.11 cycle (where we can merge c11 changes) it means we
> can also use standard _Alignas which can accomplish the same thing
> but portably.

That (C11 standard _Alignas) should be the roadmap for solving the alignment requirements.

This should be a general principle for DPDK... if the C standard offers something, don't reinvent our own. And as a consequence of the upgrade to C11, we should deprecate all our own now-obsolete substitutes for these.

> 
> full disclosure the catch is i still have to properly locate the <thing>
> that does the alignment and some small questions about the expansion and
> use of the existing macro.
> 
> on the subject of DPDK requiring proper alignment, you're right it
> is generally for performance but also for pre-c11 atomics.
> 
> one question i have been asking myself is would the community see value
> in more compile time assertions / testing of the size and alignment of
> structures and offset of structure fields? we have a few key
> RTE_BUILD_BUG_ON() assertions but i've discovered they don't offer
> comprehensive protection.

Absolutely. Catching bugs at build time is much better than any alternative!

> > >  /**
> > >   * Force a structure to be packed
> > >   */
> > > +#ifndef RTE_TOOLCHAIN_MSVC
> > >  #define __rte_packed __attribute__((__packed__))
> > > +#else
> > > +#define __rte_packed
> > > +#endif
> >
> > Similar comment as for __rte_aligned(); however, I consider it more
> likely that structure packing is a functional requirement, and not just
> used for optimization. Based on my experience, it may be used for
> packing network structures; perhaps not in DPDK itself but maybe in DPDK
> applications.
> 
> so interestingly i've discovered this is kind of a mess and as you note
> some places we can't just "fix" it for abi compatibility reasons.
> 
> in some instances the packing is being applied to structures where it is
> essentially a noop. i.e. natural alignment gets you the same thing so it
> is superfluous.
> 
> in some instances the packing is being applied to structures that are
> private and it appears to be completely unnecessary e.g. some structure
> that isn't nested into something else and sizeof() or offsetof() fields
> don't matter in the context of their use.
> 
> in some instances it is completely necessary usually when type punning
> buffers containing network framing etc...
> 
> unfortunately the standard doesn't offer me an out here as there is an
> issue of placement of the pragma/attributes that do the packing.
> 
> for places it isn't needed it, whatever i just expand empty. for places
> it is superfluous again because msvc has no stable abi (we're not
> established yet) again i just expand empty. finally for the places where
> it is needed i'll probably need to expand conditionally but i think the
> instances are far fewer than current use.

Optimally, we will have a common macro (or other solution) to support both GCC/CLANG and MSVC to replace or supplement __rte_packed. However, the cost of this may be an API break if we replace __rte_packed.

> 
> >
> > The same risk applies to __rte_aligned(), but with lower probability.
> 
> so that's the long winded story of why they are both expanded empty for
> now for msvc. but when the time comes i want to submit patch series that
> focus on each specifically to generate robust discussion.

Sounds like the right path to take.

Now, I'm thinking ahead here...

We should be prepared to accept a major ABI/API break at one point in time, to replace our home-grown macros with C11 standard solutions and to fully support MSVC. This is not happening anytime soon, but the Techboard should acknowledge that this is going to happen (with an unspecified release), so it can be formally announced. The sooner it is announced, the more time developers will have to prepare for it.

All the details do not need to be known at the time of the announcement; they can be added along the way, based on the discussions from your future patches.

> 
> ty

^ permalink raw reply	[relevance 3%]

* [PATCH v6 11/15] eal: expand most macros to empty when using MSVC
  @ 2023-04-15  1:15  5%   ` Tyler Retzlaff
  2023-04-15  1:15  3%   ` [PATCH v6 13/15] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-15  1:15 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

For now expand a lot of common rte macros empty. The catch here is we
need to test that most of the macros do what they should but at the same
time they are blocking work needed to bootstrap of the unit tests.

Later we will return and provide (where possible) expansions that work
correctly for msvc and where not possible provide some alternate macros
to achieve the same outcome.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_branch_prediction.h |  8 +++++
 lib/eal/include/rte_common.h            | 54 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_compat.h            | 20 ++++++++++++
 3 files changed, 82 insertions(+)

diff --git a/lib/eal/include/rte_branch_prediction.h b/lib/eal/include/rte_branch_prediction.h
index 0256a9d..1eff9f6 100644
--- a/lib/eal/include/rte_branch_prediction.h
+++ b/lib/eal/include/rte_branch_prediction.h
@@ -25,7 +25,11 @@
  *
  */
 #ifndef likely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define likely(x)	__builtin_expect(!!(x), 1)
+#else
+#define likely(x)	(!!(x))
+#endif
 #endif /* likely */
 
 /**
@@ -39,7 +43,11 @@
  *
  */
 #ifndef unlikely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	(!!(x))
+#endif
 #endif /* unlikely */
 
 #ifdef __cplusplus
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 2f464e3..5417f68 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -62,10 +62,18 @@
 		__GNUC_PATCHLEVEL__)
 #endif
 
+#ifdef RTE_TOOLCHAIN_MSVC
+#define __extension__
+#endif
+
 /**
  * Force alignment
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
+#else
+#define __rte_aligned(a)
+#endif
 
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
@@ -80,16 +88,29 @@
 /**
  * Force a structure to be packed
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_packed __attribute__((__packed__))
+#else
+#define __rte_packed
+#endif
 
 /**
  * Macro to mark a type that is not subject to type-based aliasing rules
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_may_alias __attribute__((__may_alias__))
+#else
+#define __rte_may_alias
+#endif
 
 /******* Macro to mark functions and fields scheduled for removal *****/
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_deprecated	__attribute__((__deprecated__))
 #define __rte_deprecated_msg(msg)	__attribute__((__deprecated__(msg)))
+#else
+#define __rte_deprecated
+#define __rte_deprecated_msg(msg)
+#endif
 
 /**
  *  Macro to mark macros and defines scheduled for removal
@@ -110,14 +131,22 @@
 /**
  * Force symbol to be generated even if it appears to be unused.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_used __attribute__((used))
+#else
+#define __rte_used
+#endif
 
 /*********** Macros to eliminate unused variable warnings ********/
 
 /**
  * short definition to mark a function parameter unused
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_unused __attribute__((__unused__))
+#else
+#define __rte_unused
+#endif
 
 /**
  * Mark pointer as restricted with regard to pointer aliasing.
@@ -141,6 +170,7 @@
  * even if the underlying stdio implementation is ANSI-compliant,
  * so this must be overridden.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #if RTE_CC_IS_GNU
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(gnu_printf, format_index, first_arg)))
@@ -148,6 +178,9 @@
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
+#else
+#define __rte_format_printf(format_index, first_arg)
+#endif
 
 /**
  * Tells compiler that the function returns a value that points to
@@ -222,7 +255,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /**
  * Hint never returning function
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_noreturn __attribute__((noreturn))
+#else
+#define __rte_noreturn
+#endif
 
 /**
  * Issue a warning in case the function's return value is ignored.
@@ -247,12 +284,20 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  }
  * @endcode
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_warn_unused_result __attribute__((warn_unused_result))
+#else
+#define __rte_warn_unused_result
+#endif
 
 /**
  * Force a function to be inlined
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_always_inline inline __attribute__((always_inline))
+#else
+#define __rte_always_inline
+#endif
 
 /**
  * Force a function to be noinlined
@@ -437,7 +482,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 #define RTE_CACHE_LINE_MIN_SIZE 64
 
 /** Force alignment to cache line. */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
+#else
+#define __rte_cache_aligned
+#endif
 
 /** Force minimum cache line alignment. */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
@@ -812,12 +861,17 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  struct wrapper *w = container_of(x, struct wrapper, c);
  */
 #ifndef container_of
+#ifndef RTE_TOOLCHAIN_MSVC
 #define container_of(ptr, type, member)	__extension__ ({		\
 			const typeof(((type *)0)->member) *_ptr = (ptr); \
 			__rte_unused type *_target_ptr =	\
 				(type *)(ptr);				\
 			(type *)(((uintptr_t)_ptr) - offsetof(type, member)); \
 		})
+#else
+#define container_of(ptr, type, member) \
+			((type *)((uintptr_t)(ptr) - offsetof(type, member)))
+#endif
 #endif
 
 /** Swap two variables. */
diff --git a/lib/eal/include/rte_compat.h b/lib/eal/include/rte_compat.h
index fc9fbaa..6a4b5ee 100644
--- a/lib/eal/include/rte_compat.h
+++ b/lib/eal/include/rte_compat.h
@@ -12,14 +12,22 @@
 
 #ifndef ALLOW_EXPERIMENTAL_API
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((deprecated("Symbol is not yet part of stable ABI"), \
 section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #endif
 
@@ -30,23 +38,35 @@
 
 #if !defined ALLOW_INTERNAL_API && __has_attribute(error) /* For GCC */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((error("Symbol is not public ABI"), \
 section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #elif !defined ALLOW_INTERNAL_API && __has_attribute(diagnose_if) /* For clang */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 _Pragma("GCC diagnostic push") \
 _Pragma("GCC diagnostic ignored \"-Wgcc-compat\"") \
 __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
 section(".text.internal"))) \
 _Pragma("GCC diagnostic pop")
+#else
+#define __rte_internal
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #endif
 
-- 
1.8.3.1


^ permalink raw reply	[relevance 5%]

* [PATCH v6 13/15] telemetry: avoid expanding versioned symbol macros on MSVC
    2023-04-15  1:15  5%   ` [PATCH v6 11/15] eal: expand most macros to empty when using MSVC Tyler Retzlaff
@ 2023-04-15  1:15  3%   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-15  1:15 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

Windows does not support versioned symbols. Fortunately Windows also
doesn't have an exported stable ABI.

Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
functions.

Windows does have a way to achieve similar versioning for symbols but it
is not a simple #define so it will be done as a work package later.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
index 2bac2de..284c16e 100644
--- a/lib/telemetry/telemetry_data.c
+++ b/lib/telemetry/telemetry_data.c
@@ -82,8 +82,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
 		int64_t x), rte_tel_data_add_array_int_v24);
+#else
+int
+rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
+{
+	return rte_tel_data_add_array_int_v24(d, x);
+}
+#endif
 
 int
 rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
@@ -220,8 +228,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
 		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
+#else
+int
+rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
+{
+	return rte_tel_data_add_dict_int_v24(d, name, val);
+}
+#endif
 
 int
 rte_tel_data_add_dict_uint(struct rte_tel_data *d,
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v5 11/14] eal: expand most macros to empty when using MSVC
  @ 2023-04-14 17:02  4%       ` Tyler Retzlaff
  2023-04-15  7:16  3%         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-04-14 17:02 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, bruce.richardson, david.marchand, thomas, konstantin.ananyev

On Fri, Apr 14, 2023 at 08:45:17AM +0200, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Thursday, 13 April 2023 23.26
> > 
> > For now expand a lot of common rte macros empty. The catch here is we
> > need to test that most of the macros do what they should but at the same
> > time they are blocking work needed to bootstrap of the unit tests.
> > 
> > Later we will return and provide (where possible) expansions that work
> > correctly for msvc and where not possible provide some alternate macros
> > to achieve the same outcome.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> >  lib/eal/include/rte_branch_prediction.h |  8 ++++++
> >  lib/eal/include/rte_common.h            | 45
> > +++++++++++++++++++++++++++++++++
> >  lib/eal/include/rte_compat.h            | 20 +++++++++++++++
> >  3 files changed, 73 insertions(+)
> > 
> > diff --git a/lib/eal/include/rte_branch_prediction.h
> > b/lib/eal/include/rte_branch_prediction.h
> > index 0256a9d..d9a0224 100644
> > --- a/lib/eal/include/rte_branch_prediction.h
> > +++ b/lib/eal/include/rte_branch_prediction.h
> > @@ -25,7 +25,11 @@
> >   *
> >   */
> >  #ifndef likely
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  #define likely(x)	__builtin_expect(!!(x), 1)
> > +#else
> > +#define likely(x)	(x)
> 
> This must be (!!(x)), because x may be non-Boolean, e.g. likely(n & 0x10), and likely() must return Boolean (0 or 1).

yes, you're right. will fix.

> 
> > +#endif
> >  #endif /* likely */
> > 
> >  /**
> > @@ -39,7 +43,11 @@
> >   *
> >   */
> >  #ifndef unlikely
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  #define unlikely(x)	__builtin_expect(!!(x), 0)
> > +#else
> > +#define unlikely(x)	(x)
> 
> This must also be (!!(x)), for the same reason as above.

ack

> 
> > +#endif
> >  #endif /* unlikely */
> > 
> >  #ifdef __cplusplus
> > diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
> > index 2f464e3..1bdaa2d 100644
> > --- a/lib/eal/include/rte_common.h
> > +++ b/lib/eal/include/rte_common.h
> > @@ -65,7 +65,11 @@
> >  /**
> >   * Force alignment
> >   */
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  #define __rte_aligned(a) __attribute__((__aligned__(a)))
> > +#else
> > +#define __rte_aligned(a)
> > +#endif
> 
> It should be reviewed that __rte_aligned() is only used for optimization purposes, and is not required for DPDK to function properly.

so to expand on what i have in mind (and explain why i leave it expanded
empty for now)

while msvc has a __declspec for align there is a mismatch between
where gcc and msvc want it placed to control alignment of objects.

msvc support won't be functional in 23.07 because of atomics. so once
we reach the 23.11 cycle (where we can merge c11 changes) it means we
can also use standard _Alignas which can accomplish the same thing
but portably.

full disclosure the catch is i still have to properly locate the <thing>
that does the alignment and some small questions about the expansion and
use of the existing macro.

on the subject of DPDK requiring proper alignment, you're right it
is generally for performance but also for pre-c11 atomics.

one question i have been asking myself is would the community see value
in more compile time assertions / testing of the size and alignment of
structures and offset of structure fields? we have a few key
RTE_BUILD_BUG_ON() assertions but i've discovered they don't offer
comprehensive protection.

> 
> > 
> >  #ifdef RTE_ARCH_STRICT_ALIGN
> >  typedef uint64_t unaligned_uint64_t __rte_aligned(1);
> > @@ -80,16 +84,29 @@
> >  /**
> >   * Force a structure to be packed
> >   */
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  #define __rte_packed __attribute__((__packed__))
> > +#else
> > +#define __rte_packed
> > +#endif
> 
> Similar comment as for __rte_aligned(); however, I consider it more likely that structure packing is a functional requirement, and not just used for optimization. Based on my experience, it may be used for packing network structures; perhaps not in DPDK itself but maybe in DPDK applications.

so interestingly i've discovered this is kind of a mess and as you note
some places we can't just "fix" it for abi compatibility reasons.

in some instances the packing is being applied to structures where it is
essentially a noop. i.e. natural alignment gets you the same thing so it
is superfluous.

in some instances the packing is being applied to structures that are
private and it appears to be completely unnecessary e.g. some structure
that isn't nested into something else and sizeof() or offsetof() fields
don't matter in the context of their use.

in some instances it is completely necessary usually when type punning
buffers containing network framing etc...

unfortunately the standard doesn't offer me an out here as there is an
issue of placement of the pragma/attributes that do the packing.

for places it isn't needed it, whatever i just expand empty. for places
it is superfluous again because msvc has no stable abi (we're not
established yet) again i just expand empty. finally for the places where
it is needed i'll probably need to expand conditionally but i think the
instances are far fewer than current use.

> 
> The same risk applies to __rte_aligned(), but with lower probability.

so that's the long winded story of why they are both expanded empty for
now for msvc. but when the time comes i want to submit patch series that
focus on each specifically to generate robust discussion.

ty

^ permalink raw reply	[relevance 4%]

* Re: [PATCH] reorder: improve buffer structure layout
  2023-04-14 14:54  3%   ` Bruce Richardson
@ 2023-04-14 15:30  0%     ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-04-14 15:30 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Volodymyr Fialko, dev, Reshma Pattan, jerinj, anoobj

On Fri, 14 Apr 2023 15:54:13 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Fri, Apr 14, 2023 at 07:52:30AM -0700, Stephen Hemminger wrote:
> > On Fri, 14 Apr 2023 10:43:43 +0200
> > Volodymyr Fialko <vfialko@marvell.com> wrote:
> >   
> > > diff --git a/lib/reorder/rte_reorder.c b/lib/reorder/rte_reorder.c
> > > index f55f383700..7418202b04 100644
> > > --- a/lib/reorder/rte_reorder.c
> > > +++ b/lib/reorder/rte_reorder.c
> > > @@ -46,9 +46,10 @@ struct rte_reorder_buffer {
> > >  	char name[RTE_REORDER_NAMESIZE];
> > >  	uint32_t min_seqn;  /**< Lowest seq. number that can be in the buffer */
> > >  	unsigned int memsize; /**< memory area size of reorder buffer */
> > > +	int is_initialized; /**< flag indicates that buffer was initialized */
> > > +
> > >  	struct cir_buffer ready_buf; /**< temp buffer for dequeued entries */
> > >  	struct cir_buffer order_buf; /**< buffer used to reorder entries */
> > > -	int is_initialized;
> > >  } __rte_cache_aligned;
> > >  
> > >  static void  
> > 
> > Since this is ABI change it will have to wait for 23.11 release  
> 
> It shouldn't be an ABI change. This struct is defined in a C file, rather
> than a header, so is not exposed to end applications.
> 
> /Bruce

Sorry, Bruce is right. 
You might want to use uint8_t or bool for a simple flag.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] reorder: improve buffer structure layout
  2023-04-14 14:52  3% ` Stephen Hemminger
@ 2023-04-14 14:54  3%   ` Bruce Richardson
  2023-04-14 15:30  0%     ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-04-14 14:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Volodymyr Fialko, dev, Reshma Pattan, jerinj, anoobj

On Fri, Apr 14, 2023 at 07:52:30AM -0700, Stephen Hemminger wrote:
> On Fri, 14 Apr 2023 10:43:43 +0200
> Volodymyr Fialko <vfialko@marvell.com> wrote:
> 
> > diff --git a/lib/reorder/rte_reorder.c b/lib/reorder/rte_reorder.c
> > index f55f383700..7418202b04 100644
> > --- a/lib/reorder/rte_reorder.c
> > +++ b/lib/reorder/rte_reorder.c
> > @@ -46,9 +46,10 @@ struct rte_reorder_buffer {
> >  	char name[RTE_REORDER_NAMESIZE];
> >  	uint32_t min_seqn;  /**< Lowest seq. number that can be in the buffer */
> >  	unsigned int memsize; /**< memory area size of reorder buffer */
> > +	int is_initialized; /**< flag indicates that buffer was initialized */
> > +
> >  	struct cir_buffer ready_buf; /**< temp buffer for dequeued entries */
> >  	struct cir_buffer order_buf; /**< buffer used to reorder entries */
> > -	int is_initialized;
> >  } __rte_cache_aligned;
> >  
> >  static void
> 
> Since this is ABI change it will have to wait for 23.11 release

It shouldn't be an ABI change. This struct is defined in a C file, rather
than a header, so is not exposed to end applications.

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] reorder: improve buffer structure layout
  @ 2023-04-14 14:52  3% ` Stephen Hemminger
  2023-04-14 14:54  3%   ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-04-14 14:52 UTC (permalink / raw)
  To: Volodymyr Fialko; +Cc: dev, Reshma Pattan, jerinj, anoobj

On Fri, 14 Apr 2023 10:43:43 +0200
Volodymyr Fialko <vfialko@marvell.com> wrote:

> diff --git a/lib/reorder/rte_reorder.c b/lib/reorder/rte_reorder.c
> index f55f383700..7418202b04 100644
> --- a/lib/reorder/rte_reorder.c
> +++ b/lib/reorder/rte_reorder.c
> @@ -46,9 +46,10 @@ struct rte_reorder_buffer {
>  	char name[RTE_REORDER_NAMESIZE];
>  	uint32_t min_seqn;  /**< Lowest seq. number that can be in the buffer */
>  	unsigned int memsize; /**< memory area size of reorder buffer */
> +	int is_initialized; /**< flag indicates that buffer was initialized */
> +
>  	struct cir_buffer ready_buf; /**< temp buffer for dequeued entries */
>  	struct cir_buffer order_buf; /**< buffer used to reorder entries */
> -	int is_initialized;
>  } __rte_cache_aligned;
>  
>  static void

Since this is ABI change it will have to wait for 23.11 release

^ permalink raw reply	[relevance 3%]

* [PATCH v5 13/14] telemetry: avoid expanding versioned symbol macros on MSVC
    2023-04-13 21:26  6%   ` [PATCH v5 11/14] eal: expand most macros to empty when using MSVC Tyler Retzlaff
@ 2023-04-13 21:26  3%   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-13 21:26 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

Windows does not support versioned symbols. Fortunately Windows also
doesn't have an exported stable ABI.

Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
functions.

Windows does have a way to achieve similar versioning for symbols but it
is not a simple #define so it will be done as a work package later.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
index 2bac2de..284c16e 100644
--- a/lib/telemetry/telemetry_data.c
+++ b/lib/telemetry/telemetry_data.c
@@ -82,8 +82,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
 		int64_t x), rte_tel_data_add_array_int_v24);
+#else
+int
+rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
+{
+	return rte_tel_data_add_array_int_v24(d, x);
+}
+#endif
 
 int
 rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
@@ -220,8 +228,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
 		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
+#else
+int
+rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
+{
+	return rte_tel_data_add_dict_int_v24(d, name, val);
+}
+#endif
 
 int
 rte_tel_data_add_dict_uint(struct rte_tel_data *d,
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v5 11/14] eal: expand most macros to empty when using MSVC
  @ 2023-04-13 21:26  6%   ` Tyler Retzlaff
    2023-04-13 21:26  3%   ` [PATCH v5 13/14] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
  1 sibling, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-04-13 21:26 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

For now expand a lot of common rte macros empty. The catch here is we
need to test that most of the macros do what they should but at the same
time they are blocking work needed to bootstrap of the unit tests.

Later we will return and provide (where possible) expansions that work
correctly for msvc and where not possible provide some alternate macros
to achieve the same outcome.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_branch_prediction.h |  8 ++++++
 lib/eal/include/rte_common.h            | 45 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_compat.h            | 20 +++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/lib/eal/include/rte_branch_prediction.h b/lib/eal/include/rte_branch_prediction.h
index 0256a9d..d9a0224 100644
--- a/lib/eal/include/rte_branch_prediction.h
+++ b/lib/eal/include/rte_branch_prediction.h
@@ -25,7 +25,11 @@
  *
  */
 #ifndef likely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define likely(x)	__builtin_expect(!!(x), 1)
+#else
+#define likely(x)	(x)
+#endif
 #endif /* likely */
 
 /**
@@ -39,7 +43,11 @@
  *
  */
 #ifndef unlikely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	(x)
+#endif
 #endif /* unlikely */
 
 #ifdef __cplusplus
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 2f464e3..1bdaa2d 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -65,7 +65,11 @@
 /**
  * Force alignment
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
+#else
+#define __rte_aligned(a)
+#endif
 
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
@@ -80,16 +84,29 @@
 /**
  * Force a structure to be packed
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_packed __attribute__((__packed__))
+#else
+#define __rte_packed
+#endif
 
 /**
  * Macro to mark a type that is not subject to type-based aliasing rules
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_may_alias __attribute__((__may_alias__))
+#else
+#define __rte_may_alias
+#endif
 
 /******* Macro to mark functions and fields scheduled for removal *****/
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_deprecated	__attribute__((__deprecated__))
 #define __rte_deprecated_msg(msg)	__attribute__((__deprecated__(msg)))
+#else
+#define __rte_deprecated
+#define __rte_deprecated_msg(msg)
+#endif
 
 /**
  *  Macro to mark macros and defines scheduled for removal
@@ -110,14 +127,22 @@
 /**
  * Force symbol to be generated even if it appears to be unused.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_used __attribute__((used))
+#else
+#define __rte_used
+#endif
 
 /*********** Macros to eliminate unused variable warnings ********/
 
 /**
  * short definition to mark a function parameter unused
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_unused __attribute__((__unused__))
+#else
+#define __rte_unused
+#endif
 
 /**
  * Mark pointer as restricted with regard to pointer aliasing.
@@ -141,6 +166,7 @@
  * even if the underlying stdio implementation is ANSI-compliant,
  * so this must be overridden.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #if RTE_CC_IS_GNU
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(gnu_printf, format_index, first_arg)))
@@ -148,6 +174,9 @@
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
+#else
+#define __rte_format_printf(format_index, first_arg)
+#endif
 
 /**
  * Tells compiler that the function returns a value that points to
@@ -222,7 +251,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /**
  * Hint never returning function
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_noreturn __attribute__((noreturn))
+#else
+#define __rte_noreturn
+#endif
 
 /**
  * Issue a warning in case the function's return value is ignored.
@@ -247,12 +280,20 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  }
  * @endcode
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_warn_unused_result __attribute__((warn_unused_result))
+#else
+#define __rte_warn_unused_result
+#endif
 
 /**
  * Force a function to be inlined
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_always_inline inline __attribute__((always_inline))
+#else
+#define __rte_always_inline
+#endif
 
 /**
  * Force a function to be noinlined
@@ -437,7 +478,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 #define RTE_CACHE_LINE_MIN_SIZE 64
 
 /** Force alignment to cache line. */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
+#else
+#define __rte_cache_aligned
+#endif
 
 /** Force minimum cache line alignment. */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
diff --git a/lib/eal/include/rte_compat.h b/lib/eal/include/rte_compat.h
index fc9fbaa..6a4b5ee 100644
--- a/lib/eal/include/rte_compat.h
+++ b/lib/eal/include/rte_compat.h
@@ -12,14 +12,22 @@
 
 #ifndef ALLOW_EXPERIMENTAL_API
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((deprecated("Symbol is not yet part of stable ABI"), \
 section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #endif
 
@@ -30,23 +38,35 @@
 
 #if !defined ALLOW_INTERNAL_API && __has_attribute(error) /* For GCC */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((error("Symbol is not public ABI"), \
 section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #elif !defined ALLOW_INTERNAL_API && __has_attribute(diagnose_if) /* For clang */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 _Pragma("GCC diagnostic push") \
 _Pragma("GCC diagnostic ignored \"-Wgcc-compat\"") \
 __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
 section(".text.internal"))) \
 _Pragma("GCC diagnostic pop")
+#else
+#define __rte_internal
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #endif
 
-- 
1.8.3.1


^ permalink raw reply	[relevance 6%]

* [PATCH v2 2/3] doc: announce new cpu flag added to rte_cpu_flag_t
  @ 2023-04-13 11:53  3% ` Sivaprasad Tummala
  2023-04-17  4:31  3%   ` [PATCH v3 1/4] " Sivaprasad Tummala
  0 siblings, 1 reply; 200+ results
From: Sivaprasad Tummala @ 2023-04-13 11:53 UTC (permalink / raw)
  To: david.hunt; +Cc: dev

A new flag RTE_CPUFLAG_MONITORX is added to rte_cpu_flag_t in
DPDK 23.07 release to support monitorx instruction on Epyc processors.
This results in ABI breakage for legacy apps.

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 doc/guides/rel_notes/deprecation.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index dcc1ca1696..65e849616d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -163,3 +163,6 @@ Deprecation Notices
   The new port library API (functions rte_swx_port_*)
   will gradually transition from experimental to stable status
   starting with DPDK 23.07 release.
+
+* eal/x86: The enum ``rte_cpu_flag_t`` will be extended with a new cpu flag
+  ``RTE_CPUFLAG_MONITORX`` to support monitorx instruction on Epyc processors.
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v3 11/11] telemetry: avoid expanding versioned symbol macros on msvc
  2023-04-11 20:34  0%       ` Tyler Retzlaff
@ 2023-04-12  8:50  0%         ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2023-04-12  8:50 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, david.marchand, thomas, mb, konstantin.ananyev

On Tue, Apr 11, 2023 at 01:34:14PM -0700, Tyler Retzlaff wrote:
> On Tue, Apr 11, 2023 at 11:24:07AM +0100, Bruce Richardson wrote:
> > On Wed, Apr 05, 2023 at 05:45:19PM -0700, Tyler Retzlaff wrote:
> > > Windows does not support versioned symbols. Fortunately Windows also
> > > doesn't have an exported stable ABI.
> > > 
> > > Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
> > > and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
> > > functions.
> > > 
> > > Windows does have a way to achieve similar versioning for symbols but it
> > > is not a simple #define so it will be done as a work package later.
> > > 
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > ---
> > >  lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
> > >  1 file changed, 16 insertions(+)
> > > 
> > > diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
> > > index 2bac2de..284c16e 100644
> > > --- a/lib/telemetry/telemetry_data.c
> > > +++ b/lib/telemetry/telemetry_data.c
> > > @@ -82,8 +82,16 @@
> > >  /* mark the v23 function as the older version, and v24 as the default version */
> > >  VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
> > >  BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
> > > +#ifndef RTE_TOOLCHAIN_MSVC
> > >  MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
> > >  		int64_t x), rte_tel_data_add_array_int_v24);
> > > +#else
> > > +int
> > > +rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
> > > +{
> > > +	return rte_tel_data_add_array_int_v24(d, x);
> > > +}
> > > +#endif
> > >  
> > 
> > Can't see any general way to do this from the versioning header file, so
> > agree that we need some changes here. Rather than defining a public
> > funcion, we could keep the diff reduced by just using a macro alias here,
> > right? For example:
> > 
> > #ifdef RTE_TOOLCHAIN_MSVC
> > #define rte_tel_data_add_array_int rte_tel_data_add_array_int_v24
> > #else
> > MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
> > 		int64_t x), rte_tel_data_add_array_int_v24);
> > #endif
> > 
> > If this is a temporary measure, I'd tend towards the shortest solution that
> > can work. However, no strong opinions, so, either using functions as you
> > have it, or macros:
> 
> so i have to leave it as it is the reason being the version.map ->
> exports.def generation does not handle this. the .def only contains the
> rte_tel_data_add_array_int symbol. if we expand it away to the _v24 name
> the link will fail.
> 

Ah, thanks for clarifying

> let's consume the change as-is for now and i will work on the
> generalized solution when changes are integrated that actually make the
> windows dso/dll functional.
> 

Sure, good for now. Keep my ack on any future versions.
> > 
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* [PATCH v4 13/14] telemetry: avoid expanding versioned symbol macros on MSVC
    2023-04-11 21:12  6%   ` [PATCH v4 11/14] eal: expand most macros to empty when using MSVC Tyler Retzlaff
@ 2023-04-11 21:12  3%   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-11 21:12 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

Windows does not support versioned symbols. Fortunately Windows also
doesn't have an exported stable ABI.

Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
functions.

Windows does have a way to achieve similar versioning for symbols but it
is not a simple #define so it will be done as a work package later.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
index 2bac2de..284c16e 100644
--- a/lib/telemetry/telemetry_data.c
+++ b/lib/telemetry/telemetry_data.c
@@ -82,8 +82,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
 		int64_t x), rte_tel_data_add_array_int_v24);
+#else
+int
+rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
+{
+	return rte_tel_data_add_array_int_v24(d, x);
+}
+#endif
 
 int
 rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
@@ -220,8 +228,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
 		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
+#else
+int
+rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
+{
+	return rte_tel_data_add_dict_int_v24(d, name, val);
+}
+#endif
 
 int
 rte_tel_data_add_dict_uint(struct rte_tel_data *d,
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v4 11/14] eal: expand most macros to empty when using MSVC
  @ 2023-04-11 21:12  6%   ` Tyler Retzlaff
  2023-04-11 21:12  3%   ` [PATCH v4 13/14] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-11 21:12 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

For now expand a lot of common rte macros empty. The catch here is we
need to test that most of the macros do what they should but at the same
time they are blocking work needed to bootstrap of the unit tests.

Later we will return and provide (where possible) expansions that work
correctly for msvc and where not possible provide some alternate macros
to achieve the same outcome.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_branch_prediction.h |  8 +++++++
 lib/eal/include/rte_common.h            | 41 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_compat.h            | 20 ++++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/lib/eal/include/rte_branch_prediction.h b/lib/eal/include/rte_branch_prediction.h
index 0256a9d..d9a0224 100644
--- a/lib/eal/include/rte_branch_prediction.h
+++ b/lib/eal/include/rte_branch_prediction.h
@@ -25,7 +25,11 @@
  *
  */
 #ifndef likely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define likely(x)	__builtin_expect(!!(x), 1)
+#else
+#define likely(x)	(x)
+#endif
 #endif /* likely */
 
 /**
@@ -39,7 +43,11 @@
  *
  */
 #ifndef unlikely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	(x)
+#endif
 #endif /* unlikely */
 
 #ifdef __cplusplus
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 2f464e3..dd41315 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -65,7 +65,11 @@
 /**
  * Force alignment
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
+#else
+#define __rte_aligned(a)
+#endif
 
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
@@ -85,11 +89,20 @@
 /**
  * Macro to mark a type that is not subject to type-based aliasing rules
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_may_alias __attribute__((__may_alias__))
+#else
+#define __rte_may_alias
+#endif
 
 /******* Macro to mark functions and fields scheduled for removal *****/
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_deprecated	__attribute__((__deprecated__))
 #define __rte_deprecated_msg(msg)	__attribute__((__deprecated__(msg)))
+#else
+#define __rte_deprecated
+#define __rte_deprecated_msg(msg)
+#endif
 
 /**
  *  Macro to mark macros and defines scheduled for removal
@@ -110,14 +123,22 @@
 /**
  * Force symbol to be generated even if it appears to be unused.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_used __attribute__((used))
+#else
+#define __rte_used
+#endif
 
 /*********** Macros to eliminate unused variable warnings ********/
 
 /**
  * short definition to mark a function parameter unused
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_unused __attribute__((__unused__))
+#else
+#define __rte_unused
+#endif
 
 /**
  * Mark pointer as restricted with regard to pointer aliasing.
@@ -141,6 +162,7 @@
  * even if the underlying stdio implementation is ANSI-compliant,
  * so this must be overridden.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #if RTE_CC_IS_GNU
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(gnu_printf, format_index, first_arg)))
@@ -148,6 +170,9 @@
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
+#else
+#define __rte_format_printf(format_index, first_arg)
+#endif
 
 /**
  * Tells compiler that the function returns a value that points to
@@ -222,7 +247,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /**
  * Hint never returning function
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_noreturn __attribute__((noreturn))
+#else
+#define __rte_noreturn
+#endif
 
 /**
  * Issue a warning in case the function's return value is ignored.
@@ -247,12 +276,20 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  }
  * @endcode
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_warn_unused_result __attribute__((warn_unused_result))
+#else
+#define __rte_warn_unused_result
+#endif
 
 /**
  * Force a function to be inlined
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_always_inline inline __attribute__((always_inline))
+#else
+#define __rte_always_inline
+#endif
 
 /**
  * Force a function to be noinlined
@@ -437,7 +474,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 #define RTE_CACHE_LINE_MIN_SIZE 64
 
 /** Force alignment to cache line. */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
+#else
+#define __rte_cache_aligned
+#endif
 
 /** Force minimum cache line alignment. */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
diff --git a/lib/eal/include/rte_compat.h b/lib/eal/include/rte_compat.h
index fc9fbaa..6a4b5ee 100644
--- a/lib/eal/include/rte_compat.h
+++ b/lib/eal/include/rte_compat.h
@@ -12,14 +12,22 @@
 
 #ifndef ALLOW_EXPERIMENTAL_API
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((deprecated("Symbol is not yet part of stable ABI"), \
 section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #endif
 
@@ -30,23 +38,35 @@
 
 #if !defined ALLOW_INTERNAL_API && __has_attribute(error) /* For GCC */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((error("Symbol is not public ABI"), \
 section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #elif !defined ALLOW_INTERNAL_API && __has_attribute(diagnose_if) /* For clang */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 _Pragma("GCC diagnostic push") \
 _Pragma("GCC diagnostic ignored \"-Wgcc-compat\"") \
 __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
 section(".text.internal"))) \
 _Pragma("GCC diagnostic pop")
+#else
+#define __rte_internal
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #endif
 
-- 
1.8.3.1


^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3 11/11] telemetry: avoid expanding versioned symbol macros on msvc
  2023-04-11 10:24  0%     ` Bruce Richardson
@ 2023-04-11 20:34  0%       ` Tyler Retzlaff
  2023-04-12  8:50  0%         ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-04-11 20:34 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, david.marchand, thomas, mb, konstantin.ananyev

On Tue, Apr 11, 2023 at 11:24:07AM +0100, Bruce Richardson wrote:
> On Wed, Apr 05, 2023 at 05:45:19PM -0700, Tyler Retzlaff wrote:
> > Windows does not support versioned symbols. Fortunately Windows also
> > doesn't have an exported stable ABI.
> > 
> > Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
> > and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
> > functions.
> > 
> > Windows does have a way to achieve similar versioning for symbols but it
> > is not a simple #define so it will be done as a work package later.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> >  lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
> > index 2bac2de..284c16e 100644
> > --- a/lib/telemetry/telemetry_data.c
> > +++ b/lib/telemetry/telemetry_data.c
> > @@ -82,8 +82,16 @@
> >  /* mark the v23 function as the older version, and v24 as the default version */
> >  VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
> >  BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
> >  		int64_t x), rte_tel_data_add_array_int_v24);
> > +#else
> > +int
> > +rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
> > +{
> > +	return rte_tel_data_add_array_int_v24(d, x);
> > +}
> > +#endif
> >  
> 
> Can't see any general way to do this from the versioning header file, so
> agree that we need some changes here. Rather than defining a public
> funcion, we could keep the diff reduced by just using a macro alias here,
> right? For example:
> 
> #ifdef RTE_TOOLCHAIN_MSVC
> #define rte_tel_data_add_array_int rte_tel_data_add_array_int_v24
> #else
> MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
> 		int64_t x), rte_tel_data_add_array_int_v24);
> #endif
> 
> If this is a temporary measure, I'd tend towards the shortest solution that
> can work. However, no strong opinions, so, either using functions as you
> have it, or macros:

so i have to leave it as it is the reason being the version.map ->
exports.def generation does not handle this. the .def only contains the
rte_tel_data_add_array_int symbol. if we expand it away to the _v24 name
the link will fail.

let's consume the change as-is for now and i will work on the
generalized solution when changes are integrated that actually make the
windows dso/dll functional.

> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 1/3] security: introduce out of place support for inline ingress
  2023-04-11 10:04  4% ` [PATCH 1/3] " Nithin Dabilpuram
@ 2023-04-11 18:05  3%   ` Stephen Hemminger
  2023-04-18  8:33  4%     ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-04-11 18:05 UTC (permalink / raw)
  To: Nithin Dabilpuram; +Cc: Thomas Monjalon, Akhil Goyal, jerinj, dev

On Tue, 11 Apr 2023 15:34:07 +0530
Nithin Dabilpuram <ndabilpuram@marvell.com> wrote:

> diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
> index 4bacf9fcd9..866cd4e8ee 100644
> --- a/lib/security/rte_security.h
> +++ b/lib/security/rte_security.h
> @@ -275,6 +275,17 @@ struct rte_security_ipsec_sa_options {
>  	 */
>  	uint32_t ip_reassembly_en : 1;
>  
> +	/** Enable out of place processing on inline inbound packets.
> +	 *
> +	 * * 1: Enable driver to perform Out-of-place(OOP) processing for this inline
> +	 *      inbound SA if supported by driver. PMD need to register mbuf
> +	 *      dynamic field using rte_security_oop_dynfield_register()
> +	 *      and security session creation would fail if dynfield is not
> +	 *      registered successfully.
> +	 * * 0: Disable OOP processing for this session (default).
> +	 */
> +	uint32_t ingress_oop : 1;
> +
>  	/** Reserved bit fields for future extension
>  	 *
>  	 * User should ensure reserved_opts is cleared as it may change in
> @@ -282,7 +293,7 @@ struct rte_security_ipsec_sa_options {
>  	 *
>  	 * Note: Reduce number of bits in reserved_opts for every new option.
>  	 */
> -	uint32_t reserved_opts : 17;
> +	uint32_t reserved_opts : 16;
>  };

NAK
Let me repeat the reserved bit rant. YAGNI

Reserved space is not usable without ABI breakage unless the existing
code enforces that reserved space has to be zero.

Just saying "User should ensure reserved_opts is cleared" is not enough.



^ permalink raw reply	[relevance 3%]

* Re: [PATCH v3 11/11] telemetry: avoid expanding versioned symbol macros on msvc
  2023-04-06  0:45  3%   ` [PATCH v3 11/11] telemetry: avoid expanding versioned symbol macros on msvc Tyler Retzlaff
@ 2023-04-11 10:24  0%     ` Bruce Richardson
  2023-04-11 20:34  0%       ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-04-11 10:24 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, david.marchand, thomas, mb, konstantin.ananyev

On Wed, Apr 05, 2023 at 05:45:19PM -0700, Tyler Retzlaff wrote:
> Windows does not support versioned symbols. Fortunately Windows also
> doesn't have an exported stable ABI.
> 
> Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
> and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
> functions.
> 
> Windows does have a way to achieve similar versioning for symbols but it
> is not a simple #define so it will be done as a work package later.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>  lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
> index 2bac2de..284c16e 100644
> --- a/lib/telemetry/telemetry_data.c
> +++ b/lib/telemetry/telemetry_data.c
> @@ -82,8 +82,16 @@
>  /* mark the v23 function as the older version, and v24 as the default version */
>  VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
>  BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
> +#ifndef RTE_TOOLCHAIN_MSVC
>  MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
>  		int64_t x), rte_tel_data_add_array_int_v24);
> +#else
> +int
> +rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
> +{
> +	return rte_tel_data_add_array_int_v24(d, x);
> +}
> +#endif
>  

Can't see any general way to do this from the versioning header file, so
agree that we need some changes here. Rather than defining a public
funcion, we could keep the diff reduced by just using a macro alias here,
right? For example:

#ifdef RTE_TOOLCHAIN_MSVC
#define rte_tel_data_add_array_int rte_tel_data_add_array_int_v24
#else
MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
		int64_t x), rte_tel_data_add_array_int_v24);
#endif

If this is a temporary measure, I'd tend towards the shortest solution that
can work. However, no strong opinions, so, either using functions as you
have it, or macros:

Acked-by: Bruce Richardson <bruce.richardson@intel.com>


^ permalink raw reply	[relevance 0%]

* [PATCH 1/3] security: introduce out of place support for inline ingress
  2023-03-09  8:56  4% [RFC 1/2] security: introduce out of place support for inline ingress Nithin Dabilpuram
@ 2023-04-11 10:04  4% ` Nithin Dabilpuram
  2023-04-11 18:05  3%   ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Nithin Dabilpuram @ 2023-04-11 10:04 UTC (permalink / raw)
  To: Thomas Monjalon, Akhil Goyal; +Cc: jerinj, dev, Nithin Dabilpuram

Similar to out of place(OOP) processing support that exists for
Lookaside crypto/security sessions, Inline ingress security
sessions may also need out of place processing in usecases
where original encrypted packet needs to be retained for post
processing. So for NIC's which have such a kind of HW support,
a new SA option is provided to indicate whether OOP needs to
be enabled on that Inline ingress security session or not.

Since for inline ingress sessions, packet is not received by
CPU until the processing is done, we can only have per-SA
option and not per-packet option like Lookaside sessions.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 devtools/libabigail.abignore       |  4 +++
 lib/security/rte_security.c        | 17 +++++++++++++
 lib/security/rte_security.h        | 39 +++++++++++++++++++++++++++++-
 lib/security/rte_security_driver.h |  8 ++++++
 lib/security/version.map           |  2 ++
 5 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 3ff51509de..414baac060 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -40,3 +40,7 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+; Ignore change to reserved opts for new SA option
+[suppress_type]
+       name = rte_security_ipsec_sa_options
diff --git a/lib/security/rte_security.c b/lib/security/rte_security.c
index e102c55e55..c2199dd8db 100644
--- a/lib/security/rte_security.c
+++ b/lib/security/rte_security.c
@@ -27,7 +27,10 @@
 } while (0)
 
 #define RTE_SECURITY_DYNFIELD_NAME "rte_security_dynfield_metadata"
+#define RTE_SECURITY_OOP_DYNFIELD_NAME "rte_security_oop_dynfield_metadata"
+
 int rte_security_dynfield_offset = -1;
+int rte_security_oop_dynfield_offset = -1;
 
 int
 rte_security_dynfield_register(void)
@@ -42,6 +45,20 @@ rte_security_dynfield_register(void)
 	return rte_security_dynfield_offset;
 }
 
+int
+rte_security_oop_dynfield_register(void)
+{
+	static const struct rte_mbuf_dynfield dynfield_desc = {
+		.name = RTE_SECURITY_OOP_DYNFIELD_NAME,
+		.size = sizeof(rte_security_oop_dynfield_t),
+		.align = __alignof__(rte_security_oop_dynfield_t),
+	};
+
+	rte_security_oop_dynfield_offset =
+		rte_mbuf_dynfield_register(&dynfield_desc);
+	return rte_security_oop_dynfield_offset;
+}
+
 void *
 rte_security_session_create(struct rte_security_ctx *instance,
 			    struct rte_security_session_conf *conf,
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 4bacf9fcd9..866cd4e8ee 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -275,6 +275,17 @@ struct rte_security_ipsec_sa_options {
 	 */
 	uint32_t ip_reassembly_en : 1;
 
+	/** Enable out of place processing on inline inbound packets.
+	 *
+	 * * 1: Enable driver to perform Out-of-place(OOP) processing for this inline
+	 *      inbound SA if supported by driver. PMD need to register mbuf
+	 *      dynamic field using rte_security_oop_dynfield_register()
+	 *      and security session creation would fail if dynfield is not
+	 *      registered successfully.
+	 * * 0: Disable OOP processing for this session (default).
+	 */
+	uint32_t ingress_oop : 1;
+
 	/** Reserved bit fields for future extension
 	 *
 	 * User should ensure reserved_opts is cleared as it may change in
@@ -282,7 +293,7 @@ struct rte_security_ipsec_sa_options {
 	 *
 	 * Note: Reduce number of bits in reserved_opts for every new option.
 	 */
-	uint32_t reserved_opts : 17;
+	uint32_t reserved_opts : 16;
 };
 
 /** IPSec security association direction */
@@ -812,6 +823,13 @@ typedef uint64_t rte_security_dynfield_t;
 /** Dynamic mbuf field for device-specific metadata */
 extern int rte_security_dynfield_offset;
 
+/** Out-of-Place(OOP) processing field type */
+typedef struct rte_mbuf *rte_security_oop_dynfield_t;
+/** Dynamic mbuf field for pointer to original mbuf for
+ * OOP processing session.
+ */
+extern int rte_security_oop_dynfield_offset;
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
@@ -834,6 +852,25 @@ rte_security_dynfield(struct rte_mbuf *mbuf)
 		rte_security_dynfield_t *);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get pointer to mbuf field for original mbuf pointer when
+ * Out-Of-Place(OOP) processing is enabled in security session.
+ *
+ * @param       mbuf    packet to access
+ * @return pointer to mbuf field
+ */
+__rte_experimental
+static inline rte_security_oop_dynfield_t *
+rte_security_oop_dynfield(struct rte_mbuf *mbuf)
+{
+	return RTE_MBUF_DYNFIELD(mbuf,
+			rte_security_oop_dynfield_offset,
+			rte_security_oop_dynfield_t *);
+}
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
diff --git a/lib/security/rte_security_driver.h b/lib/security/rte_security_driver.h
index 421e6f7780..91e7786ab7 100644
--- a/lib/security/rte_security_driver.h
+++ b/lib/security/rte_security_driver.h
@@ -190,6 +190,14 @@ typedef int (*security_macsec_sa_stats_get_t)(void *device, uint16_t sa_id,
 __rte_internal
 int rte_security_dynfield_register(void);
 
+/**
+ * @internal
+ * Register mbuf dynamic field for Security inline ingress Out-of-Place(OOP)
+ * processing.
+ */
+__rte_internal
+int rte_security_oop_dynfield_register(void);
+
 /**
  * Update the mbuf with provided metadata.
  *
diff --git a/lib/security/version.map b/lib/security/version.map
index 07dcce9ffb..59a95f40bd 100644
--- a/lib/security/version.map
+++ b/lib/security/version.map
@@ -23,10 +23,12 @@ EXPERIMENTAL {
 	rte_security_macsec_sc_stats_get;
 	rte_security_session_stats_get;
 	rte_security_session_update;
+	rte_security_oop_dynfield_offset;
 };
 
 INTERNAL {
 	global:
 
 	rte_security_dynfield_register;
+	rte_security_oop_dynfield_register;
 };
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v2] version: 23.07-rc0
  2023-04-03  9:37 10% ` [PATCH v2] " David Marchand
@ 2023-04-06  7:44  0%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-04-06  7:44 UTC (permalink / raw)
  To: dev; +Cc: thomas

On Mon, Apr 3, 2023 at 11:45 AM David Marchand
<david.marchand@redhat.com> wrote:
>
> Start a new release cycle with empty release notes.
> Bump version and ABI minor.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Applied!


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* [PATCH v3 08/11] eal: expand most macros to empty when using msvc
  @ 2023-04-06  0:45  6%   ` Tyler Retzlaff
  2023-04-06  0:45  3%   ` [PATCH v3 11/11] telemetry: avoid expanding versioned symbol macros on msvc Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-06  0:45 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

For now expand a lot of common rte macros empty. The catch here is we
need to test that most of the macros do what they should but at the same
time they are blocking work needed to bootstrap of the unit tests.

Later we will return and provide (where possible) expansions that work
correctly for msvc and where not possible provide some alternate macros
to achieve the same outcome.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_branch_prediction.h |  8 ++++++++
 lib/eal/include/rte_common.h            | 33 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_compat.h            | 20 ++++++++++++++++++++
 3 files changed, 61 insertions(+)

diff --git a/lib/eal/include/rte_branch_prediction.h b/lib/eal/include/rte_branch_prediction.h
index 0256a9d..d9a0224 100644
--- a/lib/eal/include/rte_branch_prediction.h
+++ b/lib/eal/include/rte_branch_prediction.h
@@ -25,7 +25,11 @@
  *
  */
 #ifndef likely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define likely(x)	__builtin_expect(!!(x), 1)
+#else
+#define likely(x)	(x)
+#endif
 #endif /* likely */
 
 /**
@@ -39,7 +43,11 @@
  *
  */
 #ifndef unlikely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	(x)
+#endif
 #endif /* unlikely */
 
 #ifdef __cplusplus
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 2f464e3..a724e22 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -65,7 +65,11 @@
 /**
  * Force alignment
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
+#else
+#define __rte_aligned(a)
+#endif
 
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
@@ -88,8 +92,13 @@
 #define __rte_may_alias __attribute__((__may_alias__))
 
 /******* Macro to mark functions and fields scheduled for removal *****/
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_deprecated	__attribute__((__deprecated__))
 #define __rte_deprecated_msg(msg)	__attribute__((__deprecated__(msg)))
+#else
+#define __rte_deprecated
+#define __rte_deprecated_msg(msg)
+#endif
 
 /**
  *  Macro to mark macros and defines scheduled for removal
@@ -117,7 +126,11 @@
 /**
  * short definition to mark a function parameter unused
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_unused __attribute__((__unused__))
+#else
+#define __rte_unused
+#endif
 
 /**
  * Mark pointer as restricted with regard to pointer aliasing.
@@ -141,6 +154,7 @@
  * even if the underlying stdio implementation is ANSI-compliant,
  * so this must be overridden.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #if RTE_CC_IS_GNU
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(gnu_printf, format_index, first_arg)))
@@ -148,6 +162,9 @@
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
+#else
+#define __rte_format_printf(format_index, first_arg)
+#endif
 
 /**
  * Tells compiler that the function returns a value that points to
@@ -222,7 +239,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /**
  * Hint never returning function
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_noreturn __attribute__((noreturn))
+#else
+#define __rte_noreturn
+#endif
 
 /**
  * Issue a warning in case the function's return value is ignored.
@@ -247,12 +268,20 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  }
  * @endcode
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_warn_unused_result __attribute__((warn_unused_result))
+#else
+#define __rte_warn_unused_result
+#endif
 
 /**
  * Force a function to be inlined
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_always_inline inline __attribute__((always_inline))
+#else
+#define __rte_always_inline
+#endif
 
 /**
  * Force a function to be noinlined
@@ -437,7 +466,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 #define RTE_CACHE_LINE_MIN_SIZE 64
 
 /** Force alignment to cache line. */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
+#else
+#define __rte_cache_aligned
+#endif
 
 /** Force minimum cache line alignment. */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
diff --git a/lib/eal/include/rte_compat.h b/lib/eal/include/rte_compat.h
index fc9fbaa..6a4b5ee 100644
--- a/lib/eal/include/rte_compat.h
+++ b/lib/eal/include/rte_compat.h
@@ -12,14 +12,22 @@
 
 #ifndef ALLOW_EXPERIMENTAL_API
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((deprecated("Symbol is not yet part of stable ABI"), \
 section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #endif
 
@@ -30,23 +38,35 @@
 
 #if !defined ALLOW_INTERNAL_API && __has_attribute(error) /* For GCC */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((error("Symbol is not public ABI"), \
 section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #elif !defined ALLOW_INTERNAL_API && __has_attribute(diagnose_if) /* For clang */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 _Pragma("GCC diagnostic push") \
 _Pragma("GCC diagnostic ignored \"-Wgcc-compat\"") \
 __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
 section(".text.internal"))) \
 _Pragma("GCC diagnostic pop")
+#else
+#define __rte_internal
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #endif
 
-- 
1.8.3.1


^ permalink raw reply	[relevance 6%]

* [PATCH v3 11/11] telemetry: avoid expanding versioned symbol macros on msvc
    2023-04-06  0:45  6%   ` [PATCH v3 08/11] eal: expand most macros to empty when using msvc Tyler Retzlaff
@ 2023-04-06  0:45  3%   ` Tyler Retzlaff
  2023-04-11 10:24  0%     ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-04-06  0:45 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

Windows does not support versioned symbols. Fortunately Windows also
doesn't have an exported stable ABI.

Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
functions.

Windows does have a way to achieve similar versioning for symbols but it
is not a simple #define so it will be done as a work package later.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
index 2bac2de..284c16e 100644
--- a/lib/telemetry/telemetry_data.c
+++ b/lib/telemetry/telemetry_data.c
@@ -82,8 +82,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
 		int64_t x), rte_tel_data_add_array_int_v24);
+#else
+int
+rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
+{
+	return rte_tel_data_add_array_int_v24(d, x);
+}
+#endif
 
 int
 rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
@@ -220,8 +228,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
 		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
+#else
+int
+rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
+{
+	return rte_tel_data_add_dict_int_v24(d, name, val);
+}
+#endif
 
 int
 rte_tel_data_add_dict_uint(struct rte_tel_data *d,
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH] MAINTAINERS: sort file entries
@ 2023-04-05 23:12 17% Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-04-05 23:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Thomas Monjalon

The list of file paths (F:) is only partially sorted
in some cases.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df23e50999f..5fa432b00aac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -83,26 +83,26 @@ Developers and Maintainers Tools
 M: Thomas Monjalon <thomas@monjalon.net>
 F: MAINTAINERS
 F: devtools/build-dict.sh
-F: devtools/check-abi.sh
 F: devtools/check-abi-version.sh
+F: devtools/check-abi.sh
 F: devtools/check-doc-vs-code.sh
 F: devtools/check-dup-includes.sh
-F: devtools/check-maintainers.sh
 F: devtools/check-forbidden-tokens.awk
 F: devtools/check-git-log.sh
+F: devtools/check-maintainers.sh
 F: devtools/check-spdx-tag.sh
 F: devtools/check-symbol-change.sh
 F: devtools/check-symbol-maps.sh
 F: devtools/checkpatches.sh
 F: devtools/get-maintainer.sh
 F: devtools/git-log-fixes.sh
+F: devtools/libabigail.abignore
 F: devtools/load-devel-config
 F: devtools/parse-flow-support.sh
 F: devtools/process-iwyu.py
 F: devtools/update-abi.sh
 F: devtools/update-patches.py
 F: devtools/update_version_map_abi.py
-F: devtools/libabigail.abignore
 F: devtools/words-case.txt
 F: license/
 F: .editorconfig
@@ -114,16 +114,16 @@ F: Makefile
 F: meson.build
 F: meson_options.txt
 F: config/
+F: buildtools/call-sphinx-build.py
 F: buildtools/check-symbols.sh
 F: buildtools/chkincs/
-F: buildtools/call-sphinx-build.py
 F: buildtools/get-cpu-count.py
 F: buildtools/get-numa-count.py
 F: buildtools/list-dir-globs.py
 F: buildtools/map-list-symbol.sh
 F: buildtools/pkg-config/
-F: buildtools/symlink-drivers-solibs.sh
 F: buildtools/symlink-drivers-solibs.py
+F: buildtools/symlink-drivers-solibs.sh
 F: devtools/test-meson-builds.sh
 F: devtools/check-meson.py
 
-- 
2.39.2


^ permalink raw reply	[relevance 17%]

* Re: [PATCH v2 9/9] telemetry: avoid expanding versioned symbol macros on msvc
  2023-04-05 16:02  0%       ` Tyler Retzlaff
@ 2023-04-05 16:17  0%         ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2023-04-05 16:17 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, david.marchand, thomas, mb, konstantin.ananyev

On Wed, Apr 05, 2023 at 09:02:10AM -0700, Tyler Retzlaff wrote:
> On Wed, Apr 05, 2023 at 11:56:05AM +0100, Bruce Richardson wrote:
> > On Tue, Apr 04, 2023 at 01:07:27PM -0700, Tyler Retzlaff wrote:
> > > Windows does not support versioned symbols. Fortunately Windows also
> > > doesn't have an exported stable ABI.
> > > 
> > > Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
> > > and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
> > > functions.
> > > 
> > > Windows does have a way to achieve similar versioning for symbols but it
> > > is not a simple #define so it will be done as a work package later.
> > > 
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > 
> > Does this require a change in telemetry itself? Can it be done via the
> > header file with the versioning macros in it, so it would apply to any
> > other versioned functions we have in DPDK?
> 
> i didn't spend a lot of time thinking if the existing macros could be
> made to expand in the way needed. there is a way of doing versioning on
> windows but it is foreign to how this symbol versioning scheme works so
> i plan to investigate it separately after i get unit tests running.
> 
> for now i know what i'm doing is ugly but i need to get protection of
> unit tests so i'm doing minimal changes to get to that point. if you're
> not comfortable with this going in on a temporary basis i can remove it
> from this series and we can work on it as a separated patch set.
> 
> my bar is pretty low here, as long as it doesn't break any existing
> linux/gcc/clang etc ok, if msvc is not right i'll take a second pass
> and design each stop-gap properly. it already doesn't work so things
> aren't made worse.
> 
> let me know if i need to carve this out of the series.
> 
It's not that ugly. :-) If no other clear solution is apparent, I can certainly
live with this.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 9/9] telemetry: avoid expanding versioned symbol macros on msvc
  2023-04-05 10:56  0%     ` Bruce Richardson
@ 2023-04-05 16:02  0%       ` Tyler Retzlaff
  2023-04-05 16:17  0%         ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-04-05 16:02 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, david.marchand, thomas, mb, konstantin.ananyev

On Wed, Apr 05, 2023 at 11:56:05AM +0100, Bruce Richardson wrote:
> On Tue, Apr 04, 2023 at 01:07:27PM -0700, Tyler Retzlaff wrote:
> > Windows does not support versioned symbols. Fortunately Windows also
> > doesn't have an exported stable ABI.
> > 
> > Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
> > and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
> > functions.
> > 
> > Windows does have a way to achieve similar versioning for symbols but it
> > is not a simple #define so it will be done as a work package later.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
> Does this require a change in telemetry itself? Can it be done via the
> header file with the versioning macros in it, so it would apply to any
> other versioned functions we have in DPDK?

i didn't spend a lot of time thinking if the existing macros could be
made to expand in the way needed. there is a way of doing versioning on
windows but it is foreign to how this symbol versioning scheme works so
i plan to investigate it separately after i get unit tests running.

for now i know what i'm doing is ugly but i need to get protection of
unit tests so i'm doing minimal changes to get to that point. if you're
not comfortable with this going in on a temporary basis i can remove it
from this series and we can work on it as a separated patch set.

my bar is pretty low here, as long as it doesn't break any existing
linux/gcc/clang etc ok, if msvc is not right i'll take a second pass
and design each stop-gap properly. it already doesn't work so things
aren't made worse.

let me know if i need to carve this out of the series.

ty

> 
> /Bruce
> 
> > ---
> >  lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
> > index 2bac2de..284c16e 100644
> > --- a/lib/telemetry/telemetry_data.c
> > +++ b/lib/telemetry/telemetry_data.c
> > @@ -82,8 +82,16 @@
> >  /* mark the v23 function as the older version, and v24 as the default version */
> >  VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
> >  BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
> >  		int64_t x), rte_tel_data_add_array_int_v24);
> > +#else
> > +int
> > +rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
> > +{
> > +	return rte_tel_data_add_array_int_v24(d, x);
> > +}
> > +#endif
> >  
> >  int
> >  rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
> > @@ -220,8 +228,16 @@
> >  /* mark the v23 function as the older version, and v24 as the default version */
> >  VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
> >  BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
> > +#ifndef RTE_TOOLCHAIN_MSVC
> >  MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
> >  		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
> > +#else
> > +int
> > +rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
> > +{
> > +	return rte_tel_data_add_dict_int_v24(d, name, val);
> > +}
> > +#endif
> >  
> >  int
> >  rte_tel_data_add_dict_uint(struct rte_tel_data *d,
> > -- 
> > 1.8.3.1
> > 

^ permalink raw reply	[relevance 0%]

* [PATCH v2 0/3] vhost: add device op to offload the interrupt kick
@ 2023-04-05 12:40  3% Eelco Chaudron
    2023-05-08 13:58  0% ` [PATCH v2 0/3] " Eelco Chaudron
  0 siblings, 2 replies; 200+ results
From: Eelco Chaudron @ 2023-04-05 12:40 UTC (permalink / raw)
  To: maxime.coquelin, chenbo.xia; +Cc: dev

This series adds an operation callback which gets called every time the
library wants to call eventfd_write(). This eventfd_write() call could
result in a system call, which could potentially block the PMD thread.

The callback function can decide whether it's ok to handle the
eventfd_write() now or have the newly introduced function,
rte_vhost_notify_guest(), called at a later time.

This can be used by 3rd party applications, like OVS, to avoid system
calls being called as part of the PMD threads.

v2: - Used vhost_virtqueue->index to find index for operation.
    - Aligned function name to VDUSE RFC patchset.
    - Added error and offload statistics counter.
    - Mark new API as experimental.
    - Change the virtual queue spin lock to read/write spin lock.
    - Made shared counters atomic.
    - Add versioned rte_vhost_driver_callback_register() for
      ABI compliance.

Eelco Chaudron (3):
      vhost: Change vhost_virtqueue access lock to a read/write one.
      vhost: make the guest_notifications statistic counter atomic.
      vhost: add device op to offload the interrupt kick


 lib/eal/include/generic/rte_rwlock.h | 17 +++++
 lib/vhost/meson.build                |  2 +
 lib/vhost/rte_vhost.h                | 23 ++++++-
 lib/vhost/socket.c                   | 72 ++++++++++++++++++++--
 lib/vhost/version.map                |  9 +++
 lib/vhost/vhost.c                    | 92 +++++++++++++++++++++-------
 lib/vhost/vhost.h                    | 70 ++++++++++++++-------
 lib/vhost/vhost_user.c               | 14 ++---
 lib/vhost/virtio_net.c               | 90 +++++++++++++--------------
 9 files changed, 288 insertions(+), 101 deletions(-)


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 9/9] telemetry: avoid expanding versioned symbol macros on msvc
  2023-04-04 20:07  3%   ` [PATCH v2 9/9] telemetry: avoid expanding versioned symbol macros on msvc Tyler Retzlaff
@ 2023-04-05 10:56  0%     ` Bruce Richardson
  2023-04-05 16:02  0%       ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-04-05 10:56 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, david.marchand, thomas, mb, konstantin.ananyev

On Tue, Apr 04, 2023 at 01:07:27PM -0700, Tyler Retzlaff wrote:
> Windows does not support versioned symbols. Fortunately Windows also
> doesn't have an exported stable ABI.
> 
> Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
> and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
> functions.
> 
> Windows does have a way to achieve similar versioning for symbols but it
> is not a simple #define so it will be done as a work package later.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

Does this require a change in telemetry itself? Can it be done via the
header file with the versioning macros in it, so it would apply to any
other versioned functions we have in DPDK?

/Bruce

> ---
>  lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
> index 2bac2de..284c16e 100644
> --- a/lib/telemetry/telemetry_data.c
> +++ b/lib/telemetry/telemetry_data.c
> @@ -82,8 +82,16 @@
>  /* mark the v23 function as the older version, and v24 as the default version */
>  VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
>  BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
> +#ifndef RTE_TOOLCHAIN_MSVC
>  MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
>  		int64_t x), rte_tel_data_add_array_int_v24);
> +#else
> +int
> +rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
> +{
> +	return rte_tel_data_add_array_int_v24(d, x);
> +}
> +#endif
>  
>  int
>  rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
> @@ -220,8 +228,16 @@
>  /* mark the v23 function as the older version, and v24 as the default version */
>  VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
>  BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
> +#ifndef RTE_TOOLCHAIN_MSVC
>  MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
>  		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
> +#else
> +int
> +rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
> +{
> +	return rte_tel_data_add_dict_int_v24(d, name, val);
> +}
> +#endif
>  
>  int
>  rte_tel_data_add_dict_uint(struct rte_tel_data *d,
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[relevance 0%]

* [PATCH v2 6/9] eal: expand most macros to empty when using msvc
  @ 2023-04-04 20:07  6%   ` Tyler Retzlaff
  2023-04-04 20:07  3%   ` [PATCH v2 9/9] telemetry: avoid expanding versioned symbol macros on msvc Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-04 20:07 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

For now expand a lot of common rte macros empty. The catch here is we
need to test that most of the macros do what they should but at the same
time they are blocking work needed to bootstrap of the unit tests.

Later we will return and provide (where possible) expansions that work
correctly for msvc and where not possible provide some alternate macros
to achieve the same outcome.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_branch_prediction.h |  8 ++++++++
 lib/eal/include/rte_common.h            | 33 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_compat.h            | 20 ++++++++++++++++++++
 3 files changed, 61 insertions(+)

diff --git a/lib/eal/include/rte_branch_prediction.h b/lib/eal/include/rte_branch_prediction.h
index 0256a9d..3589c97 100644
--- a/lib/eal/include/rte_branch_prediction.h
+++ b/lib/eal/include/rte_branch_prediction.h
@@ -25,7 +25,11 @@
  *
  */
 #ifndef likely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define likely(x)	__builtin_expect(!!(x), 1)
+#else
+#define likely(x)	(!!(x) == 1)
+#endif
 #endif /* likely */
 
 /**
@@ -39,7 +43,11 @@
  *
  */
 #ifndef unlikely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	(!!(x) == 0)
+#endif
 #endif /* unlikely */
 
 #ifdef __cplusplus
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 2f464e3..a724e22 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -65,7 +65,11 @@
 /**
  * Force alignment
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
+#else
+#define __rte_aligned(a)
+#endif
 
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
@@ -88,8 +92,13 @@
 #define __rte_may_alias __attribute__((__may_alias__))
 
 /******* Macro to mark functions and fields scheduled for removal *****/
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_deprecated	__attribute__((__deprecated__))
 #define __rte_deprecated_msg(msg)	__attribute__((__deprecated__(msg)))
+#else
+#define __rte_deprecated
+#define __rte_deprecated_msg(msg)
+#endif
 
 /**
  *  Macro to mark macros and defines scheduled for removal
@@ -117,7 +126,11 @@
 /**
  * short definition to mark a function parameter unused
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_unused __attribute__((__unused__))
+#else
+#define __rte_unused
+#endif
 
 /**
  * Mark pointer as restricted with regard to pointer aliasing.
@@ -141,6 +154,7 @@
  * even if the underlying stdio implementation is ANSI-compliant,
  * so this must be overridden.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #if RTE_CC_IS_GNU
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(gnu_printf, format_index, first_arg)))
@@ -148,6 +162,9 @@
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
+#else
+#define __rte_format_printf(format_index, first_arg)
+#endif
 
 /**
  * Tells compiler that the function returns a value that points to
@@ -222,7 +239,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /**
  * Hint never returning function
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_noreturn __attribute__((noreturn))
+#else
+#define __rte_noreturn
+#endif
 
 /**
  * Issue a warning in case the function's return value is ignored.
@@ -247,12 +268,20 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  }
  * @endcode
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_warn_unused_result __attribute__((warn_unused_result))
+#else
+#define __rte_warn_unused_result
+#endif
 
 /**
  * Force a function to be inlined
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_always_inline inline __attribute__((always_inline))
+#else
+#define __rte_always_inline
+#endif
 
 /**
  * Force a function to be noinlined
@@ -437,7 +466,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 #define RTE_CACHE_LINE_MIN_SIZE 64
 
 /** Force alignment to cache line. */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
+#else
+#define __rte_cache_aligned
+#endif
 
 /** Force minimum cache line alignment. */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
diff --git a/lib/eal/include/rte_compat.h b/lib/eal/include/rte_compat.h
index fc9fbaa..6a4b5ee 100644
--- a/lib/eal/include/rte_compat.h
+++ b/lib/eal/include/rte_compat.h
@@ -12,14 +12,22 @@
 
 #ifndef ALLOW_EXPERIMENTAL_API
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((deprecated("Symbol is not yet part of stable ABI"), \
 section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #endif
 
@@ -30,23 +38,35 @@
 
 #if !defined ALLOW_INTERNAL_API && __has_attribute(error) /* For GCC */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((error("Symbol is not public ABI"), \
 section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #elif !defined ALLOW_INTERNAL_API && __has_attribute(diagnose_if) /* For clang */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 _Pragma("GCC diagnostic push") \
 _Pragma("GCC diagnostic ignored \"-Wgcc-compat\"") \
 __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
 section(".text.internal"))) \
 _Pragma("GCC diagnostic pop")
+#else
+#define __rte_internal
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #endif
 
-- 
1.8.3.1


^ permalink raw reply	[relevance 6%]

* [PATCH v2 9/9] telemetry: avoid expanding versioned symbol macros on msvc
    2023-04-04 20:07  6%   ` [PATCH v2 6/9] eal: expand most macros to empty when using msvc Tyler Retzlaff
@ 2023-04-04 20:07  3%   ` Tyler Retzlaff
  2023-04-05 10:56  0%     ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-04-04 20:07 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, david.marchand, thomas, mb, konstantin.ananyev,
	Tyler Retzlaff

Windows does not support versioned symbols. Fortunately Windows also
doesn't have an exported stable ABI.

Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
functions.

Windows does have a way to achieve similar versioning for symbols but it
is not a simple #define so it will be done as a work package later.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
index 2bac2de..284c16e 100644
--- a/lib/telemetry/telemetry_data.c
+++ b/lib/telemetry/telemetry_data.c
@@ -82,8 +82,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
 		int64_t x), rte_tel_data_add_array_int_v24);
+#else
+int
+rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
+{
+	return rte_tel_data_add_array_int_v24(d, x);
+}
+#endif
 
 int
 rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
@@ -220,8 +228,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
 		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
+#else
+int
+rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
+{
+	return rte_tel_data_add_dict_int_v24(d, name, val);
+}
+#endif
 
 int
 rte_tel_data_add_dict_uint(struct rte_tel_data *d,
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH 6/9] eal: expand most macros to empty when using msvc
  @ 2023-04-03 21:52  6% ` Tyler Retzlaff
  2023-04-03 21:52  3% ` [PATCH 9/9] telemetry: avoid expanding versioned symbol macros on msvc Tyler Retzlaff
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-03 21:52 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, david.marchand, thomas, mb, Tyler Retzlaff

For now expand a lot of common rte macros empty. The catch here is we
need to test that most of the macros do what they should but at the same
time they are blocking work needed to bootstrap of the unit tests.

Later we will return and provide (where possible) expansions that work
correctly for msvc and where not possible provide some alternate macros
to achieve the same outcome.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_branch_prediction.h |  8 ++++++++
 lib/eal/include/rte_common.h            | 33 +++++++++++++++++++++++++++++++++
 lib/eal/include/rte_compat.h            | 20 ++++++++++++++++++++
 3 files changed, 61 insertions(+)

diff --git a/lib/eal/include/rte_branch_prediction.h b/lib/eal/include/rte_branch_prediction.h
index 0256a9d..3589c97 100644
--- a/lib/eal/include/rte_branch_prediction.h
+++ b/lib/eal/include/rte_branch_prediction.h
@@ -25,7 +25,11 @@
  *
  */
 #ifndef likely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define likely(x)	__builtin_expect(!!(x), 1)
+#else
+#define likely(x)	(!!(x) == 1)
+#endif
 #endif /* likely */
 
 /**
@@ -39,7 +43,11 @@
  *
  */
 #ifndef unlikely
+#ifndef RTE_TOOLCHAIN_MSVC
 #define unlikely(x)	__builtin_expect(!!(x), 0)
+#else
+#define unlikely(x)	(!!(x) == 0)
+#endif
 #endif /* unlikely */
 
 #ifdef __cplusplus
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index 2f464e3..a724e22 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -65,7 +65,11 @@
 /**
  * Force alignment
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_aligned(a) __attribute__((__aligned__(a)))
+#else
+#define __rte_aligned(a)
+#endif
 
 #ifdef RTE_ARCH_STRICT_ALIGN
 typedef uint64_t unaligned_uint64_t __rte_aligned(1);
@@ -88,8 +92,13 @@
 #define __rte_may_alias __attribute__((__may_alias__))
 
 /******* Macro to mark functions and fields scheduled for removal *****/
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_deprecated	__attribute__((__deprecated__))
 #define __rte_deprecated_msg(msg)	__attribute__((__deprecated__(msg)))
+#else
+#define __rte_deprecated
+#define __rte_deprecated_msg(msg)
+#endif
 
 /**
  *  Macro to mark macros and defines scheduled for removal
@@ -117,7 +126,11 @@
 /**
  * short definition to mark a function parameter unused
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_unused __attribute__((__unused__))
+#else
+#define __rte_unused
+#endif
 
 /**
  * Mark pointer as restricted with regard to pointer aliasing.
@@ -141,6 +154,7 @@
  * even if the underlying stdio implementation is ANSI-compliant,
  * so this must be overridden.
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #if RTE_CC_IS_GNU
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(gnu_printf, format_index, first_arg)))
@@ -148,6 +162,9 @@
 #define __rte_format_printf(format_index, first_arg) \
 	__attribute__((format(printf, format_index, first_arg)))
 #endif
+#else
+#define __rte_format_printf(format_index, first_arg)
+#endif
 
 /**
  * Tells compiler that the function returns a value that points to
@@ -222,7 +239,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /**
  * Hint never returning function
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_noreturn __attribute__((noreturn))
+#else
+#define __rte_noreturn
+#endif
 
 /**
  * Issue a warning in case the function's return value is ignored.
@@ -247,12 +268,20 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
  *  }
  * @endcode
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_warn_unused_result __attribute__((warn_unused_result))
+#else
+#define __rte_warn_unused_result
+#endif
 
 /**
  * Force a function to be inlined
  */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_always_inline inline __attribute__((always_inline))
+#else
+#define __rte_always_inline
+#endif
 
 /**
  * Force a function to be noinlined
@@ -437,7 +466,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 #define RTE_CACHE_LINE_MIN_SIZE 64
 
 /** Force alignment to cache line. */
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_cache_aligned __rte_aligned(RTE_CACHE_LINE_SIZE)
+#else
+#define __rte_cache_aligned
+#endif
 
 /** Force minimum cache line alignment. */
 #define __rte_cache_min_aligned __rte_aligned(RTE_CACHE_LINE_MIN_SIZE)
diff --git a/lib/eal/include/rte_compat.h b/lib/eal/include/rte_compat.h
index fc9fbaa..6a4b5ee 100644
--- a/lib/eal/include/rte_compat.h
+++ b/lib/eal/include/rte_compat.h
@@ -12,14 +12,22 @@
 
 #ifndef ALLOW_EXPERIMENTAL_API
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((deprecated("Symbol is not yet part of stable ABI"), \
 section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_experimental \
 __attribute__((section(".text.experimental")))
+#else
+#define __rte_experimental
+#endif
 
 #endif
 
@@ -30,23 +38,35 @@
 
 #if !defined ALLOW_INTERNAL_API && __has_attribute(error) /* For GCC */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((error("Symbol is not public ABI"), \
 section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #elif !defined ALLOW_INTERNAL_API && __has_attribute(diagnose_if) /* For clang */
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 _Pragma("GCC diagnostic push") \
 _Pragma("GCC diagnostic ignored \"-Wgcc-compat\"") \
 __attribute__((diagnose_if(1, "Symbol is not public ABI", "error"), \
 section(".text.internal"))) \
 _Pragma("GCC diagnostic pop")
+#else
+#define __rte_internal
+#endif
 
 #else
 
+#ifndef RTE_TOOLCHAIN_MSVC
 #define __rte_internal \
 __attribute__((section(".text.internal")))
+#else
+#define __rte_internal
+#endif
 
 #endif
 
-- 
1.8.3.1


^ permalink raw reply	[relevance 6%]

* [PATCH 9/9] telemetry: avoid expanding versioned symbol macros on msvc
    2023-04-03 21:52  6% ` [PATCH 6/9] eal: expand most macros to empty when using msvc Tyler Retzlaff
@ 2023-04-03 21:52  3% ` Tyler Retzlaff
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-04-03 21:52 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, david.marchand, thomas, mb, Tyler Retzlaff

Windows does not support versioned symbols. Fortunately Windows also
doesn't have an exported stable ABI.

Export rte_tel_data_add_array_int -> rte_tel_data_add_array_int_24
and rte_tel_data_add_dict_int -> rte_tel_data_add_dict_int_v24
functions.

Windows does have a way to achieve similar versioning for symbols but it
is not a simple #define so it will be done as a work package later.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/telemetry/telemetry_data.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/lib/telemetry/telemetry_data.c b/lib/telemetry/telemetry_data.c
index 2bac2de..284c16e 100644
--- a/lib/telemetry/telemetry_data.c
+++ b/lib/telemetry/telemetry_data.c
@@ -82,8 +82,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_array_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_array_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_array_int(struct rte_tel_data *d,
 		int64_t x), rte_tel_data_add_array_int_v24);
+#else
+int
+rte_tel_data_add_array_int(struct rte_tel_data *d, int64_t x)
+{
+	return rte_tel_data_add_array_int_v24(d, x);
+}
+#endif
 
 int
 rte_tel_data_add_array_uint(struct rte_tel_data *d, uint64_t x)
@@ -220,8 +228,16 @@
 /* mark the v23 function as the older version, and v24 as the default version */
 VERSION_SYMBOL(rte_tel_data_add_dict_int, _v23, 23);
 BIND_DEFAULT_SYMBOL(rte_tel_data_add_dict_int, _v24, 24);
+#ifndef RTE_TOOLCHAIN_MSVC
 MAP_STATIC_SYMBOL(int rte_tel_data_add_dict_int(struct rte_tel_data *d,
 		const char *name, int64_t val), rte_tel_data_add_dict_int_v24);
+#else
+int
+rte_tel_data_add_dict_int(struct rte_tel_data *d, const char *name, int64_t val)
+{
+	return rte_tel_data_add_dict_int_v24(d, name, val);
+}
+#endif
 
 int
 rte_tel_data_add_dict_uint(struct rte_tel_data *d,
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v2] devtools: add script to check for non inclusive naming
  @ 2023-04-03 14:47 14% ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-04-03 14:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Shell script to find use of words that not be used.
By default it prints matches.  The -q (quiet) option
is used to just count. There is also -l option
which lists lines matching (like grep -l).

Uses the word lists from Inclusive Naming Initiative
see https://inclusivenaming.org/word-lists/

Examples:
 $ ./devtools/check-naming-policy.sh -q
 Total files: 37 errors, 90 warnings, 2 suggestions

 $ ./devtools/check-naming-policy.sh -q -l lib/eal
 Total lines: 32 errors, 8 warnings, 0 suggestions

Add MAINTAINERS file entry for the new tool and resort
the list files back into to alphabetic order

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 - fix typo in words
   - add subtree (pathspec) option
   - update maintainers file (and fix alphabetic order)

 MAINTAINERS                     |   8 ++-
 devtools/check-naming-policy.sh | 107 ++++++++++++++++++++++++++++++++
 devtools/naming/tier1.txt       |   8 +++
 devtools/naming/tier2.txt       |   1 +
 devtools/naming/tier3.txt       |   4 ++
 5 files changed, 125 insertions(+), 3 deletions(-)
 create mode 100755 devtools/check-naming-policy.sh
 create mode 100644 devtools/naming/tier1.txt
 create mode 100644 devtools/naming/tier2.txt
 create mode 100644 devtools/naming/tier3.txt

diff --git a/MAINTAINERS b/MAINTAINERS
index 8df23e50999f..b5881113ba85 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -83,26 +83,28 @@ Developers and Maintainers Tools
 M: Thomas Monjalon <thomas@monjalon.net>
 F: MAINTAINERS
 F: devtools/build-dict.sh
-F: devtools/check-abi.sh
 F: devtools/check-abi-version.sh
+F: devtools/check-abi.sh
 F: devtools/check-doc-vs-code.sh
 F: devtools/check-dup-includes.sh
-F: devtools/check-maintainers.sh
 F: devtools/check-forbidden-tokens.awk
 F: devtools/check-git-log.sh
+F: devtools/check-maintainers.sh
+F: devtools/check-naming-policy.sh
 F: devtools/check-spdx-tag.sh
 F: devtools/check-symbol-change.sh
 F: devtools/check-symbol-maps.sh
 F: devtools/checkpatches.sh
 F: devtools/get-maintainer.sh
 F: devtools/git-log-fixes.sh
+F: devtools/libabigail.abignore
 F: devtools/load-devel-config
+F: devtools/naming/
 F: devtools/parse-flow-support.sh
 F: devtools/process-iwyu.py
 F: devtools/update-abi.sh
 F: devtools/update-patches.py
 F: devtools/update_version_map_abi.py
-F: devtools/libabigail.abignore
 F: devtools/words-case.txt
 F: license/
 F: .editorconfig
diff --git a/devtools/check-naming-policy.sh b/devtools/check-naming-policy.sh
new file mode 100755
index 000000000000..90347b415652
--- /dev/null
+++ b/devtools/check-naming-policy.sh
@@ -0,0 +1,107 @@
+#! /bin/bash
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2023 Stephen Hemminger
+#
+# This script scans the source tree and creates list of files
+# containing words that are recommended to bavoide by the
+# Inclusive Naming Initiative.
+# See: https://inclusivenaming.org/word-lists/
+#
+# The options are:
+#   -q = quiet mode, produces summary count only
+#   -l = show lines instead of files with recommendations
+#   -v = verbose, show a header between each tier
+#
+# Default is to scan all of DPDK source and documentation.
+# Optional pathspec can be used to limit specific tree.
+#
+#  Example:
+#    check-naming-policy.sh -q doc/*
+#
+
+errors=0
+warnings=0
+suggestions=0
+quiet=false
+veborse=false
+lines='-l'
+
+print_usage () {
+    echo "usage: $(basename $0) [-l] [-q] [-v] [<pathspec>]"
+    exit 1
+}
+
+# Locate word list files
+selfdir=$(dirname $(readlink -f $0))
+words=$selfdir/naming
+
+# These give false positives
+skipfiles=( ':^devtools/naming/' \
+	    ':^doc/guides/rel_notes/' \
+	    ':^doc/guides/contributing/coding_style.rst' \
+	    ':^doc/guides/prog_guide/glossary.rst' \
+)
+# These are obsolete
+skipfiles+=( \
+	    ':^drivers/net/liquidio/' \
+	    ':^drivers/net/bnx2x/' \
+	    ':^lib/table/' \
+	    ':^lib/port/' \
+	    ':^lib/pipeline/' \
+	    ':^examples/pipeline/' \
+)
+
+#
+# check_wordlist wordfile description
+check_wordlist() {
+    local list=$words/$1
+    local description=$2
+
+    git grep -i $lines -f $list -- ${skipfiles[@]} $pathspec > $tmpfile
+    count=$(wc -l < $tmpfile)
+    if ! $quiet; then
+	if [ $count -gt 0 ]; then
+	    if $verbose; then
+   		    echo $description
+		    echo $description | tr '[:print:]' '-'
+	    fi
+   	    cat $tmpfile
+	    echo
+	fi
+    fi
+    return $count
+}
+
+while getopts lqvh ARG ; do
+	case $ARG in
+		l ) lines= ;;
+		q ) quiet=true ;;
+		v ) verbose=true ;;
+		h ) print_usage ; exit 0 ;;
+		? ) print_usage ; exit 1 ;;
+	esac
+done
+shift $(($OPTIND - 1))
+
+tmpfile=$(mktemp -t dpdk.checknames.XXXXXX)
+trap 'rm -f -- "$tmpfile"' INT TERM HUP EXIT
+
+pathspec=$*
+
+check_wordlist tier1.txt "Tier 1: Replace immediately"
+errors=$?
+
+check_wordlist tier2.txt "Tier 2: Strongly consider replacing"
+warnings=$?
+
+check_wordlist tier3.txt "Tier 3: Recommend to replace"
+suggestions=$?
+
+if [ -z "$lines" ] ; then
+    echo -n "Total lines: "
+else
+    echo -n "Total files: "
+fi
+
+echo $errors "errors," $warnings "warnings," $suggestions "suggestions"
+exit $errors
diff --git a/devtools/naming/tier1.txt b/devtools/naming/tier1.txt
new file mode 100644
index 000000000000..a0e9b549c218
--- /dev/null
+++ b/devtools/naming/tier1.txt
@@ -0,0 +1,8 @@
+abort
+blackhat
+blacklist
+cripple
+master
+slave
+whitehat
+whitelist
diff --git a/devtools/naming/tier2.txt b/devtools/naming/tier2.txt
new file mode 100644
index 000000000000..cd4280d1625c
--- /dev/null
+++ b/devtools/naming/tier2.txt
@@ -0,0 +1 @@
+sanity
diff --git a/devtools/naming/tier3.txt b/devtools/naming/tier3.txt
new file mode 100644
index 000000000000..072f6468ea47
--- /dev/null
+++ b/devtools/naming/tier3.txt
@@ -0,0 +1,4 @@
+man.in.the.middle
+segregate
+segregation
+tribe
-- 
2.39.2


^ permalink raw reply	[relevance 14%]

* [PATCH v2] version: 23.07-rc0
  2023-04-03  6:59  9% [PATCH] version: 23.07-rc0 David Marchand
@ 2023-04-03  9:37 10% ` David Marchand
  2023-04-06  7:44  0%   ` David Marchand
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2023-04-03  9:37 UTC (permalink / raw)
  To: dev; +Cc: thomas

Start a new release cycle with empty release notes.
Bump version and ABI minor.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- fix ABI reference git repository,

---
 .github/workflows/build.yml            |   3 +-
 ABI_VERSION                            |   2 +-
 VERSION                                |   2 +-
 doc/guides/rel_notes/index.rst         |   1 +
 doc/guides/rel_notes/release_23_07.rst | 138 +++++++++++++++++++++++++
 5 files changed, 142 insertions(+), 4 deletions(-)
 create mode 100644 doc/guides/rel_notes/release_23_07.rst

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index e24e47a216..edd39cbd62 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -26,8 +26,7 @@ jobs:
       MINGW: ${{ matrix.config.cross == 'mingw' }}
       MINI: ${{ matrix.config.mini != '' }}
       PPC64LE: ${{ matrix.config.cross == 'ppc64le' }}
-      REF_GIT_REPO: https://dpdk.org/git/dpdk-stable
-      REF_GIT_TAG: v22.11.1
+      REF_GIT_TAG: v23.03
       RISCV64: ${{ matrix.config.cross == 'riscv64' }}
       RUN_TESTS: ${{ contains(matrix.config.checks, 'tests') }}
 
diff --git a/ABI_VERSION b/ABI_VERSION
index a12b18e437..3c8ce91a46 100644
--- a/ABI_VERSION
+++ b/ABI_VERSION
@@ -1 +1 @@
-23.1
+23.2
diff --git a/VERSION b/VERSION
index 533bf9aa13..d3c78a13bf 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-23.03.0
+23.07.0-rc0
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 57475a8158..d8dfa621ec 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,6 +8,7 @@ Release Notes
     :maxdepth: 1
     :numbered:
 
+    release_23_07
     release_23_03
     release_22_11
     release_22_07
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
new file mode 100644
index 0000000000..a9b1293689
--- /dev/null
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright 2023 The DPDK contributors
+
+.. include:: <isonum.txt>
+
+DPDK Release 23.07
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      ninja -C build doc
+      xdg-open build/doc/guides/html/rel_notes/release_23_07.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release.
+   Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense.
+     The description should be enough to allow someone scanning
+     the release notes to understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list
+     like this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     Suggested order in release notes items:
+     * Core libs (EAL, mempool, ring, mbuf, buses)
+     * Device abstraction libs and PMDs (ordered alphabetically by vendor name)
+       - ethdev (lib, PMDs)
+       - cryptodev (lib, PMDs)
+       - eventdev (lib, PMDs)
+       - etc
+     * Other libs
+     * Apps, Examples, Tools (if significant)
+
+     This section is a comment. Do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =======================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item
+     in the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the API change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the ABI change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+* No ABI change that would break compatibility with 22.11.
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue
+     in the present tense. Add information on any known workarounds.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested
+   with this release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
-- 
2.39.2


^ permalink raw reply	[relevance 10%]

* [PATCH] version: 23.07-rc0
@ 2023-04-03  6:59  9% David Marchand
  2023-04-03  9:37 10% ` [PATCH v2] " David Marchand
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2023-04-03  6:59 UTC (permalink / raw)
  To: dev; +Cc: thomas

Start a new release cycle with empty release notes.
Bump version and ABI minor.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 .github/workflows/build.yml            |   2 +-
 ABI_VERSION                            |   2 +-
 VERSION                                |   2 +-
 doc/guides/rel_notes/index.rst         |   1 +
 doc/guides/rel_notes/release_23_07.rst | 138 +++++++++++++++++++++++++
 5 files changed, 142 insertions(+), 3 deletions(-)
 create mode 100644 doc/guides/rel_notes/release_23_07.rst

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index e24e47a216..e824f8841c 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -27,7 +27,7 @@ jobs:
       MINI: ${{ matrix.config.mini != '' }}
       PPC64LE: ${{ matrix.config.cross == 'ppc64le' }}
       REF_GIT_REPO: https://dpdk.org/git/dpdk-stable
-      REF_GIT_TAG: v22.11.1
+      REF_GIT_TAG: v23.03
       RISCV64: ${{ matrix.config.cross == 'riscv64' }}
       RUN_TESTS: ${{ contains(matrix.config.checks, 'tests') }}
 
diff --git a/ABI_VERSION b/ABI_VERSION
index a12b18e437..3c8ce91a46 100644
--- a/ABI_VERSION
+++ b/ABI_VERSION
@@ -1 +1 @@
-23.1
+23.2
diff --git a/VERSION b/VERSION
index 533bf9aa13..d3c78a13bf 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-23.03.0
+23.07.0-rc0
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 57475a8158..d8dfa621ec 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,6 +8,7 @@ Release Notes
     :maxdepth: 1
     :numbered:
 
+    release_23_07
     release_23_03
     release_22_11
     release_22_07
diff --git a/doc/guides/rel_notes/release_23_07.rst b/doc/guides/rel_notes/release_23_07.rst
new file mode 100644
index 0000000000..a9b1293689
--- /dev/null
+++ b/doc/guides/rel_notes/release_23_07.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright 2023 The DPDK contributors
+
+.. include:: <isonum.txt>
+
+DPDK Release 23.07
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      ninja -C build doc
+      xdg-open build/doc/guides/html/rel_notes/release_23_07.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release.
+   Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense.
+     The description should be enough to allow someone scanning
+     the release notes to understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list
+     like this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     Suggested order in release notes items:
+     * Core libs (EAL, mempool, ring, mbuf, buses)
+     * Device abstraction libs and PMDs (ordered alphabetically by vendor name)
+       - ethdev (lib, PMDs)
+       - cryptodev (lib, PMDs)
+       - eventdev (lib, PMDs)
+       - etc
+     * Other libs
+     * Apps, Examples, Tools (if significant)
+
+     This section is a comment. Do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =======================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item
+     in the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the API change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the ABI change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+* No ABI change that would break compatibility with 22.11.
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue
+     in the present tense. Add information on any known workarounds.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested
+   with this release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
-- 
2.39.2


^ permalink raw reply	[relevance 9%]

* DPDK 23.03 released
@ 2023-03-31 17:17  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-03-31 17:17 UTC (permalink / raw)
  To: announce

A new major release is available:
	https://fast.dpdk.org/rel/dpdk-23.03.tar.xz

Winter release numbers are quite small as usual:
	1048 commits from 161 authors
	1379 files changed, 85721 insertions(+), 25814 deletions(-)

It is not planned to start a maintenance branch for 23.03.
This version is ABI-compatible with 22.11.

Below are some new features:
	- lock annotations
	- ARM power management monitor/wakeup
	- machine learning inference device API and test application
	- platform bus
	- 400G link speed
	- queue mapping of aggregated ports
	- flow quota
	- more flow matching (ICMPv6, IPv6 routing extension)
	- more flow actions (flex modify, congestion management)
	- Intel cpfl IPU driver
	- Marvell CNXK machine learning inference
	- SHAKE hash algorithm for crypto
	- LZ4 algorithm for compression
	- more telemetry endpoints
	- more tracepoints
	- DTS hello world

More details in the release notes:
	https://doc.dpdk.org/guides/rel_notes/release_23_03.html

The test framework DTS is being improved and migrated into the mainline.
Please join the DTS effort for contributing, reviewing or testing.


There are 34 new contributors (including authors, reviewers and testers).
Welcome to Alok Prasad, Alvaro Karsz, Anup Prabhu, Boleslav Stankevich,
Boris Ouretskey, Chenyu Huang, Edwin Brossette, Fengnan Chang,
Francesco Mancino, Haijun Chu, Hiral Shah, Isaac Boukris, J.J. Martzki,
Jesna K E, Joshua Washington, Kamalakshitha Aligeri, Krzysztof Karas,
Leo Xu, Maayan Kashani, Michal Schmidt, Mohammad Iqbal Ahmad,
Nathan Brown, Patrick Robb, Prince Takkar, Rushil Gupta,
Saoirse O'Donovan, Shivah Shankar S, Shiyang He, Song Jiale,
Vikash Poddar, Visa Hankala, Yevgeny Kliteynik, Zerun Fu,
and Zhuobin Huang.

Below is the number of commits per employer (with authors count):
	265     Marvell (33)
	256     Intel (49)
	175     NVIDIA (20)
	 98     Red Hat (6)
	 68     Huawei (3)
	 55     Corigine (9)
	 49     Microsoft (3)
	 13     Arm (5)
	 10     PANTHEON.tech (1)
	  9     Trustnet (1)
	  9     AMD (2)
	  8     Ark Networks (2)
	        ...

A big thank to all courageous people who took on the non rewarding task
of reviewing other's job.
Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:
	 48     Maxime Coquelin <maxime.coquelin@redhat.com>
	 46     Ferruh Yigit <ferruh.yigit@amd.com>
	 44     Morten Brørup <mb@smartsharesystems.com>
	 25     Ori Kam <orika@nvidia.com>
	 24     Tyler Retzlaff <roretzla@linux.microsoft.com>
	 23     Chengwen Feng <fengchengwen@huawei.com>
	 21     David Marchand <david.marchand@redhat.com>
	 21     Akhil Goyal <gakhil@marvell.com>


The next version will be 23.07 in July.
The new features for 23.07 can be submitted during the next 3 weeks:
        http://core.dpdk.org/roadmap#dates
Please share your roadmap.

One last ask; please fill this quick survey before April 7th
to help planning the next DPDK Summit:
https://docs.google.com/forms/d/1104swKV4-_nNT6GimkRBNVac1uAqX7o2P936bcGsgMc

Thanks everyone



^ permalink raw reply	[relevance 3%]

* [PATCH v12 18/22] hash: move rte_hash_set_alg out header
  2023-03-29 23:40  2% [PATCH v12 00/22] Covert static log types in libraries to dynamic Stephen Hemminger
@ 2023-03-29 23:40  2% ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-03-29 23:40 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Ruifeng Wang, Yipeng Wang, Sameh Gobriel,
	Bruce Richardson, Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/hash/meson.build     |  1 +
 lib/hash/rte_crc_arm64.h |  8 ++---
 lib/hash/rte_crc_x86.h   | 10 +++---
 lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h  | 48 ++--------------------------
 lib/hash/version.map     |  7 +++++
 6 files changed, 88 insertions(+), 54 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_crc_arm64.h b/lib/hash/rte_crc_arm64.h
index c9f52510871b..414fe065caa8 100644
--- a/lib/hash/rte_crc_arm64.h
+++ b/lib/hash/rte_crc_arm64.h
@@ -53,7 +53,7 @@ crc32c_arm64_u64(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -67,7 +67,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u64(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_crc_x86.h b/lib/hash/rte_crc_x86.h
index 205bc182be77..3b865e251db2 100644
--- a/lib/hash/rte_crc_x86.h
+++ b/lib/hash/rte_crc_x86.h
@@ -67,7 +67,7 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -110,11 +110,11 @@ static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
 #ifdef RTE_ARCH_X86_64
-	if (likely(crc32_alg == CRC32_SSE42_x64))
+	if (likely(rte_hash_crc32_alg == CRC32_SSE42_x64))
 		return crc32c_sse42_u64(data, init_val);
 #endif
 
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u64_mimic(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..1439d8a71f6a
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
+#define RTE_LOGTYPE_HASH_CRC hash_crc_logtype
+
+uint8_t rte_hash_crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	rte_hash_crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		rte_hash_crc32_alg = CRC32_SSE42;
+	else
+		rte_hash_crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		rte_hash_crc32_alg = CRC32_ARM64;
+#endif
+
+	if (rte_hash_crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e8145ee44204 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -31,7 +29,7 @@ extern "C" {
 #define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
 #define CRC32_ARM64         (1U << 3)
 
-static uint8_t crc32_alg = CRC32_SW;
+extern uint8_t rte_hash_crc32_alg;
 
 #if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
 #include "rte_crc_arm64.h"
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..8b22aad5626b 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
@@ -56,3 +57,9 @@ EXPERIMENTAL {
 	rte_thash_gfni;
 	rte_thash_gfni_bulk;
 };
+
+INTERNAL {
+	global:
+
+	rte_hash_crc32_alg;
+};
-- 
2.39.2


^ permalink raw reply	[relevance 2%]

* [PATCH v12 00/22] Covert static log types in libraries to dynamic
@ 2023-03-29 23:40  2% Stephen Hemminger
  2023-03-29 23:40  2% ` [PATCH v12 18/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-03-29 23:40 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
leave them there, mark as deprecated, or remove them.
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v12 - rebase and add table and pipeline libraries

v11 - fix include check on arm cross build

v10 - add necessary rte_compat.h in thash_gfni stub for arm

v9 - fix handling of crc32 alg in lib/hash.
     make it an internal global variable.
     fix gfni stubs for case where they are not used.

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: convert RTE_LOGTYPE_EFD to dynamic type
  mbuf: convert RTE_LOGTYPE_MBUF to dynamic type
  acl: convert RTE_LOGTYPE_ACL to dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: convert RTE_LOGTYPE_POWER to dynamic type
  ring: convert RTE_LOGTYPE_RING to dynamic type
  mempool: convert RTE_LOGTYPE_MEMPOOL to dynamic type
  lpm: convert RTE_LOGTYPE_LPM to dynamic types
  kni: convert RTE_LOGTYPE_KNI to dynamic type
  sched: convert RTE_LOGTYPE_SCHED to dynamic type
  examples/ipsec-secgw: replace RTE_LOGTYPE_PORT
  port: convert RTE_LOGTYPE_PORT to dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic type
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: convert RTE_LOGTYPE_PIPELINE to dynamic type

 app/test/test_acl.c             |  3 +-
 app/test/test_table_acl.c       | 50 +++++++++++-------------
 app/test/test_table_pipeline.c  | 40 +++++++++----------
 examples/distributor/main.c     |  2 +-
 examples/ipsec-secgw/sa.c       |  6 +--
 examples/l3fwd-power/main.c     | 17 +++++----
 lib/acl/acl_bld.c               |  1 +
 lib/acl/acl_gen.c               |  1 +
 lib/acl/acl_log.h               |  4 ++
 lib/acl/rte_acl.c               |  4 ++
 lib/acl/tb_mem.c                |  3 +-
 lib/eal/common/eal_common_log.c | 17 ---------
 lib/eal/include/rte_log.h       | 34 ++++++++---------
 lib/efd/rte_efd.c               |  4 ++
 lib/fib/fib_log.h               |  4 ++
 lib/fib/rte_fib.c               |  3 ++
 lib/fib/rte_fib6.c              |  2 +
 lib/gso/rte_gso.c               |  4 +-
 lib/gso/rte_gso.h               |  1 +
 lib/hash/meson.build            |  9 ++++-
 lib/hash/rte_crc_arm64.h        |  8 ++--
 lib/hash/rte_crc_x86.h          | 10 ++---
 lib/hash/rte_cuckoo_hash.c      |  5 +++
 lib/hash/rte_fbk_hash.c         |  5 +++
 lib/hash/rte_hash_crc.c         | 68 +++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h         | 48 ++---------------------
 lib/hash/rte_thash.c            |  3 ++
 lib/hash/rte_thash_gfni.c       | 50 ++++++++++++++++++++++++
 lib/hash/rte_thash_gfni.h       | 30 +++++----------
 lib/hash/version.map            | 11 ++++++
 lib/kni/rte_kni.c               |  3 ++
 lib/lpm/lpm_log.h               |  4 ++
 lib/lpm/rte_lpm.c               |  3 ++
 lib/lpm/rte_lpm6.c              |  1 +
 lib/mbuf/mbuf_log.h             |  4 ++
 lib/mbuf/rte_mbuf.c             |  4 ++
 lib/mbuf/rte_mbuf_dyn.c         |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c    |  2 +
 lib/mempool/rte_mempool.c       |  2 +
 lib/mempool/rte_mempool.h       |  8 ++++
 lib/mempool/version.map         |  3 ++
 lib/pipeline/rte_pipeline.c     |  2 +
 lib/pipeline/rte_pipeline.h     |  5 +++
 lib/port/rte_port_ethdev.c      |  3 ++
 lib/port/rte_port_eventdev.c    |  4 ++
 lib/port/rte_port_fd.c          |  3 ++
 lib/port/rte_port_frag.c        |  3 ++
 lib/port/rte_port_kni.c         |  3 ++
 lib/port/rte_port_ras.c         |  3 ++
 lib/port/rte_port_ring.c        |  3 ++
 lib/port/rte_port_sched.c       |  3 ++
 lib/port/rte_port_source_sink.c |  3 ++
 lib/port/rte_port_sym_crypto.c  |  3 ++
 lib/power/guest_channel.c       |  3 +-
 lib/power/power_common.c        |  2 +
 lib/power/power_common.h        |  3 +-
 lib/power/power_kvm_vm.c        |  1 +
 lib/power/rte_power.c           |  1 +
 lib/rib/rib_log.h               |  4 ++
 lib/rib/rte_rib.c               |  3 ++
 lib/rib/rte_rib6.c              |  3 ++
 lib/ring/rte_ring.c             |  3 ++
 lib/sched/rte_pie.c             |  1 +
 lib/sched/rte_sched.c           |  5 +++
 lib/sched/rte_sched_log.h       |  4 ++
 lib/table/meson.build           |  1 +
 lib/table/rte_table.c           |  8 ++++
 lib/table/rte_table.h           |  4 ++
 68 files changed, 391 insertions(+), 176 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h
 create mode 100644 lib/table/rte_table.c

-- 
2.39.2


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v3 03/15] graph: move node process into inline function
  2023-03-29 15:34  3%     ` Stephen Hemminger
@ 2023-03-29 15:41  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2023-03-29 15:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Zhirun Yan, dev, jerinj, kirankumark, ndabilpuram, cunming.liang,
	haiyue.wang

On Wed, Mar 29, 2023 at 9:04 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Wed, 29 Mar 2023 15:43:28 +0900
> Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> > +/**
> > + * @internal
> > + *
> > + * Enqueue a given node to the tail of the graph reel.
> > + *
> > + * @param graph
> > + *   Pointer Graph object.
> > + * @param node
> > + *   Pointer to node object to be enqueued.
> > + */
> > +static __rte_always_inline void
> > +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> > +{
> > +     uint64_t start;
> > +     uint16_t rc;
> > +     void **objs;
> > +
> > +     RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> > +     objs = node->objs;
> > +     rte_prefetch0(objs);
> > +
> > +     if (rte_graph_has_stats_feature()) {
> > +             start = rte_rdtsc();
> > +             rc = node->process(graph, node, objs, node->idx);
> > +             node->total_cycles += rte_rdtsc() - start;
> > +             node->total_calls++;
> > +             node->total_objs += rc;
> > +     } else {
> > +             node->process(graph, node, objs, node->idx);
> > +     }
> > +     node->idx = 0;
> > +}
> > +
>
> Why inline? Doing everything as inlines has long term ABI
> impacts. And this is not a super critical performance path.

This is one of the real fast path routine.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 03/15] graph: move node process into inline function
  @ 2023-03-29 15:34  3%     ` Stephen Hemminger
  2023-03-29 15:41  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-03-29 15:34 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Wed, 29 Mar 2023 15:43:28 +0900
Zhirun Yan <zhirun.yan@intel.com> wrote:

> +/**
> + * @internal
> + *
> + * Enqueue a given node to the tail of the graph reel.
> + *
> + * @param graph
> + *   Pointer Graph object.
> + * @param node
> + *   Pointer to node object to be enqueued.
> + */
> +static __rte_always_inline void
> +__rte_node_process(struct rte_graph *graph, struct rte_node *node)
> +{
> +	uint64_t start;
> +	uint16_t rc;
> +	void **objs;
> +
> +	RTE_ASSERT(node->fence == RTE_GRAPH_FENCE);
> +	objs = node->objs;
> +	rte_prefetch0(objs);
> +
> +	if (rte_graph_has_stats_feature()) {
> +		start = rte_rdtsc();
> +		rc = node->process(graph, node, objs, node->idx);
> +		node->total_cycles += rte_rdtsc() - start;
> +		node->total_calls++;
> +		node->total_objs += rc;
> +	} else {
> +		node->process(graph, node, objs, node->idx);
> +	}
> +	node->idx = 0;
> +}
> +

Why inline? Doing everything as inlines has long term ABI
impacts. And this is not a super critical performance path.

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 0/2] ABI check updates
  2023-03-23 17:15  9% ` [PATCH v2 " David Marchand
  2023-03-23 17:15 21%   ` [PATCH v2 1/2] devtools: unify configuration for ABI check David Marchand
  2023-03-23 17:15 41%   ` [PATCH v2 2/2] devtools: stop depending on libabigail xml format David Marchand
@ 2023-03-28 18:38  4%   ` Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-03-28 18:38 UTC (permalink / raw)
  To: David Marchand; +Cc: dev

23/03/2023 18:15, David Marchand:
> This series moves ABI exceptions in a single configuration file and
> simplifies the ABI check so that no artefact depending on libabigail
> version is stored in the CI.

Applied, thanks.



^ permalink raw reply	[relevance 4%]

* [PATCH v2 2/2] devtools: stop depending on libabigail xml format
  2023-03-23 17:15  9% ` [PATCH v2 " David Marchand
  2023-03-23 17:15 21%   ` [PATCH v2 1/2] devtools: unify configuration for ABI check David Marchand
@ 2023-03-23 17:15 41%   ` David Marchand
  2023-03-28 18:38  4%   ` [PATCH v2 0/2] ABI check updates Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-03-23 17:15 UTC (permalink / raw)
  To: dev; +Cc: Aaron Conole, Michael Santana, Thomas Monjalon, Bruce Richardson

A ABI reference depends on:
- DPDK build options,
- toolchain compiler and versions,
- libabigail version.

The reason for the latter point is that, when the ABI reference was
generated, ABI xml files were dumped in a format depending on the
libabigail version.
Those xml files were then later used to compare against modified
code.

There are a few disadvantages with this method:
- since the xml files are dependent on the libabigail version, when
  updating CI environments, a change in the libabigail package requires
  regenerating the ABI references,
- comparing xml files with abidiff is not well tested, as we (DPDK)
  uncovered bugs in libabigail that were not hit with comparing .so,

Switch to comparing .so directly, remove this dependence and update GHA
script.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 .ci/linux-build.sh            |  4 ----
 .github/workflows/build.yml   |  2 +-
 MAINTAINERS                   |  1 -
 devtools/check-abi.sh         | 17 +++++++++--------
 devtools/gen-abi.sh           | 27 ---------------------------
 devtools/test-meson-builds.sh |  5 -----
 6 files changed, 10 insertions(+), 46 deletions(-)
 delete mode 100755 devtools/gen-abi.sh

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 150b38bd7a..9631e342b5 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -130,8 +130,6 @@ fi
 if [ "$ABI_CHECKS" = "true" ]; then
     if [ "$(cat libabigail/VERSION 2>/dev/null)" != "$LIBABIGAIL_VERSION" ]; then
         rm -rf libabigail
-        # if we change libabigail, invalidate existing abi cache
-        rm -rf reference
     fi
 
     if [ ! -d libabigail ]; then
@@ -153,7 +151,6 @@ if [ "$ABI_CHECKS" = "true" ]; then
         meson setup $OPTS -Dexamples= $refsrcdir $refsrcdir/build
         ninja -C $refsrcdir/build
         DESTDIR=$(pwd)/reference ninja -C $refsrcdir/build install
-        devtools/gen-abi.sh reference
         find reference/usr/local -name '*.a' -delete
         rm -rf reference/usr/local/bin
         rm -rf reference/usr/local/share
@@ -161,7 +158,6 @@ if [ "$ABI_CHECKS" = "true" ]; then
     fi
 
     DESTDIR=$(pwd)/install ninja -C build install
-    devtools/gen-abi.sh install
     devtools/check-abi.sh reference install ${ABI_CHECKS_WARN_ONLY:-}
 fi
 
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index bbcb535afb..e24e47a216 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -70,7 +70,7 @@ jobs:
       run: |
         echo 'ccache=ccache-${{ matrix.config.os }}-${{ matrix.config.compiler }}-${{ matrix.config.cross }}-'$(date -u +%Y-w%W) >> $GITHUB_OUTPUT
         echo 'libabigail=libabigail-${{ matrix.config.os }}' >> $GITHUB_OUTPUT
-        echo 'abi=abi-${{ matrix.config.os }}-${{ matrix.config.compiler }}-${{ matrix.config.cross }}-${{ env.LIBABIGAIL_VERSION }}-${{ env.REF_GIT_TAG }}' >> $GITHUB_OUTPUT
+        echo 'abi=abi-${{ matrix.config.os }}-${{ matrix.config.compiler }}-${{ matrix.config.cross }}-${{ env.REF_GIT_TAG }}' >> $GITHUB_OUTPUT
     - name: Retrieve ccache cache
       uses: actions/cache@v3
       with:
diff --git a/MAINTAINERS b/MAINTAINERS
index 1a33ad8592..280058adfc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -94,7 +94,6 @@ F: devtools/check-spdx-tag.sh
 F: devtools/check-symbol-change.sh
 F: devtools/check-symbol-maps.sh
 F: devtools/checkpatches.sh
-F: devtools/gen-abi.sh
 F: devtools/get-maintainer.sh
 F: devtools/git-log-fixes.sh
 F: devtools/load-devel-config
diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
index f74432be5d..39e3798931 100755
--- a/devtools/check-abi.sh
+++ b/devtools/check-abi.sh
@@ -37,20 +37,21 @@ fi
 
 export newdir ABIDIFF_OPTIONS ABIDIFF_SUPPRESSIONS
 export diff_func='run_diff() {
-	dump=$1
-	name=$(basename $dump)
-	if grep -q "; SKIP_LIBRARY=${name%.dump}\>" $ABIDIFF_SUPPRESSIONS; then
+	lib=$1
+	name=$(basename $lib)
+	if grep -q "; SKIP_LIBRARY=${name%.so.*}\>" $ABIDIFF_SUPPRESSIONS; then
 		echo "Skipped $name" >&2
 		return 0
 	fi
-	dump2=$(find $newdir -name $name)
-	if [ -z "$dump2" ] || [ ! -e "$dump2" ]; then
+	# Look for a library with the same major ABI version
+	lib2=$(find $newdir -name "${name%.*}.*" -a ! -type l)
+	if [ -z "$lib2" ] || [ ! -e "$lib2" ]; then
 		echo "Error: cannot find $name in $newdir" >&2
 		return 1
 	fi
-	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
+	abidiff $ABIDIFF_OPTIONS $lib $lib2 || {
 		abiret=$?
-		echo "Error: ABI issue reported for abidiff $ABIDIFF_OPTIONS $dump $dump2" >&2
+		echo "Error: ABI issue reported for abidiff $ABIDIFF_OPTIONS $lib $lib2" >&2
 		if [ $(($abiret & 3)) -ne 0 ]; then
 			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, this could be a script or environment issue." >&2
 		fi
@@ -65,7 +66,7 @@ export diff_func='run_diff() {
 }'
 
 error=
-find $refdir -name "*.dump" |
+find $refdir -name "*.so.*" -a ! -type l |
 xargs -n1 -P0 sh -c 'eval "$diff_func"; run_diff $0' ||
 error=1
 
diff --git a/devtools/gen-abi.sh b/devtools/gen-abi.sh
deleted file mode 100755
index 61f7510ea1..0000000000
--- a/devtools/gen-abi.sh
+++ /dev/null
@@ -1,27 +0,0 @@
-#!/bin/sh -e
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2019 Red Hat, Inc.
-
-if [ $# != 1 ]; then
-	echo "Usage: $0 installdir" >&2
-	exit 1
-fi
-
-installdir=$1
-if [ ! -d $installdir ]; then
-	echo "Error: install directory '$installdir' does not exist." >&2
-	exit 1
-fi
-
-dumpdir=$installdir/dump
-rm -rf $dumpdir
-mkdir -p $dumpdir
-for f in $(find $installdir -name "*.so.*"); do
-	if test -L $f; then
-		continue
-	fi
-
-	libname=$(basename $f)
-	echo $dumpdir/${libname%.so*}.dump $f
-done |
-xargs -n2 -P0 abidw --out-file
diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index 48f4e52df3..9131088c9d 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -204,7 +204,6 @@ build () # <directory> <target cc | cross file> <ABI check> [meson options]
 				-Dexamples= $*
 			compile $abirefdir/build
 			install_target $abirefdir/build $abirefdir/$targetdir
-			$srcdir/devtools/gen-abi.sh $abirefdir/$targetdir
 
 			# save disk space by removing static libs and apps
 			find $abirefdir/$targetdir/usr/local -name '*.a' -delete
@@ -215,10 +214,6 @@ build () # <directory> <target cc | cross file> <ABI check> [meson options]
 		install_target $builds_dir/$targetdir \
 			$(readlink -f $builds_dir/$targetdir/install)
 		echo "Checking ABI compatibility of $targetdir" >&$verbose
-		echo $srcdir/devtools/gen-abi.sh \
-			$(readlink -f $builds_dir/$targetdir/install) >&$veryverbose
-		$srcdir/devtools/gen-abi.sh \
-			$(readlink -f $builds_dir/$targetdir/install) >&$veryverbose
 		echo $srcdir/devtools/check-abi.sh $abirefdir/$targetdir \
 			$(readlink -f $builds_dir/$targetdir/install) >&$veryverbose
 		$srcdir/devtools/check-abi.sh $abirefdir/$targetdir \
-- 
2.39.2


^ permalink raw reply	[relevance 41%]

* [PATCH v2 1/2] devtools: unify configuration for ABI check
  2023-03-23 17:15  9% ` [PATCH v2 " David Marchand
@ 2023-03-23 17:15 21%   ` David Marchand
  2023-03-23 17:15 41%   ` [PATCH v2 2/2] devtools: stop depending on libabigail xml format David Marchand
  2023-03-28 18:38  4%   ` [PATCH v2 0/2] ABI check updates Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-03-23 17:15 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

We have been skipping removed libraries in the ABI check by updating the
check-abi.sh script itself.
See, for example, commit 33584c19ddc2 ("raw/dpaa2_qdma: remove driver").

Having two places for exception is a bit confusing, and those exceptions
are best placed in a single configuration file out of the check script.

Besides, a next patch will switch the check from comparing ABI xml files
to directly comparing .so files. In this mode, libabigail does not
support the soname_regexp syntax used for the mlx glue libraries.

Let's handle these special cases in libabigail.abignore using comments.

Taking the raw/dpaa2_qdma driver as an example, it would be possible to
skip it by adding:

 ; SKIP_LIBRARY=librte_net_mlx4_glue
+; SKIP_LIBRARY=librte_raw_dpaa2_qdma

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 devtools/check-abi.sh        |  9 +++++++--
 devtools/libabigail.abignore | 12 +++++++++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
index d253a12768..f74432be5d 100755
--- a/devtools/check-abi.sh
+++ b/devtools/check-abi.sh
@@ -10,7 +10,8 @@ fi
 refdir=$1
 newdir=$2
 warnonly=${3:-}
-ABIDIFF_OPTIONS="--suppr $(dirname $0)/libabigail.abignore --no-added-syms"
+ABIDIFF_SUPPRESSIONS=$(dirname $(readlink -f $0))/libabigail.abignore
+ABIDIFF_OPTIONS="--suppr $ABIDIFF_SUPPRESSIONS --no-added-syms"
 
 if [ ! -d $refdir ]; then
 	echo "Error: reference directory '$refdir' does not exist." >&2
@@ -34,10 +35,14 @@ else
 	ABIDIFF_OPTIONS="$ABIDIFF_OPTIONS --headers-dir2 $incdir2"
 fi
 
-export newdir ABIDIFF_OPTIONS
+export newdir ABIDIFF_OPTIONS ABIDIFF_SUPPRESSIONS
 export diff_func='run_diff() {
 	dump=$1
 	name=$(basename $dump)
+	if grep -q "; SKIP_LIBRARY=${name%.dump}\>" $ABIDIFF_SUPPRESSIONS; then
+		echo "Skipped $name" >&2
+		return 0
+	fi
 	dump2=$(find $newdir -name $name)
 	if [ -z "$dump2" ] || [ ! -e "$dump2" ]; then
 		echo "Error: cannot find $name in $newdir" >&2
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 7a93de3ba1..3ff51509de 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -16,9 +16,15 @@
 [suppress_variable]
         name_regexp = _pmd_info$
 
-; Ignore changes on soname for mlx glue internal drivers
-[suppress_file]
-        soname_regexp = ^librte_.*mlx.*glue\.
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; Special rules to skip libraries ;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;
+; This is not a libabigail rule (see check-abi.sh).
+; This is used for driver removal and other special cases like mlx glue libs.
+;
+; SKIP_LIBRARY=librte_common_mlx5_glue
+; SKIP_LIBRARY=librte_net_mlx4_glue
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Experimental APIs exceptions ;
-- 
2.39.2


^ permalink raw reply	[relevance 21%]

* [PATCH v2 0/2] ABI check updates
  @ 2023-03-23 17:15  9% ` David Marchand
  2023-03-23 17:15 21%   ` [PATCH v2 1/2] devtools: unify configuration for ABI check David Marchand
                     ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: David Marchand @ 2023-03-23 17:15 UTC (permalink / raw)
  To: dev

This series moves ABI exceptions in a single configuration file and
simplifies the ABI check so that no artefact depending on libabigail
version is stored in the CI.

-- 
David Marchand

Changes since v1:
- rebased after abi check parallelisation rework,


David Marchand (2):
  devtools: unify configuration for ABI check
  devtools: stop depending on libabigail xml format

 .ci/linux-build.sh            |  4 ----
 .github/workflows/build.yml   |  2 +-
 MAINTAINERS                   |  1 -
 devtools/check-abi.sh         | 24 +++++++++++++++---------
 devtools/gen-abi.sh           | 27 ---------------------------
 devtools/libabigail.abignore  | 12 +++++++++---
 devtools/test-meson-builds.sh |  5 -----
 7 files changed, 25 insertions(+), 50 deletions(-)
 delete mode 100755 devtools/gen-abi.sh

-- 
2.39.2


^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [RFC] ethdev: improve link speed to string
  2023-02-10 14:41  3%               ` Ferruh Yigit
@ 2023-03-23 14:40  3%                 ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-03-23 14:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Min Hu (Connor), Andrew Rybchenko, thomas, dev

On 2/10/2023 2:41 PM, Ferruh Yigit wrote:
> On 1/19/2023 4:45 PM, Stephen Hemminger wrote:
>> On Thu, 19 Jan 2023 11:41:12 +0000
>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>>>>>>> Nothing good will happen if you try to use the function to
>>>>>>> print two different link speeds in one log message.  
>>>>>> You are right.
>>>>>> And use malloc for "name" will result in memory leakage, which is also
>>>>>> not a good option.
>>>>>>
>>>>>> BTW, do you think if we need to modify the function
>>>>>> "rte_eth_link_speed_to_str"?  
>>>>>
>>>>> IMHO it would be more pain than gain in this case.
>>>>>
>>>>> .
>>>>>  
>>>> Agree with you. Thanks Andrew
>>>>  
>>>
>>> It can be option to update the API as following in next ABI break release:
>>>
>>> const char *
>>> rte_eth_link_speed_to_str(uint32_t link_speed, char *buf, size_t buf_size);
>>>
>>> For this a deprecation notice needs to be sent and approved, not sure
>>> though if it worth.
>>>
>>>
>>> Meanwhile, what do you think to update string 'Invalid' to something
>>> like 'Irregular' or 'Erratic', does this help to convey the right message?
>>
>>
>> API versioning is possible here.
> 
> 
> Agree, ABI versioning can be used here.
> 
> @Connor, what do you think?

Updating patch status as rejected, if you still pursue the feature
please send a separate patch that updates the API via ABI versioning.

Thanks,
ferruh

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-23 11:58  3%             ` fengchengwen
@ 2023-03-23 12:51  3%               ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-03-23 12:51 UTC (permalink / raw)
  To: Olivier Matz, Ferruh Yigit, fengchengwen; +Cc: dev, David Marchand

23/03/2023 12:58, fengchengwen:
> On 2023/3/22 21:49, Thomas Monjalon wrote:
> > 22/03/2023 09:53, Ferruh Yigit:
> >> On 3/22/2023 1:15 AM, fengchengwen wrote:
> >>> On 2023/3/21 21:50, Ferruh Yigit wrote:
> >>>> On 3/17/2023 2:43 AM, fengchengwen wrote:
> >>>>> On 2023/3/17 2:18, Ferruh Yigit wrote:
> >>>>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
> >>>>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
> >>>>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
> >>>>>>> parameter 'value' is NULL when parsed 'only keys'.
> >>>>>>>
> >>>>>>> It may leads to segment fault when parse args with 'only key', this 
> >>>>>>> patchset fixes rest of them.
> >>>>>>>
> >>>>>>> Chengwen Feng (5):
> >>>>>>>   app/pdump: fix segment fault when parse args
> >>>>>>>   net/memif: fix segment fault when parse devargs
> >>>>>>>   net/pcap: fix segment fault when parse devargs
> >>>>>>>   net/ring: fix segment fault when parse devargs
> >>>>>>>   net/sfc: fix segment fault when parse devargs
> >>>>>>
> >>>>>> Hi Chengwen,
> >>>>>>
> >>>>>> Did you scan all `rte_kvargs_process()` instances?
> >>>>>
> >>>>> No, I was just looking at the modules I was concerned about.
> >>>>> I looked at it briefly, and some modules had the same problem.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> And if there would be a way to tell kvargs that a value is expected (or
> >>>>>> not) this checks could be done in kvargs layer, I think this also can be
> >>>>>> to look at.
> >>>>>
> >>>>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
> >>>>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
> >>>>> But it also break the API's behavior.
> >>>>>
> >>>>
> >>>> What about having a new API, like `rte_kvargs_process_extended()`,
> >>>>
> >>>> That gets an additional flag as parameter, which may have values like
> >>>> following to indicate if key expects a value or not:
> >>>> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
> >>>> ARG_WITH_VALUE      --> "key=value"
> >>>> ARG_NO_VALUE        --> 'key'
> >>>>
> >>>> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
> >>>> `rte_kvargs_process()`.
> >>>>
> >>>> This way instead of adding checks, relevant usage can be replaced by
> >>>> `rte_kvargs_process_extended()`, this requires similar amount of change
> >>>> but code will be more clean I think.
> >>>>
> >>>> Do you think does this work?
> >>>
> >>> Yes, it can work.
> >>>
> >>> But I think the introduction of new API adds some complexity.
> >>> And a good API definition could more simpler.
> >>>
> >>
> >> Other option is changing existing API, but that may be widely used and
> >> changing it impacts applications, I don't think it worth.
> > 
> > I've planned a change in kvargs API 5 years ago and never did it:
> >>From doc/guides/rel_notes/deprecation.rst:
> > "
> > * kvargs: The function ``rte_kvargs_process`` will get a new parameter
> >   for returning key match count. It will ease handling of no-match case.
> > "
> 
> I think it's okay to add extra parameter for rte_kvargs_process. But it will
> break ABI.
> Also I notice patchset was deferred in patchwork.
> 
> Does it mean that the new version can't accept until the 23.11 release cycle ?

It is a bit too late to take a decision in 23.03 cycle.
Let's continue this discussion.
We can either have some fixes in 23.07 or have an ABI breaking change in 23.11.



^ permalink raw reply	[relevance 3%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-22 13:49  0%           ` Thomas Monjalon
@ 2023-03-23 11:58  3%             ` fengchengwen
  2023-03-23 12:51  3%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-23 11:58 UTC (permalink / raw)
  To: Thomas Monjalon, Olivier Matz, Ferruh Yigit; +Cc: dev, David Marchand

On 2023/3/22 21:49, Thomas Monjalon wrote:
> 22/03/2023 09:53, Ferruh Yigit:
>> On 3/22/2023 1:15 AM, fengchengwen wrote:
>>> On 2023/3/21 21:50, Ferruh Yigit wrote:
>>>> On 3/17/2023 2:43 AM, fengchengwen wrote:
>>>>> On 2023/3/17 2:18, Ferruh Yigit wrote:
>>>>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>>>>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>>>>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>>>>>>> parameter 'value' is NULL when parsed 'only keys'.
>>>>>>>
>>>>>>> It may leads to segment fault when parse args with 'only key', this 
>>>>>>> patchset fixes rest of them.
>>>>>>>
>>>>>>> Chengwen Feng (5):
>>>>>>>   app/pdump: fix segment fault when parse args
>>>>>>>   net/memif: fix segment fault when parse devargs
>>>>>>>   net/pcap: fix segment fault when parse devargs
>>>>>>>   net/ring: fix segment fault when parse devargs
>>>>>>>   net/sfc: fix segment fault when parse devargs
>>>>>>
>>>>>> Hi Chengwen,
>>>>>>
>>>>>> Did you scan all `rte_kvargs_process()` instances?
>>>>>
>>>>> No, I was just looking at the modules I was concerned about.
>>>>> I looked at it briefly, and some modules had the same problem.
>>>>>
>>>>>>
>>>>>>
>>>>>> And if there would be a way to tell kvargs that a value is expected (or
>>>>>> not) this checks could be done in kvargs layer, I think this also can be
>>>>>> to look at.
>>>>>
>>>>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
>>>>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
>>>>> But it also break the API's behavior.
>>>>>
>>>>
>>>> What about having a new API, like `rte_kvargs_process_extended()`,
>>>>
>>>> That gets an additional flag as parameter, which may have values like
>>>> following to indicate if key expects a value or not:
>>>> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
>>>> ARG_WITH_VALUE      --> "key=value"
>>>> ARG_NO_VALUE        --> 'key'
>>>>
>>>> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
>>>> `rte_kvargs_process()`.
>>>>
>>>> This way instead of adding checks, relevant usage can be replaced by
>>>> `rte_kvargs_process_extended()`, this requires similar amount of change
>>>> but code will be more clean I think.
>>>>
>>>> Do you think does this work?
>>>
>>> Yes, it can work.
>>>
>>> But I think the introduction of new API adds some complexity.
>>> And a good API definition could more simpler.
>>>
>>
>> Other option is changing existing API, but that may be widely used and
>> changing it impacts applications, I don't think it worth.
> 
> I've planned a change in kvargs API 5 years ago and never did it:
>>From doc/guides/rel_notes/deprecation.rst:
> "
> * kvargs: The function ``rte_kvargs_process`` will get a new parameter
>   for returning key match count. It will ease handling of no-match case.
> "

I think it's okay to add extra parameter for rte_kvargs_process. But it will
break ABI.
Also I notice patchset was deferred in patchwork.

Does it mean that the new version can't accept until the 23.11 release cycle ?

> 
>> Of course we can live with as it is and add checks to the callback
>> functions, although I still believe a new 'process()' API is better idea.
> 
> 
> 
> .
> 

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-22  8:53  0%         ` Ferruh Yigit
@ 2023-03-22 13:49  0%           ` Thomas Monjalon
  2023-03-23 11:58  3%             ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-22 13:49 UTC (permalink / raw)
  To: fengchengwen, Olivier Matz, Ferruh Yigit; +Cc: dev, David Marchand

22/03/2023 09:53, Ferruh Yigit:
> On 3/22/2023 1:15 AM, fengchengwen wrote:
> > On 2023/3/21 21:50, Ferruh Yigit wrote:
> >> On 3/17/2023 2:43 AM, fengchengwen wrote:
> >>> On 2023/3/17 2:18, Ferruh Yigit wrote:
> >>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
> >>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
> >>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
> >>>>> parameter 'value' is NULL when parsed 'only keys'.
> >>>>>
> >>>>> It may leads to segment fault when parse args with 'only key', this 
> >>>>> patchset fixes rest of them.
> >>>>>
> >>>>> Chengwen Feng (5):
> >>>>>   app/pdump: fix segment fault when parse args
> >>>>>   net/memif: fix segment fault when parse devargs
> >>>>>   net/pcap: fix segment fault when parse devargs
> >>>>>   net/ring: fix segment fault when parse devargs
> >>>>>   net/sfc: fix segment fault when parse devargs
> >>>>
> >>>> Hi Chengwen,
> >>>>
> >>>> Did you scan all `rte_kvargs_process()` instances?
> >>>
> >>> No, I was just looking at the modules I was concerned about.
> >>> I looked at it briefly, and some modules had the same problem.
> >>>
> >>>>
> >>>>
> >>>> And if there would be a way to tell kvargs that a value is expected (or
> >>>> not) this checks could be done in kvargs layer, I think this also can be
> >>>> to look at.
> >>>
> >>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
> >>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
> >>> But it also break the API's behavior.
> >>>
> >>
> >> What about having a new API, like `rte_kvargs_process_extended()`,
> >>
> >> That gets an additional flag as parameter, which may have values like
> >> following to indicate if key expects a value or not:
> >> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
> >> ARG_WITH_VALUE      --> "key=value"
> >> ARG_NO_VALUE        --> 'key'
> >>
> >> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
> >> `rte_kvargs_process()`.
> >>
> >> This way instead of adding checks, relevant usage can be replaced by
> >> `rte_kvargs_process_extended()`, this requires similar amount of change
> >> but code will be more clean I think.
> >>
> >> Do you think does this work?
> > 
> > Yes, it can work.
> > 
> > But I think the introduction of new API adds some complexity.
> > And a good API definition could more simpler.
> > 
> 
> Other option is changing existing API, but that may be widely used and
> changing it impacts applications, I don't think it worth.

I've planned a change in kvargs API 5 years ago and never did it:
From doc/guides/rel_notes/deprecation.rst:
"
* kvargs: The function ``rte_kvargs_process`` will get a new parameter
  for returning key match count. It will ease handling of no-match case.
"

> Of course we can live with as it is and add checks to the callback
> functions, although I still believe a new 'process()' API is better idea.




^ permalink raw reply	[relevance 0%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-22  1:15  0%       ` fengchengwen
@ 2023-03-22  8:53  0%         ` Ferruh Yigit
  2023-03-22 13:49  0%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-03-22  8:53 UTC (permalink / raw)
  To: fengchengwen, thomas, Olivier Matz; +Cc: dev, David Marchand

On 3/22/2023 1:15 AM, fengchengwen wrote:
> On 2023/3/21 21:50, Ferruh Yigit wrote:
>> On 3/17/2023 2:43 AM, fengchengwen wrote:
>>> On 2023/3/17 2:18, Ferruh Yigit wrote:
>>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>>>>> parameter 'value' is NULL when parsed 'only keys'.
>>>>>
>>>>> It may leads to segment fault when parse args with 'only key', this 
>>>>> patchset fixes rest of them.
>>>>>
>>>>> Chengwen Feng (5):
>>>>>   app/pdump: fix segment fault when parse args
>>>>>   net/memif: fix segment fault when parse devargs
>>>>>   net/pcap: fix segment fault when parse devargs
>>>>>   net/ring: fix segment fault when parse devargs
>>>>>   net/sfc: fix segment fault when parse devargs
>>>>
>>>> Hi Chengwen,
>>>>
>>>> Did you scan all `rte_kvargs_process()` instances?
>>>
>>> No, I was just looking at the modules I was concerned about.
>>> I looked at it briefly, and some modules had the same problem.
>>>
>>>>
>>>>
>>>> And if there would be a way to tell kvargs that a value is expected (or
>>>> not) this checks could be done in kvargs layer, I think this also can be
>>>> to look at.
>>>
>>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
>>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
>>> But it also break the API's behavior.
>>>
>>
>> What about having a new API, like `rte_kvargs_process_extended()`,
>>
>> That gets an additional flag as parameter, which may have values like
>> following to indicate if key expects a value or not:
>> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
>> ARG_WITH_VALUE      --> "key=value"
>> ARG_NO_VALUE        --> 'key'
>>
>> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
>> `rte_kvargs_process()`.
>>
>> This way instead of adding checks, relevant usage can be replaced by
>> `rte_kvargs_process_extended()`, this requires similar amount of change
>> but code will be more clean I think.
>>
>> Do you think does this work?
> 
> Yes, it can work.
> 
> But I think the introduction of new API adds some complexity.
> And a good API definition could more simpler.
> 

Other option is changing existing API, but that may be widely used and
changing it impacts applications, I don't think it worth.

Of course we can live with as it is and add checks to the callback
functions, although I still believe a new 'process()' API is better idea.

>>
>>
>>>
>>> Or continue fix the exist code (about 10+ place more),
>>> for new invoking, because the 'arg_handler_t' already well documented (52ab17efdecf935792ee1d0cb749c0dbd536c083),
>>> they'll take the initiative to prevent this.
>>>
>>>
>>> Hope for more advise for the next.
>>>
>>>> .
>>>>
>>
>> .
>>


^ permalink raw reply	[relevance 0%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-21 13:50  0%     ` Ferruh Yigit
@ 2023-03-22  1:15  0%       ` fengchengwen
  2023-03-22  8:53  0%         ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-22  1:15 UTC (permalink / raw)
  To: Ferruh Yigit, thomas, Olivier Matz; +Cc: dev, David Marchand

On 2023/3/21 21:50, Ferruh Yigit wrote:
> On 3/17/2023 2:43 AM, fengchengwen wrote:
>> On 2023/3/17 2:18, Ferruh Yigit wrote:
>>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>>>> parameter 'value' is NULL when parsed 'only keys'.
>>>>
>>>> It may leads to segment fault when parse args with 'only key', this 
>>>> patchset fixes rest of them.
>>>>
>>>> Chengwen Feng (5):
>>>>   app/pdump: fix segment fault when parse args
>>>>   net/memif: fix segment fault when parse devargs
>>>>   net/pcap: fix segment fault when parse devargs
>>>>   net/ring: fix segment fault when parse devargs
>>>>   net/sfc: fix segment fault when parse devargs
>>>
>>> Hi Chengwen,
>>>
>>> Did you scan all `rte_kvargs_process()` instances?
>>
>> No, I was just looking at the modules I was concerned about.
>> I looked at it briefly, and some modules had the same problem.
>>
>>>
>>>
>>> And if there would be a way to tell kvargs that a value is expected (or
>>> not) this checks could be done in kvargs layer, I think this also can be
>>> to look at.
>>
>> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
>> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
>> But it also break the API's behavior.
>>
> 
> What about having a new API, like `rte_kvargs_process_extended()`,
> 
> That gets an additional flag as parameter, which may have values like
> following to indicate if key expects a value or not:
> ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
> ARG_WITH_VALUE      --> "key=value"
> ARG_NO_VALUE        --> 'key'
> 
> Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
> `rte_kvargs_process()`.
> 
> This way instead of adding checks, relevant usage can be replaced by
> `rte_kvargs_process_extended()`, this requires similar amount of change
> but code will be more clean I think.
> 
> Do you think does this work?

Yes, it can work.

But I think the introduction of new API adds some complexity.
And a good API definition could more simpler.

> 
> 
>>
>> Or continue fix the exist code (about 10+ place more),
>> for new invoking, because the 'arg_handler_t' already well documented (52ab17efdecf935792ee1d0cb749c0dbd536c083),
>> they'll take the initiative to prevent this.
>>
>>
>> Hope for more advise for the next.
>>
>>> .
>>>
> 
> .
> 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 0/5] fix segment fault when parse args
  2023-03-17  2:43  3%   ` fengchengwen
@ 2023-03-21 13:50  0%     ` Ferruh Yigit
  2023-03-22  1:15  0%       ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-03-21 13:50 UTC (permalink / raw)
  To: fengchengwen, thomas, Olivier Matz; +Cc: dev, David Marchand

On 3/17/2023 2:43 AM, fengchengwen wrote:
> On 2023/3/17 2:18, Ferruh Yigit wrote:
>> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>>> parameter 'value' is NULL when parsed 'only keys'.
>>>
>>> It may leads to segment fault when parse args with 'only key', this 
>>> patchset fixes rest of them.
>>>
>>> Chengwen Feng (5):
>>>   app/pdump: fix segment fault when parse args
>>>   net/memif: fix segment fault when parse devargs
>>>   net/pcap: fix segment fault when parse devargs
>>>   net/ring: fix segment fault when parse devargs
>>>   net/sfc: fix segment fault when parse devargs
>>
>> Hi Chengwen,
>>
>> Did you scan all `rte_kvargs_process()` instances?
> 
> No, I was just looking at the modules I was concerned about.
> I looked at it briefly, and some modules had the same problem.
> 
>>
>>
>> And if there would be a way to tell kvargs that a value is expected (or
>> not) this checks could be done in kvargs layer, I think this also can be
>> to look at.
> 
> Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
> I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
> But it also break the API's behavior.
> 

What about having a new API, like `rte_kvargs_process_extended()`,

That gets an additional flag as parameter, which may have values like
following to indicate if key expects a value or not:
ARG_MAY_HAVE_VALUE  --> "key=value" OR 'key'
ARG_WITH_VALUE      --> "key=value"
ARG_NO_VALUE        --> 'key'

Default flag can be 'ARG_MAY_HAVE_VALUE' and it becomes same as
`rte_kvargs_process()`.

This way instead of adding checks, relevant usage can be replaced by
`rte_kvargs_process_extended()`, this requires similar amount of change
but code will be more clean I think.

Do you think does this work?


> 
> Or continue fix the exist code (about 10+ place more),
> for new invoking, because the 'arg_handler_t' already well documented (52ab17efdecf935792ee1d0cb749c0dbd536c083),
> they'll take the initiative to prevent this.
> 
> 
> Hope for more advise for the next.
> 
>> .
>>


^ permalink raw reply	[relevance 0%]

* [PATCH v2 2/2] ci: test compilation with debug in GHA
  @ 2023-03-20 12:18 19%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-03-20 12:18 UTC (permalink / raw)
  To: dev; +Cc: Aaron Conole, Michael Santana

We often miss compilation issues with -O0 -g.
Switch to debug in GHA for the gcc job.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v1:
- rather than introduce a new job, updated the ABI check job
  to build with debug,

---
 .ci/linux-build.sh          | 8 +++++++-
 .github/workflows/build.yml | 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index ab0994388a..150b38bd7a 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -65,6 +65,12 @@ if [ "$RISCV64" = "true" ]; then
     cross_file=config/riscv/riscv64_linux_gcc
 fi
 
+buildtype=debugoptimized
+
+if [ "$BUILD_DEBUG" = "true" ]; then
+    buildtype=debug
+fi
+
 if [ "$BUILD_DOCS" = "true" ]; then
     OPTS="$OPTS -Denable_docs=true"
 fi
@@ -85,7 +91,7 @@ fi
 
 OPTS="$OPTS -Dplatform=generic"
 OPTS="$OPTS -Ddefault_library=$DEF_LIB"
-OPTS="$OPTS -Dbuildtype=debugoptimized"
+OPTS="$OPTS -Dbuildtype=$buildtype"
 OPTS="$OPTS -Dcheck_includes=true"
 if [ "$MINI" = "true" ]; then
     OPTS="$OPTS -Denable_drivers=net/null"
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 154be70cc1..bbcb535afb 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -18,6 +18,7 @@ jobs:
       ABI_CHECKS: ${{ contains(matrix.config.checks, 'abi') }}
       ASAN: ${{ contains(matrix.config.checks, 'asan') }}
       BUILD_32BIT: ${{ matrix.config.cross == 'i386' }}
+      BUILD_DEBUG: ${{ contains(matrix.config.checks, 'debug') }}
       BUILD_DOCS: ${{ contains(matrix.config.checks, 'doc') }}
       CC: ccache ${{ matrix.config.compiler }}
       DEF_LIB: ${{ matrix.config.library }}
@@ -39,7 +40,7 @@ jobs:
             mini: mini
           - os: ubuntu-20.04
             compiler: gcc
-            checks: abi+doc+tests
+            checks: abi+debug+doc+tests
           - os: ubuntu-20.04
             compiler: clang
             checks: asan+doc+tests
-- 
2.39.2


^ permalink raw reply	[relevance 19%]

* [PATCH 2/2] ci: test compilation with debug
  @ 2023-03-20 10:26  5% ` David Marchand
    1 sibling, 0 replies; 200+ results
From: David Marchand @ 2023-03-20 10:26 UTC (permalink / raw)
  To: dev; +Cc: Aaron Conole, Michael Santana

We often miss compilation issues with -O0 -g.
Add a test in GHA.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 .ci/linux-build.sh          | 8 +++++++-
 .github/workflows/build.yml | 4 ++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index ab0994388a..150b38bd7a 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -65,6 +65,12 @@ if [ "$RISCV64" = "true" ]; then
     cross_file=config/riscv/riscv64_linux_gcc
 fi
 
+buildtype=debugoptimized
+
+if [ "$BUILD_DEBUG" = "true" ]; then
+    buildtype=debug
+fi
+
 if [ "$BUILD_DOCS" = "true" ]; then
     OPTS="$OPTS -Denable_docs=true"
 fi
@@ -85,7 +91,7 @@ fi
 
 OPTS="$OPTS -Dplatform=generic"
 OPTS="$OPTS -Ddefault_library=$DEF_LIB"
-OPTS="$OPTS -Dbuildtype=debugoptimized"
+OPTS="$OPTS -Dbuildtype=$buildtype"
 OPTS="$OPTS -Dcheck_includes=true"
 if [ "$MINI" = "true" ]; then
     OPTS="$OPTS -Denable_drivers=net/null"
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 154be70cc1..d90ecfc6f0 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -18,6 +18,7 @@ jobs:
       ABI_CHECKS: ${{ contains(matrix.config.checks, 'abi') }}
       ASAN: ${{ contains(matrix.config.checks, 'asan') }}
       BUILD_32BIT: ${{ matrix.config.cross == 'i386' }}
+      BUILD_DEBUG: ${{ contains(matrix.config.checks, 'debug') }}
       BUILD_DOCS: ${{ contains(matrix.config.checks, 'doc') }}
       CC: ccache ${{ matrix.config.compiler }}
       DEF_LIB: ${{ matrix.config.library }}
@@ -37,6 +38,9 @@ jobs:
           - os: ubuntu-20.04
             compiler: gcc
             mini: mini
+          - os: ubuntu-20.04
+            compiler: gcc
+            checks: debug
           - os: ubuntu-20.04
             compiler: gcc
             checks: abi+doc+tests
-- 
2.39.2


^ permalink raw reply	[relevance 5%]

* Re: [PATCH 0/5] fix segment fault when parse args
  @ 2023-03-17  2:43  3%   ` fengchengwen
  2023-03-21 13:50  0%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-17  2:43 UTC (permalink / raw)
  To: Ferruh Yigit, thomas; +Cc: dev, David Marchand

On 2023/3/17 2:18, Ferruh Yigit wrote:
> On 3/14/2023 12:48 PM, Chengwen Feng wrote:
>> The rte_kvargs_process() was used to parse KV pairs, it also supports
>> to parse 'only keys' (e.g. socket_id) type. And the callback function 
>> parameter 'value' is NULL when parsed 'only keys'.
>>
>> It may leads to segment fault when parse args with 'only key', this 
>> patchset fixes rest of them.
>>
>> Chengwen Feng (5):
>>   app/pdump: fix segment fault when parse args
>>   net/memif: fix segment fault when parse devargs
>>   net/pcap: fix segment fault when parse devargs
>>   net/ring: fix segment fault when parse devargs
>>   net/sfc: fix segment fault when parse devargs
> 
> Hi Chengwen,
> 
> Did you scan all `rte_kvargs_process()` instances?

No, I was just looking at the modules I was concerned about.
I looked at it briefly, and some modules had the same problem.

> 
> 
> And if there would be a way to tell kvargs that a value is expected (or
> not) this checks could be done in kvargs layer, I think this also can be
> to look at.

Yes, the way to tell kvargs may lead to a lot of modifys and also break ABI.
I also think about just set value = "" when only exist key, It could perfectly solve the above segment scene.
But it also break the API's behavior.


Or continue fix the exist code (about 10+ place more),
for new invoking, because the 'arg_handler_t' already well documented (52ab17efdecf935792ee1d0cb749c0dbd536c083),
they'll take the initiative to prevent this.


Hope for more advise for the next.

> .
> 

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-16 13:10  3%     ` Dongdong Liu
@ 2023-03-16 14:31  0%       ` Ivan Malov
  0 siblings, 0 replies; 200+ results
From: Ivan Malov @ 2023-03-16 14:31 UTC (permalink / raw)
  To: Dongdong Liu
  Cc: dev, ferruh.yigit, thomas, andrew.rybchenko, reshma.pattan,
	stable, yisen.zhuang, Jie Hai

Hi,

Thanks for responding and PSB.

On Thu, 16 Mar 2023, Dongdong Liu wrote:

> Hi Ivan
>
> Many thanks for your review.
>
> On 2023/3/15 19:28, Ivan Malov wrote:
>> Hi,
>> 
>> On Wed, 15 Mar 2023, Dongdong Liu wrote:
>> 
>>> From: Jie Hai <haijie1@huawei.com>
>>> 
>>> Currently, rte_eth_rss_conf supports configuring rss hash
>>> functions, rss key and it's length, but not rss hash algorithm.
>>> 
>>> The structure ``rte_eth_rss_conf`` is extended by adding a new field,
>>> "func". This represents the RSS algorithms to apply. The following
>>> API is affected:
>>>     - rte_eth_dev_configure
>>>     - rte_eth_dev_rss_hash_update
>>>     - rte_eth_dev_rss_hash_conf_get
>>> 
>>> To prevent configuration failures caused by incorrect func input, check
>>> this parameter in advance. If it's incorrect, a warning is generated
>>> and the default value is set. Do the same for rte_eth_dev_rss_hash_update
>>> and rte_eth_dev_configure.
>>> 
>>> To check whether the drivers report the func field, it is set to default
>>> value before querying.
>>> 
>>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>>> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
>>> ---
>>> doc/guides/rel_notes/release_23_03.rst |  4 ++--
>>> lib/ethdev/rte_ethdev.c                | 18 ++++++++++++++++++
>>> lib/ethdev/rte_ethdev.h                |  5 +++++
>>> 3 files changed, 25 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/doc/guides/rel_notes/release_23_03.rst
>>> b/doc/guides/rel_notes/release_23_03.rst
>>> index af6f37389c..7879567427 100644
>>> --- a/doc/guides/rel_notes/release_23_03.rst
>>> +++ b/doc/guides/rel_notes/release_23_03.rst
>>> @@ -284,8 +284,8 @@ ABI Changes
>>>    Also, make sure to start the actual text at the margin.
>>>    =======================================================
>>> 
>>> -* No ABI change that would break compatibility with 22.11.
>>> -
>>> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for
>>> RSS hash
>>> +  algorithm.
>>> 
>>> Known Issues
>>> ------------
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>>> index 4d03255683..db561026bd 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1368,6 +1368,15 @@ rte_eth_dev_configure(uint16_t port_id,
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>         goto rollback;
>>>     }
>>> 
>>> +    if (dev_conf->rx_adv_conf.rss_conf.func >=
>>> RTE_ETH_HASH_FUNCTION_MAX) {
>>> +        RTE_ETHDEV_LOG(WARNING,
>>> +            "Ethdev port_id=%u invalid rss hash function (%u),
>>> modified to default value (%u)\n",
>>> +            port_id, dev_conf->rx_adv_conf.rss_conf.func,
>>> +            RTE_ETH_HASH_FUNCTION_DEFAULT);
>>> +        dev->data->dev_conf.rx_adv_conf.rss_conf.func =
>>> +            RTE_ETH_HASH_FUNCTION_DEFAULT;
>> 
>> I have no strong opinion, but, to me, this behaviour conceals
>> programming errors. For example, if an application intends
>> to enable hash algorithm A but, due to a programming error,
>> passes a gibberish value here, chances are the error will
>> end up unnoticed. Especially in case the application
>> sets the log level to such that warnings are omitted.
> Good point, will fix.
>> 
>> Why not just return the error the standard way?
>
> Aha, The original intention is not to break the ABI,
> but I think it could not achieve that.
>> 
>>> +    }
>>> +
>>>     /* Check if Rx RSS distribution is disabled but RSS hash is
>>> enabled. */
>>>     if (((dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) == 0) &&
>>>         (dev_conf->rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
>>> @@ -4553,6 +4562,13 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
>>>         return -ENOTSUP;
>>>     }
>>> 
>>> +    if (rss_conf->func >= RTE_ETH_HASH_FUNCTION_MAX) {
>>> +        RTE_ETHDEV_LOG(NOTICE,
>>> +            "Ethdev port_id=%u invalid rss hash function (%u),
>>> modified to default value (%u)\n",
>>> +            port_id, rss_conf->func, RTE_ETH_HASH_FUNCTION_DEFAULT);
>>> +        rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
>>> +    }
>>> +
>>>     if (*dev->dev_ops->rss_hash_update == NULL)
>>>         return -ENOTSUP;
>>>     ret = eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
>>> @@ -4580,6 +4596,8 @@ rte_eth_dev_rss_hash_conf_get(uint16_t port_id,
>>>         return -EINVAL;
>>>     }
>>> 
>>> +    rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
>>> +
>>>     if (*dev->dev_ops->rss_hash_conf_get == NULL)
>>>         return -ENOTSUP;
>>>     ret = eth_err(port_id, (*dev->dev_ops->rss_hash_conf_get)(dev,
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>> index 99fe9e238b..5abe2cb36d 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -174,6 +174,7 @@ extern "C" {
>>> 
>>> #include "rte_ethdev_trace_fp.h"
>>> #include "rte_dev_info.h"
>>> +#include "rte_flow.h"
>>> 
>>> extern int rte_eth_dev_logtype;
>>> 
>>> @@ -461,11 +462,15 @@ struct rte_vlan_filter_conf {
>>>  * The *rss_hf* field of the *rss_conf* structure indicates the different
>>>  * types of IPv4/IPv6 packets to which the RSS hashing must be applied.
>>>  * Supplying an *rss_hf* equal to zero disables the RSS feature.
>>> + *
>>> + * The *func* field of the *rss_conf* structure indicates the different
>>> + * types of hash algorithms applied by the RSS hashing.
>> 
>> Consider:
>> 
>> The *func* field of the *rss_conf* structure indicates the algorithm to
>> use when computing hash. Passing RTE_ETH_HASH_FUNCTION_DEFAULT allows
>> the PMD to use its best-effort algorithm rather than a specific one.
>
> Look at some PMD drivers(i40e, hns3 etc), it seems the 
> RTE_ETH_HASH_FUNCTION_DEFAULT consider as no rss algorithm is set.

This does not seem to contradict the suggested description.

If they, however, treat this as "no RSS at all", then
perhaps it is a mistake, because if the user requests
Rx MQ mode "RSS" and selects algorithm DEFAULT, this
is clearly not the same as "no RSS". Not by a long
shot. Because for "no RSS" the user would have
passed MQ mode choice "NONE", I take it.

>
> Thanks,
> Dongdong
>>
>>>  */
>>> struct rte_eth_rss_conf {
>>>     uint8_t *rss_key;    /**< If not NULL, 40-byte hash key. */
>>>     uint8_t rss_key_len; /**< hash key length in bytes. */
>>>     uint64_t rss_hf;     /**< Hash functions to apply - see below. */
>>> +    enum rte_eth_hash_function func;    /**< Hash algorithm to apply. */
>>> };
>>> 
>>> /*
>>> --
>>> 2.22.0
>>> 
>>> 
>> 
>> Thank you.
>> 
>> .
>> 
>

Thank you.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-15 13:43  3%   ` Thomas Monjalon
@ 2023-03-16 13:16  3%     ` Dongdong Liu
  0 siblings, 0 replies; 200+ results
From: Dongdong Liu @ 2023-03-16 13:16 UTC (permalink / raw)
  To: Thomas Monjalon, Jie Hai
  Cc: dev, ferruh.yigit, andrew.rybchenko, reshma.pattan, stable,
	yisen.zhuang, david.marchand

Hi Thomas
On 2023/3/15 21:43, Thomas Monjalon wrote:
> 15/03/2023 12:00, Dongdong Liu:
>> From: Jie Hai <haijie1@huawei.com>
>> --- a/doc/guides/rel_notes/release_23_03.rst
>> +++ b/doc/guides/rel_notes/release_23_03.rst
>> -* No ABI change that would break compatibility with 22.11.
>> -
>> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for RSS hash
>> +  algorithm.
>
> We cannot break ABI compatibility until 23.11.
Got it. Thank you for reminding.

[PATCH 3/5] and [PATCH 4/5] do not relate with this ABI compatibility.
I will send them separately.

Thanks,
Dongdong
>
>
>
> .
>

^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-15 11:28  0%   ` Ivan Malov
@ 2023-03-16 13:10  3%     ` Dongdong Liu
  2023-03-16 14:31  0%       ` Ivan Malov
  0 siblings, 1 reply; 200+ results
From: Dongdong Liu @ 2023-03-16 13:10 UTC (permalink / raw)
  To: Ivan Malov
  Cc: dev, ferruh.yigit, thomas, andrew.rybchenko, reshma.pattan,
	stable, yisen.zhuang, Jie Hai

Hi Ivan

Many thanks for your review.

On 2023/3/15 19:28, Ivan Malov wrote:
> Hi,
>
> On Wed, 15 Mar 2023, Dongdong Liu wrote:
>
>> From: Jie Hai <haijie1@huawei.com>
>>
>> Currently, rte_eth_rss_conf supports configuring rss hash
>> functions, rss key and it's length, but not rss hash algorithm.
>>
>> The structure ``rte_eth_rss_conf`` is extended by adding a new field,
>> "func". This represents the RSS algorithms to apply. The following
>> API is affected:
>>     - rte_eth_dev_configure
>>     - rte_eth_dev_rss_hash_update
>>     - rte_eth_dev_rss_hash_conf_get
>>
>> To prevent configuration failures caused by incorrect func input, check
>> this parameter in advance. If it's incorrect, a warning is generated
>> and the default value is set. Do the same for rte_eth_dev_rss_hash_update
>> and rte_eth_dev_configure.
>>
>> To check whether the drivers report the func field, it is set to default
>> value before querying.
>>
>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
>> ---
>> doc/guides/rel_notes/release_23_03.rst |  4 ++--
>> lib/ethdev/rte_ethdev.c                | 18 ++++++++++++++++++
>> lib/ethdev/rte_ethdev.h                |  5 +++++
>> 3 files changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_23_03.rst
>> b/doc/guides/rel_notes/release_23_03.rst
>> index af6f37389c..7879567427 100644
>> --- a/doc/guides/rel_notes/release_23_03.rst
>> +++ b/doc/guides/rel_notes/release_23_03.rst
>> @@ -284,8 +284,8 @@ ABI Changes
>>    Also, make sure to start the actual text at the margin.
>>    =======================================================
>>
>> -* No ABI change that would break compatibility with 22.11.
>> -
>> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for
>> RSS hash
>> +  algorithm.
>>
>> Known Issues
>> ------------
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>> index 4d03255683..db561026bd 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -1368,6 +1368,15 @@ rte_eth_dev_configure(uint16_t port_id,
>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>         goto rollback;
>>     }
>>
>> +    if (dev_conf->rx_adv_conf.rss_conf.func >=
>> RTE_ETH_HASH_FUNCTION_MAX) {
>> +        RTE_ETHDEV_LOG(WARNING,
>> +            "Ethdev port_id=%u invalid rss hash function (%u),
>> modified to default value (%u)\n",
>> +            port_id, dev_conf->rx_adv_conf.rss_conf.func,
>> +            RTE_ETH_HASH_FUNCTION_DEFAULT);
>> +        dev->data->dev_conf.rx_adv_conf.rss_conf.func =
>> +            RTE_ETH_HASH_FUNCTION_DEFAULT;
>
> I have no strong opinion, but, to me, this behaviour conceals
> programming errors. For example, if an application intends
> to enable hash algorithm A but, due to a programming error,
> passes a gibberish value here, chances are the error will
> end up unnoticed. Especially in case the application
> sets the log level to such that warnings are omitted.
Good point, will fix.
>
> Why not just return the error the standard way?

Aha, The original intention is not to break the ABI,
but I think it could not achieve that.
>
>> +    }
>> +
>>     /* Check if Rx RSS distribution is disabled but RSS hash is
>> enabled. */
>>     if (((dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) == 0) &&
>>         (dev_conf->rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
>> @@ -4553,6 +4562,13 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
>>         return -ENOTSUP;
>>     }
>>
>> +    if (rss_conf->func >= RTE_ETH_HASH_FUNCTION_MAX) {
>> +        RTE_ETHDEV_LOG(NOTICE,
>> +            "Ethdev port_id=%u invalid rss hash function (%u),
>> modified to default value (%u)\n",
>> +            port_id, rss_conf->func, RTE_ETH_HASH_FUNCTION_DEFAULT);
>> +        rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
>> +    }
>> +
>>     if (*dev->dev_ops->rss_hash_update == NULL)
>>         return -ENOTSUP;
>>     ret = eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
>> @@ -4580,6 +4596,8 @@ rte_eth_dev_rss_hash_conf_get(uint16_t port_id,
>>         return -EINVAL;
>>     }
>>
>> +    rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
>> +
>>     if (*dev->dev_ops->rss_hash_conf_get == NULL)
>>         return -ENOTSUP;
>>     ret = eth_err(port_id, (*dev->dev_ops->rss_hash_conf_get)(dev,
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>> index 99fe9e238b..5abe2cb36d 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -174,6 +174,7 @@ extern "C" {
>>
>> #include "rte_ethdev_trace_fp.h"
>> #include "rte_dev_info.h"
>> +#include "rte_flow.h"
>>
>> extern int rte_eth_dev_logtype;
>>
>> @@ -461,11 +462,15 @@ struct rte_vlan_filter_conf {
>>  * The *rss_hf* field of the *rss_conf* structure indicates the different
>>  * types of IPv4/IPv6 packets to which the RSS hashing must be applied.
>>  * Supplying an *rss_hf* equal to zero disables the RSS feature.
>> + *
>> + * The *func* field of the *rss_conf* structure indicates the different
>> + * types of hash algorithms applied by the RSS hashing.
>
> Consider:
>
> The *func* field of the *rss_conf* structure indicates the algorithm to
> use when computing hash. Passing RTE_ETH_HASH_FUNCTION_DEFAULT allows
> the PMD to use its best-effort algorithm rather than a specific one.

Look at some PMD drivers(i40e, hns3 etc), it seems the 
RTE_ETH_HASH_FUNCTION_DEFAULT consider as no rss algorithm is set.

Thanks,
Dongdong
>
>>  */
>> struct rte_eth_rss_conf {
>>     uint8_t *rss_key;    /**< If not NULL, 40-byte hash key. */
>>     uint8_t rss_key_len; /**< hash key length in bytes. */
>>     uint64_t rss_hf;     /**< Hash functions to apply - see below. */
>> +    enum rte_eth_hash_function func;    /**< Hash algorithm to apply. */
>> };
>>
>> /*
>> --
>> 2.22.0
>>
>>
>
> Thank you.
>
> .
>

^ permalink raw reply	[relevance 3%]

* [RFC v2 0/2] Add high-performance timer facility
  2023-02-28  9:39  3% [RFC 0/2] Add high-performance timer facility Mattias Rönnblom
  2023-02-28 16:01  0% ` Morten Brørup
@ 2023-03-15 17:03  3% ` Mattias Rönnblom
  1 sibling, 0 replies; 200+ results
From: Mattias Rönnblom @ 2023-03-15 17:03 UTC (permalink / raw)
  To: dev
  Cc: Erik Gabriel Carrillo, David Marchand, maria.lingemark,
	Stefan Sundkvist, Stephen Hemminger, Morten Brørup,
	Tyler Retzlaff, Mattias Rönnblom

This patchset is an attempt to introduce a high-performance, highly
scalable timer facility into DPDK.

More specifically, the goals for the htimer library are:

* Efficient handling of a handful up to hundreds of thousands of
  concurrent timers.
* Make adding and canceling timers low-overhead, constant-time
  operations.
* Provide a service functionally equivalent to that of
  <rte_timer.h>. API/ABI backward compatibility is secondary.

In the author's opinion, there are two main shortcomings with the
current DPDK timer library (i.e., rte_timer.[ch]).

One is the synchronization overhead, where heavy-weight full-barrier
type synchronization is used. rte_timer.c uses per-EAL/lcore skip
lists, but any thread may add or cancel (or otherwise access) timers
managed by another lcore (and thus resides in its timer skip list).

The other is an algorithmic shortcoming, with rte_timer.c's reliance
on a skip list, which is less efficient than certain alternatives.

This patchset implements a hierarchical timer wheel (HWT, in
rte_htw.c), as per the Varghese and Lauck paper "Hashed and
Hierarchical Timing Wheels: Data Structures for the Efficient
Implementation of a Timer Facility". A HWT is a data structure
purposely design for this task, and used by many operating system
kernel timer facilities.

To further improve the solution described by Varghese and Lauck, a
bitset is placed in front of each of the timer wheel in the HWT,
reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
and expiry processing).

Cycle-efficient scanning and manipulation of these bitsets are crucial
for the HWT's performance.

The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
instance, much like rte_timer.c keeps a per-lcore skip list.

To avoid expensive synchronization overhead for thread-local timer
management, the HWTs are accessed only from the "owning" thread.  Any
interaction any other thread does with a particular lcore's timer
wheel goes over a set of DPDK rings. A side-effect of this design is
that all operations working toward a "remote" HWT must be
asynchronous.

The <rte_htimer.h> API is available only to EAL threads and registered
non-EAL threads.

The htimer API allows the application to supply the current time,
useful in case it already has retrieved this for other purposes,
saving the cost of a rdtsc instruction (or its equivalent).

Relative htimer does not retrieve a new time, but reuse the current
time (as known via/at-the-time of the manage-call), again to shave off
some cycles of overhead.

A semantic improvement compared to the <rte_timer.h> API is that the
htimer library can give a definite answer on the question if the timer
expiry callback was called, after a timer has been canceled.

The patchset includes a performance test case
'timer_htimer_htw_perf_autotest', which compares rte_timer, rte_htimer
and rte_htw timers in the same scenario.

'timer_htimer_htw_perf_autotest' suggests that rte_htimer is ~3-5x
faster than rte_timer for timer/timeout-heavy applications, in a
scenario where the timer always fires. For a scenario with a mix of
canceled and expired timers, the performance difference is greater.

In scenarios with few timeouts, rte_timer has lower overhead than
htimer, but both variants consume very little CPU time.

In certain scenarios, rte_timer does not suffer from
non-constant-time-add and cancel operations. On such is in case the
timer added is always last in the list, where htimer is only ~2-3x
faster.

The bitset implementation which the HWT implementation depends upon
seemed generic-enough and potentially useful outside the world of
HWTs, to justify being located in the EAL.

This patchset is very much an RFC, and the author is yet to form an
opinion on many important issues.

* If deemed a suitable replacement, should the htimer replace the
  current DPDK timer library in some particular (ABI-breaking)
  release, or should it live side-by-side with the then-legacy
  <rte_timer.h> API? A lot of things in and outside DPDK depend on
  <rte_timer.h>, so coexistence may be required to facilitate a smooth
  transition.

* Should the htimer and htw-related files be colocated with rte_timer.c
  in the timer library?

* Would it be useful for applications using asynchronous cancel to
  have the option of having the timer callback run not only in case of
  timer expiration, but also cancellation (on the target lcore)? The
  timer cb signature would need to include an additional parameter in
  that case.

* Should the rte_htimer be a nested struct, so the htw parts be separated
  from the htimer parts?

* <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
  <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
  be so?

* rte_htimer struct is only supposed to be used by the application to
  give an indication of how much memory it needs to allocate, and is
  its member are not supposed to be directly accessed (w/ the possible
  exception of the owner_lcore_id field). Should there be a dummy
  struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
  function instead, serving the same purpose? Better encapsulation,
  but more inconvenient for applications. Run-time dynamic sizing
  would force application-level dynamic allocations.

* Asynchronous cancellation is a little tricky to use for the
  application (primarily due to timer memory reclamation/race
  issues). Should this functionality be removed?
  
* Should rte_htimer_mgr_init() also retrieve the current time? If so,
  there should to be a variant which allows the user to specify the
  time (to match rte_htimer_mgr_manage_time()). One pitfall with the
  current proposed API is an application calling rte_htimer_mgr_init()
  and then immediately adding a timer with a relative timeout, in
  which case the current absolute time used is 0, which might be a
  surprise.

* Would the event timer adapter be best off using <rte_htw.h>
  directly, or <rte_htimer.h>? In the latter case, there needs to be a
  way to instantiate more HWTs (similar to the "alt" functions of
  <rte_timer.h>)?

* Should the PERIODICAL flag (and the complexity it brings) be
  removed? And leave the application with only single-shot timers, and
  the option to re-add them in the timer callback.

* Should the async result codes and the sync cancel error codes be merged
  into one set of result codes?

* Should the rte_htimer_mgr_async_add() have a flag which allow
  buffering add request messages until rte_htimer_mgr_process() is
  called? Or any manage function. Would reduce ring signaling overhead
  (i.e., burst enqueue operations instead of single-element
  enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
  solving the same "problem" a different way. (The signature of such
  a function would not be pretty.)

* Does the functionality provided by the rte_htimer_mgr_process()
  function match its the use cases? Should there me a more clear
  separation between expiry processing and asynchronous operation
  processing?

* Should the patchset be split into more commits? If so, how?

Thanks to Erik Carrillo for his assistance.

Mattias Rönnblom (2):
  eal: add bitset type
  eal: add high-performance timer facility

 app/test/meson.build                  |  12 +-
 app/test/test_bitset.c                | 645 +++++++++++++++++++
 app/test/test_htimer_mgr.c            | 674 ++++++++++++++++++++
 app/test/test_htimer_mgr_perf.c       | 322 ++++++++++
 app/test/test_htw.c                   | 478 ++++++++++++++
 app/test/test_htw_perf.c              | 181 ++++++
 app/test/test_timer_htimer_htw_perf.c | 693 ++++++++++++++++++++
 doc/api/doxy-api-index.md             |   5 +-
 doc/api/doxy-api.conf.in              |   1 +
 lib/eal/common/meson.build            |   1 +
 lib/eal/common/rte_bitset.c           |  29 +
 lib/eal/include/meson.build           |   1 +
 lib/eal/include/rte_bitset.h          | 879 ++++++++++++++++++++++++++
 lib/eal/version.map                   |   3 +
 lib/htimer/meson.build                |   7 +
 lib/htimer/rte_htimer.h               |  68 ++
 lib/htimer/rte_htimer_mgr.c           | 547 ++++++++++++++++
 lib/htimer/rte_htimer_mgr.h           | 516 +++++++++++++++
 lib/htimer/rte_htimer_msg.h           |  44 ++
 lib/htimer/rte_htimer_msg_ring.c      |  18 +
 lib/htimer/rte_htimer_msg_ring.h      |  55 ++
 lib/htimer/rte_htw.c                  | 445 +++++++++++++
 lib/htimer/rte_htw.h                  |  49 ++
 lib/htimer/version.map                |  17 +
 lib/meson.build                       |   1 +
 25 files changed, 5689 insertions(+), 2 deletions(-)
 create mode 100644 app/test/test_bitset.c
 create mode 100644 app/test/test_htimer_mgr.c
 create mode 100644 app/test/test_htimer_mgr_perf.c
 create mode 100644 app/test/test_htw.c
 create mode 100644 app/test/test_htw_perf.c
 create mode 100644 app/test/test_timer_htimer_htw_perf.c
 create mode 100644 lib/eal/common/rte_bitset.c
 create mode 100644 lib/eal/include/rte_bitset.h
 create mode 100644 lib/htimer/meson.build
 create mode 100644 lib/htimer/rte_htimer.h
 create mode 100644 lib/htimer/rte_htimer_mgr.c
 create mode 100644 lib/htimer/rte_htimer_mgr.h
 create mode 100644 lib/htimer/rte_htimer_msg.h
 create mode 100644 lib/htimer/rte_htimer_msg_ring.c
 create mode 100644 lib/htimer/rte_htimer_msg_ring.h
 create mode 100644 lib/htimer/rte_htw.c
 create mode 100644 lib/htimer/rte_htw.h
 create mode 100644 lib/htimer/version.map

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-15 11:00 10% ` [PATCH 1/5] ethdev: support setting and querying rss algorithm Dongdong Liu
  2023-03-15 11:28  0%   ` Ivan Malov
@ 2023-03-15 13:43  3%   ` Thomas Monjalon
  2023-03-16 13:16  3%     ` Dongdong Liu
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-15 13:43 UTC (permalink / raw)
  To: Dongdong Liu, Jie Hai
  Cc: dev, ferruh.yigit, andrew.rybchenko, reshma.pattan, stable,
	yisen.zhuang, david.marchand

15/03/2023 12:00, Dongdong Liu:
> From: Jie Hai <haijie1@huawei.com>
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> -* No ABI change that would break compatibility with 22.11.
> -
> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for RSS hash
> +  algorithm.

We cannot break ABI compatibility until 23.11.




^ permalink raw reply	[relevance 3%]

* Re: [PATCH 1/5] ethdev: support setting and querying rss algorithm
  2023-03-15 11:00 10% ` [PATCH 1/5] ethdev: support setting and querying rss algorithm Dongdong Liu
@ 2023-03-15 11:28  0%   ` Ivan Malov
  2023-03-16 13:10  3%     ` Dongdong Liu
  2023-03-15 13:43  3%   ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Ivan Malov @ 2023-03-15 11:28 UTC (permalink / raw)
  To: Dongdong Liu
  Cc: dev, ferruh.yigit, thomas, andrew.rybchenko, reshma.pattan,
	stable, yisen.zhuang, Jie Hai

Hi,

On Wed, 15 Mar 2023, Dongdong Liu wrote:

> From: Jie Hai <haijie1@huawei.com>
>
> Currently, rte_eth_rss_conf supports configuring rss hash
> functions, rss key and it's length, but not rss hash algorithm.
>
> The structure ``rte_eth_rss_conf`` is extended by adding a new field,
> "func". This represents the RSS algorithms to apply. The following
> API is affected:
> 	- rte_eth_dev_configure
> 	- rte_eth_dev_rss_hash_update
> 	- rte_eth_dev_rss_hash_conf_get
>
> To prevent configuration failures caused by incorrect func input, check
> this parameter in advance. If it's incorrect, a warning is generated
> and the default value is set. Do the same for rte_eth_dev_rss_hash_update
> and rte_eth_dev_configure.
>
> To check whether the drivers report the func field, it is set to default
> value before querying.
>
> Signed-off-by: Jie Hai <haijie1@huawei.com>
> Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
> ---
> doc/guides/rel_notes/release_23_03.rst |  4 ++--
> lib/ethdev/rte_ethdev.c                | 18 ++++++++++++++++++
> lib/ethdev/rte_ethdev.h                |  5 +++++
> 3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
> index af6f37389c..7879567427 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -284,8 +284,8 @@ ABI Changes
>    Also, make sure to start the actual text at the margin.
>    =======================================================
>
> -* No ABI change that would break compatibility with 22.11.
> -
> +* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for RSS hash
> +  algorithm.
>
> Known Issues
> ------------
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 4d03255683..db561026bd 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1368,6 +1368,15 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
> 		goto rollback;
> 	}
>
> +	if (dev_conf->rx_adv_conf.rss_conf.func >= RTE_ETH_HASH_FUNCTION_MAX) {
> +		RTE_ETHDEV_LOG(WARNING,
> +			"Ethdev port_id=%u invalid rss hash function (%u), modified to default value (%u)\n",
> +			port_id, dev_conf->rx_adv_conf.rss_conf.func,
> +			RTE_ETH_HASH_FUNCTION_DEFAULT);
> +		dev->data->dev_conf.rx_adv_conf.rss_conf.func =
> +			RTE_ETH_HASH_FUNCTION_DEFAULT;

I have no strong opinion, but, to me, this behaviour conceals
programming errors. For example, if an application intends
to enable hash algorithm A but, due to a programming error,
passes a gibberish value here, chances are the error will
end up unnoticed. Especially in case the application
sets the log level to such that warnings are omitted.

Why not just return the error the standard way?

> +	}
> +
> 	/* Check if Rx RSS distribution is disabled but RSS hash is enabled. */
> 	if (((dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) == 0) &&
> 	    (dev_conf->rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
> @@ -4553,6 +4562,13 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
> 		return -ENOTSUP;
> 	}
>
> +	if (rss_conf->func >= RTE_ETH_HASH_FUNCTION_MAX) {
> +		RTE_ETHDEV_LOG(NOTICE,
> +			"Ethdev port_id=%u invalid rss hash function (%u), modified to default value (%u)\n",
> +			port_id, rss_conf->func, RTE_ETH_HASH_FUNCTION_DEFAULT);
> +		rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
> +	}
> +
> 	if (*dev->dev_ops->rss_hash_update == NULL)
> 		return -ENOTSUP;
> 	ret = eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
> @@ -4580,6 +4596,8 @@ rte_eth_dev_rss_hash_conf_get(uint16_t port_id,
> 		return -EINVAL;
> 	}
>
> +	rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
> +
> 	if (*dev->dev_ops->rss_hash_conf_get == NULL)
> 		return -ENOTSUP;
> 	ret = eth_err(port_id, (*dev->dev_ops->rss_hash_conf_get)(dev,
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 99fe9e238b..5abe2cb36d 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -174,6 +174,7 @@ extern "C" {
>
> #include "rte_ethdev_trace_fp.h"
> #include "rte_dev_info.h"
> +#include "rte_flow.h"
>
> extern int rte_eth_dev_logtype;
>
> @@ -461,11 +462,15 @@ struct rte_vlan_filter_conf {
>  * The *rss_hf* field of the *rss_conf* structure indicates the different
>  * types of IPv4/IPv6 packets to which the RSS hashing must be applied.
>  * Supplying an *rss_hf* equal to zero disables the RSS feature.
> + *
> + * The *func* field of the *rss_conf* structure indicates the different
> + * types of hash algorithms applied by the RSS hashing.

Consider:

The *func* field of the *rss_conf* structure indicates the algorithm to
use when computing hash. Passing RTE_ETH_HASH_FUNCTION_DEFAULT allows
the PMD to use its best-effort algorithm rather than a specific one.

>  */
> struct rte_eth_rss_conf {
> 	uint8_t *rss_key;    /**< If not NULL, 40-byte hash key. */
> 	uint8_t rss_key_len; /**< hash key length in bytes. */
> 	uint64_t rss_hf;     /**< Hash functions to apply - see below. */
> +	enum rte_eth_hash_function func;	/**< Hash algorithm to apply. */
> };
>
> /*
> -- 
> 2.22.0
>
>

Thank you.

^ permalink raw reply	[relevance 0%]

* [PATCH 1/5] ethdev: support setting and querying rss algorithm
  @ 2023-03-15 11:00 10% ` Dongdong Liu
  2023-03-15 11:28  0%   ` Ivan Malov
  2023-03-15 13:43  3%   ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Dongdong Liu @ 2023-03-15 11:00 UTC (permalink / raw)
  To: dev, ferruh.yigit, thomas, andrew.rybchenko, reshma.pattan
  Cc: stable, yisen.zhuang, liudongdong3, Jie Hai

From: Jie Hai <haijie1@huawei.com>

Currently, rte_eth_rss_conf supports configuring rss hash
functions, rss key and it's length, but not rss hash algorithm.

The structure ``rte_eth_rss_conf`` is extended by adding a new field,
"func". This represents the RSS algorithms to apply. The following
API is affected:
	- rte_eth_dev_configure
	- rte_eth_dev_rss_hash_update
	- rte_eth_dev_rss_hash_conf_get

To prevent configuration failures caused by incorrect func input, check
this parameter in advance. If it's incorrect, a warning is generated
and the default value is set. Do the same for rte_eth_dev_rss_hash_update
and rte_eth_dev_configure.

To check whether the drivers report the func field, it is set to default
value before querying.

Signed-off-by: Jie Hai <haijie1@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
---
 doc/guides/rel_notes/release_23_03.rst |  4 ++--
 lib/ethdev/rte_ethdev.c                | 18 ++++++++++++++++++
 lib/ethdev/rte_ethdev.h                |  5 +++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index af6f37389c..7879567427 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -284,8 +284,8 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
-* No ABI change that would break compatibility with 22.11.
-
+* ethdev: Added "func" field to ``rte_eth_rss_conf`` structure for RSS hash
+  algorithm.
 
 Known Issues
 ------------
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 4d03255683..db561026bd 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1368,6 +1368,15 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
 		goto rollback;
 	}
 
+	if (dev_conf->rx_adv_conf.rss_conf.func >= RTE_ETH_HASH_FUNCTION_MAX) {
+		RTE_ETHDEV_LOG(WARNING,
+			"Ethdev port_id=%u invalid rss hash function (%u), modified to default value (%u)\n",
+			port_id, dev_conf->rx_adv_conf.rss_conf.func,
+			RTE_ETH_HASH_FUNCTION_DEFAULT);
+		dev->data->dev_conf.rx_adv_conf.rss_conf.func =
+			RTE_ETH_HASH_FUNCTION_DEFAULT;
+	}
+
 	/* Check if Rx RSS distribution is disabled but RSS hash is enabled. */
 	if (((dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) == 0) &&
 	    (dev_conf->rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH)) {
@@ -4553,6 +4562,13 @@ rte_eth_dev_rss_hash_update(uint16_t port_id,
 		return -ENOTSUP;
 	}
 
+	if (rss_conf->func >= RTE_ETH_HASH_FUNCTION_MAX) {
+		RTE_ETHDEV_LOG(NOTICE,
+			"Ethdev port_id=%u invalid rss hash function (%u), modified to default value (%u)\n",
+			port_id, rss_conf->func, RTE_ETH_HASH_FUNCTION_DEFAULT);
+		rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
+	}
+
 	if (*dev->dev_ops->rss_hash_update == NULL)
 		return -ENOTSUP;
 	ret = eth_err(port_id, (*dev->dev_ops->rss_hash_update)(dev,
@@ -4580,6 +4596,8 @@ rte_eth_dev_rss_hash_conf_get(uint16_t port_id,
 		return -EINVAL;
 	}
 
+	rss_conf->func = RTE_ETH_HASH_FUNCTION_DEFAULT;
+
 	if (*dev->dev_ops->rss_hash_conf_get == NULL)
 		return -ENOTSUP;
 	ret = eth_err(port_id, (*dev->dev_ops->rss_hash_conf_get)(dev,
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 99fe9e238b..5abe2cb36d 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -174,6 +174,7 @@ extern "C" {
 
 #include "rte_ethdev_trace_fp.h"
 #include "rte_dev_info.h"
+#include "rte_flow.h"
 
 extern int rte_eth_dev_logtype;
 
@@ -461,11 +462,15 @@ struct rte_vlan_filter_conf {
  * The *rss_hf* field of the *rss_conf* structure indicates the different
  * types of IPv4/IPv6 packets to which the RSS hashing must be applied.
  * Supplying an *rss_hf* equal to zero disables the RSS feature.
+ *
+ * The *func* field of the *rss_conf* structure indicates the different
+ * types of hash algorithms applied by the RSS hashing.
  */
 struct rte_eth_rss_conf {
 	uint8_t *rss_key;    /**< If not NULL, 40-byte hash key. */
 	uint8_t rss_key_len; /**< hash key length in bytes. */
 	uint64_t rss_hf;     /**< Hash functions to apply - see below. */
+	enum rte_eth_hash_function func;	/**< Hash algorithm to apply. */
 };
 
 /*
-- 
2.22.0


^ permalink raw reply	[relevance 10%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09 13:10  0%             ` Bruce Richardson
@ 2023-03-13 15:51  0%               ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-03-13 15:51 UTC (permalink / raw)
  To: fengchengwen, Bruce Richardson
  Cc: dev, dev, David Marchand, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

09/03/2023 14:10, Bruce Richardson:
> On Thu, Mar 09, 2023 at 01:12:51PM +0100, Thomas Monjalon wrote:
> > 09/03/2023 12:23, fengchengwen:
> > > On 2023/3/9 15:29, Thomas Monjalon wrote:
> > > > 09/03/2023 02:43, fengchengwen:
> > > >> On 2023/3/7 0:13, Thomas Monjalon wrote:
> > > >>> --- a/doc/guides/rel_notes/release_22_11.rst
> > > >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> > > >>> @@ -504,7 +504,7 @@ ABI Changes
> > > >>>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
> > > >>>  
> > > >>>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> > > >>> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> > > >>> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
> > > >>
> > > >> Should add to release 23.03 rst.
> > > > 
> > > > Yes we could add a note in API changes.
> > > > 
> > > >> The original 22.11 still have RTE_IOVA_AS_PA definition.
> > > > 
> > > > Yes it was not a good idea to rename in the release notes.
> > > > 
> > > >>> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> > > >>> -    build = false
> > > >>> -    reason = 'driver does not support disabling IOVA as PA mode'
> > > >>> +if not get_option('enable_iova_as_pa')
> > > >>>      subdir_done()
> > > >>>  endif
> > > >>
> > > >> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
> > > >> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
> > > >>      subdir_done()
> > > >> endif
> > > > 
> > > > Why testing the C macro in Meson?
> > > > It looks simpler to check the Meson option in Meson.
> > > 
> > > The macro was create in meson.build: config/meson.build:319:dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
> > > It can be regarded as alias of enable_iova_as_pa.
> > 
> > It is not strictly an alias, because it can be overriden via CFLAGS.
> > 
> > > This commit was mainly used to improve comprehensibility. so we should limit the 'enable_iova_as_pa' usage scope.
> > > and the 'if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0' is more comprehensibility than 'if not get_option('enable_iova_as_pa')'
> > 
> > To me, using Meson option in Meson files is more obvious.
> > 
> > Bruce, what do you think?
> > 
> 
> I'm not sure it matters much! However, I think of the two, using the
> reference to IOVA_IN_MBUF is clearer. It also allows the same terminology
> to be used in meson and C files. If we don't want to do a dpdk_conf lookup,
> we can always assign the option to a meson variable called iova_in_mbuf.

OK I'll query the C macro in the Meson files.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH] lib/hash: new feature adding existing key
  @ 2023-03-13 15:48  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-03-13 15:48 UTC (permalink / raw)
  To: Abdullah Ömer Yamaç; +Cc: dev, Yipeng Wang

On Mon, 13 Mar 2023 07:35:48 +0000
Abdullah Ömer Yamaç <omer.yamac@ceng.metu.edu.tr> wrote:

> diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
> index eb2644f74b..e8b7283ec2 100644
> --- a/lib/hash/rte_cuckoo_hash.h
> +++ b/lib/hash/rte_cuckoo_hash.h
> @@ -193,6 +193,8 @@ struct rte_hash {
>  	/**< If read-write concurrency support is enabled */
>  	uint8_t ext_table_support;     /**< Enable extendable bucket table */
>  	uint8_t no_free_on_del;
> +	/**< If update is prohibited on adding same key */
> +	uint8_t no_update_data;
>  	/**< If key index should be freed on calling rte_hash_del_xxx APIs.
>  	 * If this is set, rte_hash_free_key_with_position must be called to
>  	 * free the key index associated with the deleted entry.
> diff --git a/lib/hash/rte_hash.h b/lib/hash/rte_hash.h

This ends up being an ABI change. So needs to wait for 23.11 release

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] reorder: fix registration of dynamic field in mbuf
  @ 2023-03-13 10:19  3% ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-03-13 10:19 UTC (permalink / raw)
  To: Volodymyr Fialko, Reshma Pattan
  Cc: dev, Andrew Rybchenko, jerinj, anoobj, Thomas Monjalon

Hello,

On Mon, Mar 13, 2023 at 10:35 AM Volodymyr Fialko <vfialko@marvell.com> wrote:
>
> It's possible to initialize reorder buffer with user allocated memory via
> rte_reorder_init() function. In such case rte_reorder_create() not required
> and reorder dynamic field in rte_mbuf will be not registered.

Good catch.


>
> Fixes: 01f3496695b5 ("reorder: switch sequence number to dynamic mbuf field")

It seems worth backporting.
Cc: stable@dpdk.org

>
> Signed-off-by: Volodymyr Fialko <vfialko@marvell.com>
> ---
>  lib/reorder/rte_reorder.c | 40 ++++++++++++++++++++++++++++++---------
>  1 file changed, 31 insertions(+), 9 deletions(-)
>
> diff --git a/lib/reorder/rte_reorder.c b/lib/reorder/rte_reorder.c
> index 6e029c9e02..a759a9c434 100644
> --- a/lib/reorder/rte_reorder.c
> +++ b/lib/reorder/rte_reorder.c
> @@ -54,6 +54,28 @@ struct rte_reorder_buffer {
>  static void
>  rte_reorder_free_mbufs(struct rte_reorder_buffer *b);
>
> +static int
> +rte_reorder_dynf_register(void)
> +{
> +       int ret;
> +
> +       static const struct rte_mbuf_dynfield reorder_seqn_dynfield_desc = {
> +               .name = RTE_REORDER_SEQN_DYNFIELD_NAME,
> +               .size = sizeof(rte_reorder_seqn_t),
> +               .align = __alignof__(rte_reorder_seqn_t),
> +       };
> +
> +       if (rte_reorder_seqn_dynfield_offset > 0)
> +               return 0;
> +
> +       ret = rte_mbuf_dynfield_register(&reorder_seqn_dynfield_desc);
> +       if (ret < 0)
> +               return ret;
> +       rte_reorder_seqn_dynfield_offset = ret;
> +
> +       return 0;
> +}

We don't need this helper (see my comment below, for
rte_reorder_create), you can simply move this block to
rte_reorder_init().


> +
>  struct rte_reorder_buffer *
>  rte_reorder_init(struct rte_reorder_buffer *b, unsigned int bufsize,
>                 const char *name, unsigned int size)
> @@ -85,6 +107,12 @@ rte_reorder_init(struct rte_reorder_buffer *b, unsigned int bufsize,
>                 rte_errno = EINVAL;
>                 return NULL;
>         }
> +       if (rte_reorder_dynf_register()) {
> +               RTE_LOG(ERR, REORDER, "Failed to register mbuf field for reorder sequence"
> +                                     " number\n");
> +               rte_errno = ENOMEM;

I think returning this new errno code is fine from a ABI pov.
An application would have to check for NULL return code in any case
and can't act differently based on rte_errno value.

However, this is a small change to the rte_reorder_init API, so it
needs some update, see:

 * @return
 *   The initialized reorder buffer instance, or NULL on error
 *   On error case, rte_errno will be set appropriately:
 *    - EINVAL - invalid parameters



> +               return NULL;
> +       }
>
>         memset(b, 0, bufsize);
>         strlcpy(b->name, name, sizeof(b->name));
> @@ -106,11 +134,6 @@ rte_reorder_create(const char *name, unsigned socket_id, unsigned int size)
>         struct rte_reorder_list *reorder_list;
>         const unsigned int bufsize = sizeof(struct rte_reorder_buffer) +
>                                         (2 * size * sizeof(struct rte_mbuf *));
> -       static const struct rte_mbuf_dynfield reorder_seqn_dynfield_desc = {
> -               .name = RTE_REORDER_SEQN_DYNFIELD_NAME,
> -               .size = sizeof(rte_reorder_seqn_t),
> -               .align = __alignof__(rte_reorder_seqn_t),
> -       };
>
>         reorder_list = RTE_TAILQ_CAST(rte_reorder_tailq.head, rte_reorder_list);
>
> @@ -128,10 +151,9 @@ rte_reorder_create(const char *name, unsigned socket_id, unsigned int size)
>                 return NULL;
>         }
>
> -       rte_reorder_seqn_dynfield_offset =
> -               rte_mbuf_dynfield_register(&reorder_seqn_dynfield_desc);
> -       if (rte_reorder_seqn_dynfield_offset < 0) {
> -               RTE_LOG(ERR, REORDER, "Failed to register mbuf field for reorder sequence number\n");
> +       if (rte_reorder_dynf_register()) {
> +               RTE_LOG(ERR, REORDER, "Failed to register mbuf field for reorder sequence"
> +                                     " number\n");

All rte_reorder_buffer objects need to go through rte_reorder_init().
You can check rte_reorder_init() return code.


>                 rte_errno = ENOMEM;
>                 return NULL;
>         }
> --
> 2.34.1
>


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09 12:12  0%           ` Thomas Monjalon
@ 2023-03-09 13:10  0%             ` Bruce Richardson
  2023-03-13 15:51  0%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-03-09 13:10 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: fengchengwen, dev, David Marchand, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

On Thu, Mar 09, 2023 at 01:12:51PM +0100, Thomas Monjalon wrote:
> 09/03/2023 12:23, fengchengwen:
> > On 2023/3/9 15:29, Thomas Monjalon wrote:
> > > 09/03/2023 02:43, fengchengwen:
> > >> On 2023/3/7 0:13, Thomas Monjalon wrote:
> > >>> --- a/doc/guides/rel_notes/release_22_11.rst
> > >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> > >>> @@ -504,7 +504,7 @@ ABI Changes
> > >>>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
> > >>>  
> > >>>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> > >>> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> > >>> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
> > >>
> > >> Should add to release 23.03 rst.
> > > 
> > > Yes we could add a note in API changes.
> > > 
> > >> The original 22.11 still have RTE_IOVA_AS_PA definition.
> > > 
> > > Yes it was not a good idea to rename in the release notes.
> > > 
> > >>> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> > >>> -    build = false
> > >>> -    reason = 'driver does not support disabling IOVA as PA mode'
> > >>> +if not get_option('enable_iova_as_pa')
> > >>>      subdir_done()
> > >>>  endif
> > >>
> > >> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
> > >> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
> > >>      subdir_done()
> > >> endif
> > > 
> > > Why testing the C macro in Meson?
> > > It looks simpler to check the Meson option in Meson.
> > 
> > The macro was create in meson.build: config/meson.build:319:dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
> > It can be regarded as alias of enable_iova_as_pa.
> 
> It is not strictly an alias, because it can be overriden via CFLAGS.
> 
> > This commit was mainly used to improve comprehensibility. so we should limit the 'enable_iova_as_pa' usage scope.
> > and the 'if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0' is more comprehensibility than 'if not get_option('enable_iova_as_pa')'
> 
> To me, using Meson option in Meson files is more obvious.
> 
> Bruce, what do you think?
> 

I'm not sure it matters much! However, I think of the two, using the
reference to IOVA_IN_MBUF is clearer. It also allows the same terminology
to be used in meson and C files. If we don't want to do a dpdk_conf lookup,
we can always assign the option to a meson variable called iova_in_mbuf.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09 11:23  0%         ` fengchengwen
@ 2023-03-09 12:12  0%           ` Thomas Monjalon
  2023-03-09 13:10  0%             ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-09 12:12 UTC (permalink / raw)
  To: Bruce Richardson, fengchengwen
  Cc: dev, David Marchand, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

09/03/2023 12:23, fengchengwen:
> On 2023/3/9 15:29, Thomas Monjalon wrote:
> > 09/03/2023 02:43, fengchengwen:
> >> On 2023/3/7 0:13, Thomas Monjalon wrote:
> >>> --- a/doc/guides/rel_notes/release_22_11.rst
> >>> +++ b/doc/guides/rel_notes/release_22_11.rst
> >>> @@ -504,7 +504,7 @@ ABI Changes
> >>>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
> >>>  
> >>>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> >>> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> >>> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
> >>
> >> Should add to release 23.03 rst.
> > 
> > Yes we could add a note in API changes.
> > 
> >> The original 22.11 still have RTE_IOVA_AS_PA definition.
> > 
> > Yes it was not a good idea to rename in the release notes.
> > 
> >>> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> >>> -    build = false
> >>> -    reason = 'driver does not support disabling IOVA as PA mode'
> >>> +if not get_option('enable_iova_as_pa')
> >>>      subdir_done()
> >>>  endif
> >>
> >> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
> >> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
> >>      subdir_done()
> >> endif
> > 
> > Why testing the C macro in Meson?
> > It looks simpler to check the Meson option in Meson.
> 
> The macro was create in meson.build: config/meson.build:319:dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
> It can be regarded as alias of enable_iova_as_pa.

It is not strictly an alias, because it can be overriden via CFLAGS.

> This commit was mainly used to improve comprehensibility. so we should limit the 'enable_iova_as_pa' usage scope.
> and the 'if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0' is more comprehensibility than 'if not get_option('enable_iova_as_pa')'

To me, using Meson option in Meson files is more obvious.

Bruce, what do you think?

> >> Meson build 0.63.0 already support deprecated a option by a new option.
> >> When update to the new meson verion, the drivers' meson.build will not be modified.
> > 
> > I don't understand this comment.
> 
> I mean: the option "enable_iova_as_pa" need deprecated future.

Why deprecating this option?

> Based on this, I think we should limit 'enable_iova_as_pa' usage scope, this allows us to
> reduce the amount of change effort when it's about to deprecated.

I don't plan to deprecate this option.
And in general, we should avoid deprecating a compilation option.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09  7:29  0%       ` Thomas Monjalon
@ 2023-03-09 11:23  0%         ` fengchengwen
  2023-03-09 12:12  0%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-09 11:23 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, David Marchand, Bruce Richardson, Qi Zhang,
	Morten Brørup, Shijith Thotton, Olivier Matz, Ruifeng Wang,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Jingjing Wu, Beilei Xing, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal



On 2023/3/9 15:29, Thomas Monjalon wrote:
> 09/03/2023 02:43, fengchengwen:
>> On 2023/3/7 0:13, Thomas Monjalon wrote:
>>> --- a/doc/guides/rel_notes/release_22_11.rst
>>> +++ b/doc/guides/rel_notes/release_22_11.rst
>>> @@ -504,7 +504,7 @@ ABI Changes
>>>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
>>>  
>>>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
>>> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
>>> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
>>
>> Should add to release 23.03 rst.
> 
> Yes we could add a note in API changes.
> 
>> The original 22.11 still have RTE_IOVA_AS_PA definition.
> 
> Yes it was not a good idea to rename in the release notes.
> 
>>> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
>>> -    build = false
>>> -    reason = 'driver does not support disabling IOVA as PA mode'
>>> +if not get_option('enable_iova_as_pa')
>>>      subdir_done()
>>>  endif
>>
>> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
>> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
>>      subdir_done()
>> endif
> 
> Why testing the C macro in Meson?
> It looks simpler to check the Meson option in Meson.

The macro was create in meson.build: config/meson.build:319:dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
It can be regarded as alias of enable_iova_as_pa.

This commit was mainly used to improve comprehensibility. so we should limit the 'enable_iova_as_pa' usage scope.
and the 'if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0' is more comprehensibility than 'if not get_option('enable_iova_as_pa')'

> 
>> Meson build 0.63.0 already support deprecated a option by a new option.
>> When update to the new meson verion, the drivers' meson.build will not be modified.
> 
> I don't understand this comment.

I mean: the option "enable_iova_as_pa" need deprecated future.

Based on this, I think we should limit 'enable_iova_as_pa' usage scope, this allows us to
reduce the amount of change effort when it's about to deprecated.

> 
> 
> .
> 

^ permalink raw reply	[relevance 0%]

* [RFC 1/2] security: introduce out of place support for inline ingress
@ 2023-03-09  8:56  4% Nithin Dabilpuram
  2023-04-11 10:04  4% ` [PATCH 1/3] " Nithin Dabilpuram
  0 siblings, 1 reply; 200+ results
From: Nithin Dabilpuram @ 2023-03-09  8:56 UTC (permalink / raw)
  To: Thomas Monjalon, Akhil Goyal; +Cc: jerinj, dev, Nithin Dabilpuram

Similar to out of place(OOP) processing support that exists for
Lookaside crypto/security sessions, Inline ingress security
sessions may also need out of place processing in usecases
where original encrypted packet needs to be retained for post
processing. So for NIC's which have such a kind of HW support,
a new SA option is provided to indicate whether OOP needs to
be enabled on that Inline ingress security session or not.

Since for inline ingress sessions, packet is not received by
CPU until the processing is done, we can only have per-SA
option and not per-packet option like Lookaside sessions.

In order to return the original encrypted packet mbuf,
this patch adds a new mbuf dynamic field of 8B size
containing pointer to original mbuf which will be populated
for packets associated with Inline SA that has OOP enabled.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
---
 devtools/libabigail.abignore       |  4 +++
 lib/security/rte_security.c        | 17 +++++++++++++
 lib/security/rte_security.h        | 39 +++++++++++++++++++++++++++++-
 lib/security/rte_security_driver.h |  8 ++++++
 lib/security/version.map           |  2 ++
 5 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 7a93de3ba1..9f52ffbf2e 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -34,3 +34,7 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+; Ignore change to reserved opts for new SA option
+[suppress_type]
+       name = rte_security_ipsec_sa_options
diff --git a/lib/security/rte_security.c b/lib/security/rte_security.c
index e102c55e55..c2199dd8db 100644
--- a/lib/security/rte_security.c
+++ b/lib/security/rte_security.c
@@ -27,7 +27,10 @@
 } while (0)
 
 #define RTE_SECURITY_DYNFIELD_NAME "rte_security_dynfield_metadata"
+#define RTE_SECURITY_OOP_DYNFIELD_NAME "rte_security_oop_dynfield_metadata"
+
 int rte_security_dynfield_offset = -1;
+int rte_security_oop_dynfield_offset = -1;
 
 int
 rte_security_dynfield_register(void)
@@ -42,6 +45,20 @@ rte_security_dynfield_register(void)
 	return rte_security_dynfield_offset;
 }
 
+int
+rte_security_oop_dynfield_register(void)
+{
+	static const struct rte_mbuf_dynfield dynfield_desc = {
+		.name = RTE_SECURITY_OOP_DYNFIELD_NAME,
+		.size = sizeof(rte_security_oop_dynfield_t),
+		.align = __alignof__(rte_security_oop_dynfield_t),
+	};
+
+	rte_security_oop_dynfield_offset =
+		rte_mbuf_dynfield_register(&dynfield_desc);
+	return rte_security_oop_dynfield_offset;
+}
+
 void *
 rte_security_session_create(struct rte_security_ctx *instance,
 			    struct rte_security_session_conf *conf,
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 4bacf9fcd9..866cd4e8ee 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -275,6 +275,17 @@ struct rte_security_ipsec_sa_options {
 	 */
 	uint32_t ip_reassembly_en : 1;
 
+	/** Enable out of place processing on inline inbound packets.
+	 *
+	 * * 1: Enable driver to perform Out-of-place(OOP) processing for this inline
+	 *      inbound SA if supported by driver. PMD need to register mbuf
+	 *      dynamic field using rte_security_oop_dynfield_register()
+	 *      and security session creation would fail if dynfield is not
+	 *      registered successfully.
+	 * * 0: Disable OOP processing for this session (default).
+	 */
+	uint32_t ingress_oop : 1;
+
 	/** Reserved bit fields for future extension
 	 *
 	 * User should ensure reserved_opts is cleared as it may change in
@@ -282,7 +293,7 @@ struct rte_security_ipsec_sa_options {
 	 *
 	 * Note: Reduce number of bits in reserved_opts for every new option.
 	 */
-	uint32_t reserved_opts : 17;
+	uint32_t reserved_opts : 16;
 };
 
 /** IPSec security association direction */
@@ -812,6 +823,13 @@ typedef uint64_t rte_security_dynfield_t;
 /** Dynamic mbuf field for device-specific metadata */
 extern int rte_security_dynfield_offset;
 
+/** Out-of-Place(OOP) processing field type */
+typedef struct rte_mbuf *rte_security_oop_dynfield_t;
+/** Dynamic mbuf field for pointer to original mbuf for
+ * OOP processing session.
+ */
+extern int rte_security_oop_dynfield_offset;
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
@@ -834,6 +852,25 @@ rte_security_dynfield(struct rte_mbuf *mbuf)
 		rte_security_dynfield_t *);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Get pointer to mbuf field for original mbuf pointer when
+ * Out-Of-Place(OOP) processing is enabled in security session.
+ *
+ * @param       mbuf    packet to access
+ * @return pointer to mbuf field
+ */
+__rte_experimental
+static inline rte_security_oop_dynfield_t *
+rte_security_oop_dynfield(struct rte_mbuf *mbuf)
+{
+	return RTE_MBUF_DYNFIELD(mbuf,
+			rte_security_oop_dynfield_offset,
+			rte_security_oop_dynfield_t *);
+}
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
diff --git a/lib/security/rte_security_driver.h b/lib/security/rte_security_driver.h
index 421e6f7780..91e7786ab7 100644
--- a/lib/security/rte_security_driver.h
+++ b/lib/security/rte_security_driver.h
@@ -190,6 +190,14 @@ typedef int (*security_macsec_sa_stats_get_t)(void *device, uint16_t sa_id,
 __rte_internal
 int rte_security_dynfield_register(void);
 
+/**
+ * @internal
+ * Register mbuf dynamic field for Security inline ingress Out-of-Place(OOP)
+ * processing.
+ */
+__rte_internal
+int rte_security_oop_dynfield_register(void);
+
 /**
  * Update the mbuf with provided metadata.
  *
diff --git a/lib/security/version.map b/lib/security/version.map
index 07dcce9ffb..59a95f40bd 100644
--- a/lib/security/version.map
+++ b/lib/security/version.map
@@ -23,10 +23,12 @@ EXPERIMENTAL {
 	rte_security_macsec_sc_stats_get;
 	rte_security_session_stats_get;
 	rte_security_session_update;
+	rte_security_oop_dynfield_offset;
 };
 
 INTERNAL {
 	global:
 
 	rte_security_dynfield_register;
+	rte_security_oop_dynfield_register;
 };
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-09  1:43  0%     ` fengchengwen
@ 2023-03-09  7:29  0%       ` Thomas Monjalon
  2023-03-09 11:23  0%         ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-09  7:29 UTC (permalink / raw)
  To: fengchengwen
  Cc: dev, David Marchand, Bruce Richardson, Qi Zhang,
	Morten Brørup, Shijith Thotton, Olivier Matz, Ruifeng Wang,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao,
	Jingjing Wu, Beilei Xing, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

09/03/2023 02:43, fengchengwen:
> On 2023/3/7 0:13, Thomas Monjalon wrote:
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -504,7 +504,7 @@ ABI Changes
> >    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
> >  
> >  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> > -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> > +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
> 
> Should add to release 23.03 rst.

Yes we could add a note in API changes.

> The original 22.11 still have RTE_IOVA_AS_PA definition.

Yes it was not a good idea to rename in the release notes.

> > -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> > -    build = false
> > -    reason = 'driver does not support disabling IOVA as PA mode'
> > +if not get_option('enable_iova_as_pa')
> >      subdir_done()
> >  endif
> 
> Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
> if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
>      subdir_done()
> endif

Why testing the C macro in Meson?
It looks simpler to check the Meson option in Meson.

> Meson build 0.63.0 already support deprecated a option by a new option.
> When update to the new meson verion, the drivers' meson.build will not be modified.

I don't understand this comment.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  2023-03-06 16:13  2%   ` [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf Thomas Monjalon
@ 2023-03-09  1:43  0%     ` fengchengwen
  2023-03-09  7:29  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-03-09  1:43 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: David Marchand, Bruce Richardson, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Kevin Laatz, Pavan Nikhilesh,
	Mattias Rönnblom, Liang Ma, Peter Mccarthy, Jerin Jacob,
	Harry van Haaren, Artem V. Andreev, Andrew Rybchenko,
	Ashwin Sekhar T K, John W. Linville, Ciara Loftus, Chas Williams,
	Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

On 2023/3/7 0:13, Thomas Monjalon wrote:
> The impact of the option "enable_iova_as_pa" is explained for users.
> 
> Also the code flag "RTE_IOVA_AS_PA" is renamed as "RTE_IOVA_IN_MBUF"
> in order to be more accurate (IOVA mode is decided at runtime),
> and more readable in the code.
> 
> Similarly the drivers are using the variable "require_iova_in_mbuf"
> instead of "pmd_supports_disable_iova_as_pa" with an opposite meaning.
> By default, it is assumed that drivers require the IOVA field in mbuf.
> The drivers which support removing this field have to declare themselves.
> 
> If the option "enable_iova_as_pa" is disabled, the unsupported drivers
> will be listed with the new reason text "requires IOVA in mbuf".
> 
> Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---

...

>  compile_time_cpuflags = []
>  subdir(arch_subdir)
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 91414573bd..c67c2823a2 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -504,7 +504,7 @@ ABI Changes
>    ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
>  
>  * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
> -  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
> +  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.

Should add to release 23.03 rst.
The original 22.11 still have RTE_IOVA_AS_PA definition.

...

> diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
> index e1a5afa2ec..743fae9db7 100644
> --- a/drivers/net/hns3/meson.build
> +++ b/drivers/net/hns3/meson.build
> @@ -13,9 +13,7 @@ if arch_subdir != 'x86' and arch_subdir != 'arm' or not dpdk_conf.get('RTE_ARCH_
>      subdir_done()
>  endif
>  
> -if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
> -    build = false
> -    reason = 'driver does not support disabling IOVA as PA mode'
> +if not get_option('enable_iova_as_pa')
>      subdir_done()
>  endif

Suggest keep original, and replace RTE_IOVA_AS_PA with RTE_IOVA_IN_MBUF:
if dpdk_conf.get('RTE_IOVA_IN_MBUF') == 0
     subdir_done()
endif
Meson build 0.63.0 already support deprecated a option by a new option.
When update to the new meson verion, the drivers' meson.build will not be modified.

>  
> diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h
> index e69e23997f..dacb87dcb0 100644

...

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-03-02 13:58  0%           ` Jerin Jacob
@ 2023-03-07  8:26  0%             ` Yan, Zhirun
  0 siblings, 0 replies; 200+ results
From: Yan, Zhirun @ 2023-03-07  8:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, March 2, 2023 9:58 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, February 27, 2023 6:23 AM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > > > ndabilpuram@marvell.com; Liang, Cunming
> > > > > <cunming.liang@intel.com>; Wang, Haiyue <haiyue.wang@intel.com>
> > > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker
> > > > > model APIs
> > > > >
> > > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan
> > > > > <zhirun.yan@intel.com>
> > > wrote:
> > > > > >
> > > > > > Add new get/set APIs to configure graph worker model which is
> > > > > > used to determine which model will be chosen.
> > > > > >
> > > > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > > > ---
> > > > > >  lib/graph/rte_graph_worker.h        | 51
> > > +++++++++++++++++++++++++++++
> > > > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > > > >  lib/graph/version.map               |  3 ++
> > > > > >  3 files changed, 67 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > > 100644
> > > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > > @@ -1,5 +1,56 @@
> > > > > >  #include "rte_graph_model_rtc.h"
> > > > > >
> > > > > > +static enum rte_graph_worker_model worker_model =
> > > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > > >
> > > > > This will break the multiprocess.
> > > >
> > > > Thanks. I will use TLS for per-thread local storage.
> > >
> > > If it needs to be used from secondary process, then it needs to be
> > > from memzone.
> > >
> >
> >
> > This filed will be set by primary process in initial stage, and then lcore will only
> read it.
> > I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It
> > seems not necessary to allocate from memzone.
> >
> > >
> > >
> > > >
> > > > >
> > > > > > +
> > > > > > +/** Graph worker models */
> > > > > > +enum rte_graph_worker_model { #define WORKER_MODEL_DEFAULT
> > > > > > +"default"
> > > > >
> > > > > Why need strings?
> > > > > Also, every symbol in a public header file should start with
> > > > > RTE_ to avoid namespace conflict.
> > > >
> > > > It was used to config the model in app. I can put the string into example.
> > >
> > > OK
> > >
> > > >
> > > > >
> > > > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define
> WORKER_MODEL_RTC
> > > > > > +"rtc"
> > > > > > +       RTE_GRAPH_MODEL_RTC,
> > > > >
> > > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > > enum
> > > > > itself.
> > > > Yes, will do in next version.
> > > >
> > > > >
> > > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > > >
> > > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > > RTE_GRAPH_MODEL_PIPELINE
> > > >
> > > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> > >
> > > Hybrid is very overloaded term, and it will be confusing
> > > (considering there will be new models in future).
> > > Please pick a word that really express the model working.
> > >
> >
> > In this case, the path is Node0 -> Node1 -> Node2 -> Node3 And Node1
> > and Node3 are binding with one core.
> >
> > Our model offers the ability to dispatch between cores.
> >
> > Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?
> 
> Some names, What I can think of
> 
> // MCORE->MULTI CORE
> 
> RTE_GRAPH_MODEL_MCORE_PIPELINE
> or
> RTE_GRAG_MODEL_MCORE_DISPATCH
> or
> RTE_GRAG_MODEL_MCORE_RING
> or
> RTE_GRAPH_MODEL_MULTI_CORE
> 

Thanks, I will use RTE_GRAG_MODEL_MCORE_DISPATCH as the name.

> >
> > + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> > '  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
> > '            '     '                          '     '            '
> > ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> > ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> > '            '     '     |                    '     '      ^     '
> > + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
> >                          |                                 |
> >                          + - - - - - - - - - - - - - - - - +
> >
> >
> > > > >
> > > > >
> > > > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > > > +       RTE_GRAPH_MODEL_MAX,
> > > > >
> > > > > No need for MAX, it will break the ABI for future. See other
> > > > > subsystem such as cryptodev.
> > > >
> > > > Thanks, I will change it.
> > > > >
> > > > > > +};
> > > > >
> > > > > >

^ permalink raw reply	[relevance 0%]

* [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf
  @ 2023-03-06 16:13  2%   ` Thomas Monjalon
  2023-03-09  1:43  0%     ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-03-06 16:13 UTC (permalink / raw)
  To: dev
  Cc: David Marchand, Bruce Richardson, Qi Zhang, Morten Brørup,
	Shijith Thotton, Olivier Matz, Ruifeng Wang, Nithin Dabilpuram,
	Kiran Kumar K, Sunil Kumar Kori, Satha Rao, Jingjing Wu,
	Beilei Xing, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Kai Ji, Pablo de Lara, Radha Mohan Chintakuntla,
	Veerasenareddy Burru, Chengwen Feng, Kevin Laatz,
	Pavan Nikhilesh, Mattias Rönnblom, Liang Ma, Peter Mccarthy,
	Jerin Jacob, Harry van Haaren, Artem V. Andreev,
	Andrew Rybchenko, Ashwin Sekhar T K, John W. Linville,
	Ciara Loftus, Chas Williams, Min Hu (Connor),
	Gaetan Rivet, Dongdong Liu, Yisen Zhuang, Konstantin Ananyev,
	Qiming Yang, Jakub Grajciar, Tetsuya Mukawa, Jakub Palider,
	Tomasz Duszynski, Sachin Saxena, Hemant Agrawal

The impact of the option "enable_iova_as_pa" is explained for users.

Also the code flag "RTE_IOVA_AS_PA" is renamed as "RTE_IOVA_IN_MBUF"
in order to be more accurate (IOVA mode is decided at runtime),
and more readable in the code.

Similarly the drivers are using the variable "require_iova_in_mbuf"
instead of "pmd_supports_disable_iova_as_pa" with an opposite meaning.
By default, it is assumed that drivers require the IOVA field in mbuf.
The drivers which support removing this field have to declare themselves.

If the option "enable_iova_as_pa" is disabled, the unsupported drivers
will be listed with the new reason text "requires IOVA in mbuf".

Suggested-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 app/test/test_mbuf.c                   |  2 +-
 config/arm/meson.build                 |  4 ++--
 config/meson.build                     |  2 +-
 doc/guides/rel_notes/release_22_11.rst |  2 +-
 drivers/common/cnxk/meson.build        |  2 +-
 drivers/common/iavf/meson.build        |  2 +-
 drivers/crypto/armv8/meson.build       |  2 +-
 drivers/crypto/cnxk/meson.build        |  2 +-
 drivers/crypto/ipsec_mb/meson.build    |  2 +-
 drivers/crypto/null/meson.build        |  2 +-
 drivers/crypto/openssl/meson.build     |  2 +-
 drivers/dma/cnxk/meson.build           |  2 +-
 drivers/dma/skeleton/meson.build       |  2 +-
 drivers/event/cnxk/meson.build         |  2 +-
 drivers/event/dsw/meson.build          |  2 +-
 drivers/event/opdl/meson.build         |  2 +-
 drivers/event/skeleton/meson.build     |  2 +-
 drivers/event/sw/meson.build           |  2 +-
 drivers/mempool/bucket/meson.build     |  2 +-
 drivers/mempool/cnxk/meson.build       |  2 +-
 drivers/mempool/ring/meson.build       |  2 +-
 drivers/mempool/stack/meson.build      |  2 +-
 drivers/meson.build                    |  6 +++---
 drivers/net/af_packet/meson.build      |  2 +-
 drivers/net/af_xdp/meson.build         |  2 +-
 drivers/net/bonding/meson.build        |  2 +-
 drivers/net/cnxk/meson.build           |  2 +-
 drivers/net/failsafe/meson.build       |  2 +-
 drivers/net/hns3/meson.build           |  4 +---
 drivers/net/ice/ice_rxtx_common_avx.h  | 12 ++++++------
 drivers/net/ice/ice_rxtx_vec_sse.c     |  4 ++--
 drivers/net/ice/meson.build            |  2 +-
 drivers/net/memif/meson.build          |  2 +-
 drivers/net/null/meson.build           |  2 +-
 drivers/net/pcap/meson.build           |  2 +-
 drivers/net/ring/meson.build           |  2 +-
 drivers/net/tap/meson.build            |  2 +-
 drivers/raw/cnxk_bphy/meson.build      |  2 +-
 drivers/raw/cnxk_gpio/meson.build      |  2 +-
 drivers/raw/skeleton/meson.build       |  2 +-
 lib/eal/linux/eal.c                    |  2 +-
 lib/mbuf/rte_mbuf.c                    |  2 +-
 lib/mbuf/rte_mbuf.h                    |  4 ++--
 lib/mbuf/rte_mbuf_core.h               |  8 ++++----
 lib/mbuf/rte_mbuf_dyn.c                |  2 +-
 lib/meson.build                        |  2 +-
 meson_options.txt                      |  2 +-
 47 files changed, 60 insertions(+), 62 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 6cbb03b0af..81a6632d11 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -1232,7 +1232,7 @@ test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
 		return -1;
 	}
 
-	if (RTE_IOVA_AS_PA) {
+	if (RTE_IOVA_IN_MBUF) {
 		badbuf = *buf;
 		rte_mbuf_iova_set(&badbuf, 0);
 		if (verify_mbuf_check_panics(&badbuf)) {
diff --git a/config/arm/meson.build b/config/arm/meson.build
index 451dbada7d..5ff66248de 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -319,7 +319,7 @@ soc_cn10k = {
         ['RTE_MAX_LCORE', 24],
         ['RTE_MAX_NUMA_NODES', 1],
         ['RTE_MEMPOOL_ALIGN', 128],
-        ['RTE_IOVA_AS_PA', 0]
+        ['RTE_IOVA_IN_MBUF', 0]
     ],
     'part_number': '0xd49',
     'extra_march_features': ['crypto'],
@@ -412,7 +412,7 @@ soc_cn9k = {
     'part_number': '0xb2',
     'numa': false,
     'flags': [
-        ['RTE_IOVA_AS_PA', 0]
+        ['RTE_IOVA_IN_MBUF', 0]
     ]
 }
 
diff --git a/config/meson.build b/config/meson.build
index fc3ac99a32..fa730a1b14 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -316,7 +316,7 @@ endif
 if get_option('mbuf_refcnt_atomic')
     dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
 endif
-dpdk_conf.set10('RTE_IOVA_AS_PA', get_option('enable_iova_as_pa'))
+dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
 
 compile_time_cpuflags = []
 subdir(arch_subdir)
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 91414573bd..c67c2823a2 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -504,7 +504,7 @@ ABI Changes
   ``rte-worker-<lcore_id>`` so that DPDK can accommodate lcores higher than 99.
 
 * mbuf: Replaced ``buf_iova`` field with ``next`` field and added a new field
-  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_AS_PA`` is 0.
+  ``dynfield2`` at its place in second cacheline if ``RTE_IOVA_IN_MBUF`` is 0.
 
 * ethdev: enum ``RTE_FLOW_ITEM`` was affected by deprecation procedure.
 
diff --git a/drivers/common/cnxk/meson.build b/drivers/common/cnxk/meson.build
index 849735921c..ce71f3d70c 100644
--- a/drivers/common/cnxk/meson.build
+++ b/drivers/common/cnxk/meson.build
@@ -87,4 +87,4 @@ sources += files('cnxk_telemetry_bphy.c',
 )
 
 deps += ['bus_pci', 'net', 'telemetry']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/common/iavf/meson.build b/drivers/common/iavf/meson.build
index af8a4983e0..af26955772 100644
--- a/drivers/common/iavf/meson.build
+++ b/drivers/common/iavf/meson.build
@@ -6,4 +6,4 @@ sources = files('iavf_adminq.c', 'iavf_common.c', 'iavf_impl.c')
 if cc.has_argument('-Wno-pointer-to-int-cast')
         cflags += '-Wno-pointer-to-int-cast'
 endif
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/armv8/meson.build b/drivers/crypto/armv8/meson.build
index 700fb80eb2..a735eb511c 100644
--- a/drivers/crypto/armv8/meson.build
+++ b/drivers/crypto/armv8/meson.build
@@ -17,4 +17,4 @@ endif
 ext_deps += dep
 deps += ['bus_vdev']
 sources = files('rte_armv8_pmd.c', 'rte_armv8_pmd_ops.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/cnxk/meson.build b/drivers/crypto/cnxk/meson.build
index a5acabab2b..3d9a0dbbf0 100644
--- a/drivers/crypto/cnxk/meson.build
+++ b/drivers/crypto/cnxk/meson.build
@@ -32,4 +32,4 @@ else
     cflags += [ '-ULA_IPSEC_DEBUG','-UCNXK_CRYPTODEV_DEBUG' ]
 endif
 
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/ipsec_mb/meson.build b/drivers/crypto/ipsec_mb/meson.build
index ec147d2110..3057e6fd10 100644
--- a/drivers/crypto/ipsec_mb/meson.build
+++ b/drivers/crypto/ipsec_mb/meson.build
@@ -41,4 +41,4 @@ sources = files(
         'pmd_zuc.c',
 )
 deps += ['bus_vdev', 'net', 'security']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/null/meson.build b/drivers/crypto/null/meson.build
index 59a7508f18..2e8b05ad28 100644
--- a/drivers/crypto/null/meson.build
+++ b/drivers/crypto/null/meson.build
@@ -9,4 +9,4 @@ endif
 
 deps += 'bus_vdev'
 sources = files('null_crypto_pmd.c', 'null_crypto_pmd_ops.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/crypto/openssl/meson.build b/drivers/crypto/openssl/meson.build
index d165c32ae8..1ec63c216d 100644
--- a/drivers/crypto/openssl/meson.build
+++ b/drivers/crypto/openssl/meson.build
@@ -15,4 +15,4 @@ endif
 deps += 'bus_vdev'
 sources = files('rte_openssl_pmd.c', 'rte_openssl_pmd_ops.c')
 ext_deps += dep
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/dma/cnxk/meson.build b/drivers/dma/cnxk/meson.build
index 252e5ff78b..b868fb14cb 100644
--- a/drivers/dma/cnxk/meson.build
+++ b/drivers/dma/cnxk/meson.build
@@ -3,4 +3,4 @@
 
 deps += ['bus_pci', 'common_cnxk', 'dmadev']
 sources = files('cnxk_dmadev.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/dma/skeleton/meson.build b/drivers/dma/skeleton/meson.build
index 2b0422ce61..77055683ad 100644
--- a/drivers/dma/skeleton/meson.build
+++ b/drivers/dma/skeleton/meson.build
@@ -5,4 +5,4 @@ deps += ['dmadev', 'kvargs', 'ring', 'bus_vdev']
 sources = files(
         'skeleton_dmadev.c',
 )
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/cnxk/meson.build b/drivers/event/cnxk/meson.build
index aa42ab3a90..3517e79341 100644
--- a/drivers/event/cnxk/meson.build
+++ b/drivers/event/cnxk/meson.build
@@ -479,4 +479,4 @@ foreach flag: extra_flags
 endforeach
 
 deps += ['bus_pci', 'common_cnxk', 'net_cnxk', 'crypto_cnxk']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/dsw/meson.build b/drivers/event/dsw/meson.build
index e6808c0f71..01af94165f 100644
--- a/drivers/event/dsw/meson.build
+++ b/drivers/event/dsw/meson.build
@@ -6,4 +6,4 @@ if cc.has_argument('-Wno-format-nonliteral')
     cflags += '-Wno-format-nonliteral'
 endif
 sources = files('dsw_evdev.c', 'dsw_event.c', 'dsw_xstats.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/opdl/meson.build b/drivers/event/opdl/meson.build
index 7abef44609..8613b2a746 100644
--- a/drivers/event/opdl/meson.build
+++ b/drivers/event/opdl/meson.build
@@ -9,4 +9,4 @@ sources = files(
         'opdl_test.c',
 )
 deps += ['bus_vdev']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/skeleton/meson.build b/drivers/event/skeleton/meson.build
index fa6a5e0a9f..6e788cfcee 100644
--- a/drivers/event/skeleton/meson.build
+++ b/drivers/event/skeleton/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('skeleton_eventdev.c')
 deps += ['bus_pci', 'bus_vdev']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/event/sw/meson.build b/drivers/event/sw/meson.build
index 8d815dfa84..3a3ebd72a3 100644
--- a/drivers/event/sw/meson.build
+++ b/drivers/event/sw/meson.build
@@ -9,4 +9,4 @@ sources = files(
         'sw_evdev.c',
 )
 deps += ['hash', 'bus_vdev']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/mempool/bucket/meson.build b/drivers/mempool/bucket/meson.build
index 94c060904b..d0ec523237 100644
--- a/drivers/mempool/bucket/meson.build
+++ b/drivers/mempool/bucket/meson.build
@@ -12,4 +12,4 @@ if is_windows
 endif
 
 sources = files('rte_mempool_bucket.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/mempool/cnxk/meson.build b/drivers/mempool/cnxk/meson.build
index d8bcc41ca0..50856ecde8 100644
--- a/drivers/mempool/cnxk/meson.build
+++ b/drivers/mempool/cnxk/meson.build
@@ -17,4 +17,4 @@ sources = files(
 )
 
 deps += ['eal', 'mbuf', 'kvargs', 'bus_pci', 'common_cnxk', 'mempool']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/mempool/ring/meson.build b/drivers/mempool/ring/meson.build
index 65d203d4b7..a25e9ebc16 100644
--- a/drivers/mempool/ring/meson.build
+++ b/drivers/mempool/ring/meson.build
@@ -2,4 +2,4 @@
 # Copyright(c) 2017 Intel Corporation
 
 sources = files('rte_mempool_ring.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/mempool/stack/meson.build b/drivers/mempool/stack/meson.build
index 961e90fc04..95f69042ae 100644
--- a/drivers/mempool/stack/meson.build
+++ b/drivers/mempool/stack/meson.build
@@ -4,4 +4,4 @@
 sources = files('rte_mempool_stack.c')
 
 deps += ['stack']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/meson.build b/drivers/meson.build
index 0618c31a69..2aefa146a7 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -109,7 +109,7 @@ foreach subpath:subdirs
         ext_deps = []
         pkgconfig_extra_libs = []
         testpmd_sources = []
-        pmd_supports_disable_iova_as_pa = false
+        require_iova_in_mbuf = true
 
         if not enable_drivers.contains(drv_path)
             build = false
@@ -127,9 +127,9 @@ foreach subpath:subdirs
             # pull in driver directory which should update all the local variables
             subdir(drv_path)
 
-            if dpdk_conf.get('RTE_IOVA_AS_PA') == 0 and not pmd_supports_disable_iova_as_pa and not always_enable.contains(drv_path)
+            if not get_option('enable_iova_as_pa') and require_iova_in_mbuf and not always_enable.contains(drv_path)
                 build = false
-                reason = 'driver does not support disabling IOVA as PA mode'
+                reason = 'requires IOVA in mbuf'
             endif
 
             # get dependency objs from strings
diff --git a/drivers/net/af_packet/meson.build b/drivers/net/af_packet/meson.build
index bab008d083..f45e4491d4 100644
--- a/drivers/net/af_packet/meson.build
+++ b/drivers/net/af_packet/meson.build
@@ -6,4 +6,4 @@ if not is_linux
     reason = 'only supported on Linux'
 endif
 sources = files('rte_eth_af_packet.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
index 979b914bb6..9a8dbb4d49 100644
--- a/drivers/net/af_xdp/meson.build
+++ b/drivers/net/af_xdp/meson.build
@@ -71,4 +71,4 @@ if build
   endif
 endif
 
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/bonding/meson.build b/drivers/net/bonding/meson.build
index 29022712cb..83326c0d63 100644
--- a/drivers/net/bonding/meson.build
+++ b/drivers/net/bonding/meson.build
@@ -22,4 +22,4 @@ deps += 'sched' # needed for rte_bitmap.h
 deps += ['ip_frag']
 
 headers = files('rte_eth_bond.h', 'rte_eth_bond_8023ad.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build
index c7ca24d437..c1da121a15 100644
--- a/drivers/net/cnxk/meson.build
+++ b/drivers/net/cnxk/meson.build
@@ -195,4 +195,4 @@ foreach flag: extra_flags
 endforeach
 
 headers = files('rte_pmd_cnxk.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/failsafe/meson.build b/drivers/net/failsafe/meson.build
index bf8f791984..513de17535 100644
--- a/drivers/net/failsafe/meson.build
+++ b/drivers/net/failsafe/meson.build
@@ -27,4 +27,4 @@ sources = files(
         'failsafe_ops.c',
         'failsafe_rxtx.c',
 )
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
index e1a5afa2ec..743fae9db7 100644
--- a/drivers/net/hns3/meson.build
+++ b/drivers/net/hns3/meson.build
@@ -13,9 +13,7 @@ if arch_subdir != 'x86' and arch_subdir != 'arm' or not dpdk_conf.get('RTE_ARCH_
     subdir_done()
 endif
 
-if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
-    build = false
-    reason = 'driver does not support disabling IOVA as PA mode'
+if not get_option('enable_iova_as_pa')
     subdir_done()
 endif
 
diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h
index e69e23997f..dacb87dcb0 100644
--- a/drivers/net/ice/ice_rxtx_common_avx.h
+++ b/drivers/net/ice/ice_rxtx_common_avx.h
@@ -54,7 +54,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 		mb0 = rxep[0].mbuf;
 		mb1 = rxep[1].mbuf;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 		/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
 		RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
 				offsetof(struct rte_mbuf, buf_addr) + 8);
@@ -62,7 +62,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 		vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
 		vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 		/* convert pa to dma_addr hdr/data */
 		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
 		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
@@ -105,7 +105,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 			mb6 = rxep[6].mbuf;
 			mb7 = rxep[7].mbuf;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 			/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
 			RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
 					offsetof(struct rte_mbuf, buf_addr) + 8);
@@ -142,7 +142,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 				_mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5),
 						   vaddr6_7, 1);
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 			/* convert pa to dma_addr hdr/data */
 			dma_addr0_3 = _mm512_unpackhi_epi64(vaddr0_3, vaddr0_3);
 			dma_addr4_7 = _mm512_unpackhi_epi64(vaddr4_7, vaddr4_7);
@@ -177,7 +177,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 			mb2 = rxep[2].mbuf;
 			mb3 = rxep[3].mbuf;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 			/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
 			RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
 					offsetof(struct rte_mbuf, buf_addr) + 8);
@@ -198,7 +198,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512)
 				_mm256_inserti128_si256(_mm256_castsi128_si256(vaddr2),
 							vaddr3, 1);
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 			/* convert pa to dma_addr hdr/data */
 			dma_addr0_1 = _mm256_unpackhi_epi64(vaddr0_1, vaddr0_1);
 			dma_addr2_3 = _mm256_unpackhi_epi64(vaddr2_3, vaddr2_3);
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index 72dfd58308..71fdd6ffb5 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -68,7 +68,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq)
 		mb0 = rxep[0].mbuf;
 		mb1 = rxep[1].mbuf;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 		/* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */
 		RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) !=
 				 offsetof(struct rte_mbuf, buf_addr) + 8);
@@ -76,7 +76,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq)
 		vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr);
 		vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr);
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 		/* convert pa to dma_addr hdr/data */
 		dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0);
 		dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1);
diff --git a/drivers/net/ice/meson.build b/drivers/net/ice/meson.build
index 123b190f72..5e90afcb9b 100644
--- a/drivers/net/ice/meson.build
+++ b/drivers/net/ice/meson.build
@@ -78,4 +78,4 @@ sources += files(
         'ice_dcf_parent.c',
         'ice_dcf_sched.c',
 )
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
index 28416a982f..b890984b46 100644
--- a/drivers/net/memif/meson.build
+++ b/drivers/net/memif/meson.build
@@ -12,4 +12,4 @@ sources = files(
 )
 
 deps += ['hash']
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/null/meson.build b/drivers/net/null/meson.build
index 4a483955a7..076b9937c1 100644
--- a/drivers/net/null/meson.build
+++ b/drivers/net/null/meson.build
@@ -8,4 +8,4 @@ if is_windows
 endif
 
 sources = files('rte_eth_null.c')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/pcap/meson.build b/drivers/net/pcap/meson.build
index a5a2971f0e..de2a70ef0b 100644
--- a/drivers/net/pcap/meson.build
+++ b/drivers/net/pcap/meson.build
@@ -15,4 +15,4 @@ ext_deps += pcap_dep
 if is_windows
     ext_deps += cc.find_library('iphlpapi', required: true)
 endif
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/ring/meson.build b/drivers/net/ring/meson.build
index 72792e26b0..2cd0e97e56 100644
--- a/drivers/net/ring/meson.build
+++ b/drivers/net/ring/meson.build
@@ -9,4 +9,4 @@ endif
 
 sources = files('rte_eth_ring.c')
 headers = files('rte_eth_ring.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/net/tap/meson.build b/drivers/net/tap/meson.build
index 4c9a9eac2b..b07ce68e48 100644
--- a/drivers/net/tap/meson.build
+++ b/drivers/net/tap/meson.build
@@ -35,4 +35,4 @@ foreach arg:args
     config.set(arg[0], cc.has_header_symbol(arg[1], arg[2]))
 endforeach
 configure_file(output : 'tap_autoconf.h', configuration : config)
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/raw/cnxk_bphy/meson.build b/drivers/raw/cnxk_bphy/meson.build
index ffb0ee6b7e..bb5d2ffb80 100644
--- a/drivers/raw/cnxk_bphy/meson.build
+++ b/drivers/raw/cnxk_bphy/meson.build
@@ -10,4 +10,4 @@ sources = files(
         'cnxk_bphy_irq.c',
 )
 headers = files('rte_pmd_bphy.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/raw/cnxk_gpio/meson.build b/drivers/raw/cnxk_gpio/meson.build
index f52a7be9eb..9d9a527392 100644
--- a/drivers/raw/cnxk_gpio/meson.build
+++ b/drivers/raw/cnxk_gpio/meson.build
@@ -9,4 +9,4 @@ sources = files(
         'cnxk_gpio_selftest.c',
 )
 headers = files('rte_pmd_cnxk_gpio.h')
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/drivers/raw/skeleton/meson.build b/drivers/raw/skeleton/meson.build
index bfb8fd8bcc..9d5fcf6514 100644
--- a/drivers/raw/skeleton/meson.build
+++ b/drivers/raw/skeleton/meson.build
@@ -6,4 +6,4 @@ sources = files(
         'skeleton_rawdev.c',
         'skeleton_rawdev_test.c',
 )
-pmd_supports_disable_iova_as_pa = true
+require_iova_in_mbuf = false
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index fabafbc39b..e39b6643ee 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -1134,7 +1134,7 @@ rte_eal_init(int argc, char **argv)
 		return -1;
 	}
 
-	if (rte_eal_iova_mode() == RTE_IOVA_PA && !RTE_IOVA_AS_PA) {
+	if (rte_eal_iova_mode() == RTE_IOVA_PA && !RTE_IOVA_IN_MBUF) {
 		rte_eal_init_alert("Cannot use IOVA as 'PA' as it is disabled during build");
 		rte_errno = EINVAL;
 		return -1;
diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index cfd8062f1e..686e797c80 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -388,7 +388,7 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
 		*reason = "bad mbuf pool";
 		return -1;
 	}
-	if (RTE_IOVA_AS_PA && rte_mbuf_iova_get(m) == 0) {
+	if (RTE_IOVA_IN_MBUF && rte_mbuf_iova_get(m) == 0) {
 		*reason = "bad IO addr";
 		return -1;
 	}
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 3a82eb136d..bc41eac10d 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -146,7 +146,7 @@ static inline uint16_t rte_pktmbuf_priv_size(struct rte_mempool *mp);
 static inline rte_iova_t
 rte_mbuf_iova_get(const struct rte_mbuf *m)
 {
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 	return m->buf_iova;
 #else
 	return (rte_iova_t)m->buf_addr;
@@ -164,7 +164,7 @@ rte_mbuf_iova_get(const struct rte_mbuf *m)
 static inline void
 rte_mbuf_iova_set(struct rte_mbuf *m, rte_iova_t iova)
 {
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 	m->buf_iova = iova;
 #else
 	RTE_SET_USED(m);
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index a30e1e0eaf..dfffb6e5e6 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -466,11 +466,11 @@ struct rte_mbuf {
 	RTE_MARKER cacheline0;
 
 	void *buf_addr;           /**< Virtual address of segment buffer. */
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 	/**
 	 * Physical address of segment buffer.
 	 * This field is undefined if the build is configured to use only
-	 * virtual address as IOVA (i.e. RTE_IOVA_AS_PA is 0).
+	 * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
 	 * Force alignment to 8-bytes, so as to ensure we have the exact
 	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
 	 * working on vector drivers easier.
@@ -599,7 +599,7 @@ struct rte_mbuf {
 	/* second cache line - fields only used in slow path or on TX */
 	RTE_MARKER cacheline1 __rte_cache_min_aligned;
 
-#if RTE_IOVA_AS_PA
+#if RTE_IOVA_IN_MBUF
 	/**
 	 * Next segment of scattered packet. Must be NULL in the last
 	 * segment or in case of non-segmented packet.
@@ -608,7 +608,7 @@ struct rte_mbuf {
 #else
 	/**
 	 * Reserved for dynamic fields
-	 * when the next pointer is in first cache line (i.e. RTE_IOVA_AS_PA is 0).
+	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
 	 */
 	uint64_t dynfield2;
 #endif
diff --git a/lib/mbuf/rte_mbuf_dyn.c b/lib/mbuf/rte_mbuf_dyn.c
index 35839e938c..5049508bea 100644
--- a/lib/mbuf/rte_mbuf_dyn.c
+++ b/lib/mbuf/rte_mbuf_dyn.c
@@ -128,7 +128,7 @@ init_shared_mem(void)
 		 */
 		memset(shm, 0, sizeof(*shm));
 		mark_free(dynfield1);
-#if !RTE_IOVA_AS_PA
+#if !RTE_IOVA_IN_MBUF
 		mark_free(dynfield2);
 #endif
 
diff --git a/lib/meson.build b/lib/meson.build
index 2bc0932ad5..fc7abd4aa3 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -93,7 +93,7 @@ dpdk_libs_deprecated += [
 disabled_libs = []
 opt_disabled_libs = run_command(list_dir_globs, get_option('disable_libs'),
         check: true).stdout().split()
-if dpdk_conf.get('RTE_IOVA_AS_PA') == 0
+if not get_option('enable_iova_as_pa')
     opt_disabled_libs += ['kni']
 endif
 foreach l:opt_disabled_libs
diff --git a/meson_options.txt b/meson_options.txt
index 08528492f7..82c8297065 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -41,7 +41,7 @@ option('max_lcores', type: 'string', value: 'default', description:
 option('max_numa_nodes', type: 'string', value: 'default', description:
        'Set the highest NUMA node supported by EAL; "default" is different per-arch, "detect" detects the highest NUMA node on the build machine.')
 option('enable_iova_as_pa', type: 'boolean', value: true, description:
-       'Support for IOVA as physical address. Disabling removes the buf_iova field of mbuf.')
+       'Support the use of physical addresses for IO addresses, such as used by UIO or VFIO in no-IOMMU mode. When disabled, DPDK can only run with IOMMU support for address mappings, but will have more space available in the mbuf structure.')
 option('mbuf_refcnt_atomic', type: 'boolean', value: true, description:
        'Atomically access the mbuf refcnt.')
 option('platform', type: 'string', value: 'native', description:
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-03-02  8:38  0%         ` Yan, Zhirun
@ 2023-03-02 13:58  0%           ` Jerin Jacob
  2023-03-07  8:26  0%             ` Yan, Zhirun
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-03-02 13:58 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue

On Thu, Mar 2, 2023 at 2:09 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, February 27, 2023 6:23 AM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > Wang, Haiyue <haiyue.wang@intel.com>
> > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> >
> > On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, February 20, 2023 9:51 PM
> > > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > > Wang, Haiyue <haiyue.wang@intel.com>
> > > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > > APIs
> > > >
> > > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com>
> > wrote:
> > > > >
> > > > > Add new get/set APIs to configure graph worker model which is used
> > > > > to determine which model will be chosen.
> > > > >
> > > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > > ---
> > > > >  lib/graph/rte_graph_worker.h        | 51
> > +++++++++++++++++++++++++++++
> > > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > > >  lib/graph/version.map               |  3 ++
> > > > >  3 files changed, 67 insertions(+)
> > > > >
> > > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> > 100644
> > > > > --- a/lib/graph/rte_graph_worker.h
> > > > > +++ b/lib/graph/rte_graph_worker.h
> > > > > @@ -1,5 +1,56 @@
> > > > >  #include "rte_graph_model_rtc.h"
> > > > >
> > > > > +static enum rte_graph_worker_model worker_model =
> > > > > +RTE_GRAPH_MODEL_DEFAULT;
> > > >
> > > > This will break the multiprocess.
> > >
> > > Thanks. I will use TLS for per-thread local storage.
> >
> > If it needs to be used from secondary process, then it needs to be from
> > memzone.
> >
>
>
> This filed will be set by primary process in initial stage, and then lcore will only read it.
> I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It seems
> not necessary to allocate from memzone.
>
> >
> >
> > >
> > > >
> > > > > +
> > > > > +/** Graph worker models */
> > > > > +enum rte_graph_worker_model {
> > > > > +#define WORKER_MODEL_DEFAULT "default"
> > > >
> > > > Why need strings?
> > > > Also, every symbol in a public header file should start with RTE_ to
> > > > avoid namespace conflict.
> > >
> > > It was used to config the model in app. I can put the string into example.
> >
> > OK
> >
> > >
> > > >
> > > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define WORKER_MODEL_RTC
> > > > > +"rtc"
> > > > > +       RTE_GRAPH_MODEL_RTC,
> > > >
> > > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> > enum
> > > > itself.
> > > Yes, will do in next version.
> > >
> > > >
> > > > > +#define WORKER_MODEL_GENERIC "generic"
> > > >
> > > > Generic is a very overloaded term. Use pipeline here i.e
> > > > RTE_GRAPH_MODEL_PIPELINE
> > >
> > > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> >
> > Hybrid is very overloaded term, and it will be confusing (considering there
> > will be new models in future).
> > Please pick a word that really express the model working.
> >
>
> In this case, the path is Node0 -> Node1 -> Node2 -> Node3
> And Node1 and Node3 are binding with one core.
>
> Our model offers the ability to dispatch between cores.
>
> Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?

Some names, What I can think of

// MCORE->MULTI CORE

RTE_GRAPH_MODEL_MCORE_PIPELINE
or
RTE_GRAG_MODEL_MCORE_DISPATCH
or
RTE_GRAG_MODEL_MCORE_RING
or
RTE_GRAPH_MODEL_MULTI_CORE

>
> + - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
> '  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
> '            '     '                          '     '            '
> ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> ' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
> ' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
> '            '     '     |                    '     '      ^     '
> + - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
>                          |                                 |
>                          + - - - - - - - - - - - - - - - - +
>
>
> > > >
> > > >
> > > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > > +       RTE_GRAPH_MODEL_MAX,
> > > >
> > > > No need for MAX, it will break the ABI for future. See other
> > > > subsystem such as cryptodev.
> > >
> > > Thanks, I will change it.
> > > >
> > > > > +};
> > > >
> > > > >

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-26 22:23  0%       ` Jerin Jacob
@ 2023-03-02  8:38  0%         ` Yan, Zhirun
  2023-03-02 13:58  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Yan, Zhirun @ 2023-03-02  8:38 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 27, 2023 6:23 AM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> Wang, Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, February 20, 2023 9:51 PM
> > > To: Yan, Zhirun <zhirun.yan@intel.com>
> > > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>;
> > > Wang, Haiyue <haiyue.wang@intel.com>
> > > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model
> > > APIs
> > >
> > > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com>
> wrote:
> > > >
> > > > Add new get/set APIs to configure graph worker model which is used
> > > > to determine which model will be chosen.
> > > >
> > > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > > ---
> > > >  lib/graph/rte_graph_worker.h        | 51
> +++++++++++++++++++++++++++++
> > > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > > >  lib/graph/version.map               |  3 ++
> > > >  3 files changed, 67 insertions(+)
> > > >
> > > > diff --git a/lib/graph/rte_graph_worker.h
> > > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153
> 100644
> > > > --- a/lib/graph/rte_graph_worker.h
> > > > +++ b/lib/graph/rte_graph_worker.h
> > > > @@ -1,5 +1,56 @@
> > > >  #include "rte_graph_model_rtc.h"
> > > >
> > > > +static enum rte_graph_worker_model worker_model =
> > > > +RTE_GRAPH_MODEL_DEFAULT;
> > >
> > > This will break the multiprocess.
> >
> > Thanks. I will use TLS for per-thread local storage.
> 
> If it needs to be used from secondary process, then it needs to be from
> memzone.
> 


This filed will be set by primary process in initial stage, and then lcore will only read it.
I want to use RTE_DEFINE_PER_LCORE to define the worker model here. It seems
not necessary to allocate from memzone.

> 
> 
> >
> > >
> > > > +
> > > > +/** Graph worker models */
> > > > +enum rte_graph_worker_model {
> > > > +#define WORKER_MODEL_DEFAULT "default"
> > >
> > > Why need strings?
> > > Also, every symbol in a public header file should start with RTE_ to
> > > avoid namespace conflict.
> >
> > It was used to config the model in app. I can put the string into example.
> 
> OK
> 
> >
> > >
> > > > +       RTE_GRAPH_MODEL_DEFAULT = 0, #define WORKER_MODEL_RTC
> > > > +"rtc"
> > > > +       RTE_GRAPH_MODEL_RTC,
> > >
> > > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in
> enum
> > > itself.
> > Yes, will do in next version.
> >
> > >
> > > > +#define WORKER_MODEL_GENERIC "generic"
> > >
> > > Generic is a very overloaded term. Use pipeline here i.e
> > > RTE_GRAPH_MODEL_PIPELINE
> >
> > Actually, it's not a purely pipeline mode. I prefer to change to hybrid.
> 
> Hybrid is very overloaded term, and it will be confusing (considering there
> will be new models in future).
> Please pick a word that really express the model working.
> 

In this case, the path is Node0 -> Node1 -> Node2 -> Node3
And Node1 and Node3 are binding with one core.

Our model offers the ability to dispatch between cores.

Do you think RTE_GRAPH_MODEL_DISPATCH is a good name?

+ - - - - - -+     +- - - - - - - - - - - - - +     + - - - - - -+
'  Core #0   '     '  Core #1       Core #1   '     '  Core #2   '
'            '     '                          '     '            '
' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
' | Node-0 | - - - ->| Node-1 |    | Node-3 |<- - - - | Node-2 | '
' +--------+ '     ' +--------+    +--------+ '     ' +--------+ '
'            '     '     |                    '     '      ^     '
+ - - - - - -+     +- - -|- - - - - - - - - - +     + - - -|- - -+
                         |                                 |
                         + - - - - - - - - - - - - - - - - +


> > >
> > >
> > > > +       RTE_GRAPH_MODEL_GENERIC,
> > > > +       RTE_GRAPH_MODEL_MAX,
> > >
> > > No need for MAX, it will break the ABI for future. See other
> > > subsystem such as cryptodev.
> >
> > Thanks, I will change it.
> > >
> > > > +};
> > >
> > > >

^ permalink raw reply	[relevance 0%]

* RE: [RFC 0/2] Add high-performance timer facility
  2023-03-01 15:50  3%       ` Mattias Rönnblom
@ 2023-03-01 17:06  0%         ` Morten Brørup
  0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2023-03-01 17:06 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Erik Gabriel Carrillo, David Marchand, Maria Lingemark, Stefan Sundkvist

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Wednesday, 1 March 2023 16.50
> 
> On 2023-03-01 14:31, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Wednesday, 1 March 2023 12.18
> >>
> >> On 2023-02-28 17:01, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >>>> Sent: Tuesday, 28 February 2023 10.39
> >>>
> >>> I have been looking for a high performance timer library (for use in
> a fast
> >> path TCP stack), and this looks very useful, Mattias.
> >>>
> >>> My initial feedback is based on quickly skimming the patch source
> code, and
> >> reading this cover letter.
> >>>
> >>>>
> >>>> This patchset is an attempt to introduce a high-performance, highly
> >>>> scalable timer facility into DPDK.
> >>>>
> >>>> More specifically, the goals for the htimer library are:
> >>>>
> >>>> * Efficient handling of a handful up to hundreds of thousands of
> >>>>     concurrent timers.
> >>>> * Reduced overhead of adding and canceling timers.
> >>>> * Provide a service functionally equivalent to that of
> >>>>     <rte_timer.h>. API/ABI backward compatibility is secondary.
> >>>>
> >>>> In the author's opinion, there are two main shortcomings with the
> >>>> current DPDK timer library (i.e., rte_timer.[ch]).
> >>>>
> >>>> One is the synchronization overhead, where heavy-weight full-
> barrier
> >>>> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
> >>>> lists, but any thread may add or cancel (or otherwise access)
> timers
> >>>> managed by another lcore (and thus resides in its timer skip list).
> >>>>
> >>>> The other is an algorithmic shortcoming, with rte_timer.c's
> reliance
> >>>> on a skip list, which, seemingly, is less efficient than certain
> >>>> alternatives.
> >>>>
> >>>> This patchset implements a hierarchical timer wheel (HWT, in
> >>>
> >>> Typo: HWT or HTW?
> >>
> >> Yes. I don't understand how I could managed to make so many such HTW
> ->
> >> HWT typos. At least I got the filenames (rte_htw.[ch]) correct.
> >>
> >>>
> >>>> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
> >>>> Hierarchical Timing Wheels: Data Structures for the Efficient
> >>>> Implementation of a Timer Facility". A HWT is a data structure
> >>>> purposely design for this task, and used by many operating system
> >>>> kernel timer facilities.
> >>>>
> >>>> To further improve the solution described by Varghese and Lauck, a
> >>>> bitset is placed in front of each of the timer wheel in the HWT,
> >>>> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing
> time
> >>>> and expiry processing).
> >>>>
> >>>> Cycle-efficient scanning and manipulation of these bitsets are
> crucial
> >>>> for the HWT's performance.
> >>>>
> >>>> The htimer module keeps a per-lcore (or per-registered EAL thread)
> HWT
> >>>> instance, much like rte_timer.c keeps a per-lcore skip list.
> >>>>
> >>>> To avoid expensive synchronization overhead for thread-local timer
> >>>> management, the HWTs are accessed only from the "owning" thread.
> Any
> >>>> interaction any other thread has with a particular lcore's timer
> >>>> wheel goes over a set of DPDK rings. A side-effect of this design
> is
> >>>> that all operations working toward a "remote" HWT must be
> >>>> asynchronous.
> >>>>
> >>>> The <rte_htimer.h> API is available only to EAL threads and
> registered
> >>>> non-EAL threads.
> >>>>
> >>>> The htimer API allows the application to supply the current time,
> >>>> useful in case it already has retrieved this for other purposes,
> >>>> saving the cost of a rdtsc instruction (or its equivalent).
> >>>>
> >>>> Relative htimer does not retrieve a new time, but reuse the current
> >>>> time (as known via/at-the-time of the manage-call), again to shave
> off
> >>>> some cycles of overhead.
> >>>
> >>> I have a comment to the two points above.
> >>>
> >>> I agree that the application should supply the current time.
> >>>
> >>> This should be the concept throughout the library. I don't
> understand why
> >> TSC is used in the library at all?
> >>>
> >>> Please use a unit-less tick, and let the application decide what one
> tick
> >> means.
> >>>
> >>
> >> I suspect the design of rte_htimer_mgr.h (and rte_timer.h) makes more
> >> sense if you think of the user of the API as not just a "monolithic"
> >> application, but rather a set of different modules, developed by
> >> different organizations, and reused across a set of applications. The
> >> idea behind the API design is they should all be able to share one
> timer
> >> service instance.
> >>
> >> The different parts of the application and any future DPDK platform
> >> modules that use the htimer service needs to agree what a tick means
> in
> >> terms of actual wall-time, if it's not mandated by the API.
> >
> > I see. Then those non-monolithic applications can agree that the unit
> of time is nanoseconds, or whatever makes sense for those applications.
> And then they can instantiate one shared HTW for that purpose.
> >
> 
> <rte_htimer_mgr.h> contains nothing but shared HTWs.
> 
> > There is no need to impose such an API limit on other users of the
> library.
> >
> >>
> >> There might be room for module-specific timer wheels as well, with
> >> different resolution or other characteristics. The event timer
> adapter's
> >> use of a timer wheel could be one example (although I'm not sure it
> is).
> >
> > We are not using the event device, and I have not looked into it, so I
> have no qualified comments to this.
> >
> >>
> >> If timer-wheel-as-a-private-lego-piece is also a valid use case, then
> >> one could consider make the <rte_htw.h> API public as well. That is
> what
> >> I think you as asking for here: a generic timer wheel that doesn't
> know
> >> anything about time sources, time source time -> tick conversion, or
> >> timer source time -> monotonic wall time conversion, and maybe is
> also
> >> not bound to a particular thread.
> >
> > Yes, that is what I had been searching the Internet for.
> >
> > (I'm not sure what you mean by "not bound to a particular thread".
> Your per-thread design seems good to me.)
> >
> > I don't want more stuff in the EAL. What I want is high-performance
> DPDK libraries we can use in our applications.
> >
> >>
> >> I picked TSC because it seemed like a good "universal time unit" for
> >> DPDK. rdtsc (and its equivalent) is also a very precise (especially
> on
> >> x86) and cheap-to-retrieve (especially on ARM, from what I
> understand).
> >
> > The TSC does have excellent performance, but on all other parameters
> it is a horrible time keeper: The measurement unit depends on the
> underlying hardware, the TSC drifts depending on temperature, it cannot
> be PTP synchronized, the list is endless!
> >
> >>
> >> That said, at the moment, I'm leaning toward nanoseconds (uint64_t
> >> format) should be the default for timer expiration time instead of
> TSC.
> >> TSC could still be an option for passing the current time, since TSC
> >> will be a common time source, and it shaves off one conversion.
> >
> > There are many reasons why nanoseconds is a much better choice than
> TSC.
> >
> >>
> >>> A unit-less tick will also let the application instantiate a HTW
> with higher
> >> resolution than the TSC. (E.g. think about oversampling in audio
> processing,
> >> or Brezenham's line drawing algorithm for 2D visuals - oversampling
> can sound
> >> and look better.)
> >
> > Some of the timing data in our application have a resolution orders of
> magnitude higher than one nanosecond. If we combined that with a HTW
> library with nanosecond resolution, we would need to keep these timer
> values in two locations: The original high-res timer in our data
> structure, and the shadow low-res (nanosecond) timer in the HTW.
> >
> 
> There is no way you will meet timers with anything approaching
> pico-second-level precision.

Correct. Our sub-nanosecond timers don't need to meet the exact time, but the higher resolution prevents loss of accuracy when a number has been added to it many times. Think of it like a special fixed-point number, where the least significant part is included to ensure accuracy in calculations, while the actual timer only considers the most significant part of the number.

> You will also get into a value range issue,
> since you will wrap around a 64-bit integer in a matter of days.

Yes. We use timers with different scales for individual purposes. Our highest resolution are sub-nanosecond.

> 
> The HTW only stores the timeout in ticks, not TSC, nanoseconds or
> picoseconds.

Excellent. Then I'm happy.

> Generally, you don't want pico-second-level tick
> granularity, since it increases the overhead of advancing the wheel(s).

We currently use proprietary algorithms for our bandwidth scheduling. It seems that a HTW is not a good fit for this purpose. Perhaps you are offering a hammer, and it's not a good replacement for my screwdriver.

I suppose that nanosecond resolution suffices for a TCP stack, which is the use case I have been on the lookout for a timer library for. :-)

> The first (lowest-significance) few wheels will pretty much always be
> empty.
> 
> > We might also need to frequently update the HTW timers to prevent
> drifting away from the high-res timers. E.g. 1.2 + 1.2 is still 2 when
> rounded, but + 1.2 becomes 3 when it should have been 4 (3 * 1.2 = 3.6)
> rounded. This level of drifting would also make periodic timers in the
> HTW useless.
> >
> 
> Useless, for a certain class of applications. What application would
> that be?

Sorry about being unclear there. Yes, I only meant the specific application I was talking about, i.e. our application for high precision bandwidth management. For reference, 1 bit at 100 Gbit/s is 10 picoseconds.

> 
> > Please note: I haven't really considered merging the high-res timing
> in our application with this HTW, and I'm also not saying that PERIODIC
> timers in the HTW are required or even useful for our application. I'm
> only providing arguments for a unit-less time!
> >
> >>>
> >>> For reference (supporting my suggestion), the dynamic timestamp
> field in the
> >> rte_mbuf structure is also defined as being unit-less. (I think
> NVIDIA
> >> implements it as nanoseconds, but that's an implementation specific
> choice.)
> >>>
> >>>>
> >>>> A semantic improvement compared to the <rte_timer.h> API is that
> the
> >>>> htimer library can give a definite answer on the question if the
> timer
> >>>> expiry callback was called, after a timer has been canceled.
> >>>>
> >>>> Below is a performance data from DPDK's 'app/test' micro
> benchmarks,
> >>>> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
> >>>> test_htimer_mgr_perf.c) aren't identical in their structure, but
> the
> >>>> numbers give some indication of the difference.
> >>>>
> >>>> Use case               htimer  timer
> >>>> ------------------------------------
> >>>> Add timer                 28    253
> >>>> Cancel timer              10    412
> >>>> Async add (source lcore)  64
> >>>> Async add (target lcore)  13
> >>>>
> >>>> (AMD 5900X CPU. Time in TSC.)
> >>>>
> >>>> Prototype integration of the htimer library into real, timer-heavy,
> >>>> applications indicates that htimer may result in significant
> >>>> application-level performance gains.
> >>>>
> >>>> The bitset implementation which the HWT implementation depends upon
> >>>> seemed generic-enough and potentially useful outside the world of
> >>>> HWTs, to justify being located in the EAL.
> >>>>
> >>>> This patchset is very much an RFC, and the author is yet to form an
> >>>> opinion on many important issues.
> >>>>
> >>>> * If deemed a suitable replacement, should the htimer replace the
> >>>>     current DPDK timer library in some particular (ABI-breaking)
> >>>>     release, or should it live side-by-side with the then-legacy
> >>>>     <rte_timer.h> API? A lot of things in and outside DPDK depend
> on
> >>>>     <rte_timer.h>, so coexistence may be required to facilitate a
> smooth
> >>>>     transition.
> >>>
> >>> It's my immediate impression that they are totally different in both
> design
> >> philosophy and API.
> >>>
> >>> Personal opinion: I would call it an entirely different library.
> >>>
> >>>>
> >>>> * Should the htimer and htw-related files be colocated with
> rte_timer.c
> >>>>     in the timer library?
> >>>
> >>> Personal opinion: No. This is an entirely different library, and
> should live
> >> for itself in a directory of its own.
> >>>
> >>>>
> >>>> * Would it be useful for applications using asynchronous cancel to
> >>>>     have the option of having the timer callback run not only in
> case of
> >>>>     timer expiration, but also cancellation (on the target lcore)?
> The
> >>>>     timer cb signature would need to include an additional
> parameter in
> >>>>     that case.
> >>>
> >>> If one thread cancels something in another thread, some
> synchronization
> >> between the threads is going to be required anyway. So we could
> reprase your
> >> question: Will the burden of the otherwise required synchronization
> between
> >> the two threads be significantly reduced if the library has the
> ability to run
> >> the callback on asynchronous cancel?
> >>>
> >>
> >> Yes.
> >>
> >> Intuitively, it seems convenient that if you hand off a timer to a
> >> different lcore, the timer callback will be called exactly once,
> >> regardless if the timer was canceled or expired.
> >>
> >> But, as you indicate, you may still need synchronization to solve the
> >> resource reclamation issue.
> >>
> >>> Is such a feature mostly "Must have" or "Nice to have"?
> >>>
> >>> More thoughts in this area...
> >>>
> >>> If adding and additional callback parameter, it could be an enum, so
> the
> >> callback could be expanded to support "timeout (a.k.a. timer fired)",
> "cancel"
> >> and more events we have not yet come up with, e.g. "early kick".
> >>>
> >>
> >> Yes, or an int.
> >>
> >>> Here's an idea off the top of my head: An additional callback
> parameter has
> >> a (small) performance cost incurred with every timer fired (which is
> a very
> >> large multiplier). It might not be required. As an alternative to an
> "what
> >> happened" parameter to the callback, the callback could investigate
> the state
> >> of the object for which the timer fired, and draw its own conclusion
> on how to
> >> proceed. Obviously, this also has a performance cost, but perhaps the
> callback
> >> works on the object's state anyway, making this cost insignificant.
> >>>
> >>
> >> It's not obvious to me that you, in the timer callback, can determine
> >> what happened, if the same callback is called both in the cancel and
> the
> >> expired case.
> >>
> >> The cost of an extra integer passed in a register (or checking a
> flag,
> >> if the timer callback should be called at all at cancellation) that
> is
> >> the concern for me; it's extra bit of API complexity.
> >
> > Then introduce the library without this feature. More features can be
> added later.
> >
> > The library will be introduced as "experimental", so we are free to
> improve it and modify the ABI along the way.
> >
> >>
> >>> Here's another alternative to adding a "what happened" parameter to
> the
> >> callback:
> >>>
> >>> The rte_htimer could have one more callback pointer, which (if set)
> will be
> >> called on cancellation of the timer.
> >>>
> >>
> >> This will grow the timer struct with 16 bytes.
> >
> > If the rte_htimer struct stays within one cache line, it should be
> acceptable.
> >
> 
> Timer structs are often embedded in other structures, and need not
> themselves be cache line aligned (although the "parent" struct may need
> to be, e.g. if it's dynamically allocated).
> 
> So smaller is better. Just consider if you want your attosecond-level
> time stamp in a struct:
> 
> struct my_timer {
>      uint64_t high_precision_time_high_bits;
>      uint64_t high_precision_time_low_bits;
>      struct rte_htimer timer;
> };
> 
> ...and you allocate those structs from a mempool. If rte_htimer is small
> enough, you will fit on one cache line.

Ahh... I somehow assumed they only existed as stand-alone elements inside the HTW.

Then I obviously agree that shorter is better.

> 
> > On the other hand, this approach is less generic than passing an
> additional parameter. (E.g. add yet another callback pointer for "early
> kick"?)
> >
> > BTW, async cancel is a form of inter-thread communication. Does this
> library really need to provide any inter-thread communication
> mechanisms? Doesn't an inter-thread communication mechanism belong in a
> separate library?
> >
> 
> Yes, <rte_htimer_mgr.h> needs this because:
> 1) Being able to schedule timers on a remote lcore is a useful feature
> (especially since we don't have much else in terms of deferred work
> mechanisms in DPDK).

Although remote procedures is a useful feature, providing such a feature doesn't necessarily belong in a library that uses remote procedures.

> 2) htimer aspires to be a plug-in replacement for <rte_timer.h> (albeit
> an ABI-breaking one).

This is a good argument.

But I would much rather have a highly tuned stand-alone HTW library than a plug-in replacement of the old <rte_timer.h>.

> 
> The pure HTW is in rte_htw.[ch].
> 
> Plus, with the current design, async operations basically come for free
> (if you don't use them), from a performance perspective. The extra
> overhead boils down to occasionally polling an empty ring, which is an
> inexpensive operation.

OK. Then no worries.

> 
> >>
> >>>>
> >>>> * Should the rte_htimer be a nested struct, so the htw parts be
> separated
> >>>>     from the htimer parts?
> >>>>
> >>>> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
> >>>>     <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should
> it
> >>>>     be so?
> >>>>
> >>>> * rte_htimer struct is only supposed to be used by the application
> to
> >>>>     give an indication of how much memory it needs to allocate, and
> is
> >>>>     its member are not supposed to be directly accessed (w/ the
> possible
> >>>>     exception of the owner_lcore_id field). Should there be a dummy
> >>>>     struct, or a #define RTE_HTIMER_MEMSIZE or a
> rte_htimer_get_memsize()
> >>>>     function instead, serving the same purpose? Better
> encapsulation,
> >>>>     but more inconvenient for applications. Run-time dynamic sizing
> >>>>     would force application-level dynamic allocations.
> >>>>
> >>>> * Asynchronous cancellation is a little tricky to use for the
> >>>>     application (primarily due to timer memory reclamation/race
> >>>>     issues). Should this functionality be removed?
> >>>>
> >>>> * Should rte_htimer_mgr_init() also retrieve the current time? If
> so,
> >>>>     there should to be a variant which allows the user to specify
> the
> >>>>     time (to match rte_htimer_mgr_manage_time()). One pitfall with
> the
> >>>>     current proposed API is an application calling
> rte_htimer_mgr_init()
> >>>>     and then immediately adding a timer with a relative timeout, in
> >>>>     which case the current absolute time used is 0, which might be
> a
> >>>>     surprise.
> >>>>
> >>>> * Should libdivide (optionally) be used to avoid the div in the TSC
> ->
> >>>>     tick conversion? (Doesn't improve performance on Zen 3, but may
> >>>>     do on other CPUs.) Consider <rte_reciprocal.h> as well.
> >>>>
> >>>> * Should the TSC-per-tick be rounded up to a power of 2, so shifts
> can be
> >>>>     used for conversion? Very minor performance gains to be found
> there,
> >>>>     at least on Zen 3 cores.
> >>>>
> >>>> * Should it be possible to supply the time in rte_htimer_mgr_add()
> >>>>     and/or rte_htimer_mgr_manage_time() functions as ticks, rather
> than
> >>>>     as TSC? Should it be possible to also use nanoseconds?
> >>>>     rte_htimer_mgr_manage_time() would need a flags parameter in
> that
> >>>>     case.
> >>>
> >>> Do not use TSC anywhere in this library. Let the application decide
> the
> >> meaning of a tick.
> >>>
> >>>>
> >>>> * Would the event timer adapter be best off using <rte_htw.h>
> >>>>     directly, or <rte_htimer.h>? In the latter case, there needs to
> be a
> >>>>     way to instantiate more HWTs (similar to the "alt" functions of
> >>>>     <rte_timer.h>)?
> >>>>
> >>>> * Should the PERIODICAL flag (and the complexity it brings) be
> >>>>     removed? And leave the application with only single-shot
> timers, and
> >>>>     the option to re-add them in the timer callback.
> >>>
> >>> First thought: Yes, keep it lean and remove the periodical stuff.
> >>>
> >>> Second thought: This needs a more detailed analysis.
> >>>
> >>>   From one angle:
> >>>
> >>> How many PERIODICAL versus ONESHOT timers do we expect?
> >>>
> >>
> >> I suspect you should be prepared for the ratio being anything.
> >
> > In theory, anything is possible. But I'm asking that we consider
> realistic use cases.
> >
> >>
> >>> Intuitively, I would use this library for ONESHOT timers, and
> perhaps
> >> implement my periodical timers by other means.
> >>>
> >>> If the PERIODICAL:ONESHOT ratio is low, we can probably live with
> the extra
> >> cost of cancel+add for a few periodical timers.
> >>>
> >>>   From another angle:
> >>>
> >>> What is the performance gain with the PERIODICAL flag?
> >>>
> >>
> >> None, pretty much. It's just there for convenience.
> >
> > OK, then I suggest that you remove it, unless you get objections.
> >
> > The library can be expanded with useful features at any time later.
> Useless features are (nearly) impossible to remove, once they are in
> there - they are just "technical debt" with associated maintenance
> costs, added complexity weaving into other features, etc..
> >
> >>
> >>> Without a periodical timer, cancel+add costs 10+28 cycles. How many
> cycles
> >> would a "move" function, performing both cancel and add, use?
> >>>
> >>> And then compare that to the cost (in cycles) of repeating a timer
> with
> >> PERIODICAL?
> >>>
> >>> Furthermore, not having the PERIODICAL flag probably improves the
> >> performance for non-periodical timers. How many cycles could we gain
> here?
> >>>
> >>>
> >>> Another, vaguely related, idea:
> >>>
> >>> The callback pointer might not need to be stored per rte_htimer, but
> could
> >> instead be common for the rte_htw.
> >>>
> >>
> >> Do you mean rte_htw, or rte_htimer_mgr?
> >>
> >> If you make one common callback, all the different parts of the
> >> application needs to be coordinated (in a big switch-statement, or
> >> something of that sort), or have some convention for using an
> >> application-specific wrapper structure (accessed via container_of()).
> >>
> >> This is a problem if the timer service API consumer is a set of
> largely
> >> uncoordinated software modules.
> >>
> >> Btw, the eventdev API has the same issue, and the proposed event
> >> dispatcher is one way to help facilitate application-internal
> decoupling.
> >>
> >> For a module-private rte_htw instance your suggestion may work, but
> not
> >> for <rte_htimer_mgr.h>.
> >
> > I was speculating that a common callback pointer might provide a
> performance benefit for single-purpose HTW instances. (The same concept
> applies if there are multiple callbacks, e.g. a "Timer Fired", a "Timer
> Cancelled", and an "Early Kick" callback pointer - i.e. having the
> callback pointers per HTW instance, instead of per timer.)
> >
> >>
> >>> When a timer fires, the callback probably needs to check/update the
> state of
> >> the object for which the timer fired anyway, so why not just let the
> >> application use that state to determine the appropriate action. This
> might
> >> provide some performance benefit.
> >>>
> >>> It might complicate using one HTW for multiple different purposes,
> though.
> >> Probably a useless idea, but I wanted to share the idea anyway. It
> might
> >> trigger other, better ideas in the community.
> >>>
> >>>>
> >>>> * Should the async result codes and the sync cancel error codes be
> merged
> >>>>     into one set of result codes?
> >>>>
> >>>> * Should the rte_htimer_mgr_async_add() have a flag which allow
> >>>>     buffering add request messages until rte_htimer_mgr_process()
> is
> >>>>     called? Or any manage function. Would reduce ring signaling
> overhead
> >>>>     (i.e., burst enqueue operations instead of single-element
> >>>>     enqueue). Could also be a rte_htimer_mgr_async_add_burst()
> function,
> >>>>     solving the same "problem" a different way. (The signature of
> such
> >>>>     a function would not be pretty.)
> >>>>
> >>>> * Does the functionality provided by the rte_htimer_mgr_process()
> >>>>     function match its the use cases? Should there me a more clear
> >>>>     separation between expiry processing and asynchronous operation
> >>>>     processing?
> >>>>
> >>>> * Should the patchset be split into more commits? If so, how?
> >>>>
> >>>> Thanks to Erik Carrillo for his assistance.
> >>>>
> >>>> Mattias Rönnblom (2):
> >>>>     eal: add bitset type
> >>>>     eal: add high-performance timer facility
> >


^ permalink raw reply	[relevance 0%]

* Re: [RFC 0/2] Add high-performance timer facility
  2023-03-01 13:31  3%     ` Morten Brørup
@ 2023-03-01 15:50  3%       ` Mattias Rönnblom
  2023-03-01 17:06  0%         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Mattias Rönnblom @ 2023-03-01 15:50 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Erik Gabriel Carrillo, David Marchand, Maria Lingemark, Stefan Sundkvist

On 2023-03-01 14:31, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Wednesday, 1 March 2023 12.18
>>
>> On 2023-02-28 17:01, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>>>> Sent: Tuesday, 28 February 2023 10.39
>>>
>>> I have been looking for a high performance timer library (for use in a fast
>> path TCP stack), and this looks very useful, Mattias.
>>>
>>> My initial feedback is based on quickly skimming the patch source code, and
>> reading this cover letter.
>>>
>>>>
>>>> This patchset is an attempt to introduce a high-performance, highly
>>>> scalable timer facility into DPDK.
>>>>
>>>> More specifically, the goals for the htimer library are:
>>>>
>>>> * Efficient handling of a handful up to hundreds of thousands of
>>>>     concurrent timers.
>>>> * Reduced overhead of adding and canceling timers.
>>>> * Provide a service functionally equivalent to that of
>>>>     <rte_timer.h>. API/ABI backward compatibility is secondary.
>>>>
>>>> In the author's opinion, there are two main shortcomings with the
>>>> current DPDK timer library (i.e., rte_timer.[ch]).
>>>>
>>>> One is the synchronization overhead, where heavy-weight full-barrier
>>>> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
>>>> lists, but any thread may add or cancel (or otherwise access) timers
>>>> managed by another lcore (and thus resides in its timer skip list).
>>>>
>>>> The other is an algorithmic shortcoming, with rte_timer.c's reliance
>>>> on a skip list, which, seemingly, is less efficient than certain
>>>> alternatives.
>>>>
>>>> This patchset implements a hierarchical timer wheel (HWT, in
>>>
>>> Typo: HWT or HTW?
>>
>> Yes. I don't understand how I could managed to make so many such HTW ->
>> HWT typos. At least I got the filenames (rte_htw.[ch]) correct.
>>
>>>
>>>> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
>>>> Hierarchical Timing Wheels: Data Structures for the Efficient
>>>> Implementation of a Timer Facility". A HWT is a data structure
>>>> purposely design for this task, and used by many operating system
>>>> kernel timer facilities.
>>>>
>>>> To further improve the solution described by Varghese and Lauck, a
>>>> bitset is placed in front of each of the timer wheel in the HWT,
>>>> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
>>>> and expiry processing).
>>>>
>>>> Cycle-efficient scanning and manipulation of these bitsets are crucial
>>>> for the HWT's performance.
>>>>
>>>> The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
>>>> instance, much like rte_timer.c keeps a per-lcore skip list.
>>>>
>>>> To avoid expensive synchronization overhead for thread-local timer
>>>> management, the HWTs are accessed only from the "owning" thread.  Any
>>>> interaction any other thread has with a particular lcore's timer
>>>> wheel goes over a set of DPDK rings. A side-effect of this design is
>>>> that all operations working toward a "remote" HWT must be
>>>> asynchronous.
>>>>
>>>> The <rte_htimer.h> API is available only to EAL threads and registered
>>>> non-EAL threads.
>>>>
>>>> The htimer API allows the application to supply the current time,
>>>> useful in case it already has retrieved this for other purposes,
>>>> saving the cost of a rdtsc instruction (or its equivalent).
>>>>
>>>> Relative htimer does not retrieve a new time, but reuse the current
>>>> time (as known via/at-the-time of the manage-call), again to shave off
>>>> some cycles of overhead.
>>>
>>> I have a comment to the two points above.
>>>
>>> I agree that the application should supply the current time.
>>>
>>> This should be the concept throughout the library. I don't understand why
>> TSC is used in the library at all?
>>>
>>> Please use a unit-less tick, and let the application decide what one tick
>> means.
>>>
>>
>> I suspect the design of rte_htimer_mgr.h (and rte_timer.h) makes more
>> sense if you think of the user of the API as not just a "monolithic"
>> application, but rather a set of different modules, developed by
>> different organizations, and reused across a set of applications. The
>> idea behind the API design is they should all be able to share one timer
>> service instance.
>>
>> The different parts of the application and any future DPDK platform
>> modules that use the htimer service needs to agree what a tick means in
>> terms of actual wall-time, if it's not mandated by the API.
> 
> I see. Then those non-monolithic applications can agree that the unit of time is nanoseconds, or whatever makes sense for those applications. And then they can instantiate one shared HTW for that purpose.
> 

<rte_htimer_mgr.h> contains nothing but shared HTWs.

> There is no need to impose such an API limit on other users of the library.
> 
>>
>> There might be room for module-specific timer wheels as well, with
>> different resolution or other characteristics. The event timer adapter's
>> use of a timer wheel could be one example (although I'm not sure it is).
> 
> We are not using the event device, and I have not looked into it, so I have no qualified comments to this.
> 
>>
>> If timer-wheel-as-a-private-lego-piece is also a valid use case, then
>> one could consider make the <rte_htw.h> API public as well. That is what
>> I think you as asking for here: a generic timer wheel that doesn't know
>> anything about time sources, time source time -> tick conversion, or
>> timer source time -> monotonic wall time conversion, and maybe is also
>> not bound to a particular thread.
> 
> Yes, that is what I had been searching the Internet for.
> 
> (I'm not sure what you mean by "not bound to a particular thread". Your per-thread design seems good to me.)
> 
> I don't want more stuff in the EAL. What I want is high-performance DPDK libraries we can use in our applications.
> 
>>
>> I picked TSC because it seemed like a good "universal time unit" for
>> DPDK. rdtsc (and its equivalent) is also a very precise (especially on
>> x86) and cheap-to-retrieve (especially on ARM, from what I understand).
> 
> The TSC does have excellent performance, but on all other parameters it is a horrible time keeper: The measurement unit depends on the underlying hardware, the TSC drifts depending on temperature, it cannot be PTP synchronized, the list is endless!
> 
>>
>> That said, at the moment, I'm leaning toward nanoseconds (uint64_t
>> format) should be the default for timer expiration time instead of TSC.
>> TSC could still be an option for passing the current time, since TSC
>> will be a common time source, and it shaves off one conversion.
> 
> There are many reasons why nanoseconds is a much better choice than TSC.
> 
>>
>>> A unit-less tick will also let the application instantiate a HTW with higher
>> resolution than the TSC. (E.g. think about oversampling in audio processing,
>> or Brezenham's line drawing algorithm for 2D visuals - oversampling can sound
>> and look better.)
> 
> Some of the timing data in our application have a resolution orders of magnitude higher than one nanosecond. If we combined that with a HTW library with nanosecond resolution, we would need to keep these timer values in two locations: The original high-res timer in our data structure, and the shadow low-res (nanosecond) timer in the HTW.
> 

There is no way you will meet timers with anything approaching 
pico-second-level precision. You will also get into a value range issue, 
since you will wrap around a 64-bit integer in a matter of days.

The HTW only stores the timeout in ticks, not TSC, nanoseconds or 
picoseconds. Generally, you don't want pico-second-level tick 
granularity, since it increases the overhead of advancing the wheel(s). 
The first (lowest-significance) few wheels will pretty much always be empty.

> We might also need to frequently update the HTW timers to prevent drifting away from the high-res timers. E.g. 1.2 + 1.2 is still 2 when rounded, but + 1.2 becomes 3 when it should have been 4 (3 * 1.2 = 3.6) rounded. This level of drifting would also make periodic timers in the HTW useless.
> 

Useless, for a certain class of applications. What application would 
that be?

> Please note: I haven't really considered merging the high-res timing in our application with this HTW, and I'm also not saying that PERIODIC timers in the HTW are required or even useful for our application. I'm only providing arguments for a unit-less time!
> 
>>>
>>> For reference (supporting my suggestion), the dynamic timestamp field in the
>> rte_mbuf structure is also defined as being unit-less. (I think NVIDIA
>> implements it as nanoseconds, but that's an implementation specific choice.)
>>>
>>>>
>>>> A semantic improvement compared to the <rte_timer.h> API is that the
>>>> htimer library can give a definite answer on the question if the timer
>>>> expiry callback was called, after a timer has been canceled.
>>>>
>>>> Below is a performance data from DPDK's 'app/test' micro benchmarks,
>>>> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
>>>> test_htimer_mgr_perf.c) aren't identical in their structure, but the
>>>> numbers give some indication of the difference.
>>>>
>>>> Use case               htimer  timer
>>>> ------------------------------------
>>>> Add timer                 28    253
>>>> Cancel timer              10    412
>>>> Async add (source lcore)  64
>>>> Async add (target lcore)  13
>>>>
>>>> (AMD 5900X CPU. Time in TSC.)
>>>>
>>>> Prototype integration of the htimer library into real, timer-heavy,
>>>> applications indicates that htimer may result in significant
>>>> application-level performance gains.
>>>>
>>>> The bitset implementation which the HWT implementation depends upon
>>>> seemed generic-enough and potentially useful outside the world of
>>>> HWTs, to justify being located in the EAL.
>>>>
>>>> This patchset is very much an RFC, and the author is yet to form an
>>>> opinion on many important issues.
>>>>
>>>> * If deemed a suitable replacement, should the htimer replace the
>>>>     current DPDK timer library in some particular (ABI-breaking)
>>>>     release, or should it live side-by-side with the then-legacy
>>>>     <rte_timer.h> API? A lot of things in and outside DPDK depend on
>>>>     <rte_timer.h>, so coexistence may be required to facilitate a smooth
>>>>     transition.
>>>
>>> It's my immediate impression that they are totally different in both design
>> philosophy and API.
>>>
>>> Personal opinion: I would call it an entirely different library.
>>>
>>>>
>>>> * Should the htimer and htw-related files be colocated with rte_timer.c
>>>>     in the timer library?
>>>
>>> Personal opinion: No. This is an entirely different library, and should live
>> for itself in a directory of its own.
>>>
>>>>
>>>> * Would it be useful for applications using asynchronous cancel to
>>>>     have the option of having the timer callback run not only in case of
>>>>     timer expiration, but also cancellation (on the target lcore)? The
>>>>     timer cb signature would need to include an additional parameter in
>>>>     that case.
>>>
>>> If one thread cancels something in another thread, some synchronization
>> between the threads is going to be required anyway. So we could reprase your
>> question: Will the burden of the otherwise required synchronization between
>> the two threads be significantly reduced if the library has the ability to run
>> the callback on asynchronous cancel?
>>>
>>
>> Yes.
>>
>> Intuitively, it seems convenient that if you hand off a timer to a
>> different lcore, the timer callback will be called exactly once,
>> regardless if the timer was canceled or expired.
>>
>> But, as you indicate, you may still need synchronization to solve the
>> resource reclamation issue.
>>
>>> Is such a feature mostly "Must have" or "Nice to have"?
>>>
>>> More thoughts in this area...
>>>
>>> If adding and additional callback parameter, it could be an enum, so the
>> callback could be expanded to support "timeout (a.k.a. timer fired)", "cancel"
>> and more events we have not yet come up with, e.g. "early kick".
>>>
>>
>> Yes, or an int.
>>
>>> Here's an idea off the top of my head: An additional callback parameter has
>> a (small) performance cost incurred with every timer fired (which is a very
>> large multiplier). It might not be required. As an alternative to an "what
>> happened" parameter to the callback, the callback could investigate the state
>> of the object for which the timer fired, and draw its own conclusion on how to
>> proceed. Obviously, this also has a performance cost, but perhaps the callback
>> works on the object's state anyway, making this cost insignificant.
>>>
>>
>> It's not obvious to me that you, in the timer callback, can determine
>> what happened, if the same callback is called both in the cancel and the
>> expired case.
>>
>> The cost of an extra integer passed in a register (or checking a flag,
>> if the timer callback should be called at all at cancellation) that is
>> the concern for me; it's extra bit of API complexity.
> 
> Then introduce the library without this feature. More features can be added later.
> 
> The library will be introduced as "experimental", so we are free to improve it and modify the ABI along the way.
> 
>>
>>> Here's another alternative to adding a "what happened" parameter to the
>> callback:
>>>
>>> The rte_htimer could have one more callback pointer, which (if set) will be
>> called on cancellation of the timer.
>>>
>>
>> This will grow the timer struct with 16 bytes.
> 
> If the rte_htimer struct stays within one cache line, it should be acceptable.
> 

Timer structs are often embedded in other structures, and need not 
themselves be cache line aligned (although the "parent" struct may need 
to be, e.g. if it's dynamically allocated).

So smaller is better. Just consider if you want your attosecond-level 
time stamp in a struct:

struct my_timer {
     uint64_t high_precision_time_high_bits;
     uint64_t high_precision_time_low_bits;
     struct rte_htimer timer;
};

...and you allocate those structs from a mempool. If rte_htimer is small 
enough, you will fit on one cache line.

> On the other hand, this approach is less generic than passing an additional parameter. (E.g. add yet another callback pointer for "early kick"?)
> 
> BTW, async cancel is a form of inter-thread communication. Does this library really need to provide any inter-thread communication mechanisms? Doesn't an inter-thread communication mechanism belong in a separate library?
> 

Yes, <rte_htimer_mgr.h> needs this because:
1) Being able to schedule timers on a remote lcore is a useful feature 
(especially since we don't have much else in terms of deferred work 
mechanisms in DPDK).
2) htimer aspires to be a plug-in replacement for <rte_timer.h> (albeit 
an ABI-breaking one).

The pure HTW is in rte_htw.[ch].

Plus, with the current design, async operations basically come for free 
(if you don't use them), from a performance perspective. The extra 
overhead boils down to occasionally polling an empty ring, which is an 
inexpensive operation.

>>
>>>>
>>>> * Should the rte_htimer be a nested struct, so the htw parts be separated
>>>>     from the htimer parts?
>>>>
>>>> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
>>>>     <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
>>>>     be so?
>>>>
>>>> * rte_htimer struct is only supposed to be used by the application to
>>>>     give an indication of how much memory it needs to allocate, and is
>>>>     its member are not supposed to be directly accessed (w/ the possible
>>>>     exception of the owner_lcore_id field). Should there be a dummy
>>>>     struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
>>>>     function instead, serving the same purpose? Better encapsulation,
>>>>     but more inconvenient for applications. Run-time dynamic sizing
>>>>     would force application-level dynamic allocations.
>>>>
>>>> * Asynchronous cancellation is a little tricky to use for the
>>>>     application (primarily due to timer memory reclamation/race
>>>>     issues). Should this functionality be removed?
>>>>
>>>> * Should rte_htimer_mgr_init() also retrieve the current time? If so,
>>>>     there should to be a variant which allows the user to specify the
>>>>     time (to match rte_htimer_mgr_manage_time()). One pitfall with the
>>>>     current proposed API is an application calling rte_htimer_mgr_init()
>>>>     and then immediately adding a timer with a relative timeout, in
>>>>     which case the current absolute time used is 0, which might be a
>>>>     surprise.
>>>>
>>>> * Should libdivide (optionally) be used to avoid the div in the TSC ->
>>>>     tick conversion? (Doesn't improve performance on Zen 3, but may
>>>>     do on other CPUs.) Consider <rte_reciprocal.h> as well.
>>>>
>>>> * Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
>>>>     used for conversion? Very minor performance gains to be found there,
>>>>     at least on Zen 3 cores.
>>>>
>>>> * Should it be possible to supply the time in rte_htimer_mgr_add()
>>>>     and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
>>>>     as TSC? Should it be possible to also use nanoseconds?
>>>>     rte_htimer_mgr_manage_time() would need a flags parameter in that
>>>>     case.
>>>
>>> Do not use TSC anywhere in this library. Let the application decide the
>> meaning of a tick.
>>>
>>>>
>>>> * Would the event timer adapter be best off using <rte_htw.h>
>>>>     directly, or <rte_htimer.h>? In the latter case, there needs to be a
>>>>     way to instantiate more HWTs (similar to the "alt" functions of
>>>>     <rte_timer.h>)?
>>>>
>>>> * Should the PERIODICAL flag (and the complexity it brings) be
>>>>     removed? And leave the application with only single-shot timers, and
>>>>     the option to re-add them in the timer callback.
>>>
>>> First thought: Yes, keep it lean and remove the periodical stuff.
>>>
>>> Second thought: This needs a more detailed analysis.
>>>
>>>   From one angle:
>>>
>>> How many PERIODICAL versus ONESHOT timers do we expect?
>>>
>>
>> I suspect you should be prepared for the ratio being anything.
> 
> In theory, anything is possible. But I'm asking that we consider realistic use cases.
> 
>>
>>> Intuitively, I would use this library for ONESHOT timers, and perhaps
>> implement my periodical timers by other means.
>>>
>>> If the PERIODICAL:ONESHOT ratio is low, we can probably live with the extra
>> cost of cancel+add for a few periodical timers.
>>>
>>>   From another angle:
>>>
>>> What is the performance gain with the PERIODICAL flag?
>>>
>>
>> None, pretty much. It's just there for convenience.
> 
> OK, then I suggest that you remove it, unless you get objections.
> 
> The library can be expanded with useful features at any time later. Useless features are (nearly) impossible to remove, once they are in there - they are just "technical debt" with associated maintenance costs, added complexity weaving into other features, etc..
> 
>>
>>> Without a periodical timer, cancel+add costs 10+28 cycles. How many cycles
>> would a "move" function, performing both cancel and add, use?
>>>
>>> And then compare that to the cost (in cycles) of repeating a timer with
>> PERIODICAL?
>>>
>>> Furthermore, not having the PERIODICAL flag probably improves the
>> performance for non-periodical timers. How many cycles could we gain here?
>>>
>>>
>>> Another, vaguely related, idea:
>>>
>>> The callback pointer might not need to be stored per rte_htimer, but could
>> instead be common for the rte_htw.
>>>
>>
>> Do you mean rte_htw, or rte_htimer_mgr?
>>
>> If you make one common callback, all the different parts of the
>> application needs to be coordinated (in a big switch-statement, or
>> something of that sort), or have some convention for using an
>> application-specific wrapper structure (accessed via container_of()).
>>
>> This is a problem if the timer service API consumer is a set of largely
>> uncoordinated software modules.
>>
>> Btw, the eventdev API has the same issue, and the proposed event
>> dispatcher is one way to help facilitate application-internal decoupling.
>>
>> For a module-private rte_htw instance your suggestion may work, but not
>> for <rte_htimer_mgr.h>.
> 
> I was speculating that a common callback pointer might provide a performance benefit for single-purpose HTW instances. (The same concept applies if there are multiple callbacks, e.g. a "Timer Fired", a "Timer Cancelled", and an "Early Kick" callback pointer - i.e. having the callback pointers per HTW instance, instead of per timer.)
> 
>>
>>> When a timer fires, the callback probably needs to check/update the state of
>> the object for which the timer fired anyway, so why not just let the
>> application use that state to determine the appropriate action. This might
>> provide some performance benefit.
>>>
>>> It might complicate using one HTW for multiple different purposes, though.
>> Probably a useless idea, but I wanted to share the idea anyway. It might
>> trigger other, better ideas in the community.
>>>
>>>>
>>>> * Should the async result codes and the sync cancel error codes be merged
>>>>     into one set of result codes?
>>>>
>>>> * Should the rte_htimer_mgr_async_add() have a flag which allow
>>>>     buffering add request messages until rte_htimer_mgr_process() is
>>>>     called? Or any manage function. Would reduce ring signaling overhead
>>>>     (i.e., burst enqueue operations instead of single-element
>>>>     enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
>>>>     solving the same "problem" a different way. (The signature of such
>>>>     a function would not be pretty.)
>>>>
>>>> * Does the functionality provided by the rte_htimer_mgr_process()
>>>>     function match its the use cases? Should there me a more clear
>>>>     separation between expiry processing and asynchronous operation
>>>>     processing?
>>>>
>>>> * Should the patchset be split into more commits? If so, how?
>>>>
>>>> Thanks to Erik Carrillo for his assistance.
>>>>
>>>> Mattias Rönnblom (2):
>>>>     eal: add bitset type
>>>>     eal: add high-performance timer facility
> 


^ permalink raw reply	[relevance 3%]

* RE: [RFC 0/2] Add high-performance timer facility
  2023-03-01 11:18  0%   ` Mattias Rönnblom
@ 2023-03-01 13:31  3%     ` Morten Brørup
  2023-03-01 15:50  3%       ` Mattias Rönnblom
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-03-01 13:31 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Erik Gabriel Carrillo, David Marchand, Maria Lingemark, Stefan Sundkvist

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Wednesday, 1 March 2023 12.18
> 
> On 2023-02-28 17:01, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Tuesday, 28 February 2023 10.39
> >
> > I have been looking for a high performance timer library (for use in a fast
> path TCP stack), and this looks very useful, Mattias.
> >
> > My initial feedback is based on quickly skimming the patch source code, and
> reading this cover letter.
> >
> >>
> >> This patchset is an attempt to introduce a high-performance, highly
> >> scalable timer facility into DPDK.
> >>
> >> More specifically, the goals for the htimer library are:
> >>
> >> * Efficient handling of a handful up to hundreds of thousands of
> >>    concurrent timers.
> >> * Reduced overhead of adding and canceling timers.
> >> * Provide a service functionally equivalent to that of
> >>    <rte_timer.h>. API/ABI backward compatibility is secondary.
> >>
> >> In the author's opinion, there are two main shortcomings with the
> >> current DPDK timer library (i.e., rte_timer.[ch]).
> >>
> >> One is the synchronization overhead, where heavy-weight full-barrier
> >> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
> >> lists, but any thread may add or cancel (or otherwise access) timers
> >> managed by another lcore (and thus resides in its timer skip list).
> >>
> >> The other is an algorithmic shortcoming, with rte_timer.c's reliance
> >> on a skip list, which, seemingly, is less efficient than certain
> >> alternatives.
> >>
> >> This patchset implements a hierarchical timer wheel (HWT, in
> >
> > Typo: HWT or HTW?
> 
> Yes. I don't understand how I could managed to make so many such HTW ->
> HWT typos. At least I got the filenames (rte_htw.[ch]) correct.
> 
> >
> >> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
> >> Hierarchical Timing Wheels: Data Structures for the Efficient
> >> Implementation of a Timer Facility". A HWT is a data structure
> >> purposely design for this task, and used by many operating system
> >> kernel timer facilities.
> >>
> >> To further improve the solution described by Varghese and Lauck, a
> >> bitset is placed in front of each of the timer wheel in the HWT,
> >> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
> >> and expiry processing).
> >>
> >> Cycle-efficient scanning and manipulation of these bitsets are crucial
> >> for the HWT's performance.
> >>
> >> The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
> >> instance, much like rte_timer.c keeps a per-lcore skip list.
> >>
> >> To avoid expensive synchronization overhead for thread-local timer
> >> management, the HWTs are accessed only from the "owning" thread.  Any
> >> interaction any other thread has with a particular lcore's timer
> >> wheel goes over a set of DPDK rings. A side-effect of this design is
> >> that all operations working toward a "remote" HWT must be
> >> asynchronous.
> >>
> >> The <rte_htimer.h> API is available only to EAL threads and registered
> >> non-EAL threads.
> >>
> >> The htimer API allows the application to supply the current time,
> >> useful in case it already has retrieved this for other purposes,
> >> saving the cost of a rdtsc instruction (or its equivalent).
> >>
> >> Relative htimer does not retrieve a new time, but reuse the current
> >> time (as known via/at-the-time of the manage-call), again to shave off
> >> some cycles of overhead.
> >
> > I have a comment to the two points above.
> >
> > I agree that the application should supply the current time.
> >
> > This should be the concept throughout the library. I don't understand why
> TSC is used in the library at all?
> >
> > Please use a unit-less tick, and let the application decide what one tick
> means.
> >
> 
> I suspect the design of rte_htimer_mgr.h (and rte_timer.h) makes more
> sense if you think of the user of the API as not just a "monolithic"
> application, but rather a set of different modules, developed by
> different organizations, and reused across a set of applications. The
> idea behind the API design is they should all be able to share one timer
> service instance.
> 
> The different parts of the application and any future DPDK platform
> modules that use the htimer service needs to agree what a tick means in
> terms of actual wall-time, if it's not mandated by the API.

I see. Then those non-monolithic applications can agree that the unit of time is nanoseconds, or whatever makes sense for those applications. And then they can instantiate one shared HTW for that purpose.

There is no need to impose such an API limit on other users of the library.

> 
> There might be room for module-specific timer wheels as well, with
> different resolution or other characteristics. The event timer adapter's
> use of a timer wheel could be one example (although I'm not sure it is).

We are not using the event device, and I have not looked into it, so I have no qualified comments to this.

> 
> If timer-wheel-as-a-private-lego-piece is also a valid use case, then
> one could consider make the <rte_htw.h> API public as well. That is what
> I think you as asking for here: a generic timer wheel that doesn't know
> anything about time sources, time source time -> tick conversion, or
> timer source time -> monotonic wall time conversion, and maybe is also
> not bound to a particular thread.

Yes, that is what I had been searching the Internet for.

(I'm not sure what you mean by "not bound to a particular thread". Your per-thread design seems good to me.)

I don't want more stuff in the EAL. What I want is high-performance DPDK libraries we can use in our applications.

> 
> I picked TSC because it seemed like a good "universal time unit" for
> DPDK. rdtsc (and its equivalent) is also a very precise (especially on
> x86) and cheap-to-retrieve (especially on ARM, from what I understand).

The TSC does have excellent performance, but on all other parameters it is a horrible time keeper: The measurement unit depends on the underlying hardware, the TSC drifts depending on temperature, it cannot be PTP synchronized, the list is endless!

> 
> That said, at the moment, I'm leaning toward nanoseconds (uint64_t
> format) should be the default for timer expiration time instead of TSC.
> TSC could still be an option for passing the current time, since TSC
> will be a common time source, and it shaves off one conversion.

There are many reasons why nanoseconds is a much better choice than TSC.

> 
> > A unit-less tick will also let the application instantiate a HTW with higher
> resolution than the TSC. (E.g. think about oversampling in audio processing,
> or Brezenham's line drawing algorithm for 2D visuals - oversampling can sound
> and look better.)

Some of the timing data in our application have a resolution orders of magnitude higher than one nanosecond. If we combined that with a HTW library with nanosecond resolution, we would need to keep these timer values in two locations: The original high-res timer in our data structure, and the shadow low-res (nanosecond) timer in the HTW.

We might also need to frequently update the HTW timers to prevent drifting away from the high-res timers. E.g. 1.2 + 1.2 is still 2 when rounded, but + 1.2 becomes 3 when it should have been 4 (3 * 1.2 = 3.6) rounded. This level of drifting would also make periodic timers in the HTW useless.

Please note: I haven't really considered merging the high-res timing in our application with this HTW, and I'm also not saying that PERIODIC timers in the HTW are required or even useful for our application. I'm only providing arguments for a unit-less time!

> >
> > For reference (supporting my suggestion), the dynamic timestamp field in the
> rte_mbuf structure is also defined as being unit-less. (I think NVIDIA
> implements it as nanoseconds, but that's an implementation specific choice.)
> >
> >>
> >> A semantic improvement compared to the <rte_timer.h> API is that the
> >> htimer library can give a definite answer on the question if the timer
> >> expiry callback was called, after a timer has been canceled.
> >>
> >> Below is a performance data from DPDK's 'app/test' micro benchmarks,
> >> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
> >> test_htimer_mgr_perf.c) aren't identical in their structure, but the
> >> numbers give some indication of the difference.
> >>
> >> Use case               htimer  timer
> >> ------------------------------------
> >> Add timer                 28    253
> >> Cancel timer              10    412
> >> Async add (source lcore)  64
> >> Async add (target lcore)  13
> >>
> >> (AMD 5900X CPU. Time in TSC.)
> >>
> >> Prototype integration of the htimer library into real, timer-heavy,
> >> applications indicates that htimer may result in significant
> >> application-level performance gains.
> >>
> >> The bitset implementation which the HWT implementation depends upon
> >> seemed generic-enough and potentially useful outside the world of
> >> HWTs, to justify being located in the EAL.
> >>
> >> This patchset is very much an RFC, and the author is yet to form an
> >> opinion on many important issues.
> >>
> >> * If deemed a suitable replacement, should the htimer replace the
> >>    current DPDK timer library in some particular (ABI-breaking)
> >>    release, or should it live side-by-side with the then-legacy
> >>    <rte_timer.h> API? A lot of things in and outside DPDK depend on
> >>    <rte_timer.h>, so coexistence may be required to facilitate a smooth
> >>    transition.
> >
> > It's my immediate impression that they are totally different in both design
> philosophy and API.
> >
> > Personal opinion: I would call it an entirely different library.
> >
> >>
> >> * Should the htimer and htw-related files be colocated with rte_timer.c
> >>    in the timer library?
> >
> > Personal opinion: No. This is an entirely different library, and should live
> for itself in a directory of its own.
> >
> >>
> >> * Would it be useful for applications using asynchronous cancel to
> >>    have the option of having the timer callback run not only in case of
> >>    timer expiration, but also cancellation (on the target lcore)? The
> >>    timer cb signature would need to include an additional parameter in
> >>    that case.
> >
> > If one thread cancels something in another thread, some synchronization
> between the threads is going to be required anyway. So we could reprase your
> question: Will the burden of the otherwise required synchronization between
> the two threads be significantly reduced if the library has the ability to run
> the callback on asynchronous cancel?
> >
> 
> Yes.
> 
> Intuitively, it seems convenient that if you hand off a timer to a
> different lcore, the timer callback will be called exactly once,
> regardless if the timer was canceled or expired.
> 
> But, as you indicate, you may still need synchronization to solve the
> resource reclamation issue.
> 
> > Is such a feature mostly "Must have" or "Nice to have"?
> >
> > More thoughts in this area...
> >
> > If adding and additional callback parameter, it could be an enum, so the
> callback could be expanded to support "timeout (a.k.a. timer fired)", "cancel"
> and more events we have not yet come up with, e.g. "early kick".
> >
> 
> Yes, or an int.
> 
> > Here's an idea off the top of my head: An additional callback parameter has
> a (small) performance cost incurred with every timer fired (which is a very
> large multiplier). It might not be required. As an alternative to an "what
> happened" parameter to the callback, the callback could investigate the state
> of the object for which the timer fired, and draw its own conclusion on how to
> proceed. Obviously, this also has a performance cost, but perhaps the callback
> works on the object's state anyway, making this cost insignificant.
> >
> 
> It's not obvious to me that you, in the timer callback, can determine
> what happened, if the same callback is called both in the cancel and the
> expired case.
> 
> The cost of an extra integer passed in a register (or checking a flag,
> if the timer callback should be called at all at cancellation) that is
> the concern for me; it's extra bit of API complexity.

Then introduce the library without this feature. More features can be added later.

The library will be introduced as "experimental", so we are free to improve it and modify the ABI along the way.

> 
> > Here's another alternative to adding a "what happened" parameter to the
> callback:
> >
> > The rte_htimer could have one more callback pointer, which (if set) will be
> called on cancellation of the timer.
> >
> 
> This will grow the timer struct with 16 bytes.

If the rte_htimer struct stays within one cache line, it should be acceptable.

On the other hand, this approach is less generic than passing an additional parameter. (E.g. add yet another callback pointer for "early kick"?)

BTW, async cancel is a form of inter-thread communication. Does this library really need to provide any inter-thread communication mechanisms? Doesn't an inter-thread communication mechanism belong in a separate library?

> 
> >>
> >> * Should the rte_htimer be a nested struct, so the htw parts be separated
> >>    from the htimer parts?
> >>
> >> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
> >>    <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
> >>    be so?
> >>
> >> * rte_htimer struct is only supposed to be used by the application to
> >>    give an indication of how much memory it needs to allocate, and is
> >>    its member are not supposed to be directly accessed (w/ the possible
> >>    exception of the owner_lcore_id field). Should there be a dummy
> >>    struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
> >>    function instead, serving the same purpose? Better encapsulation,
> >>    but more inconvenient for applications. Run-time dynamic sizing
> >>    would force application-level dynamic allocations.
> >>
> >> * Asynchronous cancellation is a little tricky to use for the
> >>    application (primarily due to timer memory reclamation/race
> >>    issues). Should this functionality be removed?
> >>
> >> * Should rte_htimer_mgr_init() also retrieve the current time? If so,
> >>    there should to be a variant which allows the user to specify the
> >>    time (to match rte_htimer_mgr_manage_time()). One pitfall with the
> >>    current proposed API is an application calling rte_htimer_mgr_init()
> >>    and then immediately adding a timer with a relative timeout, in
> >>    which case the current absolute time used is 0, which might be a
> >>    surprise.
> >>
> >> * Should libdivide (optionally) be used to avoid the div in the TSC ->
> >>    tick conversion? (Doesn't improve performance on Zen 3, but may
> >>    do on other CPUs.) Consider <rte_reciprocal.h> as well.
> >>
> >> * Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
> >>    used for conversion? Very minor performance gains to be found there,
> >>    at least on Zen 3 cores.
> >>
> >> * Should it be possible to supply the time in rte_htimer_mgr_add()
> >>    and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
> >>    as TSC? Should it be possible to also use nanoseconds?
> >>    rte_htimer_mgr_manage_time() would need a flags parameter in that
> >>    case.
> >
> > Do not use TSC anywhere in this library. Let the application decide the
> meaning of a tick.
> >
> >>
> >> * Would the event timer adapter be best off using <rte_htw.h>
> >>    directly, or <rte_htimer.h>? In the latter case, there needs to be a
> >>    way to instantiate more HWTs (similar to the "alt" functions of
> >>    <rte_timer.h>)?
> >>
> >> * Should the PERIODICAL flag (and the complexity it brings) be
> >>    removed? And leave the application with only single-shot timers, and
> >>    the option to re-add them in the timer callback.
> >
> > First thought: Yes, keep it lean and remove the periodical stuff.
> >
> > Second thought: This needs a more detailed analysis.
> >
> >  From one angle:
> >
> > How many PERIODICAL versus ONESHOT timers do we expect?
> >
> 
> I suspect you should be prepared for the ratio being anything.

In theory, anything is possible. But I'm asking that we consider realistic use cases.

> 
> > Intuitively, I would use this library for ONESHOT timers, and perhaps
> implement my periodical timers by other means.
> >
> > If the PERIODICAL:ONESHOT ratio is low, we can probably live with the extra
> cost of cancel+add for a few periodical timers.
> >
> >  From another angle:
> >
> > What is the performance gain with the PERIODICAL flag?
> >
> 
> None, pretty much. It's just there for convenience.

OK, then I suggest that you remove it, unless you get objections.

The library can be expanded with useful features at any time later. Useless features are (nearly) impossible to remove, once they are in there - they are just "technical debt" with associated maintenance costs, added complexity weaving into other features, etc..

> 
> > Without a periodical timer, cancel+add costs 10+28 cycles. How many cycles
> would a "move" function, performing both cancel and add, use?
> >
> > And then compare that to the cost (in cycles) of repeating a timer with
> PERIODICAL?
> >
> > Furthermore, not having the PERIODICAL flag probably improves the
> performance for non-periodical timers. How many cycles could we gain here?
> >
> >
> > Another, vaguely related, idea:
> >
> > The callback pointer might not need to be stored per rte_htimer, but could
> instead be common for the rte_htw.
> >
> 
> Do you mean rte_htw, or rte_htimer_mgr?
> 
> If you make one common callback, all the different parts of the
> application needs to be coordinated (in a big switch-statement, or
> something of that sort), or have some convention for using an
> application-specific wrapper structure (accessed via container_of()).
> 
> This is a problem if the timer service API consumer is a set of largely
> uncoordinated software modules.
> 
> Btw, the eventdev API has the same issue, and the proposed event
> dispatcher is one way to help facilitate application-internal decoupling.
> 
> For a module-private rte_htw instance your suggestion may work, but not
> for <rte_htimer_mgr.h>.

I was speculating that a common callback pointer might provide a performance benefit for single-purpose HTW instances. (The same concept applies if there are multiple callbacks, e.g. a "Timer Fired", a "Timer Cancelled", and an "Early Kick" callback pointer - i.e. having the callback pointers per HTW instance, instead of per timer.)

> 
> > When a timer fires, the callback probably needs to check/update the state of
> the object for which the timer fired anyway, so why not just let the
> application use that state to determine the appropriate action. This might
> provide some performance benefit.
> >
> > It might complicate using one HTW for multiple different purposes, though.
> Probably a useless idea, but I wanted to share the idea anyway. It might
> trigger other, better ideas in the community.
> >
> >>
> >> * Should the async result codes and the sync cancel error codes be merged
> >>    into one set of result codes?
> >>
> >> * Should the rte_htimer_mgr_async_add() have a flag which allow
> >>    buffering add request messages until rte_htimer_mgr_process() is
> >>    called? Or any manage function. Would reduce ring signaling overhead
> >>    (i.e., burst enqueue operations instead of single-element
> >>    enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
> >>    solving the same "problem" a different way. (The signature of such
> >>    a function would not be pretty.)
> >>
> >> * Does the functionality provided by the rte_htimer_mgr_process()
> >>    function match its the use cases? Should there me a more clear
> >>    separation between expiry processing and asynchronous operation
> >>    processing?
> >>
> >> * Should the patchset be split into more commits? If so, how?
> >>
> >> Thanks to Erik Carrillo for his assistance.
> >>
> >> Mattias Rönnblom (2):
> >>    eal: add bitset type
> >>    eal: add high-performance timer facility


^ permalink raw reply	[relevance 3%]

* Re: [RFC 0/2] Add high-performance timer facility
  2023-02-28 16:01  0% ` Morten Brørup
@ 2023-03-01 11:18  0%   ` Mattias Rönnblom
  2023-03-01 13:31  3%     ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Mattias Rönnblom @ 2023-03-01 11:18 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Erik Gabriel Carrillo, David Marchand, Maria Lingemark, Stefan Sundkvist

On 2023-02-28 17:01, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Tuesday, 28 February 2023 10.39
> 
> I have been looking for a high performance timer library (for use in a fast path TCP stack), and this looks very useful, Mattias.
> 
> My initial feedback is based on quickly skimming the patch source code, and reading this cover letter.
> 
>>
>> This patchset is an attempt to introduce a high-performance, highly
>> scalable timer facility into DPDK.
>>
>> More specifically, the goals for the htimer library are:
>>
>> * Efficient handling of a handful up to hundreds of thousands of
>>    concurrent timers.
>> * Reduced overhead of adding and canceling timers.
>> * Provide a service functionally equivalent to that of
>>    <rte_timer.h>. API/ABI backward compatibility is secondary.
>>
>> In the author's opinion, there are two main shortcomings with the
>> current DPDK timer library (i.e., rte_timer.[ch]).
>>
>> One is the synchronization overhead, where heavy-weight full-barrier
>> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
>> lists, but any thread may add or cancel (or otherwise access) timers
>> managed by another lcore (and thus resides in its timer skip list).
>>
>> The other is an algorithmic shortcoming, with rte_timer.c's reliance
>> on a skip list, which, seemingly, is less efficient than certain
>> alternatives.
>>
>> This patchset implements a hierarchical timer wheel (HWT, in
> 
> Typo: HWT or HTW?

Yes. I don't understand how I could managed to make so many such HTW -> 
HWT typos. At least I got the filenames (rte_htw.[ch]) correct.

> 
>> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
>> Hierarchical Timing Wheels: Data Structures for the Efficient
>> Implementation of a Timer Facility". A HWT is a data structure
>> purposely design for this task, and used by many operating system
>> kernel timer facilities.
>>
>> To further improve the solution described by Varghese and Lauck, a
>> bitset is placed in front of each of the timer wheel in the HWT,
>> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
>> and expiry processing).
>>
>> Cycle-efficient scanning and manipulation of these bitsets are crucial
>> for the HWT's performance.
>>
>> The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
>> instance, much like rte_timer.c keeps a per-lcore skip list.
>>
>> To avoid expensive synchronization overhead for thread-local timer
>> management, the HWTs are accessed only from the "owning" thread.  Any
>> interaction any other thread has with a particular lcore's timer
>> wheel goes over a set of DPDK rings. A side-effect of this design is
>> that all operations working toward a "remote" HWT must be
>> asynchronous.
>>
>> The <rte_htimer.h> API is available only to EAL threads and registered
>> non-EAL threads.
>>
>> The htimer API allows the application to supply the current time,
>> useful in case it already has retrieved this for other purposes,
>> saving the cost of a rdtsc instruction (or its equivalent).
>>
>> Relative htimer does not retrieve a new time, but reuse the current
>> time (as known via/at-the-time of the manage-call), again to shave off
>> some cycles of overhead.
> 
> I have a comment to the two points above.
> 
> I agree that the application should supply the current time.
> 
> This should be the concept throughout the library. I don't understand why TSC is used in the library at all?
> 
> Please use a unit-less tick, and let the application decide what one tick means.
> 

I suspect the design of rte_htimer_mgr.h (and rte_timer.h) makes more 
sense if you think of the user of the API as not just a "monolithic" 
application, but rather a set of different modules, developed by 
different organizations, and reused across a set of applications. The 
idea behind the API design is they should all be able to share one timer 
service instance.

The different parts of the application and any future DPDK platform 
modules that use the htimer service needs to agree what a tick means in 
terms of actual wall-time, if it's not mandated by the API.

There might be room for module-specific timer wheels as well, with 
different resolution or other characteristics. The event timer adapter's 
use of a timer wheel could be one example (although I'm not sure it is).

If timer-wheel-as-a-private-lego-piece is also a valid use case, then 
one could consider make the <rte_htw.h> API public as well. That is what 
I think you as asking for here: a generic timer wheel that doesn't know 
anything about time sources, time source time -> tick conversion, or 
timer source time -> monotonic wall time conversion, and maybe is also 
not bound to a particular thread.

I picked TSC because it seemed like a good "universal time unit" for 
DPDK. rdtsc (and its equivalent) is also a very precise (especially on 
x86) and cheap-to-retrieve (especially on ARM, from what I understand).

That said, at the moment, I'm leaning toward nanoseconds (uint64_t 
format) should be the default for timer expiration time instead of TSC. 
TSC could still be an option for passing the current time, since TSC 
will be a common time source, and it shaves off one conversion.

> A unit-less tick will also let the application instantiate a HTW with higher resolution than the TSC. (E.g. think about oversampling in audio processing, or Brezenham's line drawing algorithm for 2D visuals - oversampling can sound and look better.)
> 
> For reference (supporting my suggestion), the dynamic timestamp field in the rte_mbuf structure is also defined as being unit-less. (I think NVIDIA implements it as nanoseconds, but that's an implementation specific choice.)
> 
>>
>> A semantic improvement compared to the <rte_timer.h> API is that the
>> htimer library can give a definite answer on the question if the timer
>> expiry callback was called, after a timer has been canceled.
>>
>> Below is a performance data from DPDK's 'app/test' micro benchmarks,
>> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
>> test_htimer_mgr_perf.c) aren't identical in their structure, but the
>> numbers give some indication of the difference.
>>
>> Use case               htimer  timer
>> ------------------------------------
>> Add timer                 28    253
>> Cancel timer              10    412
>> Async add (source lcore)  64
>> Async add (target lcore)  13
>>
>> (AMD 5900X CPU. Time in TSC.)
>>
>> Prototype integration of the htimer library into real, timer-heavy,
>> applications indicates that htimer may result in significant
>> application-level performance gains.
>>
>> The bitset implementation which the HWT implementation depends upon
>> seemed generic-enough and potentially useful outside the world of
>> HWTs, to justify being located in the EAL.
>>
>> This patchset is very much an RFC, and the author is yet to form an
>> opinion on many important issues.
>>
>> * If deemed a suitable replacement, should the htimer replace the
>>    current DPDK timer library in some particular (ABI-breaking)
>>    release, or should it live side-by-side with the then-legacy
>>    <rte_timer.h> API? A lot of things in and outside DPDK depend on
>>    <rte_timer.h>, so coexistence may be required to facilitate a smooth
>>    transition.
> 
> It's my immediate impression that they are totally different in both design philosophy and API.
> 
> Personal opinion: I would call it an entirely different library.
> 
>>
>> * Should the htimer and htw-related files be colocated with rte_timer.c
>>    in the timer library?
> 
> Personal opinion: No. This is an entirely different library, and should live for itself in a directory of its own.
> 
>>
>> * Would it be useful for applications using asynchronous cancel to
>>    have the option of having the timer callback run not only in case of
>>    timer expiration, but also cancellation (on the target lcore)? The
>>    timer cb signature would need to include an additional parameter in
>>    that case.
> 
> If one thread cancels something in another thread, some synchronization between the threads is going to be required anyway. So we could reprase your question: Will the burden of the otherwise required synchronization between the two threads be significantly reduced if the library has the ability to run the callback on asynchronous cancel?
> 

Yes.

Intuitively, it seems convenient that if you hand off a timer to a 
different lcore, the timer callback will be called exactly once, 
regardless if the timer was canceled or expired.

But, as you indicate, you may still need synchronization to solve the 
resource reclamation issue.

> Is such a feature mostly "Must have" or "Nice to have"?
> 
> More thoughts in this area...
> 
> If adding and additional callback parameter, it could be an enum, so the callback could be expanded to support "timeout (a.k.a. timer fired)", "cancel" and more events we have not yet come up with, e.g. "early kick".
> 

Yes, or an int.

> Here's an idea off the top of my head: An additional callback parameter has a (small) performance cost incurred with every timer fired (which is a very large multiplier). It might not be required. As an alternative to an "what happened" parameter to the callback, the callback could investigate the state of the object for which the timer fired, and draw its own conclusion on how to proceed. Obviously, this also has a performance cost, but perhaps the callback works on the object's state anyway, making this cost insignificant.
> 

It's not obvious to me that you, in the timer callback, can determine 
what happened, if the same callback is called both in the cancel and the 
expired case.

The cost of an extra integer passed in a register (or checking a flag, 
if the timer callback should be called at all at cancellation) that is 
the concern for me; it's extra bit of API complexity.

> Here's another alternative to adding a "what happened" parameter to the callback:
> 
> The rte_htimer could have one more callback pointer, which (if set) will be called on cancellation of the timer.
> 

This will grow the timer struct with 16 bytes.

>>
>> * Should the rte_htimer be a nested struct, so the htw parts be separated
>>    from the htimer parts?
>>
>> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
>>    <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
>>    be so?
>>
>> * rte_htimer struct is only supposed to be used by the application to
>>    give an indication of how much memory it needs to allocate, and is
>>    its member are not supposed to be directly accessed (w/ the possible
>>    exception of the owner_lcore_id field). Should there be a dummy
>>    struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
>>    function instead, serving the same purpose? Better encapsulation,
>>    but more inconvenient for applications. Run-time dynamic sizing
>>    would force application-level dynamic allocations.
>>
>> * Asynchronous cancellation is a little tricky to use for the
>>    application (primarily due to timer memory reclamation/race
>>    issues). Should this functionality be removed?
>>
>> * Should rte_htimer_mgr_init() also retrieve the current time? If so,
>>    there should to be a variant which allows the user to specify the
>>    time (to match rte_htimer_mgr_manage_time()). One pitfall with the
>>    current proposed API is an application calling rte_htimer_mgr_init()
>>    and then immediately adding a timer with a relative timeout, in
>>    which case the current absolute time used is 0, which might be a
>>    surprise.
>>
>> * Should libdivide (optionally) be used to avoid the div in the TSC ->
>>    tick conversion? (Doesn't improve performance on Zen 3, but may
>>    do on other CPUs.) Consider <rte_reciprocal.h> as well.
>>
>> * Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
>>    used for conversion? Very minor performance gains to be found there,
>>    at least on Zen 3 cores.
>>
>> * Should it be possible to supply the time in rte_htimer_mgr_add()
>>    and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
>>    as TSC? Should it be possible to also use nanoseconds?
>>    rte_htimer_mgr_manage_time() would need a flags parameter in that
>>    case.
> 
> Do not use TSC anywhere in this library. Let the application decide the meaning of a tick.
> 
>>
>> * Would the event timer adapter be best off using <rte_htw.h>
>>    directly, or <rte_htimer.h>? In the latter case, there needs to be a
>>    way to instantiate more HWTs (similar to the "alt" functions of
>>    <rte_timer.h>)?
>>
>> * Should the PERIODICAL flag (and the complexity it brings) be
>>    removed? And leave the application with only single-shot timers, and
>>    the option to re-add them in the timer callback.
> 
> First thought: Yes, keep it lean and remove the periodical stuff.
> 
> Second thought: This needs a more detailed analysis.
> 
>  From one angle:
> 
> How many PERIODICAL versus ONESHOT timers do we expect?
> 

I suspect you should be prepared for the ratio being anything.

> Intuitively, I would use this library for ONESHOT timers, and perhaps implement my periodical timers by other means.
> 
> If the PERIODICAL:ONESHOT ratio is low, we can probably live with the extra cost of cancel+add for a few periodical timers.
> 
>  From another angle:
> 
> What is the performance gain with the PERIODICAL flag?
> 

None, pretty much. It's just there for convenience.

> Without a periodical timer, cancel+add costs 10+28 cycles. How many cycles would a "move" function, performing both cancel and add, use?
> 
> And then compare that to the cost (in cycles) of repeating a timer with PERIODICAL?
> 
> Furthermore, not having the PERIODICAL flag probably improves the performance for non-periodical timers. How many cycles could we gain here?
> 
> 
> Another, vaguely related, idea:
> 
> The callback pointer might not need to be stored per rte_htimer, but could instead be common for the rte_htw.
> 

Do you mean rte_htw, or rte_htimer_mgr?

If you make one common callback, all the different parts of the 
application needs to be coordinated (in a big switch-statement, or 
something of that sort), or have some convention for using an 
application-specific wrapper structure (accessed via container_of()).

This is a problem if the timer service API consumer is a set of largely 
uncoordinated software modules.

Btw, the eventdev API has the same issue, and the proposed event 
dispatcher is one way to help facilitate application-internal decoupling.

For a module-private rte_htw instance your suggestion may work, but not 
for <rte_htimer_mgr.h>.

> When a timer fires, the callback probably needs to check/update the state of the object for which the timer fired anyway, so why not just let the application use that state to determine the appropriate action. This might provide some performance benefit.
> 
> It might complicate using one HTW for multiple different purposes, though. Probably a useless idea, but I wanted to share the idea anyway. It might trigger other, better ideas in the community.
> 
>>
>> * Should the async result codes and the sync cancel error codes be merged
>>    into one set of result codes?
>>
>> * Should the rte_htimer_mgr_async_add() have a flag which allow
>>    buffering add request messages until rte_htimer_mgr_process() is
>>    called? Or any manage function. Would reduce ring signaling overhead
>>    (i.e., burst enqueue operations instead of single-element
>>    enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
>>    solving the same "problem" a different way. (The signature of such
>>    a function would not be pretty.)
>>
>> * Does the functionality provided by the rte_htimer_mgr_process()
>>    function match its the use cases? Should there me a more clear
>>    separation between expiry processing and asynchronous operation
>>    processing?
>>
>> * Should the patchset be split into more commits? If so, how?
>>
>> Thanks to Erik Carrillo for his assistance.
>>
>> Mattias Rönnblom (2):
>>    eal: add bitset type
>>    eal: add high-performance timer facility


^ permalink raw reply	[relevance 0%]

* RE: [RFC 0/2] Add high-performance timer facility
  2023-02-28  9:39  3% [RFC 0/2] Add high-performance timer facility Mattias Rönnblom
@ 2023-02-28 16:01  0% ` Morten Brørup
  2023-03-01 11:18  0%   ` Mattias Rönnblom
  2023-03-15 17:03  3% ` [RFC v2 " Mattias Rönnblom
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-28 16:01 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Erik Gabriel Carrillo, David Marchand, maria.lingemark, Stefan Sundkvist

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Tuesday, 28 February 2023 10.39

I have been looking for a high performance timer library (for use in a fast path TCP stack), and this looks very useful, Mattias.

My initial feedback is based on quickly skimming the patch source code, and reading this cover letter.

> 
> This patchset is an attempt to introduce a high-performance, highly
> scalable timer facility into DPDK.
> 
> More specifically, the goals for the htimer library are:
> 
> * Efficient handling of a handful up to hundreds of thousands of
>   concurrent timers.
> * Reduced overhead of adding and canceling timers.
> * Provide a service functionally equivalent to that of
>   <rte_timer.h>. API/ABI backward compatibility is secondary.
> 
> In the author's opinion, there are two main shortcomings with the
> current DPDK timer library (i.e., rte_timer.[ch]).
> 
> One is the synchronization overhead, where heavy-weight full-barrier
> type synchronization is used. rte_timer.c uses per-EAL/lcore skip
> lists, but any thread may add or cancel (or otherwise access) timers
> managed by another lcore (and thus resides in its timer skip list).
> 
> The other is an algorithmic shortcoming, with rte_timer.c's reliance
> on a skip list, which, seemingly, is less efficient than certain
> alternatives.
> 
> This patchset implements a hierarchical timer wheel (HWT, in

Typo: HWT or HTW?

> rte_htw.c), as per the Varghese and Lauck paper "Hashed and
> Hierarchical Timing Wheels: Data Structures for the Efficient
> Implementation of a Timer Facility". A HWT is a data structure
> purposely design for this task, and used by many operating system
> kernel timer facilities.
> 
> To further improve the solution described by Varghese and Lauck, a
> bitset is placed in front of each of the timer wheel in the HWT,
> reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
> and expiry processing).
> 
> Cycle-efficient scanning and manipulation of these bitsets are crucial
> for the HWT's performance.
> 
> The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
> instance, much like rte_timer.c keeps a per-lcore skip list.
> 
> To avoid expensive synchronization overhead for thread-local timer
> management, the HWTs are accessed only from the "owning" thread.  Any
> interaction any other thread has with a particular lcore's timer
> wheel goes over a set of DPDK rings. A side-effect of this design is
> that all operations working toward a "remote" HWT must be
> asynchronous.
> 
> The <rte_htimer.h> API is available only to EAL threads and registered
> non-EAL threads.
> 
> The htimer API allows the application to supply the current time,
> useful in case it already has retrieved this for other purposes,
> saving the cost of a rdtsc instruction (or its equivalent).
> 
> Relative htimer does not retrieve a new time, but reuse the current
> time (as known via/at-the-time of the manage-call), again to shave off
> some cycles of overhead.

I have a comment to the two points above.

I agree that the application should supply the current time.

This should be the concept throughout the library. I don't understand why TSC is used in the library at all?

Please use a unit-less tick, and let the application decide what one tick means.

A unit-less tick will also let the application instantiate a HTW with higher resolution than the TSC. (E.g. think about oversampling in audio processing, or Brezenham's line drawing algorithm for 2D visuals - oversampling can sound and look better.)

For reference (supporting my suggestion), the dynamic timestamp field in the rte_mbuf structure is also defined as being unit-less. (I think NVIDIA implements it as nanoseconds, but that's an implementation specific choice.)

> 
> A semantic improvement compared to the <rte_timer.h> API is that the
> htimer library can give a definite answer on the question if the timer
> expiry callback was called, after a timer has been canceled.
> 
> Below is a performance data from DPDK's 'app/test' micro benchmarks,
> using 10k concurrent timers. The benchmarks (test_timer_perf.c and
> test_htimer_mgr_perf.c) aren't identical in their structure, but the
> numbers give some indication of the difference.
> 
> Use case               htimer  timer
> ------------------------------------
> Add timer                 28    253
> Cancel timer              10    412
> Async add (source lcore)  64
> Async add (target lcore)  13
> 
> (AMD 5900X CPU. Time in TSC.)
> 
> Prototype integration of the htimer library into real, timer-heavy,
> applications indicates that htimer may result in significant
> application-level performance gains.
> 
> The bitset implementation which the HWT implementation depends upon
> seemed generic-enough and potentially useful outside the world of
> HWTs, to justify being located in the EAL.
> 
> This patchset is very much an RFC, and the author is yet to form an
> opinion on many important issues.
> 
> * If deemed a suitable replacement, should the htimer replace the
>   current DPDK timer library in some particular (ABI-breaking)
>   release, or should it live side-by-side with the then-legacy
>   <rte_timer.h> API? A lot of things in and outside DPDK depend on
>   <rte_timer.h>, so coexistence may be required to facilitate a smooth
>   transition.

It's my immediate impression that they are totally different in both design philosophy and API.

Personal opinion: I would call it an entirely different library.

> 
> * Should the htimer and htw-related files be colocated with rte_timer.c
>   in the timer library?

Personal opinion: No. This is an entirely different library, and should live for itself in a directory of its own.

> 
> * Would it be useful for applications using asynchronous cancel to
>   have the option of having the timer callback run not only in case of
>   timer expiration, but also cancellation (on the target lcore)? The
>   timer cb signature would need to include an additional parameter in
>   that case.

If one thread cancels something in another thread, some synchronization between the threads is going to be required anyway. So we could reprase your question: Will the burden of the otherwise required synchronization between the two threads be significantly reduced if the library has the ability to run the callback on asynchronous cancel?

Is such a feature mostly "Must have" or "Nice to have"?

More thoughts in this area...

If adding and additional callback parameter, it could be an enum, so the callback could be expanded to support "timeout (a.k.a. timer fired)", "cancel" and more events we have not yet come up with, e.g. "early kick".

Here's an idea off the top of my head: An additional callback parameter has a (small) performance cost incurred with every timer fired (which is a very large multiplier). It might not be required. As an alternative to an "what happened" parameter to the callback, the callback could investigate the state of the object for which the timer fired, and draw its own conclusion on how to proceed. Obviously, this also has a performance cost, but perhaps the callback works on the object's state anyway, making this cost insignificant.

Here's another alternative to adding a "what happened" parameter to the callback:

The rte_htimer could have one more callback pointer, which (if set) will be called on cancellation of the timer.

> 
> * Should the rte_htimer be a nested struct, so the htw parts be separated
>   from the htimer parts?
> 
> * <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
>   <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
>   be so?
> 
> * rte_htimer struct is only supposed to be used by the application to
>   give an indication of how much memory it needs to allocate, and is
>   its member are not supposed to be directly accessed (w/ the possible
>   exception of the owner_lcore_id field). Should there be a dummy
>   struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
>   function instead, serving the same purpose? Better encapsulation,
>   but more inconvenient for applications. Run-time dynamic sizing
>   would force application-level dynamic allocations.
> 
> * Asynchronous cancellation is a little tricky to use for the
>   application (primarily due to timer memory reclamation/race
>   issues). Should this functionality be removed?
> 
> * Should rte_htimer_mgr_init() also retrieve the current time? If so,
>   there should to be a variant which allows the user to specify the
>   time (to match rte_htimer_mgr_manage_time()). One pitfall with the
>   current proposed API is an application calling rte_htimer_mgr_init()
>   and then immediately adding a timer with a relative timeout, in
>   which case the current absolute time used is 0, which might be a
>   surprise.
> 
> * Should libdivide (optionally) be used to avoid the div in the TSC ->
>   tick conversion? (Doesn't improve performance on Zen 3, but may
>   do on other CPUs.) Consider <rte_reciprocal.h> as well.
> 
> * Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
>   used for conversion? Very minor performance gains to be found there,
>   at least on Zen 3 cores.
> 
> * Should it be possible to supply the time in rte_htimer_mgr_add()
>   and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
>   as TSC? Should it be possible to also use nanoseconds?
>   rte_htimer_mgr_manage_time() would need a flags parameter in that
>   case.

Do not use TSC anywhere in this library. Let the application decide the meaning of a tick.

> 
> * Would the event timer adapter be best off using <rte_htw.h>
>   directly, or <rte_htimer.h>? In the latter case, there needs to be a
>   way to instantiate more HWTs (similar to the "alt" functions of
>   <rte_timer.h>)?
> 
> * Should the PERIODICAL flag (and the complexity it brings) be
>   removed? And leave the application with only single-shot timers, and
>   the option to re-add them in the timer callback.

First thought: Yes, keep it lean and remove the periodical stuff.

Second thought: This needs a more detailed analysis.

From one angle:

How many PERIODICAL versus ONESHOT timers do we expect?

Intuitively, I would use this library for ONESHOT timers, and perhaps implement my periodical timers by other means.

If the PERIODICAL:ONESHOT ratio is low, we can probably live with the extra cost of cancel+add for a few periodical timers.

From another angle:

What is the performance gain with the PERIODICAL flag?

Without a periodical timer, cancel+add costs 10+28 cycles. How many cycles would a "move" function, performing both cancel and add, use?

And then compare that to the cost (in cycles) of repeating a timer with PERIODICAL?

Furthermore, not having the PERIODICAL flag probably improves the performance for non-periodical timers. How many cycles could we gain here?


Another, vaguely related, idea:

The callback pointer might not need to be stored per rte_htimer, but could instead be common for the rte_htw.

When a timer fires, the callback probably needs to check/update the state of the object for which the timer fired anyway, so why not just let the application use that state to determine the appropriate action. This might provide some performance benefit.

It might complicate using one HTW for multiple different purposes, though. Probably a useless idea, but I wanted to share the idea anyway. It might trigger other, better ideas in the community.

> 
> * Should the async result codes and the sync cancel error codes be merged
>   into one set of result codes?
> 
> * Should the rte_htimer_mgr_async_add() have a flag which allow
>   buffering add request messages until rte_htimer_mgr_process() is
>   called? Or any manage function. Would reduce ring signaling overhead
>   (i.e., burst enqueue operations instead of single-element
>   enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
>   solving the same "problem" a different way. (The signature of such
>   a function would not be pretty.)
> 
> * Does the functionality provided by the rte_htimer_mgr_process()
>   function match its the use cases? Should there me a more clear
>   separation between expiry processing and asynchronous operation
>   processing?
> 
> * Should the patchset be split into more commits? If so, how?
> 
> Thanks to Erik Carrillo for his assistance.
> 
> Mattias Rönnblom (2):
>   eal: add bitset type
>   eal: add high-performance timer facility

^ permalink raw reply	[relevance 0%]

* [RFC 0/2] Add high-performance timer facility
@ 2023-02-28  9:39  3% Mattias Rönnblom
  2023-02-28 16:01  0% ` Morten Brørup
  2023-03-15 17:03  3% ` [RFC v2 " Mattias Rönnblom
  0 siblings, 2 replies; 200+ results
From: Mattias Rönnblom @ 2023-02-28  9:39 UTC (permalink / raw)
  To: dev
  Cc: Erik Gabriel Carrillo, David Marchand, maria.lingemark,
	Stefan Sundkvist, Mattias Rönnblom

This patchset is an attempt to introduce a high-performance, highly
scalable timer facility into DPDK.

More specifically, the goals for the htimer library are:

* Efficient handling of a handful up to hundreds of thousands of
  concurrent timers.
* Reduced overhead of adding and canceling timers.
* Provide a service functionally equivalent to that of
  <rte_timer.h>. API/ABI backward compatibility is secondary.

In the author's opinion, there are two main shortcomings with the
current DPDK timer library (i.e., rte_timer.[ch]).

One is the synchronization overhead, where heavy-weight full-barrier
type synchronization is used. rte_timer.c uses per-EAL/lcore skip
lists, but any thread may add or cancel (or otherwise access) timers
managed by another lcore (and thus resides in its timer skip list).

The other is an algorithmic shortcoming, with rte_timer.c's reliance
on a skip list, which, seemingly, is less efficient than certain
alternatives.

This patchset implements a hierarchical timer wheel (HWT, in
rte_htw.c), as per the Varghese and Lauck paper "Hashed and
Hierarchical Timing Wheels: Data Structures for the Efficient
Implementation of a Timer Facility". A HWT is a data structure
purposely design for this task, and used by many operating system
kernel timer facilities.

To further improve the solution described by Varghese and Lauck, a
bitset is placed in front of each of the timer wheel in the HWT,
reducing overhead of rte_htimer_mgr_manage() (i.e., progressing time
and expiry processing).

Cycle-efficient scanning and manipulation of these bitsets are crucial
for the HWT's performance.

The htimer module keeps a per-lcore (or per-registered EAL thread) HWT
instance, much like rte_timer.c keeps a per-lcore skip list.

To avoid expensive synchronization overhead for thread-local timer
management, the HWTs are accessed only from the "owning" thread.  Any
interaction any other thread has with a particular lcore's timer
wheel goes over a set of DPDK rings. A side-effect of this design is
that all operations working toward a "remote" HWT must be
asynchronous.

The <rte_htimer.h> API is available only to EAL threads and registered
non-EAL threads.

The htimer API allows the application to supply the current time,
useful in case it already has retrieved this for other purposes,
saving the cost of a rdtsc instruction (or its equivalent).

Relative htimer does not retrieve a new time, but reuse the current
time (as known via/at-the-time of the manage-call), again to shave off
some cycles of overhead.

A semantic improvement compared to the <rte_timer.h> API is that the
htimer library can give a definite answer on the question if the timer
expiry callback was called, after a timer has been canceled.

Below is a performance data from DPDK's 'app/test' micro benchmarks,
using 10k concurrent timers. The benchmarks (test_timer_perf.c and
test_htimer_mgr_perf.c) aren't identical in their structure, but the
numbers give some indication of the difference.

Use case               htimer  timer
------------------------------------
Add timer                 28    253
Cancel timer              10    412
Async add (source lcore)  64
Async add (target lcore)  13

(AMD 5900X CPU. Time in TSC.)

Prototype integration of the htimer library into real, timer-heavy,
applications indicates that htimer may result in significant
application-level performance gains.

The bitset implementation which the HWT implementation depends upon
seemed generic-enough and potentially useful outside the world of
HWTs, to justify being located in the EAL.

This patchset is very much an RFC, and the author is yet to form an
opinion on many important issues.

* If deemed a suitable replacement, should the htimer replace the
  current DPDK timer library in some particular (ABI-breaking)
  release, or should it live side-by-side with the then-legacy
  <rte_timer.h> API? A lot of things in and outside DPDK depend on
  <rte_timer.h>, so coexistence may be required to facilitate a smooth
  transition.

* Should the htimer and htw-related files be colocated with rte_timer.c
  in the timer library?

* Would it be useful for applications using asynchronous cancel to
  have the option of having the timer callback run not only in case of
  timer expiration, but also cancellation (on the target lcore)? The
  timer cb signature would need to include an additional parameter in
  that case.

* Should the rte_htimer be a nested struct, so the htw parts be separated
  from the htimer parts?

* <rte_htimer.h> is kept separate from <rte_htimer_mgr.h>, so that
  <rte_htw.h> may avoid a depedency to <rte_htimer_mgr.h>. Should it
  be so?

* rte_htimer struct is only supposed to be used by the application to
  give an indication of how much memory it needs to allocate, and is
  its member are not supposed to be directly accessed (w/ the possible
  exception of the owner_lcore_id field). Should there be a dummy
  struct, or a #define RTE_HTIMER_MEMSIZE or a rte_htimer_get_memsize()
  function instead, serving the same purpose? Better encapsulation,
  but more inconvenient for applications. Run-time dynamic sizing
  would force application-level dynamic allocations.

* Asynchronous cancellation is a little tricky to use for the
  application (primarily due to timer memory reclamation/race
  issues). Should this functionality be removed?
  
* Should rte_htimer_mgr_init() also retrieve the current time? If so,
  there should to be a variant which allows the user to specify the
  time (to match rte_htimer_mgr_manage_time()). One pitfall with the
  current proposed API is an application calling rte_htimer_mgr_init()
  and then immediately adding a timer with a relative timeout, in
  which case the current absolute time used is 0, which might be a
  surprise.

* Should libdivide (optionally) be used to avoid the div in the TSC ->
  tick conversion? (Doesn't improve performance on Zen 3, but may
  do on other CPUs.) Consider <rte_reciprocal.h> as well.

* Should the TSC-per-tick be rounded up to a power of 2, so shifts can be
  used for conversion? Very minor performance gains to be found there,
  at least on Zen 3 cores.

* Should it be possible to supply the time in rte_htimer_mgr_add()
  and/or rte_htimer_mgr_manage_time() functions as ticks, rather than
  as TSC? Should it be possible to also use nanoseconds?
  rte_htimer_mgr_manage_time() would need a flags parameter in that
  case.

* Would the event timer adapter be best off using <rte_htw.h>
  directly, or <rte_htimer.h>? In the latter case, there needs to be a
  way to instantiate more HWTs (similar to the "alt" functions of
  <rte_timer.h>)?

* Should the PERIODICAL flag (and the complexity it brings) be
  removed? And leave the application with only single-shot timers, and
  the option to re-add them in the timer callback.

* Should the async result codes and the sync cancel error codes be merged
  into one set of result codes?

* Should the rte_htimer_mgr_async_add() have a flag which allow
  buffering add request messages until rte_htimer_mgr_process() is
  called? Or any manage function. Would reduce ring signaling overhead
  (i.e., burst enqueue operations instead of single-element
  enqueue). Could also be a rte_htimer_mgr_async_add_burst() function,
  solving the same "problem" a different way. (The signature of such
  a function would not be pretty.)

* Does the functionality provided by the rte_htimer_mgr_process()
  function match its the use cases? Should there me a more clear
  separation between expiry processing and asynchronous operation
  processing?

* Should the patchset be split into more commits? If so, how?

Thanks to Erik Carrillo for his assistance.

Mattias Rönnblom (2):
  eal: add bitset type
  eal: add high-performance timer facility

 app/test/meson.build             |  10 +-
 app/test/test_bitset.c           | 646 +++++++++++++++++++++++
 app/test/test_htimer_mgr.c       | 674 ++++++++++++++++++++++++
 app/test/test_htimer_mgr_perf.c  | 324 ++++++++++++
 app/test/test_htw.c              | 478 +++++++++++++++++
 app/test/test_htw_perf.c         | 181 +++++++
 doc/api/doxy-api-index.md        |   5 +-
 doc/api/doxy-api.conf.in         |   1 +
 lib/eal/common/meson.build       |   1 +
 lib/eal/common/rte_bitset.c      |  29 +
 lib/eal/include/meson.build      |   1 +
 lib/eal/include/rte_bitset.h     | 878 +++++++++++++++++++++++++++++++
 lib/eal/version.map              |   3 +
 lib/htimer/meson.build           |   7 +
 lib/htimer/rte_htimer.h          |  65 +++
 lib/htimer/rte_htimer_mgr.c      | 488 +++++++++++++++++
 lib/htimer/rte_htimer_mgr.h      | 497 +++++++++++++++++
 lib/htimer/rte_htimer_msg.h      |  44 ++
 lib/htimer/rte_htimer_msg_ring.c |  18 +
 lib/htimer/rte_htimer_msg_ring.h |  49 ++
 lib/htimer/rte_htw.c             | 437 +++++++++++++++
 lib/htimer/rte_htw.h             |  49 ++
 lib/htimer/version.map           |  17 +
 lib/meson.build                  |   1 +
 24 files changed, 4901 insertions(+), 2 deletions(-)
 create mode 100644 app/test/test_bitset.c
 create mode 100644 app/test/test_htimer_mgr.c
 create mode 100644 app/test/test_htimer_mgr_perf.c
 create mode 100644 app/test/test_htw.c
 create mode 100644 app/test/test_htw_perf.c
 create mode 100644 lib/eal/common/rte_bitset.c
 create mode 100644 lib/eal/include/rte_bitset.h
 create mode 100644 lib/htimer/meson.build
 create mode 100644 lib/htimer/rte_htimer.h
 create mode 100644 lib/htimer/rte_htimer_mgr.c
 create mode 100644 lib/htimer/rte_htimer_mgr.h
 create mode 100644 lib/htimer/rte_htimer_msg.h
 create mode 100644 lib/htimer/rte_htimer_msg_ring.c
 create mode 100644 lib/htimer/rte_htimer_msg_ring.h
 create mode 100644 lib/htimer/rte_htw.c
 create mode 100644 lib/htimer/rte_htw.h
 create mode 100644 lib/htimer/version.map

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* Re: [RFC PATCH] drivers/net: fix RSS multi-queue mode check
  @ 2023-02-28  8:23  3%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-02-28  8:23 UTC (permalink / raw)
  To: lihuisong (C),
	Ajit Khaparde, Somnath Kotur, Rahul Lakkireddy, Simei Su,
	Wenjun Wu, Marcin Wojtas, Michal Krawczyk, Shai Brandes,
	Evgeny Schemeilin, Igor Chauskin, John Daley, Hyong Youb Kim,
	Qi Zhang, Xiao Wang, Junfeng Guo, Ziyang Xuan, Xiaoyun Wang,
	Guoyang Zhou, Dongdong Liu, Yisen Zhuang, Yuying Zhang,
	Beilei Xing, Jingjing Wu, Qiming Yang, Shijith Thotton,
	Srisivasubramanian Srinivasan, Long Li, Chaoyong He,
	Niklas Söderlund, Jiawen Wu, Rasesh Mody,
	Devendra Singh Rawat, Jerin Jacob, Maciej Czekaj, Jian Wang,
	Jochen Behrens, Andrew Rybchenko
  Cc: Thomas Monjalon, dev, stable

On 2/28/2023 1:24 AM, lihuisong (C) wrote:
> 
> 在 2023/2/27 17:57, Ferruh Yigit 写道:
>> On 2/27/2023 1:34 AM, lihuisong (C) wrote:
>>> 在 2023/2/24 0:04, Ferruh Yigit 写道:
>>>> 'rxmode.mq_mode' is an enum which should be an abstraction over values,
>>>> instead of mask it with 'RTE_ETH_MQ_RX_RSS_FLAG' to detect if RSS is
>>>> supported, directly compare with 'RTE_ETH_MQ_RX_RSS' enum element.
>>>>
>>>> Most of the time only 'RTE_ETH_MQ_RX_RSS' is requested by user, that is
>>>> why output is almost same, but there may be cases driver doesn't
>>>> support
>>>> RSS combinations, like 'RTE_ETH_MQ_RX_VMDQ_DCB_RSS' but that is hidden
>>>> by masking with 'RTE_ETH_MQ_RX_RSS_FLAG'.
>>> Hi Ferruh,
>>>
>>> It seems that this fully changes the usage of the mq_mode.
>>> It will cause RSS, DCB and VMDQ function cannot work well.
>>>
>>> For example,
>>> Both user and driver enable RSS and DCB functions based on xxx_DCB_FLAG
>>> and xxx_RSS_FLAG in rxmode.mq_mode.
>>> If we directly compare with 'RTE_ETH_MQ_RX_RSS' enum element now, how do
>>> we enable RSS+DCB mode?
>>>
>> Hi Huisong,
>>
>> Technically 'RSS+DCB' mode can be set by user setting 'rxmode.mq_mode'
>> to 'RTE_ETH_MQ_RX_DCB_RSS' and PMD checking the same.
> This is not a good way to use.
> Because this has a greate impact for user and PMDs and will add
> cyclomatic complexity of PMD.
>>
>> Overall I think it is not good idea to use enum items as masked values,
> I agree what you do.
> It is better to change rxmode.mq_mode and txmode.mq_mode type from
> 'enum' to 'u32'.
> In this way, PMD code logic don't need to be modified and the impact on
> PMDs and user is minimal.
> What do you think?

If bitmask feature of mq_mode is used and needed, I agree changing
underlying data type cause less disturbance in logic.

But chaning underlying data type has ABI impications, for now I will
drop this patch, thanks for the feedback.

>> but that seems done intentionally in the past:
>> Commit 4bdefaade6d1 ("ethdev: VMDQ enhancements")
> Seems it was.
>>
>> Since this can be in use already, following patch only changes where
>> 'RTE_ETH_RX_OFFLOAD_RSS_HASH' is set, rest of the usage remaining same.
>>
>> And even for 'RTE_ETH_RX_OFFLOAD_RSS_HASH', I think intention was to
>> override this offload config in PMD when explicitly RSS mode is enabled,
>> but I made the set as RFC to get feedback on this. We may keep as it is
>> if some other modes with 'RTE_ETH_MQ_RX_RSS_FLAG' uses this offload.
>>
>>>> Fixes: 73fb89dd6a00 ("drivers/net: fix RSS hash offload flag if no
>>>> RSS")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@amd.com>
>>>>
>>>> ---
>>>>
>>>> There are more usage like "rxmode->mq_mode & RTE_ETH_MQ_RX_RSS_FLAG" in
>>>> drivers, not sure to fix all in this commit or not, feedback welcomed.
>>>> ---
>>>>    drivers/net/bnxt/bnxt_ethdev.c       | 2 +-
>>>>    drivers/net/cxgbe/cxgbe_ethdev.c     | 2 +-
>>>>    drivers/net/e1000/igb_ethdev.c       | 4 ++--
>>>>    drivers/net/ena/ena_ethdev.c         | 2 +-
>>>>    drivers/net/enic/enic_ethdev.c       | 2 +-
>>>>    drivers/net/fm10k/fm10k_ethdev.c     | 2 +-
>>>>    drivers/net/gve/gve_ethdev.c         | 2 +-
>>>>    drivers/net/hinic/hinic_pmd_ethdev.c | 2 +-
>>>>    drivers/net/hns3/hns3_ethdev.c       | 2 +-
>>>>    drivers/net/hns3/hns3_ethdev_vf.c    | 2 +-
>>>>    drivers/net/i40e/i40e_ethdev.c       | 2 +-
>>>>    drivers/net/iavf/iavf_ethdev.c       | 2 +-
>>>>    drivers/net/ice/ice_dcf_ethdev.c     | 2 +-
>>>>    drivers/net/ice/ice_ethdev.c         | 2 +-
>>>>    drivers/net/igc/igc_ethdev.c         | 2 +-
>>>>    drivers/net/ixgbe/ixgbe_ethdev.c     | 4 ++--
>>>>    drivers/net/liquidio/lio_ethdev.c    | 2 +-
>>>>    drivers/net/mana/mana.c              | 2 +-
>>>>    drivers/net/netvsc/hn_ethdev.c       | 2 +-
>>>>    drivers/net/nfp/nfp_common.c         | 2 +-
>>>>    drivers/net/ngbe/ngbe_ethdev.c       | 2 +-
>>>>    drivers/net/qede/qede_ethdev.c       | 2 +-
>>>>    drivers/net/thunderx/nicvf_ethdev.c  | 2 +-
>>>>    drivers/net/txgbe/txgbe_ethdev.c     | 2 +-
>>>>    drivers/net/txgbe/txgbe_ethdev_vf.c  | 2 +-
>>>>    drivers/net/vmxnet3/vmxnet3_ethdev.c | 2 +-
>>>>    26 files changed, 28 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bnxt/bnxt_ethdev.c
>>>> b/drivers/net/bnxt/bnxt_ethdev.c
>>>> index 753e86b4b2af..14c0d5f8c72b 100644
>>>> --- a/drivers/net/bnxt/bnxt_ethdev.c
>>>> +++ b/drivers/net/bnxt/bnxt_ethdev.c
>>>> @@ -1143,7 +1143,7 @@ static int bnxt_dev_configure_op(struct
>>>> rte_eth_dev *eth_dev)
>>>>        bp->rx_cp_nr_rings = bp->rx_nr_rings;
>>>>        bp->tx_cp_nr_rings = bp->tx_nr_rings;
>>>>    -    if (eth_dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (eth_dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            rx_offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>        eth_dev->data->dev_conf.rxmode.offloads = rx_offloads;
>>>>    diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c
>>>> b/drivers/net/cxgbe/cxgbe_ethdev.c
>>>> index 45bbeaef0ceb..0e9ccc0587ba 100644
>>>> --- a/drivers/net/cxgbe/cxgbe_ethdev.c
>>>> +++ b/drivers/net/cxgbe/cxgbe_ethdev.c
>>>> @@ -440,7 +440,7 @@ int cxgbe_dev_configure(struct rte_eth_dev
>>>> *eth_dev)
>>>>          CXGBE_FUNC_TRACE();
>>>>    -    if (eth_dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (eth_dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            eth_dev->data->dev_conf.rxmode.offloads |=
>>>>                RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>    diff --git a/drivers/net/e1000/igb_ethdev.c
>>>> b/drivers/net/e1000/igb_ethdev.c
>>>> index 8858f975f8cc..8e6b43c2ff2d 100644
>>>> --- a/drivers/net/e1000/igb_ethdev.c
>>>> +++ b/drivers/net/e1000/igb_ethdev.c
>>>> @@ -1146,7 +1146,7 @@ eth_igb_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* multiple queue mode checking */
>>>> @@ -3255,7 +3255,7 @@ igbvf_dev_configure(struct rte_eth_dev *dev)
>>>>        PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id: %d",
>>>>                 dev->data->port_id);
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /*
>>>> diff --git a/drivers/net/ena/ena_ethdev.c
>>>> b/drivers/net/ena/ena_ethdev.c
>>>> index efcb163027c8..6929d7066fbd 100644
>>>> --- a/drivers/net/ena/ena_ethdev.c
>>>> +++ b/drivers/net/ena/ena_ethdev.c
>>>> @@ -2307,7 +2307,7 @@ static int ena_dev_configure(struct rte_eth_dev
>>>> *dev)
>>>>          adapter->state = ENA_ADAPTER_STATE_CONFIG;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>        dev->data->dev_conf.txmode.offloads |=
>>>> RTE_ETH_TX_OFFLOAD_MULTI_SEGS;
>>>>    diff --git a/drivers/net/enic/enic_ethdev.c
>>>> b/drivers/net/enic/enic_ethdev.c
>>>> index cdf091559196..f3a7bc161408 100644
>>>> --- a/drivers/net/enic/enic_ethdev.c
>>>> +++ b/drivers/net/enic/enic_ethdev.c
>>>> @@ -323,7 +323,7 @@ static int enicpmd_dev_configure(struct
>>>> rte_eth_dev *eth_dev)
>>>>            return ret;
>>>>        }
>>>>    -    if (eth_dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (eth_dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            eth_dev->data->dev_conf.rxmode.offloads |=
>>>>                RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>    diff --git a/drivers/net/fm10k/fm10k_ethdev.c
>>>> b/drivers/net/fm10k/fm10k_ethdev.c
>>>> index 8b83063f0a2d..49d7849ba5ea 100644
>>>> --- a/drivers/net/fm10k/fm10k_ethdev.c
>>>> +++ b/drivers/net/fm10k/fm10k_ethdev.c
>>>> @@ -450,7 +450,7 @@ fm10k_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* multiple queue mode checking */
>>>> diff --git a/drivers/net/gve/gve_ethdev.c
>>>> b/drivers/net/gve/gve_ethdev.c
>>>> index cf28a4a3b710..f34755a369fb 100644
>>>> --- a/drivers/net/gve/gve_ethdev.c
>>>> +++ b/drivers/net/gve/gve_ethdev.c
>>>> @@ -92,7 +92,7 @@ gve_dev_configure(struct rte_eth_dev *dev)
>>>>    {
>>>>        struct gve_priv *priv = dev->data->dev_private;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (dev->data->dev_conf.rxmode.offloads &
>>>> RTE_ETH_RX_OFFLOAD_TCP_LRO)
>>>> diff --git a/drivers/net/hinic/hinic_pmd_ethdev.c
>>>> b/drivers/net/hinic/hinic_pmd_ethdev.c
>>>> index 7aa5e7d8e929..872ee97b1e97 100644
>>>> --- a/drivers/net/hinic/hinic_pmd_ethdev.c
>>>> +++ b/drivers/net/hinic/hinic_pmd_ethdev.c
>>>> @@ -311,7 +311,7 @@ static int hinic_dev_configure(struct rte_eth_dev
>>>> *dev)
>>>>            return -EINVAL;
>>>>        }
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* mtu size is 256~9600 */
>>>> diff --git a/drivers/net/hns3/hns3_ethdev.c
>>>> b/drivers/net/hns3/hns3_ethdev.c
>>>> index 6babf67fcec2..fd3e499a3d38 100644
>>>> --- a/drivers/net/hns3/hns3_ethdev.c
>>>> +++ b/drivers/net/hns3/hns3_ethdev.c
>>>> @@ -2016,7 +2016,7 @@ hns3_dev_configure(struct rte_eth_dev *dev)
>>>>                goto cfg_err;
>>>>        }
>>>>    -    if ((uint32_t)mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) {
>>>> +    if (mq_mode == RTE_ETH_MQ_RX_RSS) {
>>>>            conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>            rss_conf = conf->rx_adv_conf.rss_conf;
>>>>            ret = hns3_dev_rss_hash_update(dev, &rss_conf);
>>>> diff --git a/drivers/net/hns3/hns3_ethdev_vf.c
>>>> b/drivers/net/hns3/hns3_ethdev_vf.c
>>>> index d051a1357b9f..00eb22d05558 100644
>>>> --- a/drivers/net/hns3/hns3_ethdev_vf.c
>>>> +++ b/drivers/net/hns3/hns3_ethdev_vf.c
>>>> @@ -494,7 +494,7 @@ hns3vf_dev_configure(struct rte_eth_dev *dev)
>>>>        }
>>>>          /* When RSS is not configured, redirect the packet queue 0 */
>>>> -    if ((uint32_t)mq_mode & RTE_ETH_MQ_RX_RSS_FLAG) {
>>>> +    if (mq_mode == RTE_ETH_MQ_RX_RSS) {
>>>>            conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>            rss_conf = conf->rx_adv_conf.rss_conf;
>>>>            ret = hns3_dev_rss_hash_update(dev, &rss_conf);
>>>> diff --git a/drivers/net/i40e/i40e_ethdev.c
>>>> b/drivers/net/i40e/i40e_ethdev.c
>>>> index 7726a89d99fb..3c3dbc285c96 100644
>>>> --- a/drivers/net/i40e/i40e_ethdev.c
>>>> +++ b/drivers/net/i40e/i40e_ethdev.c
>>>> @@ -1884,7 +1884,7 @@ i40e_dev_configure(struct rte_eth_dev *dev)
>>>>        ad->tx_simple_allowed = true;
>>>>        ad->tx_vec_allowed = true;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          ret = i40e_dev_init_vlan(dev);
>>>> diff --git a/drivers/net/iavf/iavf_ethdev.c
>>>> b/drivers/net/iavf/iavf_ethdev.c
>>>> index 3196210f2c1d..39860c08b606 100644
>>>> --- a/drivers/net/iavf/iavf_ethdev.c
>>>> +++ b/drivers/net/iavf/iavf_ethdev.c
>>>> @@ -638,7 +638,7 @@ iavf_dev_configure(struct rte_eth_dev *dev)
>>>>        ad->rx_vec_allowed = true;
>>>>        ad->tx_vec_allowed = true;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* Large VF setting */
>>>> diff --git a/drivers/net/ice/ice_dcf_ethdev.c
>>>> b/drivers/net/ice/ice_dcf_ethdev.c
>>>> index dcbf2af5b039..f61a30716e5e 100644
>>>> --- a/drivers/net/ice/ice_dcf_ethdev.c
>>>> +++ b/drivers/net/ice/ice_dcf_ethdev.c
>>>> @@ -711,7 +711,7 @@ ice_dcf_dev_configure(struct rte_eth_dev *dev)
>>>>        ad->rx_bulk_alloc_allowed = true;
>>>>        ad->tx_simple_allowed = true;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          return 0;
>>>> diff --git a/drivers/net/ice/ice_ethdev.c
>>>> b/drivers/net/ice/ice_ethdev.c
>>>> index 0d011bbffa77..96595fd7afaf 100644
>>>> --- a/drivers/net/ice/ice_ethdev.c
>>>> +++ b/drivers/net/ice/ice_ethdev.c
>>>> @@ -3403,7 +3403,7 @@ ice_dev_configure(struct rte_eth_dev *dev)
>>>>        ad->rx_bulk_alloc_allowed = true;
>>>>        ad->tx_simple_allowed = true;
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (dev->data->nb_rx_queues) {
>>>> diff --git a/drivers/net/igc/igc_ethdev.c
>>>> b/drivers/net/igc/igc_ethdev.c
>>>> index fab2ab6d1ce7..49f2b3738b84 100644
>>>> --- a/drivers/net/igc/igc_ethdev.c
>>>> +++ b/drivers/net/igc/igc_ethdev.c
>>>> @@ -375,7 +375,7 @@ eth_igc_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          ret  = igc_check_mq_mode(dev);
>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>> index 88118bc30560..328ccf918e86 100644
>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>>>> @@ -2431,7 +2431,7 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* multiple queue mode checking */
>>>> @@ -5321,7 +5321,7 @@ ixgbevf_dev_configure(struct rte_eth_dev *dev)
>>>>        PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id: %d",
>>>>                 dev->data->port_id);
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /*
>>>> diff --git a/drivers/net/liquidio/lio_ethdev.c
>>>> b/drivers/net/liquidio/lio_ethdev.c
>>>> index ebcfbb1a5c0f..07fbaeda1ee6 100644
>>>> --- a/drivers/net/liquidio/lio_ethdev.c
>>>> +++ b/drivers/net/liquidio/lio_ethdev.c
>>>> @@ -1722,7 +1722,7 @@ lio_dev_configure(struct rte_eth_dev *eth_dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (eth_dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (eth_dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            eth_dev->data->dev_conf.rxmode.offloads |=
>>>>                RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>    diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
>>>> index 43221e743e87..76de691a8252 100644
>>>> --- a/drivers/net/mana/mana.c
>>>> +++ b/drivers/net/mana/mana.c
>>>> @@ -78,7 +78,7 @@ mana_dev_configure(struct rte_eth_dev *dev)
>>>>        struct mana_priv *priv = dev->data->dev_private;
>>>>        struct rte_eth_conf *dev_conf = &dev->data->dev_conf;
>>>>    -    if (dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev_conf->rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev_conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (dev->data->nb_rx_queues != dev->data->nb_tx_queues) {
>>>> diff --git a/drivers/net/netvsc/hn_ethdev.c
>>>> b/drivers/net/netvsc/hn_ethdev.c
>>>> index d0bbc0a4c0c0..4950b061799c 100644
>>>> --- a/drivers/net/netvsc/hn_ethdev.c
>>>> +++ b/drivers/net/netvsc/hn_ethdev.c
>>>> @@ -721,7 +721,7 @@ static int hn_dev_configure(struct rte_eth_dev
>>>> *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev_conf->rxmode.mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev_conf->rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev_conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          unsupported = txmode->offloads & ~HN_TX_OFFLOAD_CAPS;
>>>> diff --git a/drivers/net/nfp/nfp_common.c
>>>> b/drivers/net/nfp/nfp_common.c
>>>> index 907777a9e44d..a774fad3fba2 100644
>>>> --- a/drivers/net/nfp/nfp_common.c
>>>> +++ b/drivers/net/nfp/nfp_common.c
>>>> @@ -161,7 +161,7 @@ nfp_net_configure(struct rte_eth_dev *dev)
>>>>        rxmode = &dev_conf->rxmode;
>>>>        txmode = &dev_conf->txmode;
>>>>    -    if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            rxmode->offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* Checking TX mode */
>>>> diff --git a/drivers/net/ngbe/ngbe_ethdev.c
>>>> b/drivers/net/ngbe/ngbe_ethdev.c
>>>> index c32d954769b0..5b53781c4aaf 100644
>>>> --- a/drivers/net/ngbe/ngbe_ethdev.c
>>>> +++ b/drivers/net/ngbe/ngbe_ethdev.c
>>>> @@ -918,7 +918,7 @@ ngbe_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* set flag to update link status after init */
>>>> diff --git a/drivers/net/qede/qede_ethdev.c
>>>> b/drivers/net/qede/qede_ethdev.c
>>>> index a4923670d6ba..11ddd8abf16a 100644
>>>> --- a/drivers/net/qede/qede_ethdev.c
>>>> +++ b/drivers/net/qede/qede_ethdev.c
>>>> @@ -1272,7 +1272,7 @@ static int qede_dev_configure(struct rte_eth_dev
>>>> *eth_dev)
>>>>          PMD_INIT_FUNC_TRACE(edev);
>>>>    -    if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            rxmode->offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* We need to have min 1 RX queue.There is no min check in
>>>> diff --git a/drivers/net/thunderx/nicvf_ethdev.c
>>>> b/drivers/net/thunderx/nicvf_ethdev.c
>>>> index ab1e714d9767..b9cd09332510 100644
>>>> --- a/drivers/net/thunderx/nicvf_ethdev.c
>>>> +++ b/drivers/net/thunderx/nicvf_ethdev.c
>>>> @@ -1984,7 +1984,7 @@ nicvf_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (rxmode->mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            rxmode->offloads |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (!rte_eal_has_hugepages()) {
>>>> diff --git a/drivers/net/txgbe/txgbe_ethdev.c
>>>> b/drivers/net/txgbe/txgbe_ethdev.c
>>>> index a502618bc5a2..08ad5a087e23 100644
>>>> --- a/drivers/net/txgbe/txgbe_ethdev.c
>>>> +++ b/drivers/net/txgbe/txgbe_ethdev.c
>>>> @@ -1508,7 +1508,7 @@ txgbe_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /* multiple queue mode checking */
>>>> diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c
>>>> b/drivers/net/txgbe/txgbe_ethdev_vf.c
>>>> index 3b1f7c913b7b..02a59fc696e5 100644
>>>> --- a/drivers/net/txgbe/txgbe_ethdev_vf.c
>>>> +++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
>>>> @@ -577,7 +577,7 @@ txgbevf_dev_configure(struct rte_eth_dev *dev)
>>>>        PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id: %d",
>>>>                 dev->data->port_id);
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          /*
>>>> diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c
>>>> b/drivers/net/vmxnet3/vmxnet3_ethdev.c
>>>> index fd946dec5c80..8efde46ae0ad 100644
>>>> --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
>>>> +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
>>>> @@ -531,7 +531,7 @@ vmxnet3_dev_configure(struct rte_eth_dev *dev)
>>>>          PMD_INIT_FUNC_TRACE();
>>>>    -    if (dev->data->dev_conf.rxmode.mq_mode &
>>>> RTE_ETH_MQ_RX_RSS_FLAG)
>>>> +    if (dev->data->dev_conf.rxmode.mq_mode == RTE_ETH_MQ_RX_RSS)
>>>>            dev->data->dev_conf.rxmode.offloads |=
>>>> RTE_ETH_RX_OFFLOAD_RSS_HASH;
>>>>          if (!VMXNET3_VERSION_GE_6(hw)) {
>> .


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v7 01/21] net/cpfl: support device initialization
  @ 2023-02-27 23:38  3%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-02-27 23:38 UTC (permalink / raw)
  To: Thomas Monjalon, Andrew Rybchenko, Jerin Jacob Kollanukkaran,
	Qi Z Zhang, David Marchand
  Cc: dev, Mingxia Liu, yuying.zhang, beilei.xing, techboard

On 2/27/2023 3:45 PM, Thomas Monjalon wrote:
> 27/02/2023 14:46, Ferruh Yigit:
>> On 2/16/2023 12:29 AM, Mingxia Liu wrote:
>>> +static int
>>> +cpfl_dev_configure(struct rte_eth_dev *dev)
>>> +{
>>> +	struct rte_eth_conf *conf = &dev->data->dev_conf;
>>> +
>>> +	if (conf->link_speeds & RTE_ETH_LINK_SPEED_FIXED) {
>>> +		PMD_INIT_LOG(ERR, "Setting link speed is not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->txmode.mq_mode != RTE_ETH_MQ_TX_NONE) {
>>> +		PMD_INIT_LOG(ERR, "Multi-queue TX mode %d is not supported",
>>> +			     conf->txmode.mq_mode);
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->lpbk_mode != 0) {
>>> +		PMD_INIT_LOG(ERR, "Loopback operation mode %d is not supported",
>>> +			     conf->lpbk_mode);
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->dcb_capability_en != 0) {
>>> +		PMD_INIT_LOG(ERR, "Priority Flow Control(PFC) if not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->intr_conf.lsc != 0) {
>>> +		PMD_INIT_LOG(ERR, "LSC interrupt is not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->intr_conf.rxq != 0) {
>>> +		PMD_INIT_LOG(ERR, "RXQ interrupt is not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	if (conf->intr_conf.rmv != 0) {
>>> +		PMD_INIT_LOG(ERR, "RMV interrupt is not supported");
>>> +		return -ENOTSUP;
>>> +	}
>>> +
>>> +	return 0;
>>
>> This is '.dev_configure()' dev ops of a driver, there is nothing wrong
>> with the function but it is a good example to highlight a point.
>>
>>
>> 'rte_eth_dev_configure()' can fail from various reasons, what can an
>> application do in this case?
>> It is not clear why configuration failed, there is no way to figure out
>> failed config option dynamically.
> 
> There are some capabilities to read before calling "configure".
> 

Yes, but there are some PMD specific cases as well, like above
SPEED_FIXED is not supported. How an app can manage this?

Mainly "struct rte_eth_dev_info" is used for capabilities (although it
is a mixed bag), that is not symmetric with config/setup functions, I
mean for a config/setup function there is no clear matching capability
struct/function.

>> Application developer can read the log and find out what caused the
>> failure, but what can do next? Put a conditional check for the
>> particular device, assuming application supports multiple devices,
>> before configuration?
> 
> Which failures cannot be guessed with capability flags?
> 

At least for above sample as far as I can see some capabilities are missing:
- txmode.mq_mode
- rxmode.mq_mode
- lpbk_mode
- intr_conf.rxq

We can go through all list to detect gaps if we plan to have an action.

>> I think we need better error value, to help application detect what went
>> wrong and adapt dynamically, perhaps a bitmask of errors one per each
>> config option, what do you think?
> 
> I am not sure we can change such an old API.
> 

Yes that is hard, but if we keep the return value negative, that can
still be backward compatible.

Or API can keep the interface same but set a global 'reason' variable,
similar to 'errno', so optionally new application code can get it with a
new API and investigate it.

>> And I think this is another reason why we should not make a single API
>> too overloaded and complex.
> 
> Right, and I would support a work to have some of those "configure" features
> available as small functions.
> 

If there is enough appetite we can put something to deprecation notice
for next ABI release.


^ permalink raw reply	[relevance 3%]

* RE: [EXT] Re: [PATCH v11 1/4] lib: add generic support for reading PMU events
  2023-02-21  0:48  3%                     ` Konstantin Ananyev
@ 2023-02-27  8:12  0%                       ` Tomasz Duszynski
  0 siblings, 0 replies; 200+ results
From: Tomasz Duszynski @ 2023-02-27  8:12 UTC (permalink / raw)
  To: Konstantin Ananyev, Konstantin Ananyev, dev



>-----Original Message-----
>From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
>Sent: Tuesday, February 21, 2023 1:48 AM
>To: Tomasz Duszynski <tduszynski@marvell.com>; Konstantin Ananyev <konstantin.ananyev@huawei.com>;
>dev@dpdk.org
>Subject: Re: [EXT] Re: [PATCH v11 1/4] lib: add generic support for reading PMU events
>
>
>>>>>>>>>> diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h new file
>>>>>>>>>> mode
>>>>>>>>>> 100644 index 0000000000..6b664c3336
>>>>>>>>>> --- /dev/null
>>>>>>>>>> +++ b/lib/pmu/rte_pmu.h
>>>>>>>>>> @@ -0,0 +1,212 @@
>>>>>>>>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>>>>>>>>> + * Copyright(c) 2023 Marvell  */
>>>>>>>>>> +
>>>>>>>>>> +#ifndef _RTE_PMU_H_
>>>>>>>>>> +#define _RTE_PMU_H_
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @file
>>>>>>>>>> + *
>>>>>>>>>> + * PMU event tracing operations
>>>>>>>>>> + *
>>>>>>>>>> + * This file defines generic API and types necessary to setup
>>>>>>>>>> +PMU and
>>>>>>>>>> + * read selected counters in runtime.
>>>>>>>>>> + */
>>>>>>>>>> +
>>>>>>>>>> +#ifdef __cplusplus
>>>>>>>>>> +extern "C" {
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>> +#include <linux/perf_event.h>
>>>>>>>>>> +
>>>>>>>>>> +#include <rte_atomic.h>
>>>>>>>>>> +#include <rte_branch_prediction.h> #include <rte_common.h>
>>>>>>>>>> +#include <rte_compat.h> #include <rte_spinlock.h>
>>>>>>>>>> +
>>>>>>>>>> +/** Maximum number of events in a group */ #define
>>>>>>>>>> +MAX_NUM_GROUP_EVENTS 8
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * A structure describing a group of events.
>>>>>>>>>> + */
>>>>>>>>>> +struct rte_pmu_event_group {
>>>>>>>>>> +	struct perf_event_mmap_page
>>>>>>>>>> +*mmap_pages[MAX_NUM_GROUP_EVENTS];
>>>>>>>>>> +/**< array of user pages
>>>>>>> */
>>>>>>>>>> +	int fds[MAX_NUM_GROUP_EVENTS]; /**< array of event descriptors */
>>>>>>>>>> +	bool enabled; /**< true if group was enabled on particular lcore */
>>>>>>>>>> +	TAILQ_ENTRY(rte_pmu_event_group) next; /**< list entry */ }
>>>>>>>>>> +__rte_cache_aligned;
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * A structure describing an event.
>>>>>>>>>> + */
>>>>>>>>>> +struct rte_pmu_event {
>>>>>>>>>> +	char *name; /**< name of an event */
>>>>>>>>>> +	unsigned int index; /**< event index into fds/mmap_pages */
>>>>>>>>>> +	TAILQ_ENTRY(rte_pmu_event) next; /**< list entry */ };
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * A PMU state container.
>>>>>>>>>> + */
>>>>>>>>>> +struct rte_pmu {
>>>>>>>>>> +	char *name; /**< name of core PMU listed under /sys/bus/event_source/devices */
>>>>>>>>>> +	rte_spinlock_t lock; /**< serialize access to event group list */
>>>>>>>>>> +	TAILQ_HEAD(, rte_pmu_event_group) event_group_list; /**< list of event groups */
>>>>>>>>>> +	unsigned int num_group_events; /**< number of events in a group */
>>>>>>>>>> +	TAILQ_HEAD(, rte_pmu_event) event_list; /**< list of matching events */
>>>>>>>>>> +	unsigned int initialized; /**< initialization counter */ };
>>>>>>>>>> +
>>>>>>>>>> +/** lcore event group */
>>>>>>>>>> +RTE_DECLARE_PER_LCORE(struct rte_pmu_event_group,
>>>>>>>>>> +_event_group);
>>>>>>>>>> +
>>>>>>>>>> +/** PMU state container */
>>>>>>>>>> +extern struct rte_pmu rte_pmu;
>>>>>>>>>> +
>>>>>>>>>> +/** Each architecture supporting PMU needs to provide its own
>>>>>>>>>> +version */ #ifndef rte_pmu_pmc_read #define
>>>>>>>>>> +rte_pmu_pmc_read(index) ({ 0; }) #endif
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Read PMU counter.
>>>>>>>>>> + *
>>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>>> + *
>>>>>>>>>> + * @param pc
>>>>>>>>>> + *   Pointer to the mmapped user page.
>>>>>>>>>> + * @return
>>>>>>>>>> + *   Counter value read from hardware.
>>>>>>>>>> + */
>>>>>>>>>> +static __rte_always_inline uint64_t
>>>>>>>>>> +__rte_pmu_read_userpage(struct perf_event_mmap_page *pc) {
>>>>>>>>>> +	uint64_t width, offset;
>>>>>>>>>> +	uint32_t seq, index;
>>>>>>>>>> +	int64_t pmc;
>>>>>>>>>> +
>>>>>>>>>> +	for (;;) {
>>>>>>>>>> +		seq = pc->lock;
>>>>>>>>>> +		rte_compiler_barrier();
>>>>>>>>>
>>>>>>>>> Are you sure that compiler_barrier() is enough here?
>>>>>>>>> On some archs CPU itself has freedom to re-order reads.
>>>>>>>>> Or I am missing something obvious here?
>>>>>>>>>
>>>>>>>>
>>>>>>>> It's a matter of not keeping old stuff cached in registers and
>>>>>>>> making sure that we have two reads of lock. CPU reordering won't
>>>>>>>> do any harm here.
>>>>>>>
>>>>>>> Sorry, I didn't get you here:
>>>>>>> Suppose CPU will re-order reads and will read lock *after* index or offset value.
>>>>>>> Wouldn't it mean that in that case index and/or offset can contain old/invalid values?
>>>>>>>
>>>>>>
>>>>>> This number is just an indicator whether kernel did change something or not.
>>>>>
>>>>> You are talking about pc->lock, right?
>>>>> Yes, I do understand that it is sort of seqlock.
>>>>> That's why I am puzzled why we do not care about possible cpu read-reordering.
>>>>> Manual for perf_event_open() also has a code snippet with compiler barrier only...
>>>>>
>>>>>> If cpu reordering will come into play then this will not change
>>>>>> anything from pov of this
>>> loop.
>>>>>> All we want is fresh data when needed and no involvement of
>>>>>> compiler when it comes to reordering code.
>>>>>
>>>>> Ok, can you probably explain to me why the following could not happen:
>>>>> T0:
>>>>> pc->seqlock==0; pc->index==I1; pc->offset==O1;
>>>>> T1:
>>>>>       cpu #0 read pmu (due to cpu read reorder, we get index value before seqlock):
>>>>>        index=pc->index;  //index==I1;
>>>>> T2:
>>>>>       cpu #1 kernel vent_update_userpage:
>>>>>       pc->lock++; // pc->lock==1
>>>>>       pc->index=I2;
>>>>>       pc->offset=O2;
>>>>>       ...
>>>>>       pc->lock++; //pc->lock==2
>>>>> T3:
>>>>>       cpu #0 continue with read pmu:
>>>>>       seq=pc->lock; //seq == 2
>>>>>        offset=pc->offset; // offset == O2
>>>>>        ....
>>>>>        pmc = rte_pmu_pmc_read(index - 1);  // Note that we read at I1, not I2
>>>>>        offset += pmc; //offset == O2 + pmcread(I1-1);
>>>>>        if (pc->lock == seq) // they are equal, return
>>>>>              return offset;
>>>>>
>>>>> Or, it can happen, but by some reason we don't care much?
>>>>>
>>>>
>>>> This code does self-monitoring and user page (whole group actually)
>>>> is per thread running on current cpu. Hence I am not sure what are
>>>> you trying to prove with that
>>> example.
>>>
>>> I am not trying to prove anything so far.
>>> I am asking is such situation possible or not, and if not, why?
>>> My current understanding (possibly wrong) is that after you mmaped
>>> these pages, kernel still can asynchronously update them.
>>> So, when reading the data from these pages you have to check 'lock'
>>> value before and after accessing other data.
>>> If so, why possible cpu read-reordering doesn't matter?
>>>
>>
>> Look. I'll reiterate that.
>>
>> 1. That user page/group/PMU config is per process. Other processes do not access that.
>
>Ok, that's clear.
>
>
>>     All this happens on the very same CPU where current thread is running.
>
>Ok... but can't this page be updated by kernel thread running simultaneously on different CPU?
>

I already pointed out that event/counter configuration is bound to current cpu. How can possibly
other cpu update that configuration? This cannot work. 


If you think that there's some problem with the code (or is simply broken on your setup) and logic 
has obvious flaw and you can provide meaningful evidence of that then I'd be more than happy to 
apply that fix. Otherwise that discussion will get us nowhere. 

>
>> 2. Suppose you've already read seq. Now for some reason kernel updates data in page seq was read
>from.
>> 3. Kernel will enter critical section during update. seq changes along with other data without
>app knowing about it.
>>     If you want nitty gritty details consult kernel sources.
>
>Look, I don't have to beg you to answer these questions.
>In fact, I expect library author to document all such narrow things
>clearly either in in PG, or in source code comments (ideally in both).
>If not, then from my perspective the patch is not ready stage and
>shouldn't be accepted.
>I don't know is compiler-barrier is enough here or not, but I think it
>is definitely worth a clear explanation in the docs.
>I suppose it wouldn't be only me who will get confused here.
>So please take an effort and document it clearly why you believe there
>is no race-condition.
>
>> 4. app resumes and has some stale data but *WILL* read new seq. Code loops again because values
>do not match.
>
>If the kernel will always execute update for this page in the same
>thread context, then yes, - user code will always note the difference
>after resume.
>But why it can't happen that your user-thread reads this page on one
>CPU, while some kernel code on other CPU updates it simultaneously?
>
>
>> 5. Otherwise seq values match and data is valid.
>>
>>> Also there was another question below, which you probably  missed, so I copied it here:
>>> Another question - do we really need  to have __rte_pmu_read_userpage() and rte_pmu_read() as
>>> static inline functions in public header?
>>> As I understand, because of that we also have to make 'struct rte_pmu_*'
>>> definitions also public.
>>>
>>
>> These functions need to be inlined otherwise performance takes a hit.
>
>I understand that perfomance might be affected, but how big is hit?
>I expect actual PMU read will not be free anyway, right?
>If the diff is small, might be it is worth to go for such change,
>removing unneeded structures from public headers would help a lot in
>future in terms of ABI/API stability.
>
>
>
>>>>
>>>>>>>>
>>>>>>>>>> +		index = pc->index;
>>>>>>>>>> +		offset = pc->offset;
>>>>>>>>>> +		width = pc->pmc_width;
>>>>>>>>>> +
>>>>>>>>>> +		/* index set to 0 means that particular counter cannot be used */
>>>>>>>>>> +		if (likely(pc->cap_user_rdpmc && index)) {
>>>>>>>>>> +			pmc = rte_pmu_pmc_read(index - 1);
>>>>>>>>>> +			pmc <<= 64 - width;
>>>>>>>>>> +			pmc >>= 64 - width;
>>>>>>>>>> +			offset += pmc;
>>>>>>>>>> +		}
>>>>>>>>>> +
>>>>>>>>>> +		rte_compiler_barrier();
>>>>>>>>>> +
>>>>>>>>>> +		if (likely(pc->lock == seq))
>>>>>>>>>> +			return offset;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Enable group of events on the calling lcore.
>>>>>>>>>> + *
>>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>>> + *
>>>>>>>>>> + * @return
>>>>>>>>>> + *   0 in case of success, negative value otherwise.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +int
>>>>>>>>>> +__rte_pmu_enable_group(void);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Initialize PMU library.
>>>>>>>>>> + *
>>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>>> + *
>>>>>>>>>> + * @return
>>>>>>>>>> + *   0 in case of success, negative value otherwise.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +int
>>>>>>>>>> +rte_pmu_init(void);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Finalize PMU library. This should be called after PMU
>>>>>>>>>> +counters are no longer being
>>>>> read.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +void
>>>>>>>>>> +rte_pmu_fini(void);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Add event to the group of enabled events.
>>>>>>>>>> + *
>>>>>>>>>> + * @param name
>>>>>>>>>> + *   Name of an event listed under /sys/bus/event_source/devices/pmu/events.
>>>>>>>>>> + * @return
>>>>>>>>>> + *   Event index in case of success, negative value otherwise.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +int
>>>>>>>>>> +rte_pmu_add_event(const char *name);
>>>>>>>>>> +
>>>>>>>>>> +/**
>>>>>>>>>> + * @warning
>>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>>> + *
>>>>>>>>>> + * Read hardware counter configured to count occurrences of an event.
>>>>>>>>>> + *
>>>>>>>>>> + * @param index
>>>>>>>>>> + *   Index of an event to be read.
>>>>>>>>>> + * @return
>>>>>>>>>> + *   Event value read from register. In case of errors or lack of support
>>>>>>>>>> + *   0 is returned. In other words, stream of zeros in a trace file
>>>>>>>>>> + *   indicates problem with reading particular PMU event register.
>>>>>>>>>> + */
>>>>>
>>>>> Another question - do we really need  to have
>>>>> __rte_pmu_read_userpage() and rte_pmu_read() as static inline functions in public header?
>>>>> As I understand, because of that we also have to make 'struct rte_pmu_*'
>>>>> definitions also public.
>>>>>
>>>>>>>>>> +__rte_experimental
>>>>>>>>>> +static __rte_always_inline uint64_t rte_pmu_read(unsigned
>>>>>>>>>> +int
>>>>>>>>>> +index) {
>>>>>>>>>> +	struct rte_pmu_event_group *group = &RTE_PER_LCORE(_event_group);
>>>>>>>>>> +	int ret;
>>>>>>>>>> +
>>>>>>>>>> +	if (unlikely(!rte_pmu.initialized))
>>>>>>>>>> +		return 0;
>>>>>>>>>> +
>>>>>>>>>> +	if (unlikely(!group->enabled)) {
>>>>>>>>>> +		ret = __rte_pmu_enable_group();
>>>>>>>>>> +		if (ret)
>>>>>>>>>> +			return 0;
>>>>>>>>>> +	}
>>>>>>>>>> +
>>>>>>>>>> +	if (unlikely(index >= rte_pmu.num_group_events))
>>>>>>>>>> +		return 0;
>>>>>>>>>> +
>>>>>>>>>> +	return __rte_pmu_read_userpage(group->mmap_pages[index]);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +#ifdef __cplusplus
>>>>>>>>>> +}
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-24  6:31  0%     ` Yan, Zhirun
@ 2023-02-26 22:23  0%       ` Jerin Jacob
  2023-03-02  8:38  0%         ` Yan, Zhirun
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-02-26 22:23 UTC (permalink / raw)
  To: Yan, Zhirun
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue

On Fri, Feb 24, 2023 at 12:01 PM Yan, Zhirun <zhirun.yan@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, February 20, 2023 9:51 PM
> > To: Yan, Zhirun <zhirun.yan@intel.com>
> > Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> > ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> > Haiyue <haiyue.wang@intel.com>
> > Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> >
> > On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> > >
> > > Add new get/set APIs to configure graph worker model which is used to
> > > determine which model will be chosen.
> > >
> > > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > > ---
> > >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> > >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> > >  lib/graph/version.map               |  3 ++
> > >  3 files changed, 67 insertions(+)
> > >
> > > diff --git a/lib/graph/rte_graph_worker.h
> > > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153 100644
> > > --- a/lib/graph/rte_graph_worker.h
> > > +++ b/lib/graph/rte_graph_worker.h
> > > @@ -1,5 +1,56 @@
> > >  #include "rte_graph_model_rtc.h"
> > >
> > > +static enum rte_graph_worker_model worker_model =
> > > +RTE_GRAPH_MODEL_DEFAULT;
> >
> > This will break the multiprocess.
>
> Thanks. I will use TLS for per-thread local storage.

If it needs to be used from secondary process, then it needs to be from memzone.



>
> >
> > > +
> > > +/** Graph worker models */
> > > +enum rte_graph_worker_model {
> > > +#define WORKER_MODEL_DEFAULT "default"
> >
> > Why need strings?
> > Also, every symbol in a public header file should start with RTE_ to avoid
> > namespace conflict.
>
> It was used to config the model in app. I can put the string into example.

OK

>
> >
> > > +       RTE_GRAPH_MODEL_DEFAULT = 0,
> > > +#define WORKER_MODEL_RTC "rtc"
> > > +       RTE_GRAPH_MODEL_RTC,
> >
> > Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum
> > itself.
> Yes, will do in next version.
>
> >
> > > +#define WORKER_MODEL_GENERIC "generic"
> >
> > Generic is a very overloaded term. Use pipeline here i.e
> > RTE_GRAPH_MODEL_PIPELINE
>
> Actually, it's not a purely pipeline mode. I prefer to change to hybrid.

Hybrid is very overloaded term, and it will be confusing (considering
there will be new models in future).
Please pick a word that really express the model working.

> >
> >
> > > +       RTE_GRAPH_MODEL_GENERIC,
> > > +       RTE_GRAPH_MODEL_MAX,
> >
> > No need for MAX, it will break the ABI for future. See other subsystem such as
> > cryptodev.
>
> Thanks, I will change it.
> >
> > > +};
> >
> > >

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] vhost: fix madvise arguments alignment
  2023-02-23 16:57  0%     ` Mike Pattrick
@ 2023-02-24 15:05  4%       ` Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2023-02-24 15:05 UTC (permalink / raw)
  To: Mike Pattrick; +Cc: Maxime Coquelin, dev, david.marchand, chenbo.xia

[-- Attachment #1: Type: text/plain, Size: 16088 bytes --]

UNH CI reported an ABI failure for this patch which did not report due to a
bug on our end, so I'm manually reporting it now. I see Maxime you already
predicted the issue though!

*07:58:32*  1 function with some indirect sub-type change:*07:58:32*
*07:58:32*    [C] 'function int rte_vhost_get_mem_table(int,
rte_vhost_memory**)' at vhost.c:922:1 has some indirect sub-type
changes:*07:58:32*      parameter 2 of type 'rte_vhost_memory**' has
sub-type changes:*07:58:32*        in pointed to type
'rte_vhost_memory*':*07:58:32*          in pointed to type 'struct
rte_vhost_memory' at rte_vhost.h:145:1:*07:58:32*            type size
hasn't changed*07:58:32*            1 data member change:*07:58:32*
          type of 'rte_vhost_mem_region regions[]' changed:*07:58:32*
              array element type 'struct rte_vhost_mem_region'
changed:*07:58:32*                  type size changed from 448 to 512
(in bits)*07:58:32*                  1 data member
insertion:*07:58:32*                    'uint64_t alignment', at
offset 448 (in bits) at rte_vhost.h:139:1*07:58:32*
type size hasn't changed*07:58:32*  *07:58:32*  Error: ABI issue
reported for abidiff --suppr dpdk/devtools/libabigail.abignore
--no-added-syms --headers-dir1 reference/include --headers-dir2
build_install/include reference/dump/librte_vhost.dump
build_install/dump/librte_vhost.dump*07:58:32*  ABIDIFF_ABI_CHANGE,
this change requires a review (abidiff flagged this as a potential
issue).


On Thu, Feb 23, 2023 at 11:57 AM Mike Pattrick <mkp@redhat.com> wrote:

> On Thu, Feb 23, 2023 at 11:12 AM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
> >
> > Hi Mike,
> >
> > Thanks for  looking into this issue.
> >
> > On 2/23/23 05:35, Mike Pattrick wrote:
> > > The arguments passed to madvise should be aligned to the alignment of
> > > the backing memory. Now we keep track of each regions alignment and use
> > > then when setting coredump preferences. To facilitate this, a new
> member
> > > was added to rte_vhost_mem_region. A new function was added to easily
> > > translate memory address back to region alignment. Unneeded calls to
> > > madvise were reduced, as the cache removal case should already be
> > > covered by the cache insertion case. The previously inline function
> > > mem_set_dump was removed from a header file and made not inline.
> > >
> > > Fixes: 338ad77c9ed3 ("vhost: exclude VM hugepages from coredumps")
> > >
> > > Signed-off-by: Mike Pattrick <mkp@redhat.com>
> > > ---
> > > Since v1:
> > >   - Corrected a cast for 32bit compiles
> > > ---
> > >   lib/vhost/iotlb.c      |  9 +++---
> > >   lib/vhost/rte_vhost.h  |  1 +
> > >   lib/vhost/vhost.h      | 12 ++------
> > >   lib/vhost/vhost_user.c | 63
> +++++++++++++++++++++++++++++++++++-------
> > >   4 files changed, 60 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> > > index a0b8fd7302..5293507b63 100644
> > > --- a/lib/vhost/iotlb.c
> > > +++ b/lib/vhost/iotlb.c
> > > @@ -149,7 +149,6 @@ vhost_user_iotlb_cache_remove_all(struct
> vhost_virtqueue *vq)
> > >       rte_rwlock_write_lock(&vq->iotlb_lock);
> > >
> > >       RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> > > -             mem_set_dump((void *)(uintptr_t)node->uaddr, node->size,
> true);
> >
> > Hmm, it should have been called with enable=false here since we are
> > removing the entry from the IOTLB cache. It should be kept in order to
> > "DONTDUMP" pages evicted from the cache.
>
> Here I was thinking that if we add an entry and then remove a
> different entry, they could be in the same page. But on I should have
> kept an enable=false in remove_all().
>
> And now that I think about it again, I could just check if there are
> any active cache entries in the page on every evict/remove, they're
> sorted so that should be an easy check. Unless there are any
> objections I'll go forward with that.
>
> >
> > >               TAILQ_REMOVE(&vq->iotlb_list, node, next);
> > >               vhost_user_iotlb_pool_put(vq, node);
> > >       }
> > > @@ -171,7 +170,6 @@ vhost_user_iotlb_cache_random_evict(struct
> vhost_virtqueue *vq)
> > >
> > >       RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> > >               if (!entry_idx) {
> > > -                     mem_set_dump((void *)(uintptr_t)node->uaddr,
> node->size, true);
> >
> > Same here.
> >
> > >                       TAILQ_REMOVE(&vq->iotlb_list, node, next);
> > >                       vhost_user_iotlb_pool_put(vq, node);
> > >                       vq->iotlb_cache_nr--;
> > > @@ -224,14 +222,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net
> *dev, struct vhost_virtqueue *vq
> > >                       vhost_user_iotlb_pool_put(vq, new_node);
> > >                       goto unlock;
> > >               } else if (node->iova > new_node->iova) {
> > > -                     mem_set_dump((void *)(uintptr_t)node->uaddr,
> node->size, true);
> > > +                     mem_set_dump((void *)(uintptr_t)new_node->uaddr,
> new_node->size, true,
> > > +                             hua_to_alignment(dev->mem, (void
> *)(uintptr_t)node->uaddr));
> > >                       TAILQ_INSERT_BEFORE(node, new_node, next);
> > >                       vq->iotlb_cache_nr++;
> > >                       goto unlock;
> > >               }
> > >       }
> > >
> > > -     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> > > +     mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size,
> true,
> > > +             hua_to_alignment(dev->mem, (void
> *)(uintptr_t)new_node->uaddr));
> > >       TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
> > >       vq->iotlb_cache_nr++;
> > >
> > > @@ -259,7 +259,6 @@ vhost_user_iotlb_cache_remove(struct
> vhost_virtqueue *vq,
> > >                       break;
> > >
> > >               if (iova < node->iova + node->size) {
> > > -                     mem_set_dump((void *)(uintptr_t)node->uaddr,
> node->size, true);
> > >                       TAILQ_REMOVE(&vq->iotlb_list, node, next);
> > >                       vhost_user_iotlb_pool_put(vq, node);
> > >                       vq->iotlb_cache_nr--;
> > > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> > > index a395843fe9..c5c97ea67e 100644
> > > --- a/lib/vhost/rte_vhost.h
> > > +++ b/lib/vhost/rte_vhost.h
> > > @@ -136,6 +136,7 @@ struct rte_vhost_mem_region {
> > >       void     *mmap_addr;
> > >       uint64_t mmap_size;
> > >       int fd;
> > > +     uint64_t alignment;
> >
> > This is not possible to do this as it breaks the ABI.
> > You have to store the information somewhere else, or simply call
> > get_blk_size() in hua_to_alignment() since the fd is not closed.
> >
>
> Sorry about that! You're right, checking the fd per operation should
> be easy enough.
>
> Thanks for the review,
>
> M
>
> > >   };
> > >
> > >   /**
> > > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> > > index 5750f0c005..a2467ba509 100644
> > > --- a/lib/vhost/vhost.h
> > > +++ b/lib/vhost/vhost.h
> > > @@ -1009,14 +1009,6 @@ mbuf_is_consumed(struct rte_mbuf *m)
> > >       return true;
> > >   }
> > >
> > > -static __rte_always_inline void
> > > -mem_set_dump(__rte_unused void *ptr, __rte_unused size_t size,
> __rte_unused bool enable)
> > > -{
> > > -#ifdef MADV_DONTDUMP
> > > -     if (madvise(ptr, size, enable ? MADV_DODUMP : MADV_DONTDUMP) ==
> -1) {
> > > -             rte_log(RTE_LOG_INFO, vhost_config_log_level,
> > > -                     "VHOST_CONFIG: could not set coredump preference
> (%s).\n", strerror(errno));
> > > -     }
> > > -#endif
> > > -}
> > > +uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
> > > +void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t
> alignment);
> > >   #endif /* _VHOST_NET_CDEV_H_ */
> > > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > > index d702d082dd..6d09597fbe 100644
> > > --- a/lib/vhost/vhost_user.c
> > > +++ b/lib/vhost/vhost_user.c
> > > @@ -737,6 +737,40 @@ log_addr_to_gpa(struct virtio_net *dev, struct
> vhost_virtqueue *vq)
> > >       return log_gpa;
> > >   }
> > >
> > > +uint64_t
> > > +hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
> > > +{
> > > +     struct rte_vhost_mem_region *r;
> > > +     uint32_t i;
> > > +     uintptr_t hua = (uintptr_t)ptr;
> > > +
> > > +     for (i = 0; i < mem->nregions; i++) {
> > > +             r = &mem->regions[i];
> > > +             if (hua >= r->host_user_addr &&
> > > +                     hua < r->host_user_addr + r->size) {
> > > +                     return r->alignment;
> > > +             }
> > > +     }
> > > +
> > > +     /* If region isn't found, don't align at all */
> > > +     return 1;
> > > +}
> > > +
> > > +void
> > > +mem_set_dump(void *ptr, size_t size, bool enable, uint64_t pagesz)
> > > +{
> > > +#ifdef MADV_DONTDUMP
> > > +     void *start = RTE_PTR_ALIGN_FLOOR(ptr, pagesz);
> > > +     uintptr_t end = RTE_ALIGN_CEIL((uintptr_t)ptr + size, pagesz);
> > > +     size_t len = end - (uintptr_t)start;
> > > +
> > > +     if (madvise(start, len, enable ? MADV_DODUMP : MADV_DONTDUMP) ==
> -1) {
> > > +             rte_log(RTE_LOG_INFO, vhost_config_log_level,
> > > +                     "VHOST_CONFIG: could not set coredump preference
> (%s).\n", strerror(errno));
> > > +     }
> > > +#endif
> > > +}
> > > +
> > >   static void
> > >   translate_ring_addresses(struct virtio_net **pdev, struct
> vhost_virtqueue **pvq)
> > >   {
> > > @@ -767,6 +801,8 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >                       return;
> > >               }
> > >
> > > +             mem_set_dump(vq->desc_packed, len, true,
> > > +                     hua_to_alignment(dev->mem, vq->desc_packed));
> > >               numa_realloc(&dev, &vq);
> > >               *pdev = dev;
> > >               *pvq = vq;
> > > @@ -782,6 +818,8 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >                       return;
> > >               }
> > >
> > > +             mem_set_dump(vq->driver_event, len, true,
> > > +                     hua_to_alignment(dev->mem, vq->driver_event));
> > >               len = sizeof(struct vring_packed_desc_event);
> > >               vq->device_event = (struct vring_packed_desc_event *)
> > >                                       (uintptr_t)ring_addr_to_vva(dev,
> > > @@ -793,9 +831,8 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >                       return;
> > >               }
> > >
> > > -             mem_set_dump(vq->desc_packed, len, true);
> > > -             mem_set_dump(vq->driver_event, len, true);
> > > -             mem_set_dump(vq->device_event, len, true);
> > > +             mem_set_dump(vq->device_event, len, true,
> > > +                     hua_to_alignment(dev->mem, vq->device_event));
> > >               vq->access_ok = true;
> > >               return;
> > >       }
> > > @@ -812,6 +849,7 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >               return;
> > >       }
> > >
> > > +     mem_set_dump(vq->desc, len, true, hua_to_alignment(dev->mem,
> vq->desc));
> > >       numa_realloc(&dev, &vq);
> > >       *pdev = dev;
> > >       *pvq = vq;
> > > @@ -827,6 +865,7 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >               return;
> > >       }
> > >
> > > +     mem_set_dump(vq->avail, len, true, hua_to_alignment(dev->mem,
> vq->avail));
> > >       len = sizeof(struct vring_used) +
> > >               sizeof(struct vring_used_elem) * vq->size;
> > >       if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> > > @@ -839,6 +878,8 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >               return;
> > >       }
> > >
> > > +     mem_set_dump(vq->used, len, true, hua_to_alignment(dev->mem,
> vq->used));
> > > +
> > >       if (vq->last_used_idx != vq->used->idx) {
> > >               VHOST_LOG_CONFIG(dev->ifname, WARNING,
> > >                       "last_used_idx (%u) and vq->used->idx (%u)
> mismatches;\n",
> > > @@ -849,9 +890,6 @@ translate_ring_addresses(struct virtio_net **pdev,
> struct vhost_virtqueue **pvq)
> > >                       "some packets maybe resent for Tx and dropped
> for Rx\n");
> > >       }
> > >
> > > -     mem_set_dump(vq->desc, len, true);
> > > -     mem_set_dump(vq->avail, len, true);
> > > -     mem_set_dump(vq->used, len, true);
> > >       vq->access_ok = true;
> > >
> > >       VHOST_LOG_CONFIG(dev->ifname, DEBUG, "mapped address desc:
> %p\n", vq->desc);
> > > @@ -1230,7 +1268,8 @@ vhost_user_mmap_region(struct virtio_net *dev,
> > >       region->mmap_addr = mmap_addr;
> > >       region->mmap_size = mmap_size;
> > >       region->host_user_addr = (uint64_t)(uintptr_t)mmap_addr +
> mmap_offset;
> > > -     mem_set_dump(mmap_addr, mmap_size, false);
> > > +     region->alignment = alignment;
> > > +     mem_set_dump(mmap_addr, mmap_size, false, alignment);
> > >
> > >       if (dev->async_copy) {
> > >               if (add_guest_pages(dev, region, alignment) < 0) {
> > > @@ -1535,7 +1574,6 @@ inflight_mem_alloc(struct virtio_net *dev, const
> char *name, size_t size, int *f
> > >               return NULL;
> > >       }
> > >
> > > -     mem_set_dump(ptr, size, false);
> > >       *fd = mfd;
> > >       return ptr;
> > >   }
> > > @@ -1566,6 +1604,7 @@ vhost_user_get_inflight_fd(struct virtio_net
> **pdev,
> > >       uint64_t pervq_inflight_size, mmap_size;
> > >       uint16_t num_queues, queue_size;
> > >       struct virtio_net *dev = *pdev;
> > > +     uint64_t alignment;
> > >       int fd, i, j;
> > >       int numa_node = SOCKET_ID_ANY;
> > >       void *addr;
> > > @@ -1628,6 +1667,8 @@ vhost_user_get_inflight_fd(struct virtio_net
> **pdev,
> > >               dev->inflight_info->fd = -1;
> > >       }
> > >
> > > +     alignment = get_blk_size(fd);
> > > +     mem_set_dump(addr, mmap_size, false, alignment);
> > >       dev->inflight_info->addr = addr;
> > >       dev->inflight_info->size = ctx->msg.payload.inflight.mmap_size =
> mmap_size;
> > >       dev->inflight_info->fd = ctx->fds[0] = fd;
> > > @@ -1744,10 +1785,10 @@ vhost_user_set_inflight_fd(struct virtio_net
> **pdev,
> > >               dev->inflight_info->fd = -1;
> > >       }
> > >
> > > -     mem_set_dump(addr, mmap_size, false);
> > >       dev->inflight_info->fd = fd;
> > >       dev->inflight_info->addr = addr;
> > >       dev->inflight_info->size = mmap_size;
> > > +     mem_set_dump(addr, mmap_size, false, get_blk_size(fd));
> > >
> > >       for (i = 0; i < num_queues; i++) {
> > >               vq = dev->virtqueue[i];
> > > @@ -2242,6 +2283,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> > >       struct virtio_net *dev = *pdev;
> > >       int fd = ctx->fds[0];
> > >       uint64_t size, off;
> > > +     uint64_t alignment;
> > >       void *addr;
> > >       uint32_t i;
> > >
> > > @@ -2280,6 +2322,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> > >        * fail when offset is not page size aligned.
> > >        */
> > >       addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED,
> fd, 0);
> > > +     alignment = get_blk_size(fd);
> > >       close(fd);
> > >       if (addr == MAP_FAILED) {
> > >               VHOST_LOG_CONFIG(dev->ifname, ERR, "mmap log base
> failed!\n");
> > > @@ -2296,7 +2339,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> > >       dev->log_addr = (uint64_t)(uintptr_t)addr;
> > >       dev->log_base = dev->log_addr + off;
> > >       dev->log_size = size;
> > > -     mem_set_dump(addr, size, false);
> > > +     mem_set_dump(addr, size + off, false, alignment);
> > >
> > >       for (i = 0; i < dev->nr_vring; i++) {
> > >               struct vhost_virtqueue *vq = dev->virtqueue[i];
> >
>
>

-- 

Patrick Robb

Technical Service Manager

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

www.iol.unh.edu

[-- Attachment #2: Type: text/html, Size: 24631 bytes --]

^ permalink raw reply	[relevance 4%]

* RE: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
  2023-02-22 21:55  2%   ` [PATCH v11 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-23  7:11  0%     ` Ruifeng Wang
@ 2023-02-24  9:45  0%     ` Ruifeng Wang
  1 sibling, 0 replies; 200+ results
From: Ruifeng Wang @ 2023-02-24  9:45 UTC (permalink / raw)
  To: Stephen Hemminger, dev
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin, nd

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, February 23, 2023 5:56 AM
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Yipeng Wang <yipeng1.wang@intel.com>;
> Sameh Gobriel <sameh.gobriel@intel.com>; Bruce Richardson <bruce.richardson@intel.com>;
> Vladimir Medvedkin <vladimir.medvedkin@intel.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>
> Subject: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
> 
> The code for setting algorithm for hash is not at all perf sensitive, and doing it inline
> has a couple of problems. First, it means that if multiple files include the header, then
> the initialization gets done multiple times. But also, it makes it harder to fix usage of
> RTE_LOG().
> 
> Despite what the checking script say. This is not an ABI change, the previous version
> inlined the same code; therefore both old and new code will work the same.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/hash/meson.build     |  1 +
>  lib/hash/rte_crc_arm64.h |  8 ++---
>  lib/hash/rte_crc_x86.h   | 10 +++---
>  lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
>  lib/hash/rte_hash_crc.h  | 48 ++--------------------------
>  lib/hash/version.map     |  7 +++++
>  6 files changed, 88 insertions(+), 54 deletions(-)  create mode 100644
> lib/hash/rte_hash_crc.c
> 
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>


^ permalink raw reply	[relevance 0%]

* 回复: [PATCH v3 1/3] ethdev: enable direct rearm with separate API
  @ 2023-02-24  8:55  0%           ` Feifei Wang
  0 siblings, 0 replies; 200+ results
From: Feifei Wang @ 2023-02-24  8:55 UTC (permalink / raw)
  To: Morten Brørup, thomas, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, konstantin.v.ananyev, nd, Honnappa Nagarahalli,
	Ruifeng Wang, nd, nd

Sorry for my delayed reply.

> -----邮件原件-----
> 发件人: Morten Brørup <mb@smartsharesystems.com>
> 发送时间: Wednesday, January 4, 2023 6:11 PM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>; thomas@monjalon.net;
> Ferruh Yigit <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> 抄送: dev@dpdk.org; konstantin.v.ananyev@yandex.ru; nd <nd@arm.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> 主题: RE: [PATCH v3 1/3] ethdev: enable direct rearm with separate API
> 
> > From: Feifei Wang [mailto:Feifei.Wang2@arm.com]
> > Sent: Wednesday, 4 January 2023 09.51
> >
> > Hi, Morten
> >
> > > 发件人: Morten Brørup <mb@smartsharesystems.com>
> > > 发送时间: Wednesday, January 4, 2023 4:22 PM
> > >
> > > > From: Feifei Wang [mailto:feifei.wang2@arm.com]
> > > > Sent: Wednesday, 4 January 2023 08.31
> > > >
> > > > Add 'tx_fill_sw_ring' and 'rx_flush_descriptor' API into direct
> > rearm
> > > > mode for separate Rx and Tx Operation. And this can support
> > different
> > > > multiple sources in direct rearm mode. For examples, Rx driver is
> > > > ixgbe, and Tx driver is i40e.
> > > >
> > > > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Suggested-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > ---
> > >
> > > This feature looks very promising for performance. I am pleased to
> > see
> > > progress on it.
> > >
> > Thanks very much for your reviewing.
> >
> > > Please confirm that the fast path functions are still thread safe,
> > i.e. one EAL
> > > thread may be calling rte_eth_rx_burst() while another EAL thread is
> > calling
> > > rte_eth_tx_burst().
> > >
> > For the multiple threads safe, like we say in cover letter, current
> > direct-rearm support Rx and Tx in the same thread. If we consider
> > multiple threads like 'pipeline model', there need to add 'lock' in
> > the data path which can decrease the performance.
> > Thus, the first step we do is try to enable direct-rearm in the single
> > thread, and then we will consider to enable direct rearm in multiple
> > threads and improve the performance.
> 
> OK, doing it in steps is a good idea for a feature like this - makes it easier to
> understand and review.
> 
> When proceeding to add support for the "pipeline model", perhaps the
> lockless principles from the rte_ring can be used in this feature too.
> 
> From a high level perspective, I'm somewhat worried that releasing a "work-
> in-progress" version of this feature in some DPDK version will cause API/ABI
> breakage discussions when progressing to the next steps of the
> implementation to make the feature more complete. Not only support for
> thread safety across simultaneous RX and TX, but also support for multiple
> mbuf pools per RX queue [1]. Marking the functions experimental should
> alleviate such discussions, but there is a risk of pushback to not break the
> API/ABI anyway.
> 
> [1]:
> https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/ethdev/rte_ethdev.h#L1
> 105
> 

[Feifei] I think the subsequent upgrade does not significantly damage the stability
of the API we currently define.

For thread safety across simultaneous RX and TX, in the future, the lockless operation
change will happen in the pmd layer, such as CAS load/store for rxq queue index of pmd.
Thus, this can not affect the stability of the upper API.

For multiple mbuf pools per RX queue, direct-rearm just put Tx buffers into Rx buffers, and
it do not care which mempool the buffer coming from. 
From different mempool buffers eventually freed into their respective sources in the
no FAST_FREE path.  
I think this is a mistake in cover letter. Previous direct-rearm can just support FAST_FREE
so it constraint that buffer should be from the same mempool. Now, the latest version can
support no_FAST_FREE path, but we forget to make change in cover letter.
> [...]
> 
> > > > --- a/lib/ethdev/ethdev_driver.h
> > > > +++ b/lib/ethdev/ethdev_driver.h
> > > > @@ -59,6 +59,10 @@ struct rte_eth_dev {
> > > >  	eth_rx_descriptor_status_t rx_descriptor_status;
> > > >  	/** Check the status of a Tx descriptor */
> > > >  	eth_tx_descriptor_status_t tx_descriptor_status;
> > > > +	/** Fill Rx sw-ring with Tx buffers in direct rearm mode */
> > > > +	eth_tx_fill_sw_ring_t tx_fill_sw_ring;
> > >
> > > What is "Rx sw-ring"? Please confirm that this is not an Intel PMD
> > specific
> > > term and/or implementation detail, e.g. by providing a conceptual
> > > implementation for a non-Intel PMD, e.g. mlx5.
> > Rx sw_ring is used  to store mbufs in intel PMD. This is the same as
> > 'rxq->elts'
> > in mlx5.
> 
> Sounds good.
> 
> Then all we need is consensus on a generic name for this, unless "Rx sw-ring"
> already is the generic name. (I'm not a PMD developer, so I might be
> completely off track here.) Naming is often debatable, so I'll stop talking
> about it now - I only wanted to highlight that we should avoid vendor-
> specific terms in public APIs intended to be implemented by multiple vendors.
> On the other hand... if no other vendors raise their voices before merging
> into the DPDK main repository, they forfeit their right to complain about it. ;-)
> 
> > Agree with that we need to providing a conceptual implementation for
> > all PMDs.
> 
> My main point is that we should ensure that the feature is not too tightly
> coupled with the way Intel PMDs implement mbuf handling. Providing a
> conceptual implementation for a non-Intel PMD is one way of checking this.
> 
> The actual implementation in other PMDs could be left up to the various NIC
> vendors.

Yes. And we will rename our API to make it suitable for all vendors:
rte_eth_direct_rearm  ->  rte_eth_buf_cycle   (upper API for direct rearm)
rte_eth_tx_fill_sw_ring  -> rte_eth_tx_buf_stash   (Tx queue fill Rx ring buffer )
rte_eth_rx_flush_descriptor -> rte_eth_rx_descriptors_refill (Rx queue flush its descriptors)

rte_eth_rxq_rearm_data {
	void *rx_sw_ring;
	uint16_t *rearm_start;
	uint16_t *rearm_nb;
}

->

struct *rxq_recycle_info {
	rte_mbuf **buf_ring;
	uint16_t *offset = (uint16 *)(&rq-<ci);
	uint16_t *end;
	uint16_t ring_size; 

}

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  2023-02-20 13:50  3%   ` Jerin Jacob
@ 2023-02-24  6:31  0%     ` Yan, Zhirun
  2023-02-26 22:23  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Yan, Zhirun @ 2023-02-24  6:31 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, kirankumark, ndabilpuram, Liang, Cunming, Wang, Haiyue



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 20, 2023 9:51 PM
> To: Yan, Zhirun <zhirun.yan@intel.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; kirankumark@marvell.com;
> ndabilpuram@marvell.com; Liang, Cunming <cunming.liang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>
> Subject: Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
> 
> On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
> >
> > Add new get/set APIs to configure graph worker model which is used to
> > determine which model will be chosen.
> >
> > Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> > ---
> >  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
> >  lib/graph/rte_graph_worker_common.h | 13 ++++++++
> >  lib/graph/version.map               |  3 ++
> >  3 files changed, 67 insertions(+)
> >
> > diff --git a/lib/graph/rte_graph_worker.h
> > b/lib/graph/rte_graph_worker.h index 54d1390786..a0ea0df153 100644
> > --- a/lib/graph/rte_graph_worker.h
> > +++ b/lib/graph/rte_graph_worker.h
> > @@ -1,5 +1,56 @@
> >  #include "rte_graph_model_rtc.h"
> >
> > +static enum rte_graph_worker_model worker_model =
> > +RTE_GRAPH_MODEL_DEFAULT;
> 
> This will break the multiprocess.

Thanks. I will use TLS for per-thread local storage.

> 
> > +
> > +/** Graph worker models */
> > +enum rte_graph_worker_model {
> > +#define WORKER_MODEL_DEFAULT "default"
> 
> Why need strings?
> Also, every symbol in a public header file should start with RTE_ to avoid
> namespace conflict.

It was used to config the model in app. I can put the string into example.

> 
> > +       RTE_GRAPH_MODEL_DEFAULT = 0,
> > +#define WORKER_MODEL_RTC "rtc"
> > +       RTE_GRAPH_MODEL_RTC,
> 
> Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum
> itself.
Yes, will do in next version.

> 
> > +#define WORKER_MODEL_GENERIC "generic"
> 
> Generic is a very overloaded term. Use pipeline here i.e
> RTE_GRAPH_MODEL_PIPELINE

Actually, it's not a purely pipeline mode. I prefer to change to hybrid. 
> 
> 
> > +       RTE_GRAPH_MODEL_GENERIC,
> > +       RTE_GRAPH_MODEL_MAX,
> 
> No need for MAX, it will break the ABI for future. See other subsystem such as
> cryptodev.

Thanks, I will change it.
> 
> > +};
> 
> >

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] vhost: fix madvise arguments alignment
  2023-02-23 16:12  3%   ` Maxime Coquelin
@ 2023-02-23 16:57  0%     ` Mike Pattrick
  2023-02-24 15:05  4%       ` Patrick Robb
  0 siblings, 1 reply; 200+ results
From: Mike Pattrick @ 2023-02-23 16:57 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, david.marchand, chenbo.xia

On Thu, Feb 23, 2023 at 11:12 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> Hi Mike,
>
> Thanks for  looking into this issue.
>
> On 2/23/23 05:35, Mike Pattrick wrote:
> > The arguments passed to madvise should be aligned to the alignment of
> > the backing memory. Now we keep track of each regions alignment and use
> > then when setting coredump preferences. To facilitate this, a new member
> > was added to rte_vhost_mem_region. A new function was added to easily
> > translate memory address back to region alignment. Unneeded calls to
> > madvise were reduced, as the cache removal case should already be
> > covered by the cache insertion case. The previously inline function
> > mem_set_dump was removed from a header file and made not inline.
> >
> > Fixes: 338ad77c9ed3 ("vhost: exclude VM hugepages from coredumps")
> >
> > Signed-off-by: Mike Pattrick <mkp@redhat.com>
> > ---
> > Since v1:
> >   - Corrected a cast for 32bit compiles
> > ---
> >   lib/vhost/iotlb.c      |  9 +++---
> >   lib/vhost/rte_vhost.h  |  1 +
> >   lib/vhost/vhost.h      | 12 ++------
> >   lib/vhost/vhost_user.c | 63 +++++++++++++++++++++++++++++++++++-------
> >   4 files changed, 60 insertions(+), 25 deletions(-)
> >
> > diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> > index a0b8fd7302..5293507b63 100644
> > --- a/lib/vhost/iotlb.c
> > +++ b/lib/vhost/iotlb.c
> > @@ -149,7 +149,6 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
> >       rte_rwlock_write_lock(&vq->iotlb_lock);
> >
> >       RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> > -             mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
>
> Hmm, it should have been called with enable=false here since we are
> removing the entry from the IOTLB cache. It should be kept in order to
> "DONTDUMP" pages evicted from the cache.

Here I was thinking that if we add an entry and then remove a
different entry, they could be in the same page. But on I should have
kept an enable=false in remove_all().

And now that I think about it again, I could just check if there are
any active cache entries in the page on every evict/remove, they're
sorted so that should be an easy check. Unless there are any
objections I'll go forward with that.

>
> >               TAILQ_REMOVE(&vq->iotlb_list, node, next);
> >               vhost_user_iotlb_pool_put(vq, node);
> >       }
> > @@ -171,7 +170,6 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
> >
> >       RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> >               if (!entry_idx) {
> > -                     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
>
> Same here.
>
> >                       TAILQ_REMOVE(&vq->iotlb_list, node, next);
> >                       vhost_user_iotlb_pool_put(vq, node);
> >                       vq->iotlb_cache_nr--;
> > @@ -224,14 +222,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
> >                       vhost_user_iotlb_pool_put(vq, new_node);
> >                       goto unlock;
> >               } else if (node->iova > new_node->iova) {
> > -                     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> > +                     mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
> > +                             hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
> >                       TAILQ_INSERT_BEFORE(node, new_node, next);
> >                       vq->iotlb_cache_nr++;
> >                       goto unlock;
> >               }
> >       }
> >
> > -     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> > +     mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
> > +             hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
> >       TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
> >       vq->iotlb_cache_nr++;
> >
> > @@ -259,7 +259,6 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
> >                       break;
> >
> >               if (iova < node->iova + node->size) {
> > -                     mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> >                       TAILQ_REMOVE(&vq->iotlb_list, node, next);
> >                       vhost_user_iotlb_pool_put(vq, node);
> >                       vq->iotlb_cache_nr--;
> > diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> > index a395843fe9..c5c97ea67e 100644
> > --- a/lib/vhost/rte_vhost.h
> > +++ b/lib/vhost/rte_vhost.h
> > @@ -136,6 +136,7 @@ struct rte_vhost_mem_region {
> >       void     *mmap_addr;
> >       uint64_t mmap_size;
> >       int fd;
> > +     uint64_t alignment;
>
> This is not possible to do this as it breaks the ABI.
> You have to store the information somewhere else, or simply call
> get_blk_size() in hua_to_alignment() since the fd is not closed.
>

Sorry about that! You're right, checking the fd per operation should
be easy enough.

Thanks for the review,

M

> >   };
> >
> >   /**
> > diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> > index 5750f0c005..a2467ba509 100644
> > --- a/lib/vhost/vhost.h
> > +++ b/lib/vhost/vhost.h
> > @@ -1009,14 +1009,6 @@ mbuf_is_consumed(struct rte_mbuf *m)
> >       return true;
> >   }
> >
> > -static __rte_always_inline void
> > -mem_set_dump(__rte_unused void *ptr, __rte_unused size_t size, __rte_unused bool enable)
> > -{
> > -#ifdef MADV_DONTDUMP
> > -     if (madvise(ptr, size, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) {
> > -             rte_log(RTE_LOG_INFO, vhost_config_log_level,
> > -                     "VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno));
> > -     }
> > -#endif
> > -}
> > +uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
> > +void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment);
> >   #endif /* _VHOST_NET_CDEV_H_ */
> > diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> > index d702d082dd..6d09597fbe 100644
> > --- a/lib/vhost/vhost_user.c
> > +++ b/lib/vhost/vhost_user.c
> > @@ -737,6 +737,40 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq)
> >       return log_gpa;
> >   }
> >
> > +uint64_t
> > +hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
> > +{
> > +     struct rte_vhost_mem_region *r;
> > +     uint32_t i;
> > +     uintptr_t hua = (uintptr_t)ptr;
> > +
> > +     for (i = 0; i < mem->nregions; i++) {
> > +             r = &mem->regions[i];
> > +             if (hua >= r->host_user_addr &&
> > +                     hua < r->host_user_addr + r->size) {
> > +                     return r->alignment;
> > +             }
> > +     }
> > +
> > +     /* If region isn't found, don't align at all */
> > +     return 1;
> > +}
> > +
> > +void
> > +mem_set_dump(void *ptr, size_t size, bool enable, uint64_t pagesz)
> > +{
> > +#ifdef MADV_DONTDUMP
> > +     void *start = RTE_PTR_ALIGN_FLOOR(ptr, pagesz);
> > +     uintptr_t end = RTE_ALIGN_CEIL((uintptr_t)ptr + size, pagesz);
> > +     size_t len = end - (uintptr_t)start;
> > +
> > +     if (madvise(start, len, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) {
> > +             rte_log(RTE_LOG_INFO, vhost_config_log_level,
> > +                     "VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno));
> > +     }
> > +#endif
> > +}
> > +
> >   static void
> >   translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >   {
> > @@ -767,6 +801,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >                       return;
> >               }
> >
> > +             mem_set_dump(vq->desc_packed, len, true,
> > +                     hua_to_alignment(dev->mem, vq->desc_packed));
> >               numa_realloc(&dev, &vq);
> >               *pdev = dev;
> >               *pvq = vq;
> > @@ -782,6 +818,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >                       return;
> >               }
> >
> > +             mem_set_dump(vq->driver_event, len, true,
> > +                     hua_to_alignment(dev->mem, vq->driver_event));
> >               len = sizeof(struct vring_packed_desc_event);
> >               vq->device_event = (struct vring_packed_desc_event *)
> >                                       (uintptr_t)ring_addr_to_vva(dev,
> > @@ -793,9 +831,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >                       return;
> >               }
> >
> > -             mem_set_dump(vq->desc_packed, len, true);
> > -             mem_set_dump(vq->driver_event, len, true);
> > -             mem_set_dump(vq->device_event, len, true);
> > +             mem_set_dump(vq->device_event, len, true,
> > +                     hua_to_alignment(dev->mem, vq->device_event));
> >               vq->access_ok = true;
> >               return;
> >       }
> > @@ -812,6 +849,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >               return;
> >       }
> >
> > +     mem_set_dump(vq->desc, len, true, hua_to_alignment(dev->mem, vq->desc));
> >       numa_realloc(&dev, &vq);
> >       *pdev = dev;
> >       *pvq = vq;
> > @@ -827,6 +865,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >               return;
> >       }
> >
> > +     mem_set_dump(vq->avail, len, true, hua_to_alignment(dev->mem, vq->avail));
> >       len = sizeof(struct vring_used) +
> >               sizeof(struct vring_used_elem) * vq->size;
> >       if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> > @@ -839,6 +878,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >               return;
> >       }
> >
> > +     mem_set_dump(vq->used, len, true, hua_to_alignment(dev->mem, vq->used));
> > +
> >       if (vq->last_used_idx != vq->used->idx) {
> >               VHOST_LOG_CONFIG(dev->ifname, WARNING,
> >                       "last_used_idx (%u) and vq->used->idx (%u) mismatches;\n",
> > @@ -849,9 +890,6 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
> >                       "some packets maybe resent for Tx and dropped for Rx\n");
> >       }
> >
> > -     mem_set_dump(vq->desc, len, true);
> > -     mem_set_dump(vq->avail, len, true);
> > -     mem_set_dump(vq->used, len, true);
> >       vq->access_ok = true;
> >
> >       VHOST_LOG_CONFIG(dev->ifname, DEBUG, "mapped address desc: %p\n", vq->desc);
> > @@ -1230,7 +1268,8 @@ vhost_user_mmap_region(struct virtio_net *dev,
> >       region->mmap_addr = mmap_addr;
> >       region->mmap_size = mmap_size;
> >       region->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset;
> > -     mem_set_dump(mmap_addr, mmap_size, false);
> > +     region->alignment = alignment;
> > +     mem_set_dump(mmap_addr, mmap_size, false, alignment);
> >
> >       if (dev->async_copy) {
> >               if (add_guest_pages(dev, region, alignment) < 0) {
> > @@ -1535,7 +1574,6 @@ inflight_mem_alloc(struct virtio_net *dev, const char *name, size_t size, int *f
> >               return NULL;
> >       }
> >
> > -     mem_set_dump(ptr, size, false);
> >       *fd = mfd;
> >       return ptr;
> >   }
> > @@ -1566,6 +1604,7 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
> >       uint64_t pervq_inflight_size, mmap_size;
> >       uint16_t num_queues, queue_size;
> >       struct virtio_net *dev = *pdev;
> > +     uint64_t alignment;
> >       int fd, i, j;
> >       int numa_node = SOCKET_ID_ANY;
> >       void *addr;
> > @@ -1628,6 +1667,8 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
> >               dev->inflight_info->fd = -1;
> >       }
> >
> > +     alignment = get_blk_size(fd);
> > +     mem_set_dump(addr, mmap_size, false, alignment);
> >       dev->inflight_info->addr = addr;
> >       dev->inflight_info->size = ctx->msg.payload.inflight.mmap_size = mmap_size;
> >       dev->inflight_info->fd = ctx->fds[0] = fd;
> > @@ -1744,10 +1785,10 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev,
> >               dev->inflight_info->fd = -1;
> >       }
> >
> > -     mem_set_dump(addr, mmap_size, false);
> >       dev->inflight_info->fd = fd;
> >       dev->inflight_info->addr = addr;
> >       dev->inflight_info->size = mmap_size;
> > +     mem_set_dump(addr, mmap_size, false, get_blk_size(fd));
> >
> >       for (i = 0; i < num_queues; i++) {
> >               vq = dev->virtqueue[i];
> > @@ -2242,6 +2283,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> >       struct virtio_net *dev = *pdev;
> >       int fd = ctx->fds[0];
> >       uint64_t size, off;
> > +     uint64_t alignment;
> >       void *addr;
> >       uint32_t i;
> >
> > @@ -2280,6 +2322,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> >        * fail when offset is not page size aligned.
> >        */
> >       addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> > +     alignment = get_blk_size(fd);
> >       close(fd);
> >       if (addr == MAP_FAILED) {
> >               VHOST_LOG_CONFIG(dev->ifname, ERR, "mmap log base failed!\n");
> > @@ -2296,7 +2339,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
> >       dev->log_addr = (uint64_t)(uintptr_t)addr;
> >       dev->log_base = dev->log_addr + off;
> >       dev->log_size = size;
> > -     mem_set_dump(addr, size, false);
> > +     mem_set_dump(addr, size + off, false, alignment);
> >
> >       for (i = 0; i < dev->nr_vring; i++) {
> >               struct vhost_virtqueue *vq = dev->virtqueue[i];
>


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] vhost: fix madvise arguments alignment
  @ 2023-02-23 16:12  3%   ` Maxime Coquelin
  2023-02-23 16:57  0%     ` Mike Pattrick
  0 siblings, 1 reply; 200+ results
From: Maxime Coquelin @ 2023-02-23 16:12 UTC (permalink / raw)
  To: Mike Pattrick, dev; +Cc: david.marchand, chenbo.xia

Hi Mike,

Thanks for  looking into this issue.

On 2/23/23 05:35, Mike Pattrick wrote:
> The arguments passed to madvise should be aligned to the alignment of
> the backing memory. Now we keep track of each regions alignment and use
> then when setting coredump preferences. To facilitate this, a new member
> was added to rte_vhost_mem_region. A new function was added to easily
> translate memory address back to region alignment. Unneeded calls to
> madvise were reduced, as the cache removal case should already be
> covered by the cache insertion case. The previously inline function
> mem_set_dump was removed from a header file and made not inline.
> 
> Fixes: 338ad77c9ed3 ("vhost: exclude VM hugepages from coredumps")
> 
> Signed-off-by: Mike Pattrick <mkp@redhat.com>
> ---
> Since v1:
>   - Corrected a cast for 32bit compiles
> ---
>   lib/vhost/iotlb.c      |  9 +++---
>   lib/vhost/rte_vhost.h  |  1 +
>   lib/vhost/vhost.h      | 12 ++------
>   lib/vhost/vhost_user.c | 63 +++++++++++++++++++++++++++++++++++-------
>   4 files changed, 60 insertions(+), 25 deletions(-)
> 
> diff --git a/lib/vhost/iotlb.c b/lib/vhost/iotlb.c
> index a0b8fd7302..5293507b63 100644
> --- a/lib/vhost/iotlb.c
> +++ b/lib/vhost/iotlb.c
> @@ -149,7 +149,6 @@ vhost_user_iotlb_cache_remove_all(struct vhost_virtqueue *vq)
>   	rte_rwlock_write_lock(&vq->iotlb_lock);
>   
>   	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
> -		mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);

Hmm, it should have been called with enable=false here since we are
removing the entry from the IOTLB cache. It should be kept in order to
"DONTDUMP" pages evicted from the cache.

>   		TAILQ_REMOVE(&vq->iotlb_list, node, next);
>   		vhost_user_iotlb_pool_put(vq, node);
>   	}
> @@ -171,7 +170,6 @@ vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq)
>   
>   	RTE_TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
>   		if (!entry_idx) {
> -			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);

Same here.

>   			TAILQ_REMOVE(&vq->iotlb_list, node, next);
>   			vhost_user_iotlb_pool_put(vq, node);
>   			vq->iotlb_cache_nr--;
> @@ -224,14 +222,16 @@ vhost_user_iotlb_cache_insert(struct virtio_net *dev, struct vhost_virtqueue *vq
>   			vhost_user_iotlb_pool_put(vq, new_node);
>   			goto unlock;
>   		} else if (node->iova > new_node->iova) {
> -			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> +			mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
> +				hua_to_alignment(dev->mem, (void *)(uintptr_t)node->uaddr));
>   			TAILQ_INSERT_BEFORE(node, new_node, next);
>   			vq->iotlb_cache_nr++;
>   			goto unlock;
>   		}
>   	}
>   
> -	mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
> +	mem_set_dump((void *)(uintptr_t)new_node->uaddr, new_node->size, true,
> +		hua_to_alignment(dev->mem, (void *)(uintptr_t)new_node->uaddr));
>   	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
>   	vq->iotlb_cache_nr++;
>   
> @@ -259,7 +259,6 @@ vhost_user_iotlb_cache_remove(struct vhost_virtqueue *vq,
>   			break;
>   
>   		if (iova < node->iova + node->size) {
> -			mem_set_dump((void *)(uintptr_t)node->uaddr, node->size, true);
>   			TAILQ_REMOVE(&vq->iotlb_list, node, next);
>   			vhost_user_iotlb_pool_put(vq, node);
>   			vq->iotlb_cache_nr--;
> diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
> index a395843fe9..c5c97ea67e 100644
> --- a/lib/vhost/rte_vhost.h
> +++ b/lib/vhost/rte_vhost.h
> @@ -136,6 +136,7 @@ struct rte_vhost_mem_region {
>   	void	 *mmap_addr;
>   	uint64_t mmap_size;
>   	int fd;
> +	uint64_t alignment;

This is not possible to do this as it breaks the ABI.
You have to store the information somewhere else, or simply call
get_blk_size() in hua_to_alignment() since the fd is not closed.

>   };
>   
>   /**
> diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
> index 5750f0c005..a2467ba509 100644
> --- a/lib/vhost/vhost.h
> +++ b/lib/vhost/vhost.h
> @@ -1009,14 +1009,6 @@ mbuf_is_consumed(struct rte_mbuf *m)
>   	return true;
>   }
>   
> -static __rte_always_inline void
> -mem_set_dump(__rte_unused void *ptr, __rte_unused size_t size, __rte_unused bool enable)
> -{
> -#ifdef MADV_DONTDUMP
> -	if (madvise(ptr, size, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) {
> -		rte_log(RTE_LOG_INFO, vhost_config_log_level,
> -			"VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno));
> -	}
> -#endif
> -}
> +uint64_t hua_to_alignment(struct rte_vhost_memory *mem, void *ptr);
> +void mem_set_dump(void *ptr, size_t size, bool enable, uint64_t alignment);
>   #endif /* _VHOST_NET_CDEV_H_ */
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index d702d082dd..6d09597fbe 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -737,6 +737,40 @@ log_addr_to_gpa(struct virtio_net *dev, struct vhost_virtqueue *vq)
>   	return log_gpa;
>   }
>   
> +uint64_t
> +hua_to_alignment(struct rte_vhost_memory *mem, void *ptr)
> +{
> +	struct rte_vhost_mem_region *r;
> +	uint32_t i;
> +	uintptr_t hua = (uintptr_t)ptr;
> +
> +	for (i = 0; i < mem->nregions; i++) {
> +		r = &mem->regions[i];
> +		if (hua >= r->host_user_addr &&
> +			hua < r->host_user_addr + r->size) {
> +			return r->alignment;
> +		}
> +	}
> +
> +	/* If region isn't found, don't align at all */
> +	return 1;
> +}
> +
> +void
> +mem_set_dump(void *ptr, size_t size, bool enable, uint64_t pagesz)
> +{
> +#ifdef MADV_DONTDUMP
> +	void *start = RTE_PTR_ALIGN_FLOOR(ptr, pagesz);
> +	uintptr_t end = RTE_ALIGN_CEIL((uintptr_t)ptr + size, pagesz);
> +	size_t len = end - (uintptr_t)start;
> +
> +	if (madvise(start, len, enable ? MADV_DODUMP : MADV_DONTDUMP) == -1) {
> +		rte_log(RTE_LOG_INFO, vhost_config_log_level,
> +			"VHOST_CONFIG: could not set coredump preference (%s).\n", strerror(errno));
> +	}
> +#endif
> +}
> +
>   static void
>   translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   {
> @@ -767,6 +801,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   			return;
>   		}
>   
> +		mem_set_dump(vq->desc_packed, len, true,
> +			hua_to_alignment(dev->mem, vq->desc_packed));
>   		numa_realloc(&dev, &vq);
>   		*pdev = dev;
>   		*pvq = vq;
> @@ -782,6 +818,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   			return;
>   		}
>   
> +		mem_set_dump(vq->driver_event, len, true,
> +			hua_to_alignment(dev->mem, vq->driver_event));
>   		len = sizeof(struct vring_packed_desc_event);
>   		vq->device_event = (struct vring_packed_desc_event *)
>   					(uintptr_t)ring_addr_to_vva(dev,
> @@ -793,9 +831,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   			return;
>   		}
>   
> -		mem_set_dump(vq->desc_packed, len, true);
> -		mem_set_dump(vq->driver_event, len, true);
> -		mem_set_dump(vq->device_event, len, true);
> +		mem_set_dump(vq->device_event, len, true,
> +			hua_to_alignment(dev->mem, vq->device_event));
>   		vq->access_ok = true;
>   		return;
>   	}
> @@ -812,6 +849,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   		return;
>   	}
>   
> +	mem_set_dump(vq->desc, len, true, hua_to_alignment(dev->mem, vq->desc));
>   	numa_realloc(&dev, &vq);
>   	*pdev = dev;
>   	*pvq = vq;
> @@ -827,6 +865,7 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   		return;
>   	}
>   
> +	mem_set_dump(vq->avail, len, true, hua_to_alignment(dev->mem, vq->avail));
>   	len = sizeof(struct vring_used) +
>   		sizeof(struct vring_used_elem) * vq->size;
>   	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
> @@ -839,6 +878,8 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   		return;
>   	}
>   
> +	mem_set_dump(vq->used, len, true, hua_to_alignment(dev->mem, vq->used));
> +
>   	if (vq->last_used_idx != vq->used->idx) {
>   		VHOST_LOG_CONFIG(dev->ifname, WARNING,
>   			"last_used_idx (%u) and vq->used->idx (%u) mismatches;\n",
> @@ -849,9 +890,6 @@ translate_ring_addresses(struct virtio_net **pdev, struct vhost_virtqueue **pvq)
>   			"some packets maybe resent for Tx and dropped for Rx\n");
>   	}
>   
> -	mem_set_dump(vq->desc, len, true);
> -	mem_set_dump(vq->avail, len, true);
> -	mem_set_dump(vq->used, len, true);
>   	vq->access_ok = true;
>   
>   	VHOST_LOG_CONFIG(dev->ifname, DEBUG, "mapped address desc: %p\n", vq->desc);
> @@ -1230,7 +1268,8 @@ vhost_user_mmap_region(struct virtio_net *dev,
>   	region->mmap_addr = mmap_addr;
>   	region->mmap_size = mmap_size;
>   	region->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset;
> -	mem_set_dump(mmap_addr, mmap_size, false);
> +	region->alignment = alignment;
> +	mem_set_dump(mmap_addr, mmap_size, false, alignment);
>   
>   	if (dev->async_copy) {
>   		if (add_guest_pages(dev, region, alignment) < 0) {
> @@ -1535,7 +1574,6 @@ inflight_mem_alloc(struct virtio_net *dev, const char *name, size_t size, int *f
>   		return NULL;
>   	}
>   
> -	mem_set_dump(ptr, size, false);
>   	*fd = mfd;
>   	return ptr;
>   }
> @@ -1566,6 +1604,7 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
>   	uint64_t pervq_inflight_size, mmap_size;
>   	uint16_t num_queues, queue_size;
>   	struct virtio_net *dev = *pdev;
> +	uint64_t alignment;
>   	int fd, i, j;
>   	int numa_node = SOCKET_ID_ANY;
>   	void *addr;
> @@ -1628,6 +1667,8 @@ vhost_user_get_inflight_fd(struct virtio_net **pdev,
>   		dev->inflight_info->fd = -1;
>   	}
>   
> +	alignment = get_blk_size(fd);
> +	mem_set_dump(addr, mmap_size, false, alignment);
>   	dev->inflight_info->addr = addr;
>   	dev->inflight_info->size = ctx->msg.payload.inflight.mmap_size = mmap_size;
>   	dev->inflight_info->fd = ctx->fds[0] = fd;
> @@ -1744,10 +1785,10 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev,
>   		dev->inflight_info->fd = -1;
>   	}
>   
> -	mem_set_dump(addr, mmap_size, false);
>   	dev->inflight_info->fd = fd;
>   	dev->inflight_info->addr = addr;
>   	dev->inflight_info->size = mmap_size;
> +	mem_set_dump(addr, mmap_size, false, get_blk_size(fd));
>   
>   	for (i = 0; i < num_queues; i++) {
>   		vq = dev->virtqueue[i];
> @@ -2242,6 +2283,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
>   	struct virtio_net *dev = *pdev;
>   	int fd = ctx->fds[0];
>   	uint64_t size, off;
> +	uint64_t alignment;
>   	void *addr;
>   	uint32_t i;
>   
> @@ -2280,6 +2322,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
>   	 * fail when offset is not page size aligned.
>   	 */
>   	addr = mmap(0, size + off, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +	alignment = get_blk_size(fd);
>   	close(fd);
>   	if (addr == MAP_FAILED) {
>   		VHOST_LOG_CONFIG(dev->ifname, ERR, "mmap log base failed!\n");
> @@ -2296,7 +2339,7 @@ vhost_user_set_log_base(struct virtio_net **pdev,
>   	dev->log_addr = (uint64_t)(uintptr_t)addr;
>   	dev->log_base = dev->log_addr + off;
>   	dev->log_size = size;
> -	mem_set_dump(addr, size, false);
> +	mem_set_dump(addr, size + off, false, alignment);
>   
>   	for (i = 0; i < dev->nr_vring; i++) {
>   		struct vhost_virtqueue *vq = dev->virtqueue[i];


^ permalink raw reply	[relevance 3%]

* RE: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
  2023-02-23  7:11  0%     ` Ruifeng Wang
@ 2023-02-23  7:27  0%       ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2023-02-23  7:27 UTC (permalink / raw)
  To: Stephen Hemminger, dev
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin, nd, nd

> -----Original Message-----
> From: Ruifeng Wang
> Sent: Thursday, February 23, 2023 3:11 PM
> To: Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org
> Cc: Yipeng Wang <yipeng1.wang@intel.com>; Sameh Gobriel <sameh.gobriel@intel.com>; Bruce
> Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin <vladimir.medvedkin@intel.com>;
> nd <nd@arm.com>
> Subject: RE: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
> 
> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Thursday, February 23, 2023 5:56 AM
> > To: dev@dpdk.org
> > Cc: Stephen Hemminger <stephen@networkplumber.org>; Yipeng Wang
> > <yipeng1.wang@intel.com>; Sameh Gobriel <sameh.gobriel@intel.com>;
> > Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> > <vladimir.medvedkin@intel.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Subject: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
> >
> > The code for setting algorithm for hash is not at all perf sensitive,
> > and doing it inline has a couple of problems. First, it means that if
> > multiple files include the header, then the initialization gets done
> > multiple times. But also, it makes it harder to fix usage of RTE_LOG().
> >
> > Despite what the checking script say. This is not an ABI change, the
> > previous version inlined the same code; therefore both old and new code will work the
> same.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  lib/hash/meson.build     |  1 +
> >  lib/hash/rte_crc_arm64.h |  8 ++---
> >  lib/hash/rte_crc_x86.h   | 10 +++---
> >  lib/hash/rte_hash_crc.c  | 68
> > ++++++++++++++++++++++++++++++++++++++++
> >  lib/hash/rte_hash_crc.h  | 48 ++--------------------------
> >  lib/hash/version.map     |  7 +++++
> >  6 files changed, 88 insertions(+), 54 deletions(-)  create mode
> > 100644 lib/hash/rte_hash_crc.c
> >
> > diff --git a/lib/hash/meson.build b/lib/hash/meson.build index
> > e56ee8572564..c345c6f561fc
> > 100644
> > --- a/lib/hash/meson.build
> > +++ b/lib/hash/meson.build
> > @@ -19,6 +19,7 @@ indirect_headers += files(
> >
> >  sources = files(
> >      'rte_cuckoo_hash.c',
> > +    'rte_hash_crc.c',
> 
> I suppose this list is alphabetically ordered.
> 
> >      'rte_fbk_hash.c',
> >      'rte_thash.c',
> >      'rte_thash_gfni.c'
> <snip>
> > diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h index
> > 0249ad16c5b6..e8145ee44204 100644
> > --- a/lib/hash/rte_hash_crc.h
> > +++ b/lib/hash/rte_hash_crc.h
> > @@ -20,8 +20,6 @@ extern "C" {
> >  #include <rte_branch_prediction.h>
> >  #include <rte_common.h>
> >  #include <rte_config.h>
> > -#include <rte_cpuflags.h>
> 
> A couple of files need update with this change.
> rte_cpuflags.h should be included in rte_fbk_hash.c (for ARM) and rte_efd.c.

OK, I see the changes already there in other patches in the same series.
Please ignore this comment.
Thanks.

> 
> > -#include <rte_log.h>
> >
> >  #include "rte_crc_sw.h"
> >
> <snip>

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
  2023-02-22 21:55  2%   ` [PATCH v11 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
@ 2023-02-23  7:11  0%     ` Ruifeng Wang
  2023-02-23  7:27  0%       ` Ruifeng Wang
  2023-02-24  9:45  0%     ` Ruifeng Wang
  1 sibling, 1 reply; 200+ results
From: Ruifeng Wang @ 2023-02-23  7:11 UTC (permalink / raw)
  To: Stephen Hemminger, dev
  Cc: Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin, nd

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, February 23, 2023 5:56 AM
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Yipeng Wang <yipeng1.wang@intel.com>;
> Sameh Gobriel <sameh.gobriel@intel.com>; Bruce Richardson <bruce.richardson@intel.com>;
> Vladimir Medvedkin <vladimir.medvedkin@intel.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>
> Subject: [PATCH v11 21/22] hash: move rte_hash_set_alg out header
> 
> The code for setting algorithm for hash is not at all perf sensitive, and doing it inline
> has a couple of problems. First, it means that if multiple files include the header, then
> the initialization gets done multiple times. But also, it makes it harder to fix usage of
> RTE_LOG().
> 
> Despite what the checking script say. This is not an ABI change, the previous version
> inlined the same code; therefore both old and new code will work the same.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/hash/meson.build     |  1 +
>  lib/hash/rte_crc_arm64.h |  8 ++---
>  lib/hash/rte_crc_x86.h   | 10 +++---
>  lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
>  lib/hash/rte_hash_crc.h  | 48 ++--------------------------
>  lib/hash/version.map     |  7 +++++
>  6 files changed, 88 insertions(+), 54 deletions(-)  create mode 100644
> lib/hash/rte_hash_crc.c
> 
> diff --git a/lib/hash/meson.build b/lib/hash/meson.build index e56ee8572564..c345c6f561fc
> 100644
> --- a/lib/hash/meson.build
> +++ b/lib/hash/meson.build
> @@ -19,6 +19,7 @@ indirect_headers += files(
> 
>  sources = files(
>      'rte_cuckoo_hash.c',
> +    'rte_hash_crc.c',

I suppose this list is alphabetically ordered.

>      'rte_fbk_hash.c',
>      'rte_thash.c',
>      'rte_thash_gfni.c'
<snip>
> diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h index
> 0249ad16c5b6..e8145ee44204 100644
> --- a/lib/hash/rte_hash_crc.h
> +++ b/lib/hash/rte_hash_crc.h
> @@ -20,8 +20,6 @@ extern "C" {
>  #include <rte_branch_prediction.h>
>  #include <rte_common.h>
>  #include <rte_config.h>
> -#include <rte_cpuflags.h>

A couple of files need update with this change.
rte_cpuflags.h should be included in rte_fbk_hash.c (for ARM) and rte_efd.c.

> -#include <rte_log.h>
> 
>  #include "rte_crc_sw.h"
> 
<snip>

^ permalink raw reply	[relevance 0%]

* [PATCH v11 21/22] hash: move rte_hash_set_alg out header
  2023-02-22 21:55  2% ` [PATCH v11 00/22] Convert static log type values in libraries Stephen Hemminger
@ 2023-02-22 21:55  2%   ` Stephen Hemminger
  2023-02-23  7:11  0%     ` Ruifeng Wang
  2023-02-24  9:45  0%     ` Ruifeng Wang
  0 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2023-02-22 21:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin, Ruifeng Wang

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build     |  1 +
 lib/hash/rte_crc_arm64.h |  8 ++---
 lib/hash/rte_crc_x86.h   | 10 +++---
 lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h  | 48 ++--------------------------
 lib/hash/version.map     |  7 +++++
 6 files changed, 88 insertions(+), 54 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_crc_arm64.h b/lib/hash/rte_crc_arm64.h
index c9f52510871b..414fe065caa8 100644
--- a/lib/hash/rte_crc_arm64.h
+++ b/lib/hash/rte_crc_arm64.h
@@ -53,7 +53,7 @@ crc32c_arm64_u64(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -67,7 +67,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u64(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_crc_x86.h b/lib/hash/rte_crc_x86.h
index 205bc182be77..3b865e251db2 100644
--- a/lib/hash/rte_crc_x86.h
+++ b/lib/hash/rte_crc_x86.h
@@ -67,7 +67,7 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -110,11 +110,11 @@ static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
 #ifdef RTE_ARCH_X86_64
-	if (likely(crc32_alg == CRC32_SSE42_x64))
+	if (likely(rte_hash_crc32_alg == CRC32_SSE42_x64))
 		return crc32c_sse42_u64(data, init_val);
 #endif
 
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u64_mimic(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..1439d8a71f6a
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
+#define RTE_LOGTYPE_HASH_CRC hash_crc_logtype
+
+uint8_t rte_hash_crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	rte_hash_crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		rte_hash_crc32_alg = CRC32_SSE42;
+	else
+		rte_hash_crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		rte_hash_crc32_alg = CRC32_ARM64;
+#endif
+
+	if (rte_hash_crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e8145ee44204 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -31,7 +29,7 @@ extern "C" {
 #define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
 #define CRC32_ARM64         (1U << 3)
 
-static uint8_t crc32_alg = CRC32_SW;
+extern uint8_t rte_hash_crc32_alg;
 
 #if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
 #include "rte_crc_arm64.h"
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..8b22aad5626b 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
@@ -56,3 +57,9 @@ EXPERIMENTAL {
 	rte_thash_gfni;
 	rte_thash_gfni_bulk;
 };
+
+INTERNAL {
+	global:
+
+	rte_hash_crc32_alg;
+};
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v11 00/22] Convert static log type values in libraries
                     ` (6 preceding siblings ...)
  2023-02-22 16:07  2% ` [PATCH v10 00/22] Convert static log type values in libraries Stephen Hemminger
@ 2023-02-22 21:55  2% ` Stephen Hemminger
  2023-02-22 21:55  2%   ` [PATCH v11 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-22 21:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
leave them there, mark as deprecated, or remove them.
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v11 - fix include check on arm cross build

v10 - add necessary rte_compat.h in thash_gfni stub for arm

v9 - fix handling of crc32 alg in lib/hash.
     make it an internal global variable.
     fix gfni stubs for case where they are not used.

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++---------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  4 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 +++-
 lib/hash/rte_crc_arm64.h          |  8 ++--
 lib/hash/rte_crc_x86.h            | 10 ++---
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 68 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 48 ++--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 50 +++++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 30 ++++----------
 lib/hash/version.map              | 11 +++++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  2 +
 lib/mempool/rte_mempool.h         |  8 ++++
 lib/mempool/version.map           |  3 ++
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 75 files changed, 409 insertions(+), 177 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v10 21/22] hash: move rte_hash_set_alg out header
  2023-02-22 16:07  2% ` [PATCH v10 00/22] Convert static log type values in libraries Stephen Hemminger
@ 2023-02-22 16:08  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-22 16:08 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin, Ruifeng Wang

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build     |  1 +
 lib/hash/rte_crc_arm64.h |  8 ++---
 lib/hash/rte_crc_x86.h   | 10 +++---
 lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h  | 48 ++--------------------------
 lib/hash/version.map     |  7 +++++
 6 files changed, 88 insertions(+), 54 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_crc_arm64.h b/lib/hash/rte_crc_arm64.h
index c9f52510871b..414fe065caa8 100644
--- a/lib/hash/rte_crc_arm64.h
+++ b/lib/hash/rte_crc_arm64.h
@@ -53,7 +53,7 @@ crc32c_arm64_u64(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -67,7 +67,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u64(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_crc_x86.h b/lib/hash/rte_crc_x86.h
index 205bc182be77..3b865e251db2 100644
--- a/lib/hash/rte_crc_x86.h
+++ b/lib/hash/rte_crc_x86.h
@@ -67,7 +67,7 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -110,11 +110,11 @@ static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
 #ifdef RTE_ARCH_X86_64
-	if (likely(crc32_alg == CRC32_SSE42_x64))
+	if (likely(rte_hash_crc32_alg == CRC32_SSE42_x64))
 		return crc32c_sse42_u64(data, init_val);
 #endif
 
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u64_mimic(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..1439d8a71f6a
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
+#define RTE_LOGTYPE_HASH_CRC hash_crc_logtype
+
+uint8_t rte_hash_crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	rte_hash_crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		rte_hash_crc32_alg = CRC32_SSE42;
+	else
+		rte_hash_crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		rte_hash_crc32_alg = CRC32_ARM64;
+#endif
+
+	if (rte_hash_crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e8145ee44204 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -31,7 +29,7 @@ extern "C" {
 #define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
 #define CRC32_ARM64         (1U << 3)
 
-static uint8_t crc32_alg = CRC32_SW;
+extern uint8_t rte_hash_crc32_alg;
 
 #if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
 #include "rte_crc_arm64.h"
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..8b22aad5626b 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
@@ -56,3 +57,9 @@ EXPERIMENTAL {
 	rte_thash_gfni;
 	rte_thash_gfni_bulk;
 };
+
+INTERNAL {
+	global:
+
+	rte_hash_crc32_alg;
+};
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v10 00/22] Convert static log type values in libraries
                     ` (5 preceding siblings ...)
  2023-02-21 19:01  2% ` [PATCH v9 00/22] Convert static logtypes in libraries Stephen Hemminger
@ 2023-02-22 16:07  2% ` Stephen Hemminger
  2023-02-22 16:08  2%   ` [PATCH v10 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-22 21:55  2% ` [PATCH v11 00/22] Convert static log type values in libraries Stephen Hemminger
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-22 16:07 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
leave them there, mark as deprecated, or remove them.
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v10 - add necessary rte_compat.h in thash_gfni stub for arm

v9 - fix handling of crc32 alg in lib/hash.
     make it an internal global variable.
     fix gfni stubs for case where they are not used.

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++---------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  4 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 +++-
 lib/hash/rte_crc_arm64.h          |  8 ++--
 lib/hash/rte_crc_x86.h            | 10 ++---
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 68 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 48 ++--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 50 +++++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 29 +++----------
 lib/hash/version.map              | 11 +++++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  2 +
 lib/mempool/rte_mempool.h         |  8 ++++
 lib/mempool/version.map           |  3 ++
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 75 files changed, 406 insertions(+), 179 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v2] mem: fix displaying heap ID failed for heap info command
  @ 2023-02-22  7:49  4% ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2023-02-22  7:49 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, mb, hkalra, huangdaode, fengchengwen, lihuisong

The telemetry lib has added a allowed characters set for dictionary names.
Please see commit 2537fb0c5f34 ("telemetry: limit characters allowed in
dictionary names")

The space is not in this set, which cause the heap ID in /eal/heap_info
cannot be displayed. Additionally, 'heap' is also misspelling. So use
'Heap_id' to replace 'Head id'.

Fixes: e6732d0d6e26 ("mem: add telemetry infos")
Fixes: 2537fb0c5f34 ("telemetry: limit characters allowed in dictionary names")
Cc: stable@dpdk.org

Signed-off-by: Huisong Li <lihuisong@huawei.com>
Acked-by: Chengwen Feng <fengchengwen@huawei.com>
---
 -v2: add announcement in rel_notes.
---
 doc/guides/rel_notes/release_23_03.rst | 2 ++
 lib/eal/common/eal_common_memory.c     | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 49c18617a5..bdee535046 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -237,6 +237,8 @@ API Changes
 * The experimental structures ``struct rte_graph_param``, ``struct rte_graph``
   and ``struct graph`` were updated to support pcap trace in the graph library.
 
+* The ``Head ip`` in the displaying of ``/eal/heap_info`` telemetry command
+  is modified to ``Heap_id`` to ensure that it can be printed.
 
 ABI Changes
 -----------
diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c
index c917b981bc..c2a4c8f9e7 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -1139,7 +1139,7 @@ handle_eal_heap_info_request(const char *cmd __rte_unused, const char *params,
 	malloc_heap_get_stats(heap, &sock_stats);
 
 	rte_tel_data_start_dict(d);
-	rte_tel_data_add_dict_uint(d, "Head id", heap_id);
+	rte_tel_data_add_dict_uint(d, "Heap_id", heap_id);
 	rte_tel_data_add_dict_string(d, "Name", heap->name);
 	rte_tel_data_add_dict_uint(d, "Heap_size",
 				   sock_stats.heap_totalsz_bytes);
-- 
2.33.0


^ permalink raw reply	[relevance 4%]

* [PATCH v9 21/22] hash: move rte_hash_set_alg out header
  2023-02-21 19:01  2% ` [PATCH v9 00/22] Convert static logtypes in libraries Stephen Hemminger
@ 2023-02-21 19:02  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-21 19:02 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin, Ruifeng Wang

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build     |  1 +
 lib/hash/rte_crc_arm64.h |  8 ++---
 lib/hash/rte_crc_x86.h   | 10 +++---
 lib/hash/rte_hash_crc.c  | 68 ++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h  | 48 ++--------------------------
 lib/hash/version.map     |  7 +++++
 6 files changed, 88 insertions(+), 54 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_crc_arm64.h b/lib/hash/rte_crc_arm64.h
index c9f52510871b..414fe065caa8 100644
--- a/lib/hash/rte_crc_arm64.h
+++ b/lib/hash/rte_crc_arm64.h
@@ -53,7 +53,7 @@ crc32c_arm64_u64(uint64_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -67,7 +67,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_ARM64))
+	if (likely(rte_hash_crc32_alg & CRC32_ARM64))
 		return crc32c_arm64_u64(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_crc_x86.h b/lib/hash/rte_crc_x86.h
index 205bc182be77..3b865e251db2 100644
--- a/lib/hash/rte_crc_x86.h
+++ b/lib/hash/rte_crc_x86.h
@@ -67,7 +67,7 @@ crc32c_sse42_u64(uint64_t data, uint64_t init_val)
 static inline uint32_t
 rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u8(data, init_val);
 
 	return crc32c_1byte(data, init_val);
@@ -81,7 +81,7 @@ rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u16(data, init_val);
 
 	return crc32c_2bytes(data, init_val);
@@ -95,7 +95,7 @@ rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
 static inline uint32_t
 rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
 {
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u32(data, init_val);
 
 	return crc32c_1word(data, init_val);
@@ -110,11 +110,11 @@ static inline uint32_t
 rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
 {
 #ifdef RTE_ARCH_X86_64
-	if (likely(crc32_alg == CRC32_SSE42_x64))
+	if (likely(rte_hash_crc32_alg == CRC32_SSE42_x64))
 		return crc32c_sse42_u64(data, init_val);
 #endif
 
-	if (likely(crc32_alg & CRC32_SSE42))
+	if (likely(rte_hash_crc32_alg & CRC32_SSE42))
 		return crc32c_sse42_u64_mimic(data, init_val);
 
 	return crc32c_2words(data, init_val);
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..1439d8a71f6a
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
+#define RTE_LOGTYPE_HASH_CRC hash_crc_logtype
+
+uint8_t rte_hash_crc32_alg = CRC32_SW;
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	rte_hash_crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		rte_hash_crc32_alg = CRC32_SSE42;
+	else
+		rte_hash_crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		rte_hash_crc32_alg = CRC32_ARM64;
+#endif
+
+	if (rte_hash_crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH_CRC,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e8145ee44204 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -31,7 +29,7 @@ extern "C" {
 #define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
 #define CRC32_ARM64         (1U << 3)
 
-static uint8_t crc32_alg = CRC32_SW;
+extern uint8_t rte_hash_crc32_alg;
 
 #if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
 #include "rte_crc_arm64.h"
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..8b22aad5626b 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
@@ -56,3 +57,9 @@ EXPERIMENTAL {
 	rte_thash_gfni;
 	rte_thash_gfni_bulk;
 };
+
+INTERNAL {
+	global:
+
+	rte_hash_crc32_alg;
+};
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* [PATCH v9 00/22] Convert static logtypes in libraries
                     ` (4 preceding siblings ...)
  2023-02-20 23:35  3% ` [PATCH v8 00/22] Convert static logtypes in libraries Stephen Hemminger
@ 2023-02-21 19:01  2% ` Stephen Hemminger
  2023-02-21 19:02  2%   ` [PATCH v9 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-22 16:07  2% ` [PATCH v10 00/22] Convert static log type values in libraries Stephen Hemminger
  2023-02-22 21:55  2% ` [PATCH v11 00/22] Convert static log type values in libraries Stephen Hemminger
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-21 19:01 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
leave them there, mark as deprecated, or remove them.
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v9 - fix handling of crc32 alg in lib/hash.
     make it an internal global variable.
     fix gfni stubs for case where they are not used.

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++---------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  4 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 +++-
 lib/hash/rte_crc_arm64.h          |  8 ++--
 lib/hash/rte_crc_x86.h            | 10 ++---
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 68 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 48 ++--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 50 +++++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              | 11 +++++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  2 +
 lib/mempool/rte_mempool.h         |  8 ++++
 lib/mempool/version.map           |  3 ++
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 75 files changed, 405 insertions(+), 179 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v8 21/22] hash: move rte_hash_set_alg out header
  2023-02-20 23:35  3%   ` [PATCH v8 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
@ 2023-02-21 15:02  0%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2023-02-21 15:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On Tue, Feb 21, 2023 at 12:38 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> The code for setting algorithm for hash is not at all perf sensitive,
> and doing it inline has a couple of problems. First, it means that if
> multiple files include the header, then the initialization gets done
> multiple times. But also, it makes it harder to fix usage of RTE_LOG().
>
> Despite what the checking script say. This is not an ABI change, the
> previous version inlined the same code; therefore both old and new code
> will work the same.

I suppose you are referring to:
http://mails.dpdk.org/archives/test-report/2023-February/356872.html
ERROR: symbol rte_hash_crc_set_alg is added in the DPDK_23 section,
but is expected to be added in the EXPERIMENTAL section of the version
map

I agree that this is irrelevant and can be ignored in this particular case.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* [PATCH v2 2/2] net/nfp: modify RSS's processing logic
  @ 2023-02-21  3:55  3%     ` Chaoyong He
  0 siblings, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-21  3:55 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Long Wu, Chaoyong He

From: Long Wu <long.wu@corigine.com>

The initial logic only support the single type metadata and this
commit add the support of chained type metadata. This commit also
make the relation between the RSS capability (v1/v2) and these
two types of metadata more clear.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 drivers/net/nfp/nfp_common.c    |  23 +++++++
 drivers/net/nfp/nfp_common.h    |   7 +++
 drivers/net/nfp/nfp_ctrl.h      |  18 +++++-
 drivers/net/nfp/nfp_ethdev.c    |   7 +--
 drivers/net/nfp/nfp_ethdev_vf.c |   7 +--
 drivers/net/nfp/nfp_rxtx.c      | 108 ++++++++++++++++++++------------
 6 files changed, 121 insertions(+), 49 deletions(-)

diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
index a545a10013..a1e37ada11 100644
--- a/drivers/net/nfp/nfp_common.c
+++ b/drivers/net/nfp/nfp_common.c
@@ -1584,6 +1584,29 @@ nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name)
 	return 0;
 }
 
+void
+nfp_net_init_metadata_format(struct nfp_net_hw *hw)
+{
+	/*
+	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
+	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
+	 * also indicate that we are using chained metadata.
+	 */
+	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) == 4) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
+	} else if ((hw->cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
+		/*
+		 * RSS is incompatible with chained metadata. hw->cap just represents
+		 * firmware's ability rather than the firmware's configuration. We decide
+		 * to reduce the confusion to allow us can use hw->cap to identify RSS later.
+		 */
+		hw->cap &= ~NFP_NET_CFG_CTRL_RSS;
+	} else {
+		hw->meta_format = NFP_NET_METAFORMAT_SINGLE;
+	}
+}
+
 /*
  * Local variables:
  * c-file-style: "Linux"
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index 980f3cad89..d33675eb99 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -127,6 +127,11 @@ enum nfp_qcp_ptr {
 	NFP_QCP_WRITE_PTR
 };
 
+enum nfp_net_meta_format {
+	NFP_NET_METAFORMAT_SINGLE,
+	NFP_NET_METAFORMAT_CHAINED,
+};
+
 struct nfp_pf_dev {
 	/* Backpointer to associated pci device */
 	struct rte_pci_device *pci_dev;
@@ -203,6 +208,7 @@ struct nfp_net_hw {
 	uint32_t max_mtu;
 	uint32_t mtu;
 	uint32_t rx_offset;
+	enum nfp_net_meta_format meta_format;
 
 	/* Current values for control */
 	uint32_t ctrl;
@@ -455,6 +461,7 @@ int nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc);
 int nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name);
+void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
 
 #define NFP_NET_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct nfp_net_adapter *)adapter)->hw)
diff --git a/drivers/net/nfp/nfp_ctrl.h b/drivers/net/nfp/nfp_ctrl.h
index 1069ff9485..bdc39f8974 100644
--- a/drivers/net/nfp/nfp_ctrl.h
+++ b/drivers/net/nfp/nfp_ctrl.h
@@ -110,6 +110,7 @@
 #define   NFP_NET_CFG_CTRL_MSIX_TX_OFF    (0x1 << 26) /* Disable MSIX for TX */
 #define   NFP_NET_CFG_CTRL_LSO2           (0x1 << 28) /* LSO/TSO (version 2) */
 #define   NFP_NET_CFG_CTRL_RSS2           (0x1 << 29) /* RSS (version 2) */
+#define   NFP_NET_CFG_CTRL_CSUM_COMPLETE  (0x1 << 30) /* Checksum complete */
 #define   NFP_NET_CFG_CTRL_LIVE_ADDR      (0x1U << 31)/* live MAC addr change */
 #define NFP_NET_CFG_UPDATE              0x0004
 #define   NFP_NET_CFG_UPDATE_GEN          (0x1 <<  0) /* General update */
@@ -135,6 +136,8 @@
 #define NFP_NET_CFG_CTRL_LSO_ANY (NFP_NET_CFG_CTRL_LSO | NFP_NET_CFG_CTRL_LSO2)
 #define NFP_NET_CFG_CTRL_RSS_ANY (NFP_NET_CFG_CTRL_RSS | NFP_NET_CFG_CTRL_RSS2)
 
+#define NFP_NET_CFG_CTRL_CHAIN_META (NFP_NET_CFG_CTRL_RSS2 | \
+					NFP_NET_CFG_CTRL_CSUM_COMPLETE)
 /*
  * Read-only words (0x0030 - 0x0050):
  * @NFP_NET_CFG_VERSION:     Firmware version number
@@ -218,7 +221,7 @@
 
 /*
  * RSS configuration (0x0100 - 0x01ac):
- * Used only when NFP_NET_CFG_CTRL_RSS is enabled
+ * Used only when NFP_NET_CFG_CTRL_RSS_ANY is enabled
  * @NFP_NET_CFG_RSS_CFG:     RSS configuration word
  * @NFP_NET_CFG_RSS_KEY:     RSS "secret" key
  * @NFP_NET_CFG_RSS_ITBL:    RSS indirection table
@@ -334,6 +337,19 @@
 /* PF multiport offset */
 #define NFP_PF_CSR_SLICE_SIZE	(32 * 1024)
 
+/*
+ * nfp_net_cfg_ctrl_rss() - Get RSS flag based on firmware's capability
+ * @hw_cap: The firmware's capabilities
+ */
+static inline uint32_t
+nfp_net_cfg_ctrl_rss(uint32_t hw_cap)
+{
+	if ((hw_cap & NFP_NET_CFG_CTRL_RSS2) != 0)
+		return NFP_NET_CFG_CTRL_RSS2;
+
+	return NFP_NET_CFG_CTRL_RSS;
+}
+
 #endif /* _NFP_CTRL_H_ */
 /*
  * Local variables:
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index fed7b1ab13..47d5dff16c 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -134,10 +134,7 @@ nfp_net_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -611,6 +608,8 @@ nfp_net_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index c1f8a0fa0f..7834b2ee0c 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -95,10 +95,7 @@ nfp_netvf_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -373,6 +370,8 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index 17a04cec5e..1c5a230145 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -116,26 +116,18 @@ nfp_net_rx_queue_count(void *rx_queue)
 	return count;
 }
 
-/* nfp_net_parse_meta() - Parse the metadata from packet */
-static void
-nfp_net_parse_meta(struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mbuf)
+/* nfp_net_parse_chained_meta() - Parse the chained metadata from packet */
+static bool
+nfp_net_parse_chained_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
 {
+	uint8_t *meta_offset;
 	uint32_t meta_info;
 	uint32_t vlan_info;
-	uint8_t *meta_offset;
-	struct nfp_net_hw *hw = rxq->hw;
 
-	if (unlikely((NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2) ||
-			NFP_DESC_META_LEN(rxd) == 0))
-		return;
-
-	meta_offset = rte_pktmbuf_mtod(mbuf, uint8_t *);
-	meta_offset -= NFP_DESC_META_LEN(rxd);
-	meta_info = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-	meta_offset += 4;
+	meta_info = rte_be_to_cpu_32(meta_header);
+	meta_offset = meta_base + 4;
 
 	for (; meta_info != 0; meta_info >>= NFP_NET_META_FIELD_SIZE, meta_offset += 4) {
 		switch (meta_info & NFP_NET_META_FIELD_MASK) {
@@ -157,9 +149,11 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
 			break;
 		default:
 			/* Unsupported metadata can be a performance issue */
-			return;
+			return false;
 		}
 	}
+
+	return true;
 }
 
 /*
@@ -170,33 +164,18 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
  */
 static void
 nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mbuf)
 {
-	uint32_t hash;
-	uint32_t hash_type;
 	struct nfp_net_hw *hw = rxq->hw;
 
 	if ((hw->ctrl & NFP_NET_CFG_CTRL_RSS_ANY) == 0)
 		return;
 
-	if (likely((hw->cap & NFP_NET_CFG_CTRL_RSS_ANY) != 0 &&
-			NFP_DESC_META_LEN(rxd) != 0)) {
-		hash = meta->hash;
-		hash_type = meta->hash_type;
-	} else {
-		if ((rxd->rxd.flags & PCIE_DESC_RX_RSS) == 0)
-			return;
-
-		hash = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_OFFSET);
-		hash_type = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_TYPE_OFFSET);
-	}
-
-	mbuf->hash.rss = hash;
+	mbuf->hash.rss = meta->hash;
 	mbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
 
-	switch (hash_type) {
+	switch (meta->hash_type) {
 	case NFP_NET_RSS_IPV4:
 		mbuf->packet_type |= RTE_PTYPE_INNER_L3_IPV4;
 		break;
@@ -223,6 +202,21 @@ nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
 	}
 }
 
+/*
+ * nfp_net_parse_single_meta() - Parse the single metadata
+ *
+ * The RSS hash and hash-type are prepended to the packet data.
+ * Get it from metadata area.
+ */
+static inline void
+nfp_net_parse_single_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
+{
+	meta->hash_type = rte_be_to_cpu_32(meta_header);
+	meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4));
+}
+
 /*
  * nfp_net_parse_meta_vlan() - Set mbuf vlan_strip data based on metadata info
  *
@@ -304,6 +298,45 @@ nfp_net_parse_meta_qinq(const struct nfp_meta_parsed *meta,
 	mb->ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
 }
 
+/* nfp_net_parse_meta() - Parse the metadata from packet */
+static void
+nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+		struct nfp_net_rxq *rxq,
+		struct nfp_net_hw *hw,
+		struct rte_mbuf *mb)
+{
+	uint8_t *meta_base;
+	rte_be32_t meta_header;
+	struct nfp_meta_parsed meta = {};
+
+	if (unlikely(NFP_DESC_META_LEN(rxds) == 0))
+		return;
+
+	meta_base = rte_pktmbuf_mtod(mb, uint8_t *);
+	meta_base -= NFP_DESC_META_LEN(rxds);
+	meta_header = *(rte_be32_t *)meta_base;
+
+	switch (hw->meta_format) {
+	case NFP_NET_METAFORMAT_CHAINED:
+		if (nfp_net_parse_chained_meta(meta_base, meta_header, &meta)) {
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+			nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
+			nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		} else {
+			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
+		}
+		break;
+	case NFP_NET_METAFORMAT_SINGLE:
+		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
+			nfp_net_parse_single_meta(meta_base, meta_header, &meta);
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+		}
+		break;
+	default:
+		PMD_RX_LOG(DEBUG, "RX metadata do not exist.");
+	}
+}
+
 /*
  * RX path design:
  *
@@ -341,7 +374,6 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	struct nfp_net_hw *hw;
 	struct rte_mbuf *mb;
 	struct rte_mbuf *new_mb;
-	struct nfp_meta_parsed meta;
 	uint16_t nb_hold;
 	uint64_t dma_addr;
 	uint16_t avail;
@@ -437,11 +469,7 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		mb->next = NULL;
 		mb->port = rxq->port_id;
 
-		memset(&meta, 0, sizeof(meta));
-		nfp_net_parse_meta(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_hash(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		nfp_net_parse_meta(rxds, rxq, hw, mb);
 
 		/* Checking the checksum flag */
 		nfp_net_rx_cksum(rxq, rxds, mb);
-- 
2.29.3


^ permalink raw reply	[relevance 3%]

* [PATCH v2 2/2] net/nfp: modify RSS's processing logic
  @ 2023-02-21  3:29  3%   ` Chaoyong He
    1 sibling, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-21  3:29 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Long Wu, Chaoyong He

From: Long Wu <long.wu@corigine.com>

The initial logic only support the single type metadata and this
commit add the support of chained type metadata. This commit also
make the relation between the RSS capability (v1/v2) and these
two types of metadata more clear.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 drivers/net/nfp/nfp_common.c    |  23 +++++++
 drivers/net/nfp/nfp_common.h    |   7 +++
 drivers/net/nfp/nfp_ctrl.h      |  18 +++++-
 drivers/net/nfp/nfp_ethdev.c    |   7 +--
 drivers/net/nfp/nfp_ethdev_vf.c |   7 +--
 drivers/net/nfp/nfp_rxtx.c      | 108 ++++++++++++++++++++------------
 6 files changed, 121 insertions(+), 49 deletions(-)

diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
index a545a10013..a1e37ada11 100644
--- a/drivers/net/nfp/nfp_common.c
+++ b/drivers/net/nfp/nfp_common.c
@@ -1584,6 +1584,29 @@ nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name)
 	return 0;
 }
 
+void
+nfp_net_init_metadata_format(struct nfp_net_hw *hw)
+{
+	/*
+	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
+	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
+	 * also indicate that we are using chained metadata.
+	 */
+	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) == 4) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHANINED;
+	} else if ((hw->cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHANINED;
+		/*
+		 * RSS is incompatible with chained metadata. hw->cap just represents
+		 * firmware's ability rather than the firmware's configuration. We decide
+		 * to reduce the confusion to allow us can use hw->cap to identify RSS later.
+		 */
+		hw->cap &= ~NFP_NET_CFG_CTRL_RSS;
+	} else {
+		hw->meta_format = NFP_NET_METAFORMAT_SINGLE;
+	}
+}
+
 /*
  * Local variables:
  * c-file-style: "Linux"
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index 980f3cad89..d33675eb99 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -127,6 +127,11 @@ enum nfp_qcp_ptr {
 	NFP_QCP_WRITE_PTR
 };
 
+enum nfp_net_meta_format {
+	NFP_NET_METAFORMAT_SINGLE,
+	NFP_NET_METAFORMAT_CHANINED,
+};
+
 struct nfp_pf_dev {
 	/* Backpointer to associated pci device */
 	struct rte_pci_device *pci_dev;
@@ -203,6 +208,7 @@ struct nfp_net_hw {
 	uint32_t max_mtu;
 	uint32_t mtu;
 	uint32_t rx_offset;
+	enum nfp_net_meta_format meta_format;
 
 	/* Current values for control */
 	uint32_t ctrl;
@@ -455,6 +461,7 @@ int nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc);
 int nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name);
+void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
 
 #define NFP_NET_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct nfp_net_adapter *)adapter)->hw)
diff --git a/drivers/net/nfp/nfp_ctrl.h b/drivers/net/nfp/nfp_ctrl.h
index 1069ff9485..bdc39f8974 100644
--- a/drivers/net/nfp/nfp_ctrl.h
+++ b/drivers/net/nfp/nfp_ctrl.h
@@ -110,6 +110,7 @@
 #define   NFP_NET_CFG_CTRL_MSIX_TX_OFF    (0x1 << 26) /* Disable MSIX for TX */
 #define   NFP_NET_CFG_CTRL_LSO2           (0x1 << 28) /* LSO/TSO (version 2) */
 #define   NFP_NET_CFG_CTRL_RSS2           (0x1 << 29) /* RSS (version 2) */
+#define   NFP_NET_CFG_CTRL_CSUM_COMPLETE  (0x1 << 30) /* Checksum complete */
 #define   NFP_NET_CFG_CTRL_LIVE_ADDR      (0x1U << 31)/* live MAC addr change */
 #define NFP_NET_CFG_UPDATE              0x0004
 #define   NFP_NET_CFG_UPDATE_GEN          (0x1 <<  0) /* General update */
@@ -135,6 +136,8 @@
 #define NFP_NET_CFG_CTRL_LSO_ANY (NFP_NET_CFG_CTRL_LSO | NFP_NET_CFG_CTRL_LSO2)
 #define NFP_NET_CFG_CTRL_RSS_ANY (NFP_NET_CFG_CTRL_RSS | NFP_NET_CFG_CTRL_RSS2)
 
+#define NFP_NET_CFG_CTRL_CHAIN_META (NFP_NET_CFG_CTRL_RSS2 | \
+					NFP_NET_CFG_CTRL_CSUM_COMPLETE)
 /*
  * Read-only words (0x0030 - 0x0050):
  * @NFP_NET_CFG_VERSION:     Firmware version number
@@ -218,7 +221,7 @@
 
 /*
  * RSS configuration (0x0100 - 0x01ac):
- * Used only when NFP_NET_CFG_CTRL_RSS is enabled
+ * Used only when NFP_NET_CFG_CTRL_RSS_ANY is enabled
  * @NFP_NET_CFG_RSS_CFG:     RSS configuration word
  * @NFP_NET_CFG_RSS_KEY:     RSS "secret" key
  * @NFP_NET_CFG_RSS_ITBL:    RSS indirection table
@@ -334,6 +337,19 @@
 /* PF multiport offset */
 #define NFP_PF_CSR_SLICE_SIZE	(32 * 1024)
 
+/*
+ * nfp_net_cfg_ctrl_rss() - Get RSS flag based on firmware's capability
+ * @hw_cap: The firmware's capabilities
+ */
+static inline uint32_t
+nfp_net_cfg_ctrl_rss(uint32_t hw_cap)
+{
+	if ((hw_cap & NFP_NET_CFG_CTRL_RSS2) != 0)
+		return NFP_NET_CFG_CTRL_RSS2;
+
+	return NFP_NET_CFG_CTRL_RSS;
+}
+
 #endif /* _NFP_CTRL_H_ */
 /*
  * Local variables:
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index fed7b1ab13..47d5dff16c 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -134,10 +134,7 @@ nfp_net_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -611,6 +608,8 @@ nfp_net_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index c1f8a0fa0f..7834b2ee0c 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -95,10 +95,7 @@ nfp_netvf_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -373,6 +370,8 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index 17a04cec5e..1c5a230145 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -116,26 +116,18 @@ nfp_net_rx_queue_count(void *rx_queue)
 	return count;
 }
 
-/* nfp_net_parse_meta() - Parse the metadata from packet */
-static void
-nfp_net_parse_meta(struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mbuf)
+/* nfp_net_parse_chained_meta() - Parse the chained metadata from packet */
+static bool
+nfp_net_parse_chained_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
 {
+	uint8_t *meta_offset;
 	uint32_t meta_info;
 	uint32_t vlan_info;
-	uint8_t *meta_offset;
-	struct nfp_net_hw *hw = rxq->hw;
 
-	if (unlikely((NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2) ||
-			NFP_DESC_META_LEN(rxd) == 0))
-		return;
-
-	meta_offset = rte_pktmbuf_mtod(mbuf, uint8_t *);
-	meta_offset -= NFP_DESC_META_LEN(rxd);
-	meta_info = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-	meta_offset += 4;
+	meta_info = rte_be_to_cpu_32(meta_header);
+	meta_offset = meta_base + 4;
 
 	for (; meta_info != 0; meta_info >>= NFP_NET_META_FIELD_SIZE, meta_offset += 4) {
 		switch (meta_info & NFP_NET_META_FIELD_MASK) {
@@ -157,9 +149,11 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
 			break;
 		default:
 			/* Unsupported metadata can be a performance issue */
-			return;
+			return false;
 		}
 	}
+
+	return true;
 }
 
 /*
@@ -170,33 +164,18 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
  */
 static void
 nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mbuf)
 {
-	uint32_t hash;
-	uint32_t hash_type;
 	struct nfp_net_hw *hw = rxq->hw;
 
 	if ((hw->ctrl & NFP_NET_CFG_CTRL_RSS_ANY) == 0)
 		return;
 
-	if (likely((hw->cap & NFP_NET_CFG_CTRL_RSS_ANY) != 0 &&
-			NFP_DESC_META_LEN(rxd) != 0)) {
-		hash = meta->hash;
-		hash_type = meta->hash_type;
-	} else {
-		if ((rxd->rxd.flags & PCIE_DESC_RX_RSS) == 0)
-			return;
-
-		hash = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_OFFSET);
-		hash_type = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_TYPE_OFFSET);
-	}
-
-	mbuf->hash.rss = hash;
+	mbuf->hash.rss = meta->hash;
 	mbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
 
-	switch (hash_type) {
+	switch (meta->hash_type) {
 	case NFP_NET_RSS_IPV4:
 		mbuf->packet_type |= RTE_PTYPE_INNER_L3_IPV4;
 		break;
@@ -223,6 +202,21 @@ nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
 	}
 }
 
+/*
+ * nfp_net_parse_single_meta() - Parse the single metadata
+ *
+ * The RSS hash and hash-type are prepended to the packet data.
+ * Get it from metadata area.
+ */
+static inline void
+nfp_net_parse_single_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
+{
+	meta->hash_type = rte_be_to_cpu_32(meta_header);
+	meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4));
+}
+
 /*
  * nfp_net_parse_meta_vlan() - Set mbuf vlan_strip data based on metadata info
  *
@@ -304,6 +298,45 @@ nfp_net_parse_meta_qinq(const struct nfp_meta_parsed *meta,
 	mb->ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
 }
 
+/* nfp_net_parse_meta() - Parse the metadata from packet */
+static void
+nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+		struct nfp_net_rxq *rxq,
+		struct nfp_net_hw *hw,
+		struct rte_mbuf *mb)
+{
+	uint8_t *meta_base;
+	rte_be32_t meta_header;
+	struct nfp_meta_parsed meta = {};
+
+	if (unlikely(NFP_DESC_META_LEN(rxds) == 0))
+		return;
+
+	meta_base = rte_pktmbuf_mtod(mb, uint8_t *);
+	meta_base -= NFP_DESC_META_LEN(rxds);
+	meta_header = *(rte_be32_t *)meta_base;
+
+	switch (hw->meta_format) {
+	case NFP_NET_METAFORMAT_CHANINED:
+		if (nfp_net_parse_chained_meta(meta_base, meta_header, &meta)) {
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+			nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
+			nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		} else {
+			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
+		}
+		break;
+	case NFP_NET_METAFORMAT_SINGLE:
+		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
+			nfp_net_parse_single_meta(meta_base, meta_header, &meta);
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+		}
+		break;
+	default:
+		PMD_RX_LOG(DEBUG, "RX metadata do not exist.");
+	}
+}
+
 /*
  * RX path design:
  *
@@ -341,7 +374,6 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	struct nfp_net_hw *hw;
 	struct rte_mbuf *mb;
 	struct rte_mbuf *new_mb;
-	struct nfp_meta_parsed meta;
 	uint16_t nb_hold;
 	uint64_t dma_addr;
 	uint16_t avail;
@@ -437,11 +469,7 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		mb->next = NULL;
 		mb->port = rxq->port_id;
 
-		memset(&meta, 0, sizeof(meta));
-		nfp_net_parse_meta(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_hash(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		nfp_net_parse_meta(rxds, rxq, hw, mb);
 
 		/* Checking the checksum flag */
 		nfp_net_rx_cksum(rxq, rxds, mb);
-- 
2.29.3


^ permalink raw reply	[relevance 3%]

* [PATCH 2/2] net/nfp: modify RSS's processing logic
  @ 2023-02-21  3:10  3% ` Chaoyong He
    1 sibling, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-21  3:10 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Long Wu, Chaoyong He

From: Long Wu <long.wu@corigine.com>

The initial logic only support the single type metadata and this
commit add the support of chained type metadata. This commit also
make the relation between the RSS capability (v1/v2) and these
two types of metadata more clear.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 drivers/net/nfp/nfp_common.c    |  23 +++++++
 drivers/net/nfp/nfp_common.h    |   7 +++
 drivers/net/nfp/nfp_ctrl.h      |  18 +++++-
 drivers/net/nfp/nfp_ethdev.c    |   7 +--
 drivers/net/nfp/nfp_ethdev_vf.c |   7 +--
 drivers/net/nfp/nfp_rxtx.c      | 108 ++++++++++++++++++++------------
 6 files changed, 121 insertions(+), 49 deletions(-)

diff --git a/drivers/net/nfp/nfp_common.c b/drivers/net/nfp/nfp_common.c
index a545a10013..a1e37ada11 100644
--- a/drivers/net/nfp/nfp_common.c
+++ b/drivers/net/nfp/nfp_common.c
@@ -1584,6 +1584,29 @@ nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name)
 	return 0;
 }
 
+void
+nfp_net_init_metadata_format(struct nfp_net_hw *hw)
+{
+	/*
+	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
+	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
+	 * also indicate that we are using chained metadata.
+	 */
+	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) == 4) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHANINED;
+	} else if ((hw->cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHANINED;
+		/*
+		 * RSS is incompatible with chained metadata. hw->cap just represents
+		 * firmware's ability rather than the firmware's configuration. We decide
+		 * to reduce the confusion to allow us can use hw->cap to identify RSS later.
+		 */
+		hw->cap &= ~NFP_NET_CFG_CTRL_RSS;
+	} else {
+		hw->meta_format = NFP_NET_METAFORMAT_SINGLE;
+	}
+}
+
 /*
  * Local variables:
  * c-file-style: "Linux"
diff --git a/drivers/net/nfp/nfp_common.h b/drivers/net/nfp/nfp_common.h
index 980f3cad89..d33675eb99 100644
--- a/drivers/net/nfp/nfp_common.h
+++ b/drivers/net/nfp/nfp_common.h
@@ -127,6 +127,11 @@ enum nfp_qcp_ptr {
 	NFP_QCP_WRITE_PTR
 };
 
+enum nfp_net_meta_format {
+	NFP_NET_METAFORMAT_SINGLE,
+	NFP_NET_METAFORMAT_CHANINED,
+};
+
 struct nfp_pf_dev {
 	/* Backpointer to associated pci device */
 	struct rte_pci_device *pci_dev;
@@ -203,6 +208,7 @@ struct nfp_net_hw {
 	uint32_t max_mtu;
 	uint32_t mtu;
 	uint32_t rx_offset;
+	enum nfp_net_meta_format meta_format;
 
 	/* Current values for control */
 	uint32_t ctrl;
@@ -455,6 +461,7 @@ int nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc);
 int nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name);
+void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
 
 #define NFP_NET_DEV_PRIVATE_TO_HW(adapter)\
 	(&((struct nfp_net_adapter *)adapter)->hw)
diff --git a/drivers/net/nfp/nfp_ctrl.h b/drivers/net/nfp/nfp_ctrl.h
index 1069ff9485..bdc39f8974 100644
--- a/drivers/net/nfp/nfp_ctrl.h
+++ b/drivers/net/nfp/nfp_ctrl.h
@@ -110,6 +110,7 @@
 #define   NFP_NET_CFG_CTRL_MSIX_TX_OFF    (0x1 << 26) /* Disable MSIX for TX */
 #define   NFP_NET_CFG_CTRL_LSO2           (0x1 << 28) /* LSO/TSO (version 2) */
 #define   NFP_NET_CFG_CTRL_RSS2           (0x1 << 29) /* RSS (version 2) */
+#define   NFP_NET_CFG_CTRL_CSUM_COMPLETE  (0x1 << 30) /* Checksum complete */
 #define   NFP_NET_CFG_CTRL_LIVE_ADDR      (0x1U << 31)/* live MAC addr change */
 #define NFP_NET_CFG_UPDATE              0x0004
 #define   NFP_NET_CFG_UPDATE_GEN          (0x1 <<  0) /* General update */
@@ -135,6 +136,8 @@
 #define NFP_NET_CFG_CTRL_LSO_ANY (NFP_NET_CFG_CTRL_LSO | NFP_NET_CFG_CTRL_LSO2)
 #define NFP_NET_CFG_CTRL_RSS_ANY (NFP_NET_CFG_CTRL_RSS | NFP_NET_CFG_CTRL_RSS2)
 
+#define NFP_NET_CFG_CTRL_CHAIN_META (NFP_NET_CFG_CTRL_RSS2 | \
+					NFP_NET_CFG_CTRL_CSUM_COMPLETE)
 /*
  * Read-only words (0x0030 - 0x0050):
  * @NFP_NET_CFG_VERSION:     Firmware version number
@@ -218,7 +221,7 @@
 
 /*
  * RSS configuration (0x0100 - 0x01ac):
- * Used only when NFP_NET_CFG_CTRL_RSS is enabled
+ * Used only when NFP_NET_CFG_CTRL_RSS_ANY is enabled
  * @NFP_NET_CFG_RSS_CFG:     RSS configuration word
  * @NFP_NET_CFG_RSS_KEY:     RSS "secret" key
  * @NFP_NET_CFG_RSS_ITBL:    RSS indirection table
@@ -334,6 +337,19 @@
 /* PF multiport offset */
 #define NFP_PF_CSR_SLICE_SIZE	(32 * 1024)
 
+/*
+ * nfp_net_cfg_ctrl_rss() - Get RSS flag based on firmware's capability
+ * @hw_cap: The firmware's capabilities
+ */
+static inline uint32_t
+nfp_net_cfg_ctrl_rss(uint32_t hw_cap)
+{
+	if ((hw_cap & NFP_NET_CFG_CTRL_RSS2) != 0)
+		return NFP_NET_CFG_CTRL_RSS2;
+
+	return NFP_NET_CFG_CTRL_RSS;
+}
+
 #endif /* _NFP_CTRL_H_ */
 /*
  * Local variables:
diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c
index fed7b1ab13..47d5dff16c 100644
--- a/drivers/net/nfp/nfp_ethdev.c
+++ b/drivers/net/nfp/nfp_ethdev.c
@@ -134,10 +134,7 @@ nfp_net_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -611,6 +608,8 @@ nfp_net_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c
index c1f8a0fa0f..7834b2ee0c 100644
--- a/drivers/net/nfp/nfp_ethdev_vf.c
+++ b/drivers/net/nfp/nfp_ethdev_vf.c
@@ -95,10 +95,7 @@ nfp_netvf_start(struct rte_eth_dev *dev)
 	if (rxmode->mq_mode & RTE_ETH_MQ_RX_RSS) {
 		nfp_net_rss_config_default(dev);
 		update |= NFP_NET_CFG_UPDATE_RSS;
-		if (hw->cap & NFP_NET_CFG_CTRL_RSS2)
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS2;
-		else
-			new_ctrl |= NFP_NET_CFG_CTRL_RSS;
+		new_ctrl |= nfp_net_cfg_ctrl_rss(hw->cap);
 	}
 
 	/* Enable device */
@@ -373,6 +370,8 @@ nfp_netvf_init(struct rte_eth_dev *eth_dev)
 	if (hw->cap & NFP_NET_CFG_CTRL_LSO2)
 		hw->cap &= ~NFP_NET_CFG_CTRL_TXVLAN;
 
+	nfp_net_init_metadata_format(hw);
+
 	if (NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2)
 		hw->rx_offset = NFP_NET_RX_OFFSET;
 	else
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index 17a04cec5e..1c5a230145 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -116,26 +116,18 @@ nfp_net_rx_queue_count(void *rx_queue)
 	return count;
 }
 
-/* nfp_net_parse_meta() - Parse the metadata from packet */
-static void
-nfp_net_parse_meta(struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mbuf)
+/* nfp_net_parse_chained_meta() - Parse the chained metadata from packet */
+static bool
+nfp_net_parse_chained_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
 {
+	uint8_t *meta_offset;
 	uint32_t meta_info;
 	uint32_t vlan_info;
-	uint8_t *meta_offset;
-	struct nfp_net_hw *hw = rxq->hw;
 
-	if (unlikely((NFD_CFG_MAJOR_VERSION_of(hw->ver) < 2) ||
-			NFP_DESC_META_LEN(rxd) == 0))
-		return;
-
-	meta_offset = rte_pktmbuf_mtod(mbuf, uint8_t *);
-	meta_offset -= NFP_DESC_META_LEN(rxd);
-	meta_info = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-	meta_offset += 4;
+	meta_info = rte_be_to_cpu_32(meta_header);
+	meta_offset = meta_base + 4;
 
 	for (; meta_info != 0; meta_info >>= NFP_NET_META_FIELD_SIZE, meta_offset += 4) {
 		switch (meta_info & NFP_NET_META_FIELD_MASK) {
@@ -157,9 +149,11 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
 			break;
 		default:
 			/* Unsupported metadata can be a performance issue */
-			return;
+			return false;
 		}
 	}
+
+	return true;
 }
 
 /*
@@ -170,33 +164,18 @@ nfp_net_parse_meta(struct nfp_meta_parsed *meta,
  */
 static void
 nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mbuf)
 {
-	uint32_t hash;
-	uint32_t hash_type;
 	struct nfp_net_hw *hw = rxq->hw;
 
 	if ((hw->ctrl & NFP_NET_CFG_CTRL_RSS_ANY) == 0)
 		return;
 
-	if (likely((hw->cap & NFP_NET_CFG_CTRL_RSS_ANY) != 0 &&
-			NFP_DESC_META_LEN(rxd) != 0)) {
-		hash = meta->hash;
-		hash_type = meta->hash_type;
-	} else {
-		if ((rxd->rxd.flags & PCIE_DESC_RX_RSS) == 0)
-			return;
-
-		hash = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_OFFSET);
-		hash_type = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_TYPE_OFFSET);
-	}
-
-	mbuf->hash.rss = hash;
+	mbuf->hash.rss = meta->hash;
 	mbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
 
-	switch (hash_type) {
+	switch (meta->hash_type) {
 	case NFP_NET_RSS_IPV4:
 		mbuf->packet_type |= RTE_PTYPE_INNER_L3_IPV4;
 		break;
@@ -223,6 +202,21 @@ nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
 	}
 }
 
+/*
+ * nfp_net_parse_single_meta() - Parse the single metadata
+ *
+ * The RSS hash and hash-type are prepended to the packet data.
+ * Get it from metadata area.
+ */
+static inline void
+nfp_net_parse_single_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
+{
+	meta->hash_type = rte_be_to_cpu_32(meta_header);
+	meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4));
+}
+
 /*
  * nfp_net_parse_meta_vlan() - Set mbuf vlan_strip data based on metadata info
  *
@@ -304,6 +298,45 @@ nfp_net_parse_meta_qinq(const struct nfp_meta_parsed *meta,
 	mb->ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
 }
 
+/* nfp_net_parse_meta() - Parse the metadata from packet */
+static void
+nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+		struct nfp_net_rxq *rxq,
+		struct nfp_net_hw *hw,
+		struct rte_mbuf *mb)
+{
+	uint8_t *meta_base;
+	rte_be32_t meta_header;
+	struct nfp_meta_parsed meta = {};
+
+	if (unlikely(NFP_DESC_META_LEN(rxds) == 0))
+		return;
+
+	meta_base = rte_pktmbuf_mtod(mb, uint8_t *);
+	meta_base -= NFP_DESC_META_LEN(rxds);
+	meta_header = *(rte_be32_t *)meta_base;
+
+	switch (hw->meta_format) {
+	case NFP_NET_METAFORMAT_CHANINED:
+		if (nfp_net_parse_chained_meta(meta_base, meta_header, &meta)) {
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+			nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
+			nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		} else {
+			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
+		}
+		break;
+	case NFP_NET_METAFORMAT_SINGLE:
+		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
+			nfp_net_parse_single_meta(meta_base, meta_header, &meta);
+			nfp_net_parse_meta_hash(&meta, rxq, mb);
+		}
+		break;
+	default:
+		PMD_RX_LOG(DEBUG, "RX metadata do not exist.");
+	}
+}
+
 /*
  * RX path design:
  *
@@ -341,7 +374,6 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 	struct nfp_net_hw *hw;
 	struct rte_mbuf *mb;
 	struct rte_mbuf *new_mb;
-	struct nfp_meta_parsed meta;
 	uint16_t nb_hold;
 	uint64_t dma_addr;
 	uint16_t avail;
@@ -437,11 +469,7 @@ nfp_net_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		mb->next = NULL;
 		mb->port = rxq->port_id;
 
-		memset(&meta, 0, sizeof(meta));
-		nfp_net_parse_meta(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_hash(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_vlan(&meta, rxds, rxq, mb);
-		nfp_net_parse_meta_qinq(&meta, rxq, mb);
+		nfp_net_parse_meta(rxds, rxq, hw, mb);
 
 		/* Checking the checksum flag */
 		nfp_net_rx_cksum(rxq, rxds, mb);
-- 
2.29.3


^ permalink raw reply	[relevance 3%]

* Re: [EXT] Re: [PATCH v11 1/4] lib: add generic support for reading PMU events
  @ 2023-02-21  0:48  3%                     ` Konstantin Ananyev
  2023-02-27  8:12  0%                       ` Tomasz Duszynski
  0 siblings, 1 reply; 200+ results
From: Konstantin Ananyev @ 2023-02-21  0:48 UTC (permalink / raw)
  To: Tomasz Duszynski, Konstantin Ananyev, dev


>>>>>>>>> diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h new file
>>>>>>>>> mode
>>>>>>>>> 100644 index 0000000000..6b664c3336
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/lib/pmu/rte_pmu.h
>>>>>>>>> @@ -0,0 +1,212 @@
>>>>>>>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>>>>>>>> + * Copyright(c) 2023 Marvell  */
>>>>>>>>> +
>>>>>>>>> +#ifndef _RTE_PMU_H_
>>>>>>>>> +#define _RTE_PMU_H_
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @file
>>>>>>>>> + *
>>>>>>>>> + * PMU event tracing operations
>>>>>>>>> + *
>>>>>>>>> + * This file defines generic API and types necessary to
>>>>>>>>> +setup PMU and
>>>>>>>>> + * read selected counters in runtime.
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>> +#ifdef __cplusplus
>>>>>>>>> +extern "C" {
>>>>>>>>> +#endif
>>>>>>>>> +
>>>>>>>>> +#include <linux/perf_event.h>
>>>>>>>>> +
>>>>>>>>> +#include <rte_atomic.h>
>>>>>>>>> +#include <rte_branch_prediction.h> #include <rte_common.h>
>>>>>>>>> +#include <rte_compat.h> #include <rte_spinlock.h>
>>>>>>>>> +
>>>>>>>>> +/** Maximum number of events in a group */ #define
>>>>>>>>> +MAX_NUM_GROUP_EVENTS 8
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * A structure describing a group of events.
>>>>>>>>> + */
>>>>>>>>> +struct rte_pmu_event_group {
>>>>>>>>> +	struct perf_event_mmap_page
>>>>>>>>> +*mmap_pages[MAX_NUM_GROUP_EVENTS];
>>>>>>>>> +/**< array of user pages
>>>>>> */
>>>>>>>>> +	int fds[MAX_NUM_GROUP_EVENTS]; /**< array of event descriptors */
>>>>>>>>> +	bool enabled; /**< true if group was enabled on particular lcore */
>>>>>>>>> +	TAILQ_ENTRY(rte_pmu_event_group) next; /**< list entry */ }
>>>>>>>>> +__rte_cache_aligned;
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * A structure describing an event.
>>>>>>>>> + */
>>>>>>>>> +struct rte_pmu_event {
>>>>>>>>> +	char *name; /**< name of an event */
>>>>>>>>> +	unsigned int index; /**< event index into fds/mmap_pages */
>>>>>>>>> +	TAILQ_ENTRY(rte_pmu_event) next; /**< list entry */ };
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * A PMU state container.
>>>>>>>>> + */
>>>>>>>>> +struct rte_pmu {
>>>>>>>>> +	char *name; /**< name of core PMU listed under /sys/bus/event_source/devices */
>>>>>>>>> +	rte_spinlock_t lock; /**< serialize access to event group list */
>>>>>>>>> +	TAILQ_HEAD(, rte_pmu_event_group) event_group_list; /**< list of event groups */
>>>>>>>>> +	unsigned int num_group_events; /**< number of events in a group */
>>>>>>>>> +	TAILQ_HEAD(, rte_pmu_event) event_list; /**< list of matching events */
>>>>>>>>> +	unsigned int initialized; /**< initialization counter */ };
>>>>>>>>> +
>>>>>>>>> +/** lcore event group */
>>>>>>>>> +RTE_DECLARE_PER_LCORE(struct rte_pmu_event_group,
>>>>>>>>> +_event_group);
>>>>>>>>> +
>>>>>>>>> +/** PMU state container */
>>>>>>>>> +extern struct rte_pmu rte_pmu;
>>>>>>>>> +
>>>>>>>>> +/** Each architecture supporting PMU needs to provide its
>>>>>>>>> +own version */ #ifndef rte_pmu_pmc_read #define
>>>>>>>>> +rte_pmu_pmc_read(index) ({ 0; }) #endif
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Read PMU counter.
>>>>>>>>> + *
>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>> + *
>>>>>>>>> + * @param pc
>>>>>>>>> + *   Pointer to the mmapped user page.
>>>>>>>>> + * @return
>>>>>>>>> + *   Counter value read from hardware.
>>>>>>>>> + */
>>>>>>>>> +static __rte_always_inline uint64_t
>>>>>>>>> +__rte_pmu_read_userpage(struct perf_event_mmap_page *pc) {
>>>>>>>>> +	uint64_t width, offset;
>>>>>>>>> +	uint32_t seq, index;
>>>>>>>>> +	int64_t pmc;
>>>>>>>>> +
>>>>>>>>> +	for (;;) {
>>>>>>>>> +		seq = pc->lock;
>>>>>>>>> +		rte_compiler_barrier();
>>>>>>>>
>>>>>>>> Are you sure that compiler_barrier() is enough here?
>>>>>>>> On some archs CPU itself has freedom to re-order reads.
>>>>>>>> Or I am missing something obvious here?
>>>>>>>>
>>>>>>>
>>>>>>> It's a matter of not keeping old stuff cached in registers and
>>>>>>> making sure that we have two reads of lock. CPU reordering won't
>>>>>>> do any harm here.
>>>>>>
>>>>>> Sorry, I didn't get you here:
>>>>>> Suppose CPU will re-order reads and will read lock *after* index or offset value.
>>>>>> Wouldn't it mean that in that case index and/or offset can contain old/invalid values?
>>>>>>
>>>>>
>>>>> This number is just an indicator whether kernel did change something or not.
>>>>
>>>> You are talking about pc->lock, right?
>>>> Yes, I do understand that it is sort of seqlock.
>>>> That's why I am puzzled why we do not care about possible cpu read-reordering.
>>>> Manual for perf_event_open() also has a code snippet with compiler barrier only...
>>>>
>>>>> If cpu reordering will come into play then this will not change anything from pov of this
>> loop.
>>>>> All we want is fresh data when needed and no involvement of
>>>>> compiler when it comes to reordering code.
>>>>
>>>> Ok, can you probably explain to me why the following could not happen:
>>>> T0:
>>>> pc->seqlock==0; pc->index==I1; pc->offset==O1;
>>>> T1:
>>>>       cpu #0 read pmu (due to cpu read reorder, we get index value before seqlock):
>>>>        index=pc->index;  //index==I1;
>>>> T2:
>>>>       cpu #1 kernel vent_update_userpage:
>>>>       pc->lock++; // pc->lock==1
>>>>       pc->index=I2;
>>>>       pc->offset=O2;
>>>>       ...
>>>>       pc->lock++; //pc->lock==2
>>>> T3:
>>>>       cpu #0 continue with read pmu:
>>>>       seq=pc->lock; //seq == 2
>>>>        offset=pc->offset; // offset == O2
>>>>        ....
>>>>        pmc = rte_pmu_pmc_read(index - 1);  // Note that we read at I1, not I2
>>>>        offset += pmc; //offset == O2 + pmcread(I1-1);
>>>>        if (pc->lock == seq) // they are equal, return
>>>>              return offset;
>>>>
>>>> Or, it can happen, but by some reason we don't care much?
>>>>
>>>
>>> This code does self-monitoring and user page (whole group actually) is
>>> per thread running on current cpu. Hence I am not sure what are you trying to prove with that
>> example.
>>
>> I am not trying to prove anything so far.
>> I am asking is such situation possible or not, and if not, why?
>> My current understanding (possibly wrong) is that after you mmaped these pages, kernel still can
>> asynchronously update them.
>> So, when reading the data from these pages you have to check 'lock' value before and after
>> accessing other data.
>> If so, why possible cpu read-reordering doesn't matter?
>>
> 
> Look. I'll reiterate that.
> 
> 1. That user page/group/PMU config is per process. Other processes do not access that.

Ok, that's clear.


>     All this happens on the very same CPU where current thread is running.

Ok... but can't this page be updated by kernel thread running 
simultaneously on different CPU?


> 2. Suppose you've already read seq. Now for some reason kernel updates data in page seq was read from.
> 3. Kernel will enter critical section during update. seq changes along with other data without app knowing about it.
>     If you want nitty gritty details consult kernel sources.

Look, I don't have to beg you to answer these questions.
In fact, I expect library author to document all such narrow things 
clearly either in in PG, or in source code comments (ideally in both).
If not, then from my perspective the patch is not ready stage and 
shouldn't be accepted.
I don't know is compiler-barrier is enough here or not, but I think it 
is definitely worth a clear explanation in the docs.
I suppose it wouldn't be only me who will get confused here.
So please take an effort and document it clearly why you believe there 
is no race-condition.

> 4. app resumes and has some stale data but *WILL* read new seq. Code loops again because values do not match.

If the kernel will always execute update for this page in the same 
thread context, then yes, - user code will always note the difference
after resume.
But why it can't happen that your user-thread reads this page on one 
CPU, while some kernel code on other CPU updates it simultaneously?


> 5. Otherwise seq values match and data is valid.
> 
>> Also there was another question below, which you probably  missed, so I copied it here:
>> Another question - do we really need  to have __rte_pmu_read_userpage() and rte_pmu_read() as
>> static inline functions in public header?
>> As I understand, because of that we also have to make 'struct rte_pmu_*'
>> definitions also public.
>>
> 
> These functions need to be inlined otherwise performance takes a hit.

I understand that perfomance might be affected, but how big is hit?
I expect actual PMU read will not be free anyway, right?
If the diff is small, might be it is worth to go for such change,
removing unneeded structures from public headers would help a lot in 
future in terms of ABI/API stability.



>>>
>>>>>>>
>>>>>>>>> +		index = pc->index;
>>>>>>>>> +		offset = pc->offset;
>>>>>>>>> +		width = pc->pmc_width;
>>>>>>>>> +
>>>>>>>>> +		/* index set to 0 means that particular counter cannot be used */
>>>>>>>>> +		if (likely(pc->cap_user_rdpmc && index)) {
>>>>>>>>> +			pmc = rte_pmu_pmc_read(index - 1);
>>>>>>>>> +			pmc <<= 64 - width;
>>>>>>>>> +			pmc >>= 64 - width;
>>>>>>>>> +			offset += pmc;
>>>>>>>>> +		}
>>>>>>>>> +
>>>>>>>>> +		rte_compiler_barrier();
>>>>>>>>> +
>>>>>>>>> +		if (likely(pc->lock == seq))
>>>>>>>>> +			return offset;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Enable group of events on the calling lcore.
>>>>>>>>> + *
>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>> + *
>>>>>>>>> + * @return
>>>>>>>>> + *   0 in case of success, negative value otherwise.
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +int
>>>>>>>>> +__rte_pmu_enable_group(void);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Initialize PMU library.
>>>>>>>>> + *
>>>>>>>>> + * @warning This should be not called directly.
>>>>>>>>> + *
>>>>>>>>> + * @return
>>>>>>>>> + *   0 in case of success, negative value otherwise.
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +int
>>>>>>>>> +rte_pmu_init(void);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Finalize PMU library. This should be called after PMU
>>>>>>>>> +counters are no longer being
>>>> read.
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +void
>>>>>>>>> +rte_pmu_fini(void);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Add event to the group of enabled events.
>>>>>>>>> + *
>>>>>>>>> + * @param name
>>>>>>>>> + *   Name of an event listed under /sys/bus/event_source/devices/pmu/events.
>>>>>>>>> + * @return
>>>>>>>>> + *   Event index in case of success, negative value otherwise.
>>>>>>>>> + */
>>>>>>>>> +__rte_experimental
>>>>>>>>> +int
>>>>>>>>> +rte_pmu_add_event(const char *name);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * @warning
>>>>>>>>> + * @b EXPERIMENTAL: this API may change without prior notice
>>>>>>>>> + *
>>>>>>>>> + * Read hardware counter configured to count occurrences of an event.
>>>>>>>>> + *
>>>>>>>>> + * @param index
>>>>>>>>> + *   Index of an event to be read.
>>>>>>>>> + * @return
>>>>>>>>> + *   Event value read from register. In case of errors or lack of support
>>>>>>>>> + *   0 is returned. In other words, stream of zeros in a trace file
>>>>>>>>> + *   indicates problem with reading particular PMU event register.
>>>>>>>>> + */
>>>>
>>>> Another question - do we really need  to have
>>>> __rte_pmu_read_userpage() and rte_pmu_read() as static inline functions in public header?
>>>> As I understand, because of that we also have to make 'struct rte_pmu_*'
>>>> definitions also public.
>>>>
>>>>>>>>> +__rte_experimental
>>>>>>>>> +static __rte_always_inline uint64_t rte_pmu_read(unsigned
>>>>>>>>> +int
>>>>>>>>> +index) {
>>>>>>>>> +	struct rte_pmu_event_group *group = &RTE_PER_LCORE(_event_group);
>>>>>>>>> +	int ret;
>>>>>>>>> +
>>>>>>>>> +	if (unlikely(!rte_pmu.initialized))
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	if (unlikely(!group->enabled)) {
>>>>>>>>> +		ret = __rte_pmu_enable_group();
>>>>>>>>> +		if (ret)
>>>>>>>>> +			return 0;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	if (unlikely(index >= rte_pmu.num_group_events))
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	return __rte_pmu_read_userpage(group->mmap_pages[index]);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +#ifdef __cplusplus
>>>>>>>>> +}
>>>>>>>>> +#endif
>>>>>>>>> +
> 


^ permalink raw reply	[relevance 3%]

* [PATCH v8 21/22] hash: move rte_hash_set_alg out header
  2023-02-20 23:35  3% ` [PATCH v8 00/22] Convert static logtypes in libraries Stephen Hemminger
@ 2023-02-20 23:35  3%   ` Stephen Hemminger
  2023-02-21 15:02  0%     ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-20 23:35 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v8 00/22] Convert static logtypes in libraries
                     ` (3 preceding siblings ...)
  2023-02-15 17:23  3% ` [PATCH v7 00/22] Replace use of static logtypes in libraries Stephen Hemminger
@ 2023-02-20 23:35  3% ` Stephen Hemminger
  2023-02-20 23:35  3%   ` [PATCH v8 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-21 19:01  2% ` [PATCH v9 00/22] Convert static logtypes in libraries Stephen Hemminger
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-20 23:35 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

There are several options on how to treat the old static types:
	- leave them there
	- mark the definitions as deprecated
	- remove them
This version removes them since there is no guarantee in current
DPDK policies that says they can't be removed.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v8 - rebase and fix CI issues on Arm
     simplify the mempool logtype patch

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  4 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  2 +
 lib/mempool/rte_mempool.h         |  8 ++++
 lib/mempool/version.map           |  3 ++
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 73 files changed, 383 insertions(+), 169 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v1 04/13] graph: add get/set graph worker model APIs
  @ 2023-02-20 13:50  3%   ` Jerin Jacob
  2023-02-24  6:31  0%     ` Yan, Zhirun
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-02-20 13:50 UTC (permalink / raw)
  To: Zhirun Yan
  Cc: dev, jerinj, kirankumark, ndabilpuram, cunming.liang, haiyue.wang

On Thu, Nov 17, 2022 at 10:40 AM Zhirun Yan <zhirun.yan@intel.com> wrote:
>
> Add new get/set APIs to configure graph worker model which is used to
> determine which model will be chosen.
>
> Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
> Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> Signed-off-by: Zhirun Yan <zhirun.yan@intel.com>
> ---
>  lib/graph/rte_graph_worker.h        | 51 +++++++++++++++++++++++++++++
>  lib/graph/rte_graph_worker_common.h | 13 ++++++++
>  lib/graph/version.map               |  3 ++
>  3 files changed, 67 insertions(+)
>
> diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
> index 54d1390786..a0ea0df153 100644
> --- a/lib/graph/rte_graph_worker.h
> +++ b/lib/graph/rte_graph_worker.h
> @@ -1,5 +1,56 @@
>  #include "rte_graph_model_rtc.h"
>
> +static enum rte_graph_worker_model worker_model = RTE_GRAPH_MODEL_DEFAULT;

This will break the multiprocess.

> +
> +/** Graph worker models */
> +enum rte_graph_worker_model {
> +#define WORKER_MODEL_DEFAULT "default"

Why need strings?
Also, every symbol in a public header file should start with RTE_ to
avoid namespace conflict.

> +       RTE_GRAPH_MODEL_DEFAULT = 0,
> +#define WORKER_MODEL_RTC "rtc"
> +       RTE_GRAPH_MODEL_RTC,

Why not RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT in enum itself.

> +#define WORKER_MODEL_GENERIC "generic"

Generic is a very overloaded term. Use pipeline here i.e
RTE_GRAPH_MODEL_PIPELINE


> +       RTE_GRAPH_MODEL_GENERIC,
> +       RTE_GRAPH_MODEL_MAX,

No need for MAX, it will break the ABI for future. See other subsystem
such as cryptodev.

> +};

>

^ permalink raw reply	[relevance 3%]

* [PATCH v2 3/3] doc: add Corigine information to nfp documentation
  @ 2023-02-20  8:41  8%   ` Chaoyong He
  0 siblings, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-20  8:41 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, Walter Heymans, Chaoyong He

From: Walter Heymans <walter.heymans@corigine.com>

Add Corigine information to the nfp documentation. The Network Flow
Processor (NFP) PMD is used by products from both Netronome and
Corigine.

Signed-off-by: Walter Heymans <walter.heymans@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
---
 doc/guides/nics/nfp.rst | 78 +++++++++++++++++++++++++----------------
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
index d133b6385c..f102238a28 100644
--- a/doc/guides/nics/nfp.rst
+++ b/doc/guides/nics/nfp.rst
@@ -1,19 +1,18 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2015-2017 Netronome Systems, Inc. All rights reserved.
-    All rights reserved.
+    Copyright(c) 2021 Corigine, Inc. All rights reserved.
 
 NFP poll mode driver library
 ============================
 
-Netronome's sixth generation of flow processors pack 216 programmable
-cores and over 100 hardware accelerators that uniquely combine packet,
-flow, security and content processing in a single device that scales
+Netronome and Corigine's sixth generation of flow processors pack 216
+programmable cores and over 100 hardware accelerators that uniquely combine
+packet, flow, security and content processing in a single device that scales
 up to 400-Gb/s.
 
-This document explains how to use DPDK with the Netronome Poll Mode
-Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
-(NFP-6xxx), Netronome's Network Flow Processor 4xxx (NFP-4xxx) and
-Netronome's Network Flow Processor 38xx (NFP-38xx).
+This document explains how to use DPDK with the Network Flow Processor (NFP)
+Poll Mode Driver (PMD) supporting Netronome and Corigine's NFP-6xxx, NFP-4xxx
+and NFP-38xx product lines.
 
 NFP is a SR-IOV capable device and the PMD supports the physical
 function (PF) and the virtual functions (VFs).
@@ -21,15 +20,16 @@ function (PF) and the virtual functions (VFs).
 Dependencies
 ------------
 
-Before using the Netronome's DPDK PMD some NFP configuration,
+Before using the NFP DPDK PMD some NFP configuration,
 which is not related to DPDK, is required. The system requires
-installation of **Netronome's BSP (Board Support Package)** along
-with a specific NFP firmware application. Netronome's NSP ABI
+installation of the **nfp-bsp (Board Support Package)** along
+with a specific NFP firmware application. The NSP ABI
 version should be 0.20 or higher.
 
-If you have a NFP device you should already have the code and
-documentation for this configuration. Contact
-**support@netronome.com** to obtain the latest available firmware.
+If you have a NFP device you should already have the documentation to perform
+this configuration. Contact **support@netronome.com** (for Netronome products)
+or **smartnic-support@corigine.com** (for Corigine products) to obtain the
+latest available firmware.
 
 The NFP Linux netdev kernel driver for VFs has been a part of the
 vanilla kernel since kernel version 4.5, and support for the PF
@@ -44,9 +44,9 @@ Linux kernel driver.
 Building the software
 ---------------------
 
-Netronome's PMD code is provided in the **drivers/net/nfp** directory.
-Although NFP PMD has Netronome´s BSP dependencies, it is possible to
-compile it along with other DPDK PMDs even if no BSP was installed previously.
+The NFP PMD code is provided in the **drivers/net/nfp** directory. Although
+NFP PMD has BSP dependencies, it is possible to compile it along with other
+DPDK PMDs even if no BSP was installed previously.
 Of course, a DPDK app will require such a BSP installed for using the
 NFP PMD, along with a specific NFP firmware application.
 
@@ -68,9 +68,9 @@ like uploading the firmware and configure the Link state properly when starting
 or stopping a PF port. Since DPDK 18.05 the firmware upload happens when
 a PF is initialized, which was not always true with older DPDK versions.
 
-Depending on the Netronome product installed in the system, firmware files
-should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
-PF looks for a firmware file in this order:
+Depending on the product installed in the system, firmware files should be
+available under ``/lib/firmware/netronome``. DPDK PMD supporting the PF looks
+for a firmware file in this order:
 
 	1) First try to find a firmware image specific for this device using the
 	   NFP serial number:
@@ -85,19 +85,22 @@ PF looks for a firmware file in this order:
 
 		nic_AMDA0099-0001_2x25.nffw
 
-Netronome's software packages install firmware files under
-``/lib/firmware/netronome`` to support all the Netronome's SmartNICs and
-different firmware applications. This is usually done using file names based on
-SmartNIC type and media and with a directory per firmware application. Options
-1 and 2 for firmware filenames allow more than one SmartNIC, same type of
-SmartNIC or different ones, and to upload a different firmware to each
+Netronome and Corigine's software packages install firmware files under
+``/lib/firmware/netronome`` to support all the Netronome and Corigine SmartNICs
+and different firmware applications. This is usually done using file names
+based on SmartNIC type and media and with a directory per firmware application.
+Options 1 and 2 for firmware filenames allow more than one SmartNIC, same type
+of SmartNIC or different ones, and to upload a different firmware to each
 SmartNIC.
 
    .. Note::
       Currently the NFP PMD supports using the PF with Agilio Firmware with
       NFD3 and Agilio Firmware with NFDk. See
-      https://help.netronome.com/support/solutions for more information on the
-      various firmwares supported by the Netronome Agilio CX smartNIC.
+      `Netronome Support <https://help.netronome.com/support/solutions>`_.
+      for more information on the various firmwares supported by the Netronome
+      Agilio SmartNIC range, or
+      `Corigine Support <https://www.corigine.com/productsOverviewList-30.html>`_.
+      for more information about Corigine's range.
 
 PF multiport support
 --------------------
@@ -164,6 +167,12 @@ System configuration
 
       lspci -d 19ee:
 
+   and on Corigine SmartNICs using:
+
+   .. code-block:: console
+
+      lspci -d 1da8:
+
    Now, for example, to configure two virtual functions on a NFP device
    whose PCI system identity is "0000:03:00.0":
 
@@ -171,12 +180,19 @@ System configuration
 
       echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs
 
-   The result of this command may be shown using lspci again:
+   The result of this command may be shown using lspci again on Netronome
+   SmartNICs:
 
    .. code-block:: console
 
       lspci -kd 19ee:
 
+   and on Corigine SmartNICs:
+
+   .. code-block:: console
+
+      lspci -kd 1da8:
+
    Two new PCI devices should appear in the output of the above command. The
    -k option shows the device driver, if any, that the devices are bound to.
    Depending on the modules loaded, at this point the new PCI devices may be
@@ -186,8 +202,8 @@ System configuration
 Flow offload
 ------------
 
-Use the flower firmware application, some type of Netronome's SmartNICs can
-offload the flow into cards.
+Using the flower firmware application, some types of Netronome or Corigine
+SmartNICs can offload the flows onto the cards.
 
 The flower firmware application requires the PMD running two services:
 
-- 
2.29.3


^ permalink raw reply	[relevance 8%]

* Re: [PATCH v3 6/6] test/dmadev: add tests for stopping and restarting dev
  2023-02-16 11:09  3%   ` [PATCH v3 6/6] test/dmadev: add tests for stopping and restarting dev Bruce Richardson
@ 2023-02-16 11:42  0%     ` fengchengwen
  0 siblings, 0 replies; 200+ results
From: fengchengwen @ 2023-02-16 11:42 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Kevin Laatz

Acked-by: Chengwen Feng <fengchengwen@huawei.com>

On 2023/2/16 19:09, Bruce Richardson wrote:
> Validate device operation when a device is stopped or restarted.
> 
> The only complication - and gap in the dmadev ABI specification - is
> what happens to the job ids on restart. Some drivers reset them to 0,
> while others continue where things left off. Take account of both
> possibilities in the test case.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Kevin Laatz <kevin.laatz@intel.com>

...

^ permalink raw reply	[relevance 0%]

* [PATCH v3 6/6] test/dmadev: add tests for stopping and restarting dev
  @ 2023-02-16 11:09  3%   ` Bruce Richardson
  2023-02-16 11:42  0%     ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-02-16 11:09 UTC (permalink / raw)
  To: dev; +Cc: fengchengwen, Bruce Richardson, Kevin Laatz

Validate device operation when a device is stopped or restarted.

The only complication - and gap in the dmadev ABI specification - is
what happens to the job ids on restart. Some drivers reset them to 0,
while others continue where things left off. Take account of both
possibilities in the test case.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>
---
 app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c
index 0296c52d2a..0736ff2a18 100644
--- a/app/test/test_dmadev.c
+++ b/app/test/test_dmadev.c
@@ -304,6 +304,48 @@ test_enqueue_copies(int16_t dev_id, uint16_t vchan)
 			|| do_multi_copies(dev_id, vchan, 0, 0, 1);
 }
 
+static int
+test_stop_start(int16_t dev_id, uint16_t vchan)
+{
+	/* device is already started on input, should be (re)started on output */
+
+	uint16_t id = 0;
+	enum rte_dma_status_code status = RTE_DMA_STATUS_SUCCESSFUL;
+
+	/* - test stopping a device works ok,
+	 * - then do a start-stop without doing a copy
+	 * - finally restart the device
+	 * checking for errors at each stage, and validating we can still copy at the end.
+	 */
+	if (rte_dma_stop(dev_id) < 0)
+		ERR_RETURN("Error stopping device\n");
+
+	if (rte_dma_start(dev_id) < 0)
+		ERR_RETURN("Error restarting device\n");
+	if (rte_dma_stop(dev_id) < 0)
+		ERR_RETURN("Error stopping device after restart (no jobs executed)\n");
+
+	if (rte_dma_start(dev_id) < 0)
+		ERR_RETURN("Error restarting device after multiple stop-starts\n");
+
+	/* before doing a copy, we need to know what the next id will be it should
+	 * either be:
+	 * - the last completed job before start if driver does not reset id on stop
+	 * - or -1 i.e. next job is 0, if driver does reset the job ids on stop
+	 */
+	if (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0)
+		ERR_RETURN("Error with rte_dma_completed_status when no job done\n");
+	id += 1; /* id_count is next job id */
+	if (id != id_count && id != 0)
+		ERR_RETURN("Unexpected next id from device after stop-start. Got %u, expected %u or 0\n",
+				id, id_count);
+
+	id_count = id;
+	if (test_single_copy(dev_id, vchan) < 0)
+		ERR_RETURN("Error performing copy after device restart\n");
+	return 0;
+}
+
 /* Failure handling test cases - global macros and variables for those tests*/
 #define COMP_BURST_SZ	16
 #define OPT_FENCE(idx) ((fence && idx == 8) ? RTE_DMA_OP_FLAG_FENCE : 0)
@@ -819,6 +861,10 @@ test_dmadev_instance(int16_t dev_id)
 	if (runtest("copy", test_enqueue_copies, 640, dev_id, vchan, CHECK_ERRS) < 0)
 		goto err;
 
+	/* run tests stopping/starting devices and check jobs still work after restart */
+	if (runtest("stop-start", test_stop_start, 1, dev_id, vchan, CHECK_ERRS) < 0)
+		goto err;
+
 	/* run some burst capacity tests */
 	if (rte_dma_burst_capacity(dev_id, vchan) < 64)
 		printf("DMA Dev %u: insufficient burst capacity (64 required), skipping tests\n",
-- 
2.37.2


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
  2023-02-16  1:24  0%         ` fengchengwen
@ 2023-02-16  9:24  0%           ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2023-02-16  9:24 UTC (permalink / raw)
  To: fengchengwen; +Cc: dev, Kevin Laatz

On Thu, Feb 16, 2023 at 09:24:38AM +0800, fengchengwen wrote:
> On 2023/2/15 19:57, Bruce Richardson wrote:
> > On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
> >> On 2023/1/17 1:37, Bruce Richardson wrote:
> >>> Validate device operation when a device is stopped or restarted.
> >>>
> >>> The only complication - and gap in the dmadev ABI specification - is
> >>> what happens to the job ids on restart. Some drivers reset them to 0,
> >>> while others continue where things left off. Take account of both
> >>> possibilities in the test case.
> >>>
> >>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> >>> app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 46 insertions(+)
> >>>
> >>> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
> >>> de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
> >>> b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
> >>> test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
> >>> do_multi_copies(dev_id, vchan, 0, 0, 1); }
> >>>  
> >>> +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
> >>> device is already started on input, should be (re)started on output */
> >>> + +	uint16_t id = 0; +	enum rte_dma_status_code status =
> >>> RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
> >>> ok, +	 * - then do a start-stop without doing a copy +	 *
> >>> - finally restart the device +	 * checking for errors at each
> >>> stage, and validating we can still copy at the end.  +	 */ +	if
> >>> (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
> >>> device\n"); + +	if (rte_dma_start(dev_id) < 0) +
> >>> ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
> >>> 0) +		ERR_RETURN("Error stopping device after restart (no
> >>> jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
> >>> ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
> >>> /* before doing a copy, we need to know what the next id will be it
> >>> should +	 * either be: +	 * - the last completed job before start if
> >>> driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
> >>> driver does reset the job ids on stop +	 */ +	if
> >>> (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
> >>> ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
> >>> id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
> >>> 0) +		ERR_RETURN("Unexpected next id from device after
> >>> stop-start. Got %u, expected %u or 0\n", +				id,
> >>> id_count);
> >>
> >> Hi Bruce,
> >>
> >> Suggest add a warn LOG to identify the id was not reset zero.  So that
> >> new driver could detect break ABI specification.
> >>
> > Not sure that that is necessary. The actual ABI, nor the doxygen docs,
> > doesn't specify what happens to the values on doing stop and then start. My
> > thinking was that it should continue numbering as it would be equivalent to
> > suspend and resume, but other drivers appear to treat it as a "reset". I
> > suspect there are advantages and disadvantages to both schemes. Until we
> > decide on what the correct behaviour should be - or decide to allow both -
> > I don't think warning is the right thing to do here.
> 
> In this point, agree to upstream this patch first, and then discuss the correct
> behavior should be for restart scenario.
> 
+1. Thanks.

With this patch in place we will also be better able to help drivers
enforce the correct behaviour once we define it.

I'll do v3 keeping this as-is for now.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
  2023-02-15 11:57  3%       ` Bruce Richardson
@ 2023-02-16  1:24  0%         ` fengchengwen
  2023-02-16  9:24  0%           ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2023-02-16  1:24 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Kevin Laatz

On 2023/2/15 19:57, Bruce Richardson wrote:
> On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
>> On 2023/1/17 1:37, Bruce Richardson wrote:
>>> Validate device operation when a device is stopped or restarted.
>>>
>>> The only complication - and gap in the dmadev ABI specification - is
>>> what happens to the job ids on restart. Some drivers reset them to 0,
>>> while others continue where things left off. Take account of both
>>> possibilities in the test case.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
>>> app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 46 insertions(+)
>>>
>>> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
>>> de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
>>> b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
>>> test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
>>> do_multi_copies(dev_id, vchan, 0, 0, 1); }
>>>  
>>> +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
>>> device is already started on input, should be (re)started on output */
>>> + +	uint16_t id = 0; +	enum rte_dma_status_code status =
>>> RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
>>> ok, +	 * - then do a start-stop without doing a copy +	 *
>>> - finally restart the device +	 * checking for errors at each
>>> stage, and validating we can still copy at the end.  +	 */ +	if
>>> (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
>>> device\n"); + +	if (rte_dma_start(dev_id) < 0) +
>>> ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
>>> 0) +		ERR_RETURN("Error stopping device after restart (no
>>> jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
>>> ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
>>> /* before doing a copy, we need to know what the next id will be it
>>> should +	 * either be: +	 * - the last completed job before start if
>>> driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
>>> driver does reset the job ids on stop +	 */ +	if
>>> (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
>>> ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
>>> id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
>>> 0) +		ERR_RETURN("Unexpected next id from device after
>>> stop-start. Got %u, expected %u or 0\n", +				id,
>>> id_count);
>>
>> Hi Bruce,
>>
>> Suggest add a warn LOG to identify the id was not reset zero.  So that
>> new driver could detect break ABI specification.
>>
> Not sure that that is necessary. The actual ABI, nor the doxygen docs,
> doesn't specify what happens to the values on doing stop and then start. My
> thinking was that it should continue numbering as it would be equivalent to
> suspend and resume, but other drivers appear to treat it as a "reset". I
> suspect there are advantages and disadvantages to both schemes. Until we
> decide on what the correct behaviour should be - or decide to allow both -
> I don't think warning is the right thing to do here.

In this point, agree to upstream this patch first, and then discuss the correct
behavior should be for restart scenario.

> 
> /Bruce
> 
> .
> 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: update NFP documentation with Corigine information
  2023-02-15 13:37  0% ` Ferruh Yigit
@ 2023-02-15 17:58  0%   ` Niklas Söderlund
  0 siblings, 0 replies; 200+ results
From: Niklas Söderlund @ 2023-02-15 17:58 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Chaoyong He, dev, oss-drivers, Walter Heymans

Hello Ferruh,

Thanks for your feedback.

On 2023-02-15 13:37:05 +0000, Ferruh Yigit wrote:
> On 2/3/2023 8:08 AM, Chaoyong He wrote:
> > From: Walter Heymans <walter.heymans@corigine.com>
> > 
> > The NFP PMD documentation is updated to include information about
> > Corigine and their new vendor device ID.
> > 
> > Outdated information regarding the use of the PMD is also updated.
> > 
> > While making major changes to the document, the maximum number of
> > characters per line is updated to 80 characters to improve the
> > readability in raw format.
> > 
> 
> There are three groups of changes done to documentation as explained in
> three paragraphs above.
> 
> To help review, is it possible to separate this patch into three
> patches? Later they can be squashed and merged as a single patch.
> But as it is, easy to miss content changes among formatting changes.
> 
> (You can include simple grammar updates (that doesn't change either
> content or Corigine related information) to formatting update patch)

We will break this patch in to three as you suggest, address the 
comments below and post a v2.

> 
> 
> > Signed-off-by: Walter Heymans <walter.heymans@corigine.com>
> > Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
> > Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> > ---
> >  doc/guides/nics/nfp.rst | 168 +++++++++++++++++++++-------------------
> >  1 file changed, 90 insertions(+), 78 deletions(-)
> > 
> > diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
> > index a085d7d9ae..6fea280411 100644
> > --- a/doc/guides/nics/nfp.rst
> > +++ b/doc/guides/nics/nfp.rst
> > @@ -1,35 +1,34 @@
> >  ..  SPDX-License-Identifier: BSD-3-Clause
> >      Copyright(c) 2015-2017 Netronome Systems, Inc. All rights reserved.
> > -    All rights reserved.
> > +    Copyright(c) 2021 Corigine, Inc. All rights reserved.
> >  
> >  NFP poll mode driver library
> >  ============================
> >  
> > -Netronome's sixth generation of flow processors pack 216 programmable
> > -cores and over 100 hardware accelerators that uniquely combine packet,
> > -flow, security and content processing in a single device that scales
> > +Netronome and Corigine's sixth generation of flow processors pack 216
> > +programmable cores and over 100 hardware accelerators that uniquely combine
> > +packet, flow, security and content processing in a single device that scales
> >  up to 400-Gb/s.
> >  
> > -This document explains how to use DPDK with the Netronome Poll Mode
> > -Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
> > -(NFP-6xxx), Netronome's Network Flow Processor 4xxx (NFP-4xxx) and
> > -Netronome's Network Flow Processor 38xx (NFP-38xx).
> > +This document explains how to use DPDK with the Network Flow Processor (NFP)
> > +Poll Mode Driver (PMD) supporting Netronome and Corigine's NFP-6xxx, NFP-4xxx
> > +and NFP-38xx product lines.
> >  
> > -NFP is a SRIOV capable device and the PMD supports the physical
> > -function (PF) and the virtual functions (VFs).
> > +NFP is a SR-IOV capable device and the PMD supports the physical function (PF)
> > +and the virtual functions (VFs).
> >  
> >  Dependencies
> >  ------------
> >  
> > -Before using the Netronome's DPDK PMD some NFP configuration,
> > -which is not related to DPDK, is required. The system requires
> > -installation of **Netronome's BSP (Board Support Package)** along
> > -with a specific NFP firmware application. Netronome's NSP ABI
> > -version should be 0.20 or higher.
> > +Before using the NFP DPDK PMD some NFP configuration, which is not related to
> > +DPDK, is required. The system requires installation of
> > +**NFP-BSP (Board Support Package)** along with a specific NFP firmware
> > +application. The NSP ABI version should be 0.20 or higher.
> >  
> > -If you have a NFP device you should already have the code and
> > -documentation for this configuration. Contact
> > -**support@netronome.com** to obtain the latest available firmware.
> > +If you have a NFP device you should already have the documentation to perform
> > +this configuration. Contact **support@netronome.com** (for Netronome products)
> > +or **smartnic-support@corigine.com** (for Corigine products) to obtain the
> > +latest available firmware.
> >  
> >  The NFP Linux netdev kernel driver for VFs has been a part of the
> >  vanilla kernel since kernel version 4.5, and support for the PF
> > @@ -44,11 +43,11 @@ Linux kernel driver.
> >  Building the software
> >  ---------------------
> >  
> > -Netronome's PMD code is provided in the **drivers/net/nfp** directory.
> > -Although NFP PMD has Netronome´s BSP dependencies, it is possible to
> > -compile it along with other DPDK PMDs even if no BSP was installed previously.
> > -Of course, a DPDK app will require such a BSP installed for using the
> > -NFP PMD, along with a specific NFP firmware application.
> > +The NFP PMD code is provided in the **drivers/net/nfp** directory. Although
> > +NFP PMD has BSP dependencies, it is possible to compile it along with other
> > +DPDK PMDs even if no BSP was installed previously. Of course, a DPDK app will
> > +require such a BSP installed for using the NFP PMD, along with a specific NFP
> > +firmware application.
> >  
> >  Once the DPDK is built all the DPDK apps and examples include support for
> >  the NFP PMD.
> > @@ -57,27 +56,20 @@ the NFP PMD.
> >  Driver compilation and testing
> >  ------------------------------
> >  
> > -Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
> > -for details.
> > +Refer to the document
> > +:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` for details.
> >  
> >  Using the PF
> >  ------------
> >  
> > -NFP PMD supports using the NFP PF as another DPDK port, but it does not
> > -have any functionality for controlling VFs. In fact, it is not possible to use
> > -the PMD with the VFs if the PF is being used by DPDK, that is, with the NFP PF
> > -bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK versions will
> > -have a PMD able to work with the PF and VFs at the same time and with the PF
> > -implementing VF management along with other PF-only functionalities/offloads.
> > -
> 
> Why this paragraph is removed? Is it because it is not correct anymore,
> or just because of document organization change.
> 
> >  The PMD PF has extra work to do which will delay the DPDK app initialization
> > -like uploading the firmware and configure the Link state properly when starting or
> > -stopping a PF port. Since DPDK 18.05 the firmware upload happens when
> > +like uploading the firmware and configure the Link state properly when starting
> > +or stopping a PF port. Since DPDK 18.05 the firmware upload happens when
> >  a PF is initialized, which was not always true with older DPDK versions.
> >  
> > -Depending on the Netronome product installed in the system, firmware files
> > -should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
> > -PF looks for a firmware file in this order:
> > +Depending on the product installed in the system, firmware files should be
> > +available under ``/lib/firmware/netronome``. DPDK PMD supporting the PF looks
> > +for a firmware file in this order:
> >  
> >  	1) First try to find a firmware image specific for this device using the
> >  	   NFP serial number:
> > @@ -92,18 +84,21 @@ PF looks for a firmware file in this order:
> >  
> >  		nic_AMDA0099-0001_2x25.nffw
> >  
> > -Netronome's software packages install firmware files under ``/lib/firmware/netronome``
> > -to support all the Netronome's SmartNICs and different firmware applications.
> > -This is usually done using file names based on SmartNIC type and media and with a
> > -directory per firmware application. Options 1 and 2 for firmware filenames allow
> > -more than one SmartNIC, same type of SmartNIC or different ones, and to upload a
> > -different firmware to each SmartNIC.
> > +Netronome and Corigine's software packages install firmware files under
> > +``/lib/firmware/netronome`` to support all the SmartNICs and different firmware
> > +applications. This is usually done using file names based on SmartNIC type and
> > +media and with a directory per firmware application. Options 1 and 2 for
> > +firmware filenames allow more than one SmartNIC, same type of SmartNIC or
> > +different ones, and to upload a different firmware to each SmartNIC.
> >  
> >     .. Note::
> > -      Currently the NFP PMD supports using the PF with Agilio Firmware with NFD3
> > -      and Agilio Firmware with NFDk. See https://help.netronome.com/support/solutions
> > +      Currently the NFP PMD supports using the PF with Agilio Firmware with
> > +      NFD3 and Agilio Firmware with NFDk. See
> > +      `Netronome Support <https://help.netronome.com/support/solutions>`_.
> >        for more information on the various firmwares supported by the Netronome
> > -      Agilio CX smartNIC.
> > +      Agilio SmartNICs range, or
> > +      `Corigine Support <https://www.corigine.com/productsOverviewList-30.html>`_.
> > +      for more information about Corigine's range.
> >  
> >  PF multiport support
> >  --------------------
> > @@ -118,7 +113,7 @@ this particular configuration requires the PMD to create ports in a special way,
> >  although once they are created, DPDK apps should be able to use them as normal
> >  PCI ports.
> >  
> > -NFP ports belonging to same PF can be seen inside PMD initialization with a
> > +NFP ports belonging to the same PF can be seen inside PMD initialization with a
> >  suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
> >  0000:03:00.0 and four ports is seen by the PMD code as:
> >  
> > @@ -137,50 +132,67 @@ suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
> >  PF multiprocess support
> >  -----------------------
> >  
> > -Due to how the driver needs to access the NFP through a CPP interface, which implies
> > -to use specific registers inside the chip, the number of secondary processes with PF
> > -ports is limited to only one.
> > +Due to how the driver needs to access the NFP through a CPP interface, which
> > +implies to use specific registers inside the chip, the number of secondary
> > +processes with PF ports is limited to only one.
> >  
> > -This limitation will be solved in future versions but having basic multiprocess support
> > -is important for allowing development and debugging through the PF using a secondary
> > -process which will create a CPP bridge for user space tools accessing the NFP.
> > +This limitation will be solved in future versions, but having basic
> > +multiprocess support is important for allowing development and debugging
> > +through the PF using a secondary process, which will create a CPP bridge
> > +for user space tools accessing the NFP.
> >  
> >  
> >  System configuration
> >  --------------------
> >  
> >  #. **Enable SR-IOV on the NFP device:** The current NFP PMD supports the PF and
> > -   the VFs on a NFP device. However, it is not possible to work with both at the
> > -   same time because the VFs require the PF being bound to the NFP PF Linux
> > -   netdev driver.  Make sure you are working with a kernel with NFP PF support or
> > -   get the drivers from the above Github repository and follow the instructions
> > -   for building and installing it.
> > +   the VFs on a NFP device. However, it is not possible to work with both at
> > +   the same time when using the netdev NFP Linux netdev driver.
> 
> Old and new text doesn't say same thing.
> Old one says: "For DPDK to support VF, PF needs to bound to kernel driver.:
> 
> Is this changed, or just wording mistake?
> 
> 
> >     It is possible
> > +   to bind the PF to the ``vfio-pci`` kernel module, and create VFs afterwards.
> > +   This requires loading the ``vfio-pci`` module with the following parameters:
> > +
> > +   .. code-block:: console
> > +
> > +      modprobe vfio-pci enable_sriov=1 disable_idle_d3=1
> > +
> > +   VFs need to be enabled before they can be used with the PMD. Before enabling
> > +   the VFs it is useful to obtain information about the current NFP PCI device
> > +   detected by the system. This can be done on Netronome SmartNICs using:
> > +
> > +   .. code-block:: console
> > +
> > +      lspci -d 19ee:
> >  
> 
> What I understand is, to support VF by DPDK two things are required:
> 1) Ability to create VFs, this can be done both by using device's kernel
> driver or 'vfio-pci'
> 2) PF driver should support managing VFs.
> 
> Above lines document about item (1) and how 'vfio-pci' is used for it.
> 
> But old documentation mentions about item (2) is missing, why that part
> removed, isn't it valid anymore? I mean is "PF -> kernel, VF -> DPDK"
> combination supported now?
> 
> 
> > -   VFs need to be enabled before they can be used with the PMD.
> > -   Before enabling the VFs it is useful to obtain information about the
> > -   current NFP PCI device detected by the system:
> > +   and on Corigine SmartNICs using:
> >  
> >     .. code-block:: console
> >  
> > -      lspci -d19ee:
> > +      lspci -d 1da8:
> >  
> > -   Now, for example, configure two virtual functions on a NFP-6xxx device
> > +   Now, for example, to configure two virtual functions on a NFP device
> >     whose PCI system identity is "0000:03:00.0":
> >  
> >     .. code-block:: console
> >  
> >        echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs
> >  
> > -   The result of this command may be shown using lspci again:
> > +   The result of this command may be shown using lspci again on Netronome
> > +   SmartNICs:
> > +
> > +   .. code-block:: console
> > +
> > +      lspci -d 19ee: -k
> > +
> > +   and on Corigine SmartNICs:
> >  
> >     .. code-block:: console
> >  
> > -      lspci -d19ee: -k
> > +      lspci -d 1da8: -k
> >  
> >     Two new PCI devices should appear in the output of the above command. The
> > -   -k option shows the device driver, if any, that devices are bound to.
> > -   Depending on the modules loaded at this point the new PCI devices may be
> > -   bound to nfp_netvf driver.
> > +   -k option shows the device driver, if any, that the devices are bound to.
> > +   Depending on the modules loaded, at this point the new PCI devices may be
> > +   bound to the ``nfp`` kernel driver or ``vfio-pci``.
> >  
> >  
> >  Flow offload
> > @@ -193,13 +205,13 @@ The flower firmware application requires the PMD running two services:
> >  
> >  	* PF vNIC service: handling the feedback traffic.
> >  	* ctrl vNIC service: communicate between PMD and firmware through
> > -	  control message.
> > +	  control messages.
> >  
> >  To achieve the offload of flow, the representor ports are exposed to OVS.
> > -The flower firmware application support representor port for VF and physical
> > -port. There will always exist a representor port for each physical port,
> > -and the number of the representor port for VF is specified by the user through
> > -parameter.
> > +The flower firmware application supports VF, PF, and physical port representor
> > +ports. 
> 
> Again old document and new one is not saying same thing, is it intentional?
> 
> Old one says: "Having representor ports for both VF and PF is supported."
> 
> New one says: "FW supports representor port, VF and PF."
> 
> > There will always exist a representor port for a PF and each physical
> > +port. The number of the representor ports for VFs are specified by the user
> > +through a parameter.
> >  
> >  In the Rx direction, the flower firmware application will prepend the input
> >  port information into metadata for each packet which can't offloaded. The PF
> > @@ -207,12 +219,12 @@ vNIC service will keep polling packets from the firmware, and multiplex them
> >  to the corresponding representor port.
> >  
> >  In the Tx direction, the representor port will prepend the output port
> > -information into metadata for each packet, and then send it to firmware through
> > -PF vNIC.
> > +information into metadata for each packet, and then send it to the firmware
> > +through the PF vNIC.
> >  
> > -The ctrl vNIC service handling various control message, like the creation and
> > -configuration of representor port, the pattern and action of flow rules, the
> > -statistics of flow rules, and so on.
> > +The ctrl vNIC service handles various control messages, for example, the
> > +creation and configuration of a representor port, the pattern and action of
> > +flow rules, the statistics of flow rules, etc.
> >  
> >  Metadata Format
> >  ---------------
> 

-- 
Kind Regards,
Niklas Söderlund

^ permalink raw reply	[relevance 0%]

* [PATCH v7 21/22] hash: move rte_hash_set_alg out header
  2023-02-15 17:23  3% ` [PATCH v7 00/22] Replace use of static logtypes in libraries Stephen Hemminger
@ 2023-02-15 17:23  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-15 17:23 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v7 00/22] Replace use of static logtypes in libraries
                     ` (2 preceding siblings ...)
  2023-02-14 22:47  3% ` [PATCH v6 00/22] Replace use of static logtypes in libraries Stephen Hemminger
@ 2023-02-15 17:23  3% ` Stephen Hemminger
  2023-02-15 17:23  3%   ` [PATCH v7 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-20 23:35  3% ` [PATCH v8 00/22] Convert static logtypes in libraries Stephen Hemminger
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-15 17:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v7 - fix commit message typ
     add error to gso_segment function doc
     fix missing cpuflags.h on arm

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  3 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/gso/rte_gso.h                 |  1 +
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  5 +++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  3 ++
 lib/mempool/rte_mempool_log.h     |  4 ++
 lib/mempool/rte_mempool_ops.c     |  1 +
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/power/rte_power_empty_poll.c  |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 74 files changed, 378 insertions(+), 169 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/mempool/rte_mempool_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v6 01/22] gso: don't log message on non TCP/UDP
  2023-02-15  7:26  3%     ` Hu, Jiayu
@ 2023-02-15 17:12  0%       ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-15 17:12 UTC (permalink / raw)
  To: Hu, Jiayu; +Cc: dev, Konstantin Ananyev, Mark Kavanagh

On Wed, 15 Feb 2023 07:26:22 +0000
"Hu, Jiayu" <jiayu.hu@intel.com> wrote:

> > -----Original Message-----
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > Sent: Wednesday, February 15, 2023 6:47 AM
> > To: dev@dpdk.org
> > Cc: Stephen Hemminger <stephen@networkplumber.org>; Hu, Jiayu
> > <jiayu.hu@intel.com>; Konstantin Ananyev
> > <konstantin.v.ananyev@yandex.ru>; Mark Kavanagh
> > <mark.b.kavanagh@intel.com>
> > Subject: [PATCH v6 01/22] gso: don't log message on non TCP/UDP
> > 
> > If a large packet is passed into GSO routines of unknown protocol then library
> > would log a message.
> > Better to tell the application instead of logging.
> > 
> > Fixes: 119583797b6a ("gso: support TCP/IPv4 GSO")
> > Cc: jiayu.hu@intel.com
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  lib/gso/rte_gso.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/gso/rte_gso.c b/lib/gso/rte_gso.c index
> > 4b59217c16ee..c8e67c2d4b48 100644
> > --- a/lib/gso/rte_gso.c
> > +++ b/lib/gso/rte_gso.c
> > @@ -80,9 +80,8 @@ rte_gso_segment(struct rte_mbuf *pkt,
> >  		ret = gso_udp4_segment(pkt, gso_size, direct_pool,
> >  				indirect_pool, pkts_out, nb_pkts_out);
> >  	} else {
> > -		/* unsupported packet, skip */
> > -		RTE_LOG(DEBUG, GSO, "Unsupported packet type\n");
> > -		ret = 0;
> > +		ret = -ENOTSUP;	/* only UDP or TCP allowed */
> > +  
> 
> The function signature annotation in rte_gso.h also needs update for ENOTSUP.
> In addition, will it break ABI? 

Not really, if anybody hits this error case, nothing good would have
been happening.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: update NFP documentation with Corigine information
  @ 2023-02-15 13:37  0% ` Ferruh Yigit
  2023-02-15 17:58  0%   ` Niklas Söderlund
    1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-02-15 13:37 UTC (permalink / raw)
  To: Chaoyong He, dev; +Cc: oss-drivers, niklas.soderlund, Walter Heymans

On 2/3/2023 8:08 AM, Chaoyong He wrote:
> From: Walter Heymans <walter.heymans@corigine.com>
> 
> The NFP PMD documentation is updated to include information about
> Corigine and their new vendor device ID.
> 
> Outdated information regarding the use of the PMD is also updated.
> 
> While making major changes to the document, the maximum number of
> characters per line is updated to 80 characters to improve the
> readability in raw format.
> 

There are three groups of changes done to documentation as explained in
three paragraphs above.

To help review, is it possible to separate this patch into three
patches? Later they can be squashed and merged as a single patch.
But as it is, easy to miss content changes among formatting changes.

(You can include simple grammar updates (that doesn't change either
content or Corigine related information) to formatting update patch)


> Signed-off-by: Walter Heymans <walter.heymans@corigine.com>
> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
> Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
> ---
>  doc/guides/nics/nfp.rst | 168 +++++++++++++++++++++-------------------
>  1 file changed, 90 insertions(+), 78 deletions(-)
> 
> diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
> index a085d7d9ae..6fea280411 100644
> --- a/doc/guides/nics/nfp.rst
> +++ b/doc/guides/nics/nfp.rst
> @@ -1,35 +1,34 @@
>  ..  SPDX-License-Identifier: BSD-3-Clause
>      Copyright(c) 2015-2017 Netronome Systems, Inc. All rights reserved.
> -    All rights reserved.
> +    Copyright(c) 2021 Corigine, Inc. All rights reserved.
>  
>  NFP poll mode driver library
>  ============================
>  
> -Netronome's sixth generation of flow processors pack 216 programmable
> -cores and over 100 hardware accelerators that uniquely combine packet,
> -flow, security and content processing in a single device that scales
> +Netronome and Corigine's sixth generation of flow processors pack 216
> +programmable cores and over 100 hardware accelerators that uniquely combine
> +packet, flow, security and content processing in a single device that scales
>  up to 400-Gb/s.
>  
> -This document explains how to use DPDK with the Netronome Poll Mode
> -Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
> -(NFP-6xxx), Netronome's Network Flow Processor 4xxx (NFP-4xxx) and
> -Netronome's Network Flow Processor 38xx (NFP-38xx).
> +This document explains how to use DPDK with the Network Flow Processor (NFP)
> +Poll Mode Driver (PMD) supporting Netronome and Corigine's NFP-6xxx, NFP-4xxx
> +and NFP-38xx product lines.
>  
> -NFP is a SRIOV capable device and the PMD supports the physical
> -function (PF) and the virtual functions (VFs).
> +NFP is a SR-IOV capable device and the PMD supports the physical function (PF)
> +and the virtual functions (VFs).
>  
>  Dependencies
>  ------------
>  
> -Before using the Netronome's DPDK PMD some NFP configuration,
> -which is not related to DPDK, is required. The system requires
> -installation of **Netronome's BSP (Board Support Package)** along
> -with a specific NFP firmware application. Netronome's NSP ABI
> -version should be 0.20 or higher.
> +Before using the NFP DPDK PMD some NFP configuration, which is not related to
> +DPDK, is required. The system requires installation of
> +**NFP-BSP (Board Support Package)** along with a specific NFP firmware
> +application. The NSP ABI version should be 0.20 or higher.
>  
> -If you have a NFP device you should already have the code and
> -documentation for this configuration. Contact
> -**support@netronome.com** to obtain the latest available firmware.
> +If you have a NFP device you should already have the documentation to perform
> +this configuration. Contact **support@netronome.com** (for Netronome products)
> +or **smartnic-support@corigine.com** (for Corigine products) to obtain the
> +latest available firmware.
>  
>  The NFP Linux netdev kernel driver for VFs has been a part of the
>  vanilla kernel since kernel version 4.5, and support for the PF
> @@ -44,11 +43,11 @@ Linux kernel driver.
>  Building the software
>  ---------------------
>  
> -Netronome's PMD code is provided in the **drivers/net/nfp** directory.
> -Although NFP PMD has Netronome´s BSP dependencies, it is possible to
> -compile it along with other DPDK PMDs even if no BSP was installed previously.
> -Of course, a DPDK app will require such a BSP installed for using the
> -NFP PMD, along with a specific NFP firmware application.
> +The NFP PMD code is provided in the **drivers/net/nfp** directory. Although
> +NFP PMD has BSP dependencies, it is possible to compile it along with other
> +DPDK PMDs even if no BSP was installed previously. Of course, a DPDK app will
> +require such a BSP installed for using the NFP PMD, along with a specific NFP
> +firmware application.
>  
>  Once the DPDK is built all the DPDK apps and examples include support for
>  the NFP PMD.
> @@ -57,27 +56,20 @@ the NFP PMD.
>  Driver compilation and testing
>  ------------------------------
>  
> -Refer to the document :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`
> -for details.
> +Refer to the document
> +:ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>` for details.
>  
>  Using the PF
>  ------------
>  
> -NFP PMD supports using the NFP PF as another DPDK port, but it does not
> -have any functionality for controlling VFs. In fact, it is not possible to use
> -the PMD with the VFs if the PF is being used by DPDK, that is, with the NFP PF
> -bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK versions will
> -have a PMD able to work with the PF and VFs at the same time and with the PF
> -implementing VF management along with other PF-only functionalities/offloads.
> -

Why this paragraph is removed? Is it because it is not correct anymore,
or just because of document organization change.

>  The PMD PF has extra work to do which will delay the DPDK app initialization
> -like uploading the firmware and configure the Link state properly when starting or
> -stopping a PF port. Since DPDK 18.05 the firmware upload happens when
> +like uploading the firmware and configure the Link state properly when starting
> +or stopping a PF port. Since DPDK 18.05 the firmware upload happens when
>  a PF is initialized, which was not always true with older DPDK versions.
>  
> -Depending on the Netronome product installed in the system, firmware files
> -should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
> -PF looks for a firmware file in this order:
> +Depending on the product installed in the system, firmware files should be
> +available under ``/lib/firmware/netronome``. DPDK PMD supporting the PF looks
> +for a firmware file in this order:
>  
>  	1) First try to find a firmware image specific for this device using the
>  	   NFP serial number:
> @@ -92,18 +84,21 @@ PF looks for a firmware file in this order:
>  
>  		nic_AMDA0099-0001_2x25.nffw
>  
> -Netronome's software packages install firmware files under ``/lib/firmware/netronome``
> -to support all the Netronome's SmartNICs and different firmware applications.
> -This is usually done using file names based on SmartNIC type and media and with a
> -directory per firmware application. Options 1 and 2 for firmware filenames allow
> -more than one SmartNIC, same type of SmartNIC or different ones, and to upload a
> -different firmware to each SmartNIC.
> +Netronome and Corigine's software packages install firmware files under
> +``/lib/firmware/netronome`` to support all the SmartNICs and different firmware
> +applications. This is usually done using file names based on SmartNIC type and
> +media and with a directory per firmware application. Options 1 and 2 for
> +firmware filenames allow more than one SmartNIC, same type of SmartNIC or
> +different ones, and to upload a different firmware to each SmartNIC.
>  
>     .. Note::
> -      Currently the NFP PMD supports using the PF with Agilio Firmware with NFD3
> -      and Agilio Firmware with NFDk. See https://help.netronome.com/support/solutions
> +      Currently the NFP PMD supports using the PF with Agilio Firmware with
> +      NFD3 and Agilio Firmware with NFDk. See
> +      `Netronome Support <https://help.netronome.com/support/solutions>`_.
>        for more information on the various firmwares supported by the Netronome
> -      Agilio CX smartNIC.
> +      Agilio SmartNICs range, or
> +      `Corigine Support <https://www.corigine.com/productsOverviewList-30.html>`_.
> +      for more information about Corigine's range.
>  
>  PF multiport support
>  --------------------
> @@ -118,7 +113,7 @@ this particular configuration requires the PMD to create ports in a special way,
>  although once they are created, DPDK apps should be able to use them as normal
>  PCI ports.
>  
> -NFP ports belonging to same PF can be seen inside PMD initialization with a
> +NFP ports belonging to the same PF can be seen inside PMD initialization with a
>  suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
>  0000:03:00.0 and four ports is seen by the PMD code as:
>  
> @@ -137,50 +132,67 @@ suffix added to the PCI ID: wwww:xx:yy.z_portn. For example, a PF with PCI ID
>  PF multiprocess support
>  -----------------------
>  
> -Due to how the driver needs to access the NFP through a CPP interface, which implies
> -to use specific registers inside the chip, the number of secondary processes with PF
> -ports is limited to only one.
> +Due to how the driver needs to access the NFP through a CPP interface, which
> +implies to use specific registers inside the chip, the number of secondary
> +processes with PF ports is limited to only one.
>  
> -This limitation will be solved in future versions but having basic multiprocess support
> -is important for allowing development and debugging through the PF using a secondary
> -process which will create a CPP bridge for user space tools accessing the NFP.
> +This limitation will be solved in future versions, but having basic
> +multiprocess support is important for allowing development and debugging
> +through the PF using a secondary process, which will create a CPP bridge
> +for user space tools accessing the NFP.
>  
>  
>  System configuration
>  --------------------
>  
>  #. **Enable SR-IOV on the NFP device:** The current NFP PMD supports the PF and
> -   the VFs on a NFP device. However, it is not possible to work with both at the
> -   same time because the VFs require the PF being bound to the NFP PF Linux
> -   netdev driver.  Make sure you are working with a kernel with NFP PF support or
> -   get the drivers from the above Github repository and follow the instructions
> -   for building and installing it.
> +   the VFs on a NFP device. However, it is not possible to work with both at
> +   the same time when using the netdev NFP Linux netdev driver.

Old and new text doesn't say same thing.
Old one says: "For DPDK to support VF, PF needs to bound to kernel driver.:

Is this changed, or just wording mistake?


>     It is possible
> +   to bind the PF to the ``vfio-pci`` kernel module, and create VFs afterwards.
> +   This requires loading the ``vfio-pci`` module with the following parameters:
> +
> +   .. code-block:: console
> +
> +      modprobe vfio-pci enable_sriov=1 disable_idle_d3=1
> +
> +   VFs need to be enabled before they can be used with the PMD. Before enabling
> +   the VFs it is useful to obtain information about the current NFP PCI device
> +   detected by the system. This can be done on Netronome SmartNICs using:
> +
> +   .. code-block:: console
> +
> +      lspci -d 19ee:
>  

What I understand is, to support VF by DPDK two things are required:
1) Ability to create VFs, this can be done both by using device's kernel
driver or 'vfio-pci'
2) PF driver should support managing VFs.

Above lines document about item (1) and how 'vfio-pci' is used for it.

But old documentation mentions about item (2) is missing, why that part
removed, isn't it valid anymore? I mean is "PF -> kernel, VF -> DPDK"
combination supported now?


> -   VFs need to be enabled before they can be used with the PMD.
> -   Before enabling the VFs it is useful to obtain information about the
> -   current NFP PCI device detected by the system:
> +   and on Corigine SmartNICs using:
>  
>     .. code-block:: console
>  
> -      lspci -d19ee:
> +      lspci -d 1da8:
>  
> -   Now, for example, configure two virtual functions on a NFP-6xxx device
> +   Now, for example, to configure two virtual functions on a NFP device
>     whose PCI system identity is "0000:03:00.0":
>  
>     .. code-block:: console
>  
>        echo 2 > /sys/bus/pci/devices/0000:03:00.0/sriov_numvfs
>  
> -   The result of this command may be shown using lspci again:
> +   The result of this command may be shown using lspci again on Netronome
> +   SmartNICs:
> +
> +   .. code-block:: console
> +
> +      lspci -d 19ee: -k
> +
> +   and on Corigine SmartNICs:
>  
>     .. code-block:: console
>  
> -      lspci -d19ee: -k
> +      lspci -d 1da8: -k
>  
>     Two new PCI devices should appear in the output of the above command. The
> -   -k option shows the device driver, if any, that devices are bound to.
> -   Depending on the modules loaded at this point the new PCI devices may be
> -   bound to nfp_netvf driver.
> +   -k option shows the device driver, if any, that the devices are bound to.
> +   Depending on the modules loaded, at this point the new PCI devices may be
> +   bound to the ``nfp`` kernel driver or ``vfio-pci``.
>  
>  
>  Flow offload
> @@ -193,13 +205,13 @@ The flower firmware application requires the PMD running two services:
>  
>  	* PF vNIC service: handling the feedback traffic.
>  	* ctrl vNIC service: communicate between PMD and firmware through
> -	  control message.
> +	  control messages.
>  
>  To achieve the offload of flow, the representor ports are exposed to OVS.
> -The flower firmware application support representor port for VF and physical
> -port. There will always exist a representor port for each physical port,
> -and the number of the representor port for VF is specified by the user through
> -parameter.
> +The flower firmware application supports VF, PF, and physical port representor
> +ports. 

Again old document and new one is not saying same thing, is it intentional?

Old one says: "Having representor ports for both VF and PF is supported."

New one says: "FW supports representor port, VF and PF."

> There will always exist a representor port for a PF and each physical
> +port. The number of the representor ports for VFs are specified by the user
> +through a parameter.
>  
>  In the Rx direction, the flower firmware application will prepend the input
>  port information into metadata for each packet which can't offloaded. The PF
> @@ -207,12 +219,12 @@ vNIC service will keep polling packets from the firmware, and multiplex them
>  to the corresponding representor port.
>  
>  In the Tx direction, the representor port will prepend the output port
> -information into metadata for each packet, and then send it to firmware through
> -PF vNIC.
> +information into metadata for each packet, and then send it to the firmware
> +through the PF vNIC.
>  
> -The ctrl vNIC service handling various control message, like the creation and
> -configuration of representor port, the pattern and action of flow rules, the
> -statistics of flow rules, and so on.
> +The ctrl vNIC service handles various control messages, for example, the
> +creation and configuration of a representor port, the pattern and action of
> +flow rules, the statistics of flow rules, etc.
>  
>  Metadata Format
>  ---------------


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
  2023-02-15  1:59  3%     ` fengchengwen
@ 2023-02-15 11:57  3%       ` Bruce Richardson
  2023-02-16  1:24  0%         ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2023-02-15 11:57 UTC (permalink / raw)
  To: fengchengwen; +Cc: dev, Kevin Laatz

On Wed, Feb 15, 2023 at 09:59:06AM +0800, fengchengwen wrote:
> On 2023/1/17 1:37, Bruce Richardson wrote:
> > Validate device operation when a device is stopped or restarted.
> > 
> > The only complication - and gap in the dmadev ABI specification - is
> > what happens to the job ids on restart. Some drivers reset them to 0,
> > while others continue where things left off. Take account of both
> > possibilities in the test case.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> ---
> > app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 46 insertions(+)
> > 
> > diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c index
> > de787c14e2..8fb73a41e2 100644 --- a/app/test/test_dmadev.c +++
> > b/app/test/test_dmadev.c @@ -304,6 +304,48 @@
> > test_enqueue_copies(int16_t dev_id, uint16_t vchan) ||
> > do_multi_copies(dev_id, vchan, 0, 0, 1); }
> >  
> > +static int +test_stop_start(int16_t dev_id, uint16_t vchan) +{ +	/*
> > device is already started on input, should be (re)started on output */
> > + +	uint16_t id = 0; +	enum rte_dma_status_code status =
> > RTE_DMA_STATUS_SUCCESSFUL; + +	/* - test stopping a device works
> > ok, +	 * - then do a start-stop without doing a copy +	 *
> > - finally restart the device +	 * checking for errors at each
> > stage, and validating we can still copy at the end.  +	 */ +	if
> > (rte_dma_stop(dev_id) < 0) +		ERR_RETURN("Error stopping
> > device\n"); + +	if (rte_dma_start(dev_id) < 0) +
> > ERR_RETURN("Error restarting device\n"); +	if (rte_dma_stop(dev_id) <
> > 0) +		ERR_RETURN("Error stopping device after restart (no
> > jobs executed)\n"); + +	if (rte_dma_start(dev_id) < 0) +
> > ERR_RETURN("Error restarting device after multiple stop-starts\n"); + +
> > /* before doing a copy, we need to know what the next id will be it
> > should +	 * either be: +	 * - the last completed job before start if
> > driver does not reset id on stop +	 * - or -1 i.e. next job is 0, if
> > driver does reset the job ids on stop +	 */ +	if
> > (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0) +
> > ERR_RETURN("Error with rte_dma_completed_status when no job done\n"); +
> > id += 1; /* id_count is next job id */ +	if (id != id_count && id !=
> > 0) +		ERR_RETURN("Unexpected next id from device after
> > stop-start. Got %u, expected %u or 0\n", +				id,
> > id_count);
> 
> Hi Bruce,
> 
> Suggest add a warn LOG to identify the id was not reset zero.  So that
> new driver could detect break ABI specification.
> 
Not sure that that is necessary. The actual ABI, nor the doxygen docs,
doesn't specify what happens to the values on doing stop and then start. My
thinking was that it should continue numbering as it would be equivalent to
suspend and resume, but other drivers appear to treat it as a "reset". I
suspect there are advantages and disadvantages to both schemes. Until we
decide on what the correct behaviour should be - or decide to allow both -
I don't think warning is the right thing to do here.

/Bruce

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v6 01/22] gso: don't log message on non TCP/UDP
  @ 2023-02-15  7:26  3%     ` Hu, Jiayu
  2023-02-15 17:12  0%       ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Hu, Jiayu @ 2023-02-15  7:26 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Konstantin Ananyev, Mark Kavanagh



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, February 15, 2023 6:47 AM
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Hu, Jiayu
> <jiayu.hu@intel.com>; Konstantin Ananyev
> <konstantin.v.ananyev@yandex.ru>; Mark Kavanagh
> <mark.b.kavanagh@intel.com>
> Subject: [PATCH v6 01/22] gso: don't log message on non TCP/UDP
> 
> If a large packet is passed into GSO routines of unknown protocol then library
> would log a message.
> Better to tell the application instead of logging.
> 
> Fixes: 119583797b6a ("gso: support TCP/IPv4 GSO")
> Cc: jiayu.hu@intel.com
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/gso/rte_gso.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/gso/rte_gso.c b/lib/gso/rte_gso.c index
> 4b59217c16ee..c8e67c2d4b48 100644
> --- a/lib/gso/rte_gso.c
> +++ b/lib/gso/rte_gso.c
> @@ -80,9 +80,8 @@ rte_gso_segment(struct rte_mbuf *pkt,
>  		ret = gso_udp4_segment(pkt, gso_size, direct_pool,
>  				indirect_pool, pkts_out, nb_pkts_out);
>  	} else {
> -		/* unsupported packet, skip */
> -		RTE_LOG(DEBUG, GSO, "Unsupported packet type\n");
> -		ret = 0;
> +		ret = -ENOTSUP;	/* only UDP or TCP allowed */
> +

The function signature annotation in rte_gso.h also needs update for ENOTSUP.
In addition, will it break ABI? 

Thanks,
Jiayu
>  	}
> 
>  	if (ret < 0) {
> --
> 2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
    2023-02-14 16:04  0%     ` Kevin Laatz
@ 2023-02-15  1:59  3%     ` fengchengwen
  2023-02-15 11:57  3%       ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: fengchengwen @ 2023-02-15  1:59 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Kevin Laatz

On 2023/1/17 1:37, Bruce Richardson wrote:
> Validate device operation when a device is stopped or restarted.
> 
> The only complication - and gap in the dmadev ABI specification - is
> what happens to the job ids on restart. Some drivers reset them to 0,
> while others continue where things left off. Take account of both
> possibilities in the test case.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/app/test/test_dmadev.c b/app/test/test_dmadev.c
> index de787c14e2..8fb73a41e2 100644
> --- a/app/test/test_dmadev.c
> +++ b/app/test/test_dmadev.c
> @@ -304,6 +304,48 @@ test_enqueue_copies(int16_t dev_id, uint16_t vchan)
>  			|| do_multi_copies(dev_id, vchan, 0, 0, 1);
>  }
>  
> +static int
> +test_stop_start(int16_t dev_id, uint16_t vchan)
> +{
> +	/* device is already started on input, should be (re)started on output */
> +
> +	uint16_t id = 0;
> +	enum rte_dma_status_code status = RTE_DMA_STATUS_SUCCESSFUL;
> +
> +	/* - test stopping a device works ok,
> +	 * - then do a start-stop without doing a copy
> +	 * - finally restart the device
> +	 * checking for errors at each stage, and validating we can still copy at the end.
> +	 */
> +	if (rte_dma_stop(dev_id) < 0)
> +		ERR_RETURN("Error stopping device\n");
> +
> +	if (rte_dma_start(dev_id) < 0)
> +		ERR_RETURN("Error restarting device\n");
> +	if (rte_dma_stop(dev_id) < 0)
> +		ERR_RETURN("Error stopping device after restart (no jobs executed)\n");
> +
> +	if (rte_dma_start(dev_id) < 0)
> +		ERR_RETURN("Error restarting device after multiple stop-starts\n");
> +
> +	/* before doing a copy, we need to know what the next id will be it should
> +	 * either be:
> +	 * - the last completed job before start if driver does not reset id on stop
> +	 * - or -1 i.e. next job is 0, if driver does reset the job ids on stop
> +	 */
> +	if (rte_dma_completed_status(dev_id, vchan, 1, &id, &status) != 0)
> +		ERR_RETURN("Error with rte_dma_completed_status when no job done\n");
> +	id += 1; /* id_count is next job id */
> +	if (id != id_count && id != 0)
> +		ERR_RETURN("Unexpected next id from device after stop-start. Got %u, expected %u or 0\n",
> +				id, id_count);

Hi Bruce,

Suggest add a warn LOG to identify the id was not reset zero.
So that new driver could detect break ABI specification.

Thanks.


^ permalink raw reply	[relevance 3%]

* [PATCH v6 21/22] hash: move rte_hash_set_alg out header
  2023-02-14 22:47  3% ` [PATCH v6 00/22] Replace use of static logtypes in libraries Stephen Hemminger
  @ 2023-02-14 22:47  3%   ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-14 22:47 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v6 00/22] Replace use of static logtypes in libraries
    2023-02-13 19:55  3% ` [PATCH v4 00/19] Replace use of static logtypes Stephen Hemminger
  2023-02-14  2:18  3% ` [PATCH v5 00/22] Replace us of static logtypes Stephen Hemminger
@ 2023-02-14 22:47  3% ` Stephen Hemminger
    2023-02-14 22:47  3%   ` [PATCH v6 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-15 17:23  3% ` [PATCH v7 00/22] Replace use of static logtypes in libraries Stephen Hemminger
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 200+ results
From: Stephen Hemminger @ 2023-02-14 22:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v6 - fix typo in kni port 

v5 - fix use of LOGTYPE PORT and POWER in examples

v4 - use simpler/shorter method for setting local LOGTYPE
     split up steps of some of the changes

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  3 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  3 ++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  3 ++
 lib/mempool/rte_mempool_log.h     |  4 ++
 lib/mempool/rte_mempool_ops.c     |  1 +
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/power/rte_power_empty_poll.c  |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 73 files changed, 375 insertions(+), 169 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/mempool/rte_mempool_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev
  @ 2023-02-14 16:04  0%     ` Kevin Laatz
  2023-02-15  1:59  3%     ` fengchengwen
  1 sibling, 0 replies; 200+ results
From: Kevin Laatz @ 2023-02-14 16:04 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Chengwen Feng

On 16/01/2023 17:37, Bruce Richardson wrote:
> Validate device operation when a device is stopped or restarted.
>
> The only complication - and gap in the dmadev ABI specification - is
> what happens to the job ids on restart. Some drivers reset them to 0,
> while others continue where things left off. Take account of both
> possibilities in the test case.
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>   app/test/test_dmadev.c | 46 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 46 insertions(+)
>
Acked-by: Kevin Laatz <kevin.laatz@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-14  9:38  0%       ` Jiawei(Jonny) Wang
@ 2023-02-14 10:01  0%         ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2023-02-14 10:01 UTC (permalink / raw)
  To: Jiawei(Jonny) Wang, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang
  Cc: dev, Raslan Darawsheh

On 2/14/2023 9:38 AM, Jiawei(Jonny) Wang wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, February 10, 2023 3:45 AM
>> To: Jiawei(Jonny) Wang <jiaweiw@nvidia.com>; Slava Ovsiienko
>> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
>> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
>> andrew.rybchenko@oktetlabs.ru; Aman Singh <aman.deep.singh@intel.com>;
>> Yuying Zhang <yuying.zhang@intel.com>
>> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
>> Subject: Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue
>> API
>>
>> On 2/3/2023 1:33 PM, Jiawei Wang wrote:
>>> When multiple physical ports are connected to a single DPDK port,
>>> (example: kernel bonding, DPDK bonding, failsafe, etc.), we want to
>>> know which physical port is used for Rx and Tx.
>>>
>>
>> I assume "kernel bonding" is out of context, but this patch concerns DPDK
>> bonding, failsafe or softnic. (I will refer them as virtual bonding
>> device.)
>>
>> To use specific queues of the virtual bonding device may interfere with the
>> logic of these devices, like bonding modes or RSS of the underlying devices. I
>> can see feature focuses on a very specific use case, but not sure if all possible
>> side effects taken into consideration.
>>
>>
>> And although the feature is only relavent to virtual bondiong device, core
>> ethdev structures are updated for this. Most use cases won't need these, so is
>> there a way to reduce the scope of the changes to virtual bonding devices?
>>
>>
>> There are a few very core ethdev APIs, like:
>> rte_eth_dev_configure()
>> rte_eth_tx_queue_setup()
>> rte_eth_rx_queue_setup()
>> rte_eth_dev_start()
>> rte_eth_dev_info_get()
>>
>> Almost every user of ehtdev uses these APIs, since these are so fundemental I
>> am for being a little more conservative on these APIs.
>>
>> Every eccentric features are targetting these APIs first because they are
>> common and extending them gives an easy solution, but in long run making
>> these APIs more complex, harder to maintain and harder for PMDs to support
>> them correctly. So I am for not updating them unless it is a generic use case.
>>
>>
>> Also as we talked about PMDs supporting them, I assume your coming PMD
>> patch will be implementing 'tx_phy_affinity' config option only for mlx drivers.
>> What will happen for other NICs? Will they silently ignore the config option
>> from user? So this is a problem for the DPDK application portabiltiy.
>>
>>
>>
>> As far as I understand target is application controlling which sub-device is used
>> under the virtual bonding device, can you pleaes give more information why
>> this is required, perhaps it can help to provide a better/different solution.
>> Like adding the ability to use both bonding device and sub-device for data path,
>> this way application can use whichever it wants. (this is just first solution I
>> come with, I am not suggesting as replacement solution, but if you can describe
>> the problem more I am sure other people can come with better solutions.)
>>
>> And isn't this against the applicatio transparent to underneath device being
>> bonding device or actual device?
>>
>>
> 
> OK, I will send the new version with separate functions in ethdev layer, 
> to support the Map a Tx queue to port and get the number of ports.
> And these functions work with device ops callback, other NICs will reported
> The unsupported the ops callback is NULL.
> 

OK, thanks Jonny, at least this separates the fetaure to its own APIs
which reduces the impact for applications and drivers that are not using
this feature.


>>> This patch maps a DPDK Tx queue with a physical port, by adding
>>> tx_phy_affinity setting in Tx queue.
>>> The affinity number is the physical port ID where packets will be
>>> sent.
>>> Value 0 means no affinity and traffic could be routed to any connected
>>> physical ports, this is the default current behavior.
>>>
>>> The number of physical ports is reported with rte_eth_dev_info_get().
>>>
>>> The new tx_phy_affinity field is added into the padding hole of
>>> rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
>>> An ABI check rule needs to be added to avoid false warning.
>>>
>>> Add the testpmd command line:
>>> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
>>>
>>> For example, there're two physical ports connected to a single DPDK
>>> port (port id 0), and phy_affinity 1 stood for the first physical port
>>> and phy_affinity 2 stood for the second physical port.
>>> Use the below commands to config tx phy affinity for per Tx Queue:
>>>         port config 0 txq 0 phy_affinity 1
>>>         port config 0 txq 1 phy_affinity 1
>>>         port config 0 txq 2 phy_affinity 2
>>>         port config 0 txq 3 phy_affinity 2
>>>
>>> These commands config the Tx Queue index 0 and Tx Queue index 1 with
>>> phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets, these
>>> packets will be sent from the first physical port, and similar with
>>> the second physical port if sending packets with Tx Queue 2 or Tx
>>> Queue 3.
>>>
>>> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
>>> ---
>>>  app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
>>>  app/test-pmd/config.c                       |   1 +
>>>  devtools/libabigail.abignore                |   5 +
>>>  doc/guides/rel_notes/release_23_03.rst      |   4 +
>>>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
>>>  lib/ethdev/rte_ethdev.h                     |  10 ++
>>>  6 files changed, 133 insertions(+)
>>>
>>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
>>> cb8c174020..f771fcf8ac 100644
>>> --- a/app/test-pmd/cmdline.c
>>> +++ b/app/test-pmd/cmdline.c
>>> @@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void
>>> *parsed_result,
>>>
>>>  			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
>>>  			"    Cleanup txq mbufs for a specific Tx queue\n\n"
>>> +
>>> +			"port config (port_id) txq (queue_id) phy_affinity
>> (value)\n"
>>> +			"    Set the physical affinity value "
>>> +			"on a specific Tx queue\n\n"
>>>  		);
>>>  	}
>>>
>>> @@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t
>> cmd_show_port_flow_transfer_proxy = {
>>>  	}
>>>  };
>>>
>>> +/* *** configure port txq phy_affinity value *** */ struct
>>> +cmd_config_tx_phy_affinity {
>>> +	cmdline_fixed_string_t port;
>>> +	cmdline_fixed_string_t config;
>>> +	portid_t portid;
>>> +	cmdline_fixed_string_t txq;
>>> +	uint16_t qid;
>>> +	cmdline_fixed_string_t phy_affinity;
>>> +	uint8_t value;
>>> +};
>>> +
>>> +static void
>>> +cmd_config_tx_phy_affinity_parsed(void *parsed_result,
>>> +				  __rte_unused struct cmdline *cl,
>>> +				  __rte_unused void *data)
>>> +{
>>> +	struct cmd_config_tx_phy_affinity *res = parsed_result;
>>> +	struct rte_eth_dev_info dev_info;
>>> +	struct rte_port *port;
>>> +	int ret;
>>> +
>>> +	if (port_id_is_invalid(res->portid, ENABLED_WARN))
>>> +		return;
>>> +
>>> +	if (res->portid == (portid_t)RTE_PORT_ALL) {
>>> +		printf("Invalid port id\n");
>>> +		return;
>>> +	}
>>> +
>>> +	port = &ports[res->portid];
>>> +
>>> +	if (strcmp(res->txq, "txq")) {
>>> +		printf("Unknown parameter\n");
>>> +		return;
>>> +	}
>>> +	if (tx_queue_id_is_invalid(res->qid))
>>> +		return;
>>> +
>>> +	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
>>> +	if (ret != 0)
>>> +		return;
>>> +
>>> +	if (dev_info.nb_phy_ports == 0) {
>>> +		printf("Number of physical ports is 0 which is invalid for PHY
>> Affinity\n");
>>> +		return;
>>> +	}
>>> +	printf("The number of physical ports is %u\n", dev_info.nb_phy_ports);
>>> +	if (dev_info.nb_phy_ports < res->value) {
>>> +		printf("The PHY affinity value %u is Invalid, exceeds the "
>>> +		       "number of physical ports\n", res->value);
>>> +		return;
>>> +	}
>>> +	port->txq[res->qid].conf.tx_phy_affinity = res->value;
>>> +
>>> +	cmd_reconfig_device_queue(res->portid, 0, 1); }
>>> +
>>> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 port, "port");
>>> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 config, "config");
>>> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
>>> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 portid, RTE_UINT16);
>>> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 txq, "txq");
>>> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
>>> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +			      qid, RTE_UINT16);
>>> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
>>> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +				 phy_affinity, "phy_affinity");
>>> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
>>> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
>>> +			      value, RTE_UINT8);
>>> +
>>> +static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
>>> +	.f = cmd_config_tx_phy_affinity_parsed,
>>> +	.data = (void *)0,
>>> +	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
>>> +	.tokens = {
>>> +		(void *)&cmd_config_tx_phy_affinity_port,
>>> +		(void *)&cmd_config_tx_phy_affinity_config,
>>> +		(void *)&cmd_config_tx_phy_affinity_portid,
>>> +		(void *)&cmd_config_tx_phy_affinity_txq,
>>> +		(void *)&cmd_config_tx_phy_affinity_qid,
>>> +		(void *)&cmd_config_tx_phy_affinity_hwport,
>>> +		(void *)&cmd_config_tx_phy_affinity_value,
>>> +		NULL,
>>> +	},
>>> +};
>>> +
>>>  /*
>>>
>> ****************************************************************
>> ******
>>> ********** */
>>>
>>>  /* list of instructions */
>>> @@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>>>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
>>>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
>>>  	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
>>> +	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
>>>  	NULL,
>>>  };
>>>
>>> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
>>> acccb6b035..b83fb17cfa 100644
>>> --- a/app/test-pmd/config.c
>>> +++ b/app/test-pmd/config.c
>>> @@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
>>>  		printf("unknown\n");
>>>  		break;
>>>  	}
>>> +	printf("Current number of physical ports: %u\n",
>>> +dev_info.nb_phy_ports);
>>>  }
>>>
>>>  void
>>> diff --git a/devtools/libabigail.abignore
>>> b/devtools/libabigail.abignore index 7a93de3ba1..ac7d3fb2da 100644
>>> --- a/devtools/libabigail.abignore
>>> +++ b/devtools/libabigail.abignore
>>> @@ -34,3 +34,8 @@
>>>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>>  ; Temporary exceptions till next major ABI version ;
>>> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>>> +
>>> +; Ignore fields inserted in padding hole of rte_eth_txconf
>>> +[suppress_type]
>>> +        name = rte_eth_txconf
>>> +        has_data_member_inserted_between =
>>> +{offset_of(tx_deferred_start), offset_of(offloads)}
>>> diff --git a/doc/guides/rel_notes/release_23_03.rst
>>> b/doc/guides/rel_notes/release_23_03.rst
>>> index 73f5d94e14..e99bd2dcb6 100644
>>> --- a/doc/guides/rel_notes/release_23_03.rst
>>> +++ b/doc/guides/rel_notes/release_23_03.rst
>>> @@ -55,6 +55,10 @@ New Features
>>>       Also, make sure to start the actual text at the margin.
>>>       =======================================================
>>>
>>> +* **Added affinity for multiple physical ports connected to a single
>>> +DPDK port.**
>>> +
>>> +  * Added Tx affinity in queue setup to map a physical port.
>>> +
>>>  * **Updated AMD axgbe driver.**
>>>
>>>    * Added multi-process support.
>>> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> index 79a1fa9cb7..5c716f7679 100644
>>> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
>>> @@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only
>> on a specific Tx queue::
>>>
>>>  This command should be run when the port is stopped, or else it will fail.
>>>
>>> +config per queue Tx physical affinity
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +Configure a per queue physical affinity value only on a specific Tx queue::
>>> +
>>> +   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
>>> +
>>> +* ``phy_affinity``: physical port to use for sending,
>>> +                    when multiple physical ports are connected to
>>> +                    a single DPDK port.
>>> +
>>> +This command should be run when the port is stopped, otherwise it fails.
>>> +
>>>  Config VXLAN Encap outer layers
>>>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>> c129ca1eaf..2fd971b7b5 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
>>>  				      less free descriptors than this value. */
>>>
>>>  	uint8_t tx_deferred_start; /**< Do not start queue with
>>> rte_eth_dev_start(). */
>>> +	/**
>>> +	 * Affinity with one of the multiple physical ports connected to the
>> DPDK port.
>>> +	 * Value 0 means no affinity and traffic could be routed to any
>> connected
>>> +	 * physical port.
>>> +	 * The first physical port is number 1 and so on.
>>> +	 * Number of physical ports is reported by nb_phy_ports in
>> rte_eth_dev_info.
>>> +	 */
>>> +	uint8_t tx_phy_affinity;
>>>  	/**
>>>  	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_*
>> flags.
>>>  	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa @@
>>> -1744,6 +1752,8 @@ struct rte_eth_dev_info {
>>>  	/** Device redirection table size, the total number of entries. */
>>>  	uint16_t reta_size;
>>>  	uint8_t hash_key_size; /**< Hash key size in bytes */
>>> +	/** Number of physical ports connected with DPDK port. */
>>> +	uint8_t nb_phy_ports;
>>>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
>>>  	uint64_t flow_type_rss_offloads;
>>>  	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration
>>> */
> 


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-09 19:44  0%     ` Ferruh Yigit
  2023-02-10 14:06  0%       ` Jiawei(Jonny) Wang
@ 2023-02-14  9:38  0%       ` Jiawei(Jonny) Wang
  2023-02-14 10:01  0%         ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Jiawei(Jonny) Wang @ 2023-02-14  9:38 UTC (permalink / raw)
  To: Ferruh Yigit, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang
  Cc: dev, Raslan Darawsheh

Hi,

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, February 10, 2023 3:45 AM
> To: Jiawei(Jonny) Wang <jiaweiw@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> andrew.rybchenko@oktetlabs.ru; Aman Singh <aman.deep.singh@intel.com>;
> Yuying Zhang <yuying.zhang@intel.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue
> API
> 
> On 2/3/2023 1:33 PM, Jiawei Wang wrote:
> > When multiple physical ports are connected to a single DPDK port,
> > (example: kernel bonding, DPDK bonding, failsafe, etc.), we want to
> > know which physical port is used for Rx and Tx.
> >
> 
> I assume "kernel bonding" is out of context, but this patch concerns DPDK
> bonding, failsafe or softnic. (I will refer them as virtual bonding
> device.)
> 
> To use specific queues of the virtual bonding device may interfere with the
> logic of these devices, like bonding modes or RSS of the underlying devices. I
> can see feature focuses on a very specific use case, but not sure if all possible
> side effects taken into consideration.
> 
> 
> And although the feature is only relavent to virtual bondiong device, core
> ethdev structures are updated for this. Most use cases won't need these, so is
> there a way to reduce the scope of the changes to virtual bonding devices?
> 
> 
> There are a few very core ethdev APIs, like:
> rte_eth_dev_configure()
> rte_eth_tx_queue_setup()
> rte_eth_rx_queue_setup()
> rte_eth_dev_start()
> rte_eth_dev_info_get()
> 
> Almost every user of ehtdev uses these APIs, since these are so fundemental I
> am for being a little more conservative on these APIs.
> 
> Every eccentric features are targetting these APIs first because they are
> common and extending them gives an easy solution, but in long run making
> these APIs more complex, harder to maintain and harder for PMDs to support
> them correctly. So I am for not updating them unless it is a generic use case.
> 
> 
> Also as we talked about PMDs supporting them, I assume your coming PMD
> patch will be implementing 'tx_phy_affinity' config option only for mlx drivers.
> What will happen for other NICs? Will they silently ignore the config option
> from user? So this is a problem for the DPDK application portabiltiy.
> 
> 
> 
> As far as I understand target is application controlling which sub-device is used
> under the virtual bonding device, can you pleaes give more information why
> this is required, perhaps it can help to provide a better/different solution.
> Like adding the ability to use both bonding device and sub-device for data path,
> this way application can use whichever it wants. (this is just first solution I
> come with, I am not suggesting as replacement solution, but if you can describe
> the problem more I am sure other people can come with better solutions.)
> 
> And isn't this against the applicatio transparent to underneath device being
> bonding device or actual device?
> 
> 

OK, I will send the new version with separate functions in ethdev layer, 
to support the Map a Tx queue to port and get the number of ports.
And these functions work with device ops callback, other NICs will reported
The unsupported the ops callback is NULL.

> > This patch maps a DPDK Tx queue with a physical port, by adding
> > tx_phy_affinity setting in Tx queue.
> > The affinity number is the physical port ID where packets will be
> > sent.
> > Value 0 means no affinity and traffic could be routed to any connected
> > physical ports, this is the default current behavior.
> >
> > The number of physical ports is reported with rte_eth_dev_info_get().
> >
> > The new tx_phy_affinity field is added into the padding hole of
> > rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> > An ABI check rule needs to be added to avoid false warning.
> >
> > Add the testpmd command line:
> > testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> >
> > For example, there're two physical ports connected to a single DPDK
> > port (port id 0), and phy_affinity 1 stood for the first physical port
> > and phy_affinity 2 stood for the second physical port.
> > Use the below commands to config tx phy affinity for per Tx Queue:
> >         port config 0 txq 0 phy_affinity 1
> >         port config 0 txq 1 phy_affinity 1
> >         port config 0 txq 2 phy_affinity 2
> >         port config 0 txq 3 phy_affinity 2
> >
> > These commands config the Tx Queue index 0 and Tx Queue index 1 with
> > phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets, these
> > packets will be sent from the first physical port, and similar with
> > the second physical port if sending packets with Tx Queue 2 or Tx
> > Queue 3.
> >
> > Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> > ---
> >  app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
> >  app/test-pmd/config.c                       |   1 +
> >  devtools/libabigail.abignore                |   5 +
> >  doc/guides/rel_notes/release_23_03.rst      |   4 +
> >  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
> >  lib/ethdev/rte_ethdev.h                     |  10 ++
> >  6 files changed, 133 insertions(+)
> >
> > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> > cb8c174020..f771fcf8ac 100644
> > --- a/app/test-pmd/cmdline.c
> > +++ b/app/test-pmd/cmdline.c
> > @@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void
> > *parsed_result,
> >
> >  			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
> >  			"    Cleanup txq mbufs for a specific Tx queue\n\n"
> > +
> > +			"port config (port_id) txq (queue_id) phy_affinity
> (value)\n"
> > +			"    Set the physical affinity value "
> > +			"on a specific Tx queue\n\n"
> >  		);
> >  	}
> >
> > @@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t
> cmd_show_port_flow_transfer_proxy = {
> >  	}
> >  };
> >
> > +/* *** configure port txq phy_affinity value *** */ struct
> > +cmd_config_tx_phy_affinity {
> > +	cmdline_fixed_string_t port;
> > +	cmdline_fixed_string_t config;
> > +	portid_t portid;
> > +	cmdline_fixed_string_t txq;
> > +	uint16_t qid;
> > +	cmdline_fixed_string_t phy_affinity;
> > +	uint8_t value;
> > +};
> > +
> > +static void
> > +cmd_config_tx_phy_affinity_parsed(void *parsed_result,
> > +				  __rte_unused struct cmdline *cl,
> > +				  __rte_unused void *data)
> > +{
> > +	struct cmd_config_tx_phy_affinity *res = parsed_result;
> > +	struct rte_eth_dev_info dev_info;
> > +	struct rte_port *port;
> > +	int ret;
> > +
> > +	if (port_id_is_invalid(res->portid, ENABLED_WARN))
> > +		return;
> > +
> > +	if (res->portid == (portid_t)RTE_PORT_ALL) {
> > +		printf("Invalid port id\n");
> > +		return;
> > +	}
> > +
> > +	port = &ports[res->portid];
> > +
> > +	if (strcmp(res->txq, "txq")) {
> > +		printf("Unknown parameter\n");
> > +		return;
> > +	}
> > +	if (tx_queue_id_is_invalid(res->qid))
> > +		return;
> > +
> > +	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
> > +	if (ret != 0)
> > +		return;
> > +
> > +	if (dev_info.nb_phy_ports == 0) {
> > +		printf("Number of physical ports is 0 which is invalid for PHY
> Affinity\n");
> > +		return;
> > +	}
> > +	printf("The number of physical ports is %u\n", dev_info.nb_phy_ports);
> > +	if (dev_info.nb_phy_ports < res->value) {
> > +		printf("The PHY affinity value %u is Invalid, exceeds the "
> > +		       "number of physical ports\n", res->value);
> > +		return;
> > +	}
> > +	port->txq[res->qid].conf.tx_phy_affinity = res->value;
> > +
> > +	cmd_reconfig_device_queue(res->portid, 0, 1); }
> > +
> > +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 port, "port");
> > +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 config, "config");
> > +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
> > +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 portid, RTE_UINT16);
> > +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 txq, "txq");
> > +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
> > +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +			      qid, RTE_UINT16);
> > +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
> > +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +				 phy_affinity, "phy_affinity");
> > +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
> > +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> > +			      value, RTE_UINT8);
> > +
> > +static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
> > +	.f = cmd_config_tx_phy_affinity_parsed,
> > +	.data = (void *)0,
> > +	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
> > +	.tokens = {
> > +		(void *)&cmd_config_tx_phy_affinity_port,
> > +		(void *)&cmd_config_tx_phy_affinity_config,
> > +		(void *)&cmd_config_tx_phy_affinity_portid,
> > +		(void *)&cmd_config_tx_phy_affinity_txq,
> > +		(void *)&cmd_config_tx_phy_affinity_qid,
> > +		(void *)&cmd_config_tx_phy_affinity_hwport,
> > +		(void *)&cmd_config_tx_phy_affinity_value,
> > +		NULL,
> > +	},
> > +};
> > +
> >  /*
> >
> ****************************************************************
> ******
> > ********** */
> >
> >  /* list of instructions */
> > @@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
> >  	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
> >  	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
> >  	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
> > +	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
> >  	NULL,
> >  };
> >
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> > acccb6b035..b83fb17cfa 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
> >  		printf("unknown\n");
> >  		break;
> >  	}
> > +	printf("Current number of physical ports: %u\n",
> > +dev_info.nb_phy_ports);
> >  }
> >
> >  void
> > diff --git a/devtools/libabigail.abignore
> > b/devtools/libabigail.abignore index 7a93de3ba1..ac7d3fb2da 100644
> > --- a/devtools/libabigail.abignore
> > +++ b/devtools/libabigail.abignore
> > @@ -34,3 +34,8 @@
> >  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >  ; Temporary exceptions till next major ABI version ;
> > ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> > +
> > +; Ignore fields inserted in padding hole of rte_eth_txconf
> > +[suppress_type]
> > +        name = rte_eth_txconf
> > +        has_data_member_inserted_between =
> > +{offset_of(tx_deferred_start), offset_of(offloads)}
> > diff --git a/doc/guides/rel_notes/release_23_03.rst
> > b/doc/guides/rel_notes/release_23_03.rst
> > index 73f5d94e14..e99bd2dcb6 100644
> > --- a/doc/guides/rel_notes/release_23_03.rst
> > +++ b/doc/guides/rel_notes/release_23_03.rst
> > @@ -55,6 +55,10 @@ New Features
> >       Also, make sure to start the actual text at the margin.
> >       =======================================================
> >
> > +* **Added affinity for multiple physical ports connected to a single
> > +DPDK port.**
> > +
> > +  * Added Tx affinity in queue setup to map a physical port.
> > +
> >  * **Updated AMD axgbe driver.**
> >
> >    * Added multi-process support.
> > diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > index 79a1fa9cb7..5c716f7679 100644
> > --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> > @@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only
> on a specific Tx queue::
> >
> >  This command should be run when the port is stopped, or else it will fail.
> >
> > +config per queue Tx physical affinity
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Configure a per queue physical affinity value only on a specific Tx queue::
> > +
> > +   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
> > +
> > +* ``phy_affinity``: physical port to use for sending,
> > +                    when multiple physical ports are connected to
> > +                    a single DPDK port.
> > +
> > +This command should be run when the port is stopped, otherwise it fails.
> > +
> >  Config VXLAN Encap outer layers
> >  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> > c129ca1eaf..2fd971b7b5 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
> >  				      less free descriptors than this value. */
> >
> >  	uint8_t tx_deferred_start; /**< Do not start queue with
> > rte_eth_dev_start(). */
> > +	/**
> > +	 * Affinity with one of the multiple physical ports connected to the
> DPDK port.
> > +	 * Value 0 means no affinity and traffic could be routed to any
> connected
> > +	 * physical port.
> > +	 * The first physical port is number 1 and so on.
> > +	 * Number of physical ports is reported by nb_phy_ports in
> rte_eth_dev_info.
> > +	 */
> > +	uint8_t tx_phy_affinity;
> >  	/**
> >  	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_*
> flags.
> >  	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa @@
> > -1744,6 +1752,8 @@ struct rte_eth_dev_info {
> >  	/** Device redirection table size, the total number of entries. */
> >  	uint16_t reta_size;
> >  	uint8_t hash_key_size; /**< Hash key size in bytes */
> > +	/** Number of physical ports connected with DPDK port. */
> > +	uint8_t nb_phy_ports;
> >  	/** Bit mask of RSS offloads, the bit offset also means flow type */
> >  	uint64_t flow_type_rss_offloads;
> >  	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration
> > */


^ permalink raw reply	[relevance 0%]

* [PATCH v5 21/22] hash: move rte_hash_set_alg out header
  2023-02-14  2:18  3% ` [PATCH v5 00/22] Replace us of static logtypes Stephen Hemminger
@ 2023-02-14  2:19  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-14  2:19 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v5 00/22] Replace us of static logtypes
    2023-02-13 19:55  3% ` [PATCH v4 00/19] Replace use of static logtypes Stephen Hemminger
@ 2023-02-14  2:18  3% ` Stephen Hemminger
  2023-02-14  2:19  3%   ` [PATCH v5 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-14 22:47  3% ` [PATCH v6 00/22] Replace use of static logtypes in libraries Stephen Hemminger
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-14  2:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v5 - fix use of LOGTYPE PORT and POWER in examples

v4 - use simpler/shorter method for setting local LOGTYPE
     split up steps of some of the changes

Stephen Hemminger (22):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  examples/power: replace use of RTE_LOGTYPE_POWER
  examples/l3fwd-power: replace use of RTE_LOGTYPE_POWER
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  examples/ipsecgw: replace RTE_LOGTYPE_PORT
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 examples/distributor/main.c       |  2 +-
 examples/ipsec-secgw/sa.c         |  6 +--
 examples/l3fwd-power/main.c       | 15 +++----
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  3 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  3 ++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  3 ++
 lib/mempool/rte_mempool_log.h     |  4 ++
 lib/mempool/rte_mempool_ops.c     |  1 +
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/power/rte_power_empty_poll.c  |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 73 files changed, 375 insertions(+), 169 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/mempool/rte_mempool_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
  2023-02-13 15:28  0%                           ` Ben Magistro
@ 2023-02-13 23:18  0%                           ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-13 23:18 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard

On Mon, Feb 13, 2023 at 05:04:49AM +0000, Honnappa Nagarahalli wrote:
> Hi Tyler,
> 	Few more comments inline. Let us continue to make progress, I will add this topic for Techboard discussion for 22nd Feb.
> 
> > -----Original Message-----
> > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Sent: Friday, February 10, 2023 2:30 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Cc: Morten Brørup <mb@smartsharesystems.com>; thomas@monjalon.net;
> > dev@dpdk.org; bruce.richardson@intel.com; david.marchand@redhat.com;
> > jerinj@marvell.com; konstantin.ananyev@huawei.com;
> > ferruh.yigit@amd.com; nd <nd@arm.com>; techboard@dpdk.org
> > Subject: Re: [PATCH] eal: introduce atomics abstraction
> > 
> > On Fri, Feb 10, 2023 at 05:30:00AM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > > > > <snip>
> > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For environments where stdatomics are not supported,
> > > > > > > > > > > we could
> > > > > > > > have a
> > > > > > > > > > stdatomic.h in DPDK implementing the same APIs (we have
> > > > > > > > > > to support
> > > > > > > > only
> > > > > > > > > > _explicit APIs). This allows the code to use stdatomics
> > > > > > > > > > APIs and
> > > > > > > > when we move
> > > > > > > > > > to minimum supported standard C11, we just need to get
> > > > > > > > > > rid of the
> > > > > > > > file in DPDK
> > > > > > > > > > repo.
> > > > > > > > > >
> > > > > > > > > > my concern with this is that if we provide a stdatomic.h
> > > > > > > > > > or
> > > > > > > > introduce names
> > > > > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > > > > >
> > > > > > > > > > references:
> > > > > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > > > > >  * GNU libc manual
> > > > > > > > > >
> > > > > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reser
> > > > > > > > > > ved-
> > > > > > > > > > Names.html
> > > > > > > > > >
> > > > > > > > > > in effect the header, the names and in some instances
> > > > > > > > > > namespaces
> > > > > > > > introduced
> > > > > > > > > > are reserved by the implementation. there are several
> > > > > > > > > > reasons in
> > > > > > > > the GNU libc
> > > > > > > > > Wouldn't this apply only after the particular APIs were
> > introduced?
> > > > > > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > > > > > >
> > > > > > > > yeah, i agree they're being a bit wishy washy in the
> > > > > > > > wording, but i'm not convinced glibc folks are documenting
> > > > > > > > this as permissive guidance against.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > manual that explain the justification for these
> > > > > > > > > > reservations and if
> > > > > > > > if we think
> > > > > > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > > > > > >
> > > > > > > > > > i'll also remark that the inter-mingling of names from
> > > > > > > > > > the POSIX
> > > > > > > > standard
> > > > > > > > > > implicitly exposed as a part of the EAL public API has
> > > > > > > > > > been
> > > > > > > > problematic for
> > > > > > > > > > portability.
> > > > > > > > > These should be exposed as EAL APIs only when compiled
> > > > > > > > > with a
> > > > > > > > compiler that does not support stdatomics.
> > > > > > > >
> > > > > > > > you don't necessarily compile dpdk, the application or its
> > > > > > > > other dynamically linked dependencies with the same compiler
> > > > > > > > at the same time.
> > > > > > > > i.e. basically the model of any dpdk-dev package on any
> > > > > > > > linux distribution.
> > > > > > > >
> > > > > > > > if dpdk is built without real stdatomic types but the
> > > > > > > > application has to interoperate with a different kit or
> > > > > > > > library that does they would be forced to dance around dpdk
> > > > > > > > with their own version of a shim to hide our faked up stdatomics.
> > > > > > > >
> > > > > > >
> > > > > > > So basically, if we want a binary DPDK distribution to be
> > > > > > > compatible with a
> > > > > > separate application build environment, they both have to
> > > > > > implement atomics the same way, i.e. agree on the ABI for atomics.
> > > > > > >
> > > > > > > Summing up, this leaves us with only two realistic options:
> > > > > > >
> > > > > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > > > > build
> > > > > > environment to support C11 stdatomics.
> > > > > > > 2. Provide our own DPDK atomics library.
> > > > > > >
> > > > > > > (As mentioned by Tyler, the third option - using C11
> > > > > > > stdatomics inside DPDK, and requiring a build environment
> > > > > > > without C11 stdatomics to implement a shim - is not
> > > > > > > realistic!)
> > > > > > >
> > > > > > > I strongly want atomics to be available for use across inline
> > > > > > > and compiled
> > > > > > code; i.e. it must be possible for both compiled DPDK functions
> > > > > > and inline functions to perform atomic transactions on the same
> > atomic variable.
> > > > > >
> > > > > > i consider it a mandatory requirement. i don't see practically
> > > > > > how we could withdraw existing use and even if we had clean way
> > > > > > i don't see why we would want to. so this item is defintely
> > > > > > settled if you were
> > > > concerned.
> > > > > I think I agree here.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > So either we upgrade the DPDK build requirements to support
> > > > > > > C11 (including
> > > > > > the optional stdatomics), or we provide our own DPDK atomics.
> > > > > >
> > > > > > i think the issue of requiring a toolchain conformant to a
> > > > > > specific standard is a separate matter because any adoption of
> > > > > > C11 standard atomics is a potential abi break from the current use of
> > intrinsics.
> > > > > I am not sure why you are calling it as ABI break. Referring to
> > > > > [1], I just see
> > > > wrappers around intrinsics (though [2] does not use the intrinsics).
> > > > >
> > > > > [1]
> > > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatom
> > > > > ic.h
> > > > > [2]
> > > > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdat
> > > > > omic
> > > > > .h
> > > >
> > > > it's a potential abi break because atomic types are not the same
> > > > types as their corresponding integer types etc.. (or at least are
> > > > not guaranteed to be by all implementations of c as an abstract language).
> > > >
> > > >     ISO/IEC 9899:2011
> > > >
> > > >     6.2.5 (27)
> > > >     Further, there is the _Atomic qualifier. The presence of the _Atomic
> > > >     qualifier designates an atomic type. The size, representation, and
> > alignment
> > > >     of an atomic type need not be the same as those of the corresponding
> > > >     unqualified type.
> > > >
> > > >     7.17.6 (3)
> > > >     NOTE The representation of atomic integer types need not have the
> > same size
> > > >     as their corresponding regular types. They should have the same
> > > > size whenever
> > > >     possible, as it eases effort required to port existing code.
> > > >
> > > > i use the term `potential abi break' with intent because for me to
> > > > assert in absolute terms i would have to evaluate the implementation
> > > > of every current and potential future compilers atomic vs non-atomic
> > > > types. this as i'm sure you understand is not practical, it would
> > > > also defeat the purpose of moving to a standard. therefore i rely on
> > > > the specification prescribed by the standard not the detail of a specific
> > implementation.
> > > Can we say that the platforms 'supported' by DPDK today do not have this
> > problem? Any future platforms that will come to DPDK have to evaluate this.
> > 
> > sadly i don't think we can. i believe in an earlier post i linked a bug filed on
> > gcc that shows that clang / gcc were producing different layout than the
> > equivalent non-atomic type.
> I looked at that bug again, it is to do with structure.

just to be clear, you're saying you aren't concerned because we don't
have in our public api struct objects to which we apply atomic
operations?

if that guarantee is absolute and stays true in our public api then i am
satisfied and we can drop the issue.

hypothetically if we make this assumption are you proposing that all
platform/toolchain combinations that support std=c11 and optional
stdatomic should adopt them as default on?

there are other implications to doing this, let's dig into the details
at the next technical board meeting.

> 
> > 
> > >
> > > >
> > > >
> > > > > > the abstraction (whatever namespace it resides) allows the
> > > > > > existing toolchain/platform combinations to maintain
> > > > > > compatibility by defaulting to current non-standard intrinsics.
> > > > > How about using the intrinsics (__atomic_xxx) name space for
> > abstraction?
> > > > This covers the GCC and Clang compilers.
> > 
> > i haven't investigated fully but there are usages of these intrinsics that
> > indicate there may be undesirable difference between clang and gcc versions.
> > the hint is there seems to be conditionally compiled code under __clang__
> > when using some __atomic's.
> I sent an RFC to address this [1]. I think the size specific intrinsics are not necessary.
> 
> [1] http://patches.dpdk.org/project/dpdk/patch/20230211015622.408487-1-honnappa.nagarahalli@arm.com/

yep, looks good to me. i acked the change.

thank you.

> 
> > 
> > for the purpose of this discussion clang just tries to look like gcc so i don't
> > regard them as being different compilers for the purpose of this discussion.
> > 
> > > >
> > > > the namespace starting with `__` is also reserved for the implementation.
> > > > this is why compilers gcc/clang/msvc place name their intrinsic and
> > > > builtin functions starting with __ to explicitly avoid collision
> > > > with the application namespace.
> > 
> > > Agreed. But, here we are considering '__atomic_' specifically (i.e.
> > > not just '__')
> > 
> > i don't understand the confusion __atomic is within the __ namespace that is
> > reserved.
> What I mean is, we are not formulating a policy/rule to allow for any name space that starts with '__'.

understood, but we appear to be trying to formulate a policy allowing a name
within that space which is reserved by the standard for and claimed by gcc.

anyway, let's discuss further at the meeting.

> 
> > 
> > let me ask this another way, what benefit do you see to trying to overlap with
> > the standard namespace? the only benefit i can see is that at some point in
> > the future it avoids having to perform a mechanical change to eventually
> > retire the abstraction once all platform/toolchains support standard atomics.
> > i.e. basically s/rte_atomic/atomic/g
> > 
> > is there another benefit i'm missing?
> The abstraction you have proposed solves the problem for the long term. The proposed abstraction stops us from thinking about moving to stdatomics.

i think this is where you've got me a bit confused. i'd like to
understand how it stops us thinking about moving to stdatomics.

> IMO, the problem is short term. Using the __atomic_ name space does not have any practical issues with the platforms DPDK supports (unless msvc has a problem with this, more questions below).

oh, sorry for not answering this previously. msvc (and as it happens
clang) both use __c11_atomic_xxx as a namespace. i'm only aware of gcc
and potentially compilers that try to look like gcc using __atomic_xxxx.

so if you're asking if that selection would interfere with msvc, it
wouldn't. i'm only concerned with mingling in a namespace that gcc has
claimed.

> 
> > 
> > >
> > > >
> > > >     ISO/IEC 9899:2011
> > > >
> > > >     7.1.3 (1)
> > > >     All identifiers that begin with an underscore and either an uppercase
> > > >     letter or another underscore are always reserved for any use.
> > > >
> > > >     ...
> > > >
> > > > > If there is another platform that uses the same name space for
> > > > > something
> > > > else, I think DPDK should not be supporting that platform.
> > > >
> > > > that's effectively a statement excluding windows platform and all
> > > > non-gcc compilers from ever supporting dpdk.
> > > Apologies, I did not understand your comment on windows platform. Do
> > you mean to say a compiler for windows platform uses '__atomic_xxx' name
> > space to provide some other functionality (and hence it would get excluded)?
> > 
> > i mean dpdk can never fully be supported without msvc except for statically
> > linked builds which are niche and limit it too severely for many consumers to
> > practically use dpdk. there are also many application developers who would
> > like to integrate dpdk but can't and telling them their only choice is to re-port
> > their entire application to clang isn't feasible.
> > 
> > i can see no technical reason why we should be excluding a major compiler in
> > broad use if it is capable of building dpdk. msvc arguably has some of the
> > most sophisticated security features in the industry and the use of those
> > features is mandated by many of the customers who might deploy dpdk
> > applications on windows.
> I did not mean DPDK should not support msvc (may be my sentence below was misunderstood).
> Does msvc provide '__atomic_xxx' intrinsics?

msvc provides stdatomic (behind stdatomic there are intrinsics)

> 
> > 
> > > Clang supports these intrinsics. I am not sure about the merit of supporting
> > other non-gcc compilers. May be a topic Techboard discussion.
> > >
> > > >
> > > > > What problems do you see?
> > > >
> > > > i'm fairly certain at least one other compiler uses the __atomic
> > > > namespace but
> > > Do you mean __atomic namespace is used for some other purpose?
> > >
> > > > it would take me time to check, the most notable potential issue
> > > > that comes to mind is if such an intrinsic with the same name is
> > > > provided in a different implementation and has either regressive
> > > > code generation or different semantics it would be bad because it is
> > > > intrinsic you can't just hack around it with #undef __atomic to shim in a
> > semantically correct version.
> > > I do not think we should worry about regressive code generation problem. It
> > should be fixed by that compiler.
> > > Different semantics is something we need to worry about. It would be good
> > to find out more about a compiler that does this.
> > 
> > again, this is about portability it's about potential not that we can find an
> > example.
> > 
> > >
> > > >
> > > > how about this, is there another possible namespace you might
> > > > suggest that conforms or doesn't conflict with the the rules defined
> > > > in ISO/IEC 9899:2011
> > > > 7.1.3 i think if there were that would satisfy all of my concerns
> > > > related to namespaces.
> > > >
> > > > keep in mind the point of moving to a standard is to achieve
> > > > portability so if we do things that will regress us back to being
> > > > dependent on an implementation we haven't succeeded. that's all i'm
> > trying to guarantee here.
> > > Agree. We are trying to solve a problem that is temporary. I am trying to
> > keep the problem scope narrow which might help us push to adopt the
> > standard sooner.
> > 
> > i do wish we could just target the standard but unless we are willing to draw a
> > line and say no more non std=c11 and also we potentially break the abi we
> > are talking years. i don't think it is reasonable to block progress for years, so
> > i'm offering a transitional path. it's an evolution over time that we have to
> > manage.
> Apologies if I am sounding like I am blocking progress. Rest assured, we will find a way. It is just about which solution we are going to pick.

no problems, i really appreciate any help.

> Also, is there are any information on how long before we move to C11?

we need to clear all long term compatibility promises for gcc/linux
platforms that don't support -std=c11 and implement stdatomic option. the
last discussion was that it was years i believe.

Bruce has a patch series and another thread going talking about moving
to -std=c99 which is good but of course doesn't get us to std=c99.

> 
> > 
> > >
> > > >
> > > > i feel like we are really close on this discussion, if we can just
> > > > iron this issue out we can probably get going on the actual changes.
> > > >
> > > > thanks for the consideration.
> > > >
> > > > >
> > > > > >
> > > > > > once in place it provides an opportunity to introduce new
> > > > > > toolchain/platform combinations and enables an opt-in capability
> > > > > > to use stdatomics on existing toolchain/platform combinations
> > > > > > subject to community discussion on how/if/when.
> > > > > >
> > > > > > it would be good to get more participants into the discussion so
> > > > > > i'll cc techboard for some attention. i feel like the only area
> > > > > > that isn't decided is to do or not do this in rte_ namespace.
> > > > > >
> > > > > > i'm strongly in favor of rte_ namespace after discussion, mainly
> > > > > > due to to disadvantages of trying to overlap with the standard
> > > > > > namespace while not providing a compatible api/abi and because
> > > > > > it provides clear disambiguation of that difference in semantics
> > > > > > and compatibility with
> > > > the standard api.
> > > > > >
> > > > > > so far i've noted the following
> > > > > >
> > > > > > * we will not provide the non-explicit apis.
> > > > > +1
> > > > >
> > > > > > * we will make no attempt to support operate on struct/union atomics
> > > > > >   with our apis.
> > > > > +1
> > > > >
> > > > > > * we will mirror the standard api potentially in the rte_ namespace to
> > > > > >   - reference the standard api documentation.
> > > > > >   - assume compatible semantics (sans exceptions from first 2 points).
> > > > > >
> > > > > > my vote is to remove 'potentially' from the last point above for
> > > > > > reasons previously discussed in postings to the mail thread.
> > > > > >
> > > > > > thanks all for the discussion, i'll send up a patch removing
> > > > > > non-explicit apis for viewing.
> > > > > >
> > > > > > ty

^ permalink raw reply	[relevance 0%]

* [PATCH v4 18/19] hash: move rte_hash_set_alg out header
  2023-02-13 19:55  3% ` [PATCH v4 00/19] Replace use of static logtypes Stephen Hemminger
@ 2023-02-13 19:55  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-13 19:55 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin

The code for setting algorithm for hash is not at all perf sensitive,
and doing it inline has a couple of problems. First, it means that if
multiple files include the header, then the initialization gets done
multiple times. But also, it makes it harder to fix usage of RTE_LOG().

Despite what the checking script say. This is not an ABI change, the
previous version inlined the same code; therefore both old and new code
will work the same.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/meson.build    |  1 +
 lib/hash/rte_hash_crc.c | 63 +++++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h | 46 ++----------------------------
 lib/hash/version.map    |  1 +
 4 files changed, 67 insertions(+), 44 deletions(-)
 create mode 100644 lib/hash/rte_hash_crc.c

diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index e56ee8572564..c345c6f561fc 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -19,6 +19,7 @@ indirect_headers += files(
 
 sources = files(
     'rte_cuckoo_hash.c',
+    'rte_hash_crc.c',
     'rte_fbk_hash.c',
     'rte_thash.c',
     'rte_thash_gfni.c'
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
new file mode 100644
index 000000000000..c59eebccb1eb
--- /dev/null
+++ b/lib/hash/rte_hash_crc.c
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation
+ */
+
+#include <rte_cpuflags.h>
+#include <rte_log.h>
+
+#include "rte_hash_crc.h"
+
+/**
+ * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * calculation.
+ *
+ * @param alg
+ *   An OR of following flags:
+ *   - (CRC32_SW) Don't use SSE4.2/ARMv8 intrinsics (default non-[x86/ARMv8])
+ *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
+ *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
+ *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *
+ */
+void
+rte_hash_crc_set_alg(uint8_t alg)
+{
+	crc32_alg = CRC32_SW;
+
+	if (alg == CRC32_SW)
+		return;
+
+#if defined RTE_ARCH_X86
+	if (!(alg & CRC32_SSE42_x64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
+	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
+		crc32_alg = CRC32_SSE42;
+	else
+		crc32_alg = CRC32_SSE42_x64;
+#endif
+
+#if defined RTE_ARCH_ARM64
+	if (!(alg & CRC32_ARM64))
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
+		crc32_alg = CRC32_ARM64;
+#endif
+
+	if (crc32_alg == CRC32_SW)
+		RTE_LOG(WARNING, HASH,
+			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
+}
+
+/* Setting the best available algorithm */
+RTE_INIT(rte_hash_crc_init_alg)
+{
+#if defined(RTE_ARCH_X86)
+	rte_hash_crc_set_alg(CRC32_SSE42_x64);
+#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
+	rte_hash_crc_set_alg(CRC32_ARM64);
+#else
+	rte_hash_crc_set_alg(CRC32_SW);
+#endif
+}
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 0249ad16c5b6..e4acd99a0c81 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -20,8 +20,6 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_common.h>
 #include <rte_config.h>
-#include <rte_cpuflags.h>
-#include <rte_log.h>
 
 #include "rte_crc_sw.h"
 
@@ -53,48 +51,8 @@ static uint8_t crc32_alg = CRC32_SW;
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
  *
  */
-static inline void
-rte_hash_crc_set_alg(uint8_t alg)
-{
-	crc32_alg = CRC32_SW;
-
-	if (alg == CRC32_SW)
-		return;
-
-#if defined RTE_ARCH_X86
-	if (!(alg & CRC32_SSE42_x64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_x64/CRC32_SSE42\n");
-	if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_EM64T) || alg == CRC32_SSE42)
-		crc32_alg = CRC32_SSE42;
-	else
-		crc32_alg = CRC32_SSE42_x64;
-#endif
-
-#if defined RTE_ARCH_ARM64
-	if (!(alg & CRC32_ARM64))
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_ARM64\n");
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
-		crc32_alg = CRC32_ARM64;
-#endif
-
-	if (crc32_alg == CRC32_SW)
-		RTE_LOG(WARNING, HASH,
-			"Unsupported CRC32 algorithm requested using CRC32_SW\n");
-}
-
-/* Setting the best available algorithm */
-RTE_INIT(rte_hash_crc_init_alg)
-{
-#if defined(RTE_ARCH_X86)
-	rte_hash_crc_set_alg(CRC32_SSE42_x64);
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
-	rte_hash_crc_set_alg(CRC32_ARM64);
-#else
-	rte_hash_crc_set_alg(CRC32_SW);
-#endif
-}
+void
+rte_hash_crc_set_alg(uint8_t alg);
 
 #ifdef __DOXYGEN__
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index f03b047b2eec..a1d81835399c 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -9,6 +9,7 @@ DPDK_23 {
 	rte_hash_add_key_with_hash;
 	rte_hash_add_key_with_hash_data;
 	rte_hash_count;
+	rte_hash_crc_set_alg;
 	rte_hash_create;
 	rte_hash_del_key;
 	rte_hash_del_key_with_hash;
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH v4 00/19] Replace use of static logtypes
  @ 2023-02-13 19:55  3% ` Stephen Hemminger
  2023-02-13 19:55  3%   ` [PATCH v4 18/19] hash: move rte_hash_set_alg out header Stephen Hemminger
  2023-02-14  2:18  3% ` [PATCH v5 00/22] Replace us of static logtypes Stephen Hemminger
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-13 19:55 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patchset removes the main uses of static LOGTYPE's in DPDK
libraries. It starts with the easy one and goes on to the more complex ones.

Note: there is one patch in this series that will get
flagged incorrectly as an ABI change.

v4 - use simpler/shorter method for setting local LOGTYPE
     split up steps of some of the changes

Stephen Hemminger (19):
  gso: don't log message on non TCP/UDP
  eal: drop no longer used GSO logtype
  log: drop unused RTE_LOGTYPE_TIMER
  efd: replace RTE_LOGTYPE_EFD with dynamic type
  mbuf: replace RTE_LOGTYPE_MBUF with dynamic type
  acl: replace LOGTYPE_ACL with dynamic type
  power: replace RTE_LOGTYPE_POWER with dynamic type
  ring: replace RTE_LOGTYPE_RING with dynamic type
  mempool: replace RTE_LOGTYPE_MEMPOOL with dynamic type
  lpm: replace RTE_LOGTYPE_LPM with dynamic types
  kni: replace RTE_LOGTYPE_KNI with dynamic type
  sched: replace RTE_LOGTYPE_SCHED with dynamic type
  port: replace RTE_LOGTYPE_PORT with dynamic type
  table: convert RTE_LOGTYPE_TABLE to dynamic logtype
  app/test: remove use of RTE_LOGTYPE_PIPELINE
  pipeline: replace RTE_LOGTYPE_PIPELINE with dynamic type
  hash: move rte_thash_gfni stubs out of header file
  hash: move rte_hash_set_alg out header
  hash: convert RTE_LOGTYPE_HASH to dynamic type

 app/test/test_acl.c               |  3 +-
 app/test/test_table_acl.c         | 50 +++++++++++------------
 app/test/test_table_pipeline.c    | 40 +++++++++----------
 lib/acl/acl_bld.c                 |  1 +
 lib/acl/acl_gen.c                 |  1 +
 lib/acl/acl_log.h                 |  4 ++
 lib/acl/rte_acl.c                 |  4 ++
 lib/acl/tb_mem.c                  |  3 +-
 lib/eal/common/eal_common_log.c   | 17 --------
 lib/eal/include/rte_log.h         | 34 ++++++++--------
 lib/efd/rte_efd.c                 |  3 ++
 lib/fib/fib_log.h                 |  4 ++
 lib/fib/rte_fib.c                 |  3 ++
 lib/fib/rte_fib6.c                |  2 +
 lib/gso/rte_gso.c                 |  5 +--
 lib/hash/meson.build              |  9 ++++-
 lib/hash/rte_cuckoo_hash.c        |  5 +++
 lib/hash/rte_fbk_hash.c           |  3 ++
 lib/hash/rte_hash_crc.c           | 66 +++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.h           | 46 +--------------------
 lib/hash/rte_thash.c              |  3 ++
 lib/hash/rte_thash_gfni.c         | 46 +++++++++++++++++++++
 lib/hash/rte_thash_gfni.h         | 28 +++----------
 lib/hash/version.map              |  5 +++
 lib/kni/rte_kni.c                 |  3 ++
 lib/lpm/lpm_log.h                 |  4 ++
 lib/lpm/rte_lpm.c                 |  3 ++
 lib/lpm/rte_lpm6.c                |  1 +
 lib/mbuf/mbuf_log.h               |  4 ++
 lib/mbuf/rte_mbuf.c               |  4 ++
 lib/mbuf/rte_mbuf_dyn.c           |  2 +
 lib/mbuf/rte_mbuf_pool_ops.c      |  2 +
 lib/mempool/rte_mempool.c         |  3 ++
 lib/mempool/rte_mempool_log.h     |  4 ++
 lib/mempool/rte_mempool_ops.c     |  1 +
 lib/pipeline/rte_pipeline.c       |  3 ++
 lib/port/rte_port_ethdev.c        |  3 ++
 lib/port/rte_port_eventdev.c      |  4 ++
 lib/port/rte_port_fd.c            |  3 ++
 lib/port/rte_port_frag.c          |  3 ++
 lib/port/rte_port_kni.c           |  3 ++
 lib/port/rte_port_ras.c           |  3 ++
 lib/port/rte_port_ring.c          |  3 ++
 lib/port/rte_port_sched.c         |  3 ++
 lib/port/rte_port_source_sink.c   |  3 ++
 lib/port/rte_port_sym_crypto.c    |  3 ++
 lib/power/guest_channel.c         |  3 +-
 lib/power/power_common.c          |  2 +
 lib/power/power_common.h          |  3 +-
 lib/power/power_kvm_vm.c          |  1 +
 lib/power/rte_power.c             |  1 +
 lib/power/rte_power_empty_poll.c  |  1 +
 lib/rib/rib_log.h                 |  4 ++
 lib/rib/rte_rib.c                 |  3 ++
 lib/rib/rte_rib6.c                |  3 ++
 lib/ring/rte_ring.c               |  3 ++
 lib/sched/rte_pie.c               |  1 +
 lib/sched/rte_sched.c             |  5 +++
 lib/sched/rte_sched_log.h         |  4 ++
 lib/table/rte_table_acl.c         |  3 ++
 lib/table/rte_table_array.c       |  3 ++
 lib/table/rte_table_hash_cuckoo.c |  3 ++
 lib/table/rte_table_hash_ext.c    |  3 ++
 lib/table/rte_table_hash_key16.c  |  3 ++
 lib/table/rte_table_hash_key32.c  |  5 ++-
 lib/table/rte_table_hash_key8.c   |  5 ++-
 lib/table/rte_table_hash_lru.c    |  3 ++
 lib/table/rte_table_lpm.c         |  3 ++
 lib/table/rte_table_lpm_ipv6.c    |  3 ++
 lib/table/rte_table_stub.c        |  3 ++
 70 files changed, 363 insertions(+), 158 deletions(-)
 create mode 100644 lib/acl/acl_log.h
 create mode 100644 lib/fib/fib_log.h
 create mode 100644 lib/hash/rte_hash_crc.c
 create mode 100644 lib/hash/rte_thash_gfni.c
 create mode 100644 lib/lpm/lpm_log.h
 create mode 100644 lib/mbuf/mbuf_log.h
 create mode 100644 lib/mempool/rte_mempool_log.h
 create mode 100644 lib/rib/rib_log.h
 create mode 100644 lib/sched/rte_sched_log.h

-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
@ 2023-02-13 15:28  0%                           ` Ben Magistro
  2023-02-13 23:18  0%                           ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Ben Magistro @ 2023-02-13 15:28 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Tyler Retzlaff, Morten Brørup, thomas, dev,
	bruce.richardson, david.marchand, jerinj, konstantin.ananyev,
	ferruh.yigit, nd, techboard

[-- Attachment #1: Type: text/plain, Size: 17850 bytes --]

There is a thread discussing a change to the standard [1] but I have not
seen anything explicit yet about moving to C11.  I am personally in favor
of making the jump to C11 now as part of the 23.x branch and provided my
thoughts in the linked thread (what other projects using DPDK have as
minimum compiler requirements, CentOS 7 EOL dates).

Is the long term plan to backport this change set to the existing LTS
release or is this meant to be something introduced for use in 23.x and
going forward?  I think I was (probably naively) assuming this would be a
new feature in the 23.x going forward only.

[1] http://mails.dpdk.org/archives/dev/2023-February/262188.html

On Mon, Feb 13, 2023 at 12:05 AM Honnappa Nagarahalli <
Honnappa.Nagarahalli@arm.com> wrote:

> Hi Tyler,
>         Few more comments inline. Let us continue to make progress, I will
> add this topic for Techboard discussion for 22nd Feb.
>
> > -----Original Message-----
> > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Sent: Friday, February 10, 2023 2:30 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Cc: Morten Brørup <mb@smartsharesystems.com>; thomas@monjalon.net;
> > dev@dpdk.org; bruce.richardson@intel.com; david.marchand@redhat.com;
> > jerinj@marvell.com; konstantin.ananyev@huawei.com;
> > ferruh.yigit@amd.com; nd <nd@arm.com>; techboard@dpdk.org
> > Subject: Re: [PATCH] eal: introduce atomics abstraction
> >
> > On Fri, Feb 10, 2023 at 05:30:00AM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > > > > <snip>
> > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For environments where stdatomics are not supported,
> > > > > > > > > > > we could
> > > > > > > > have a
> > > > > > > > > > stdatomic.h in DPDK implementing the same APIs (we have
> > > > > > > > > > to support
> > > > > > > > only
> > > > > > > > > > _explicit APIs). This allows the code to use stdatomics
> > > > > > > > > > APIs and
> > > > > > > > when we move
> > > > > > > > > > to minimum supported standard C11, we just need to get
> > > > > > > > > > rid of the
> > > > > > > > file in DPDK
> > > > > > > > > > repo.
> > > > > > > > > >
> > > > > > > > > > my concern with this is that if we provide a stdatomic.h
> > > > > > > > > > or
> > > > > > > > introduce names
> > > > > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > > > > >
> > > > > > > > > > references:
> > > > > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > > > > >  * GNU libc manual
> > > > > > > > > >
> > > > > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reser
> > > > > > > > > > ved-
> > > > > > > > > > Names.html
> > > > > > > > > >
> > > > > > > > > > in effect the header, the names and in some instances
> > > > > > > > > > namespaces
> > > > > > > > introduced
> > > > > > > > > > are reserved by the implementation. there are several
> > > > > > > > > > reasons in
> > > > > > > > the GNU libc
> > > > > > > > > Wouldn't this apply only after the particular APIs were
> > introduced?
> > > > > > > > i.e. it should not apply if the compiler does not support
> stdatomics.
> > > > > > > >
> > > > > > > > yeah, i agree they're being a bit wishy washy in the
> > > > > > > > wording, but i'm not convinced glibc folks are documenting
> > > > > > > > this as permissive guidance against.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > manual that explain the justification for these
> > > > > > > > > > reservations and if
> > > > > > > > if we think
> > > > > > > > > > about ODR and ABI compatibility we can conceive of
> others.
> > > > > > > > > >
> > > > > > > > > > i'll also remark that the inter-mingling of names from
> > > > > > > > > > the POSIX
> > > > > > > > standard
> > > > > > > > > > implicitly exposed as a part of the EAL public API has
> > > > > > > > > > been
> > > > > > > > problematic for
> > > > > > > > > > portability.
> > > > > > > > > These should be exposed as EAL APIs only when compiled
> > > > > > > > > with a
> > > > > > > > compiler that does not support stdatomics.
> > > > > > > >
> > > > > > > > you don't necessarily compile dpdk, the application or its
> > > > > > > > other dynamically linked dependencies with the same compiler
> > > > > > > > at the same time.
> > > > > > > > i.e. basically the model of any dpdk-dev package on any
> > > > > > > > linux distribution.
> > > > > > > >
> > > > > > > > if dpdk is built without real stdatomic types but the
> > > > > > > > application has to interoperate with a different kit or
> > > > > > > > library that does they would be forced to dance around dpdk
> > > > > > > > with their own version of a shim to hide our faked up
> stdatomics.
> > > > > > > >
> > > > > > >
> > > > > > > So basically, if we want a binary DPDK distribution to be
> > > > > > > compatible with a
> > > > > > separate application build environment, they both have to
> > > > > > implement atomics the same way, i.e. agree on the ABI for
> atomics.
> > > > > > >
> > > > > > > Summing up, this leaves us with only two realistic options:
> > > > > > >
> > > > > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > > > > build
> > > > > > environment to support C11 stdatomics.
> > > > > > > 2. Provide our own DPDK atomics library.
> > > > > > >
> > > > > > > (As mentioned by Tyler, the third option - using C11
> > > > > > > stdatomics inside DPDK, and requiring a build environment
> > > > > > > without C11 stdatomics to implement a shim - is not
> > > > > > > realistic!)
> > > > > > >
> > > > > > > I strongly want atomics to be available for use across inline
> > > > > > > and compiled
> > > > > > code; i.e. it must be possible for both compiled DPDK functions
> > > > > > and inline functions to perform atomic transactions on the same
> > atomic variable.
> > > > > >
> > > > > > i consider it a mandatory requirement. i don't see practically
> > > > > > how we could withdraw existing use and even if we had clean way
> > > > > > i don't see why we would want to. so this item is defintely
> > > > > > settled if you were
> > > > concerned.
> > > > > I think I agree here.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > So either we upgrade the DPDK build requirements to support
> > > > > > > C11 (including
> > > > > > the optional stdatomics), or we provide our own DPDK atomics.
> > > > > >
> > > > > > i think the issue of requiring a toolchain conformant to a
> > > > > > specific standard is a separate matter because any adoption of
> > > > > > C11 standard atomics is a potential abi break from the current
> use of
> > intrinsics.
> > > > > I am not sure why you are calling it as ABI break. Referring to
> > > > > [1], I just see
> > > > wrappers around intrinsics (though [2] does not use the intrinsics).
> > > > >
> > > > > [1]
> > > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatom
> > > > > ic.h
> > > > > [2]
> > > > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdat
> > > > > omic
> > > > > .h
> > > >
> > > > it's a potential abi break because atomic types are not the same
> > > > types as their corresponding integer types etc.. (or at least are
> > > > not guaranteed to be by all implementations of c as an abstract
> language).
> > > >
> > > >     ISO/IEC 9899:2011
> > > >
> > > >     6.2.5 (27)
> > > >     Further, there is the _Atomic qualifier. The presence of the
> _Atomic
> > > >     qualifier designates an atomic type. The size, representation,
> and
> > alignment
> > > >     of an atomic type need not be the same as those of the
> corresponding
> > > >     unqualified type.
> > > >
> > > >     7.17.6 (3)
> > > >     NOTE The representation of atomic integer types need not have the
> > same size
> > > >     as their corresponding regular types. They should have the same
> > > > size whenever
> > > >     possible, as it eases effort required to port existing code.
> > > >
> > > > i use the term `potential abi break' with intent because for me to
> > > > assert in absolute terms i would have to evaluate the implementation
> > > > of every current and potential future compilers atomic vs non-atomic
> > > > types. this as i'm sure you understand is not practical, it would
> > > > also defeat the purpose of moving to a standard. therefore i rely on
> > > > the specification prescribed by the standard not the detail of a
> specific
> > implementation.
> > > Can we say that the platforms 'supported' by DPDK today do not have
> this
> > problem? Any future platforms that will come to DPDK have to evaluate
> this.
> >
> > sadly i don't think we can. i believe in an earlier post i linked a bug
> filed on
> > gcc that shows that clang / gcc were producing different layout than the
> > equivalent non-atomic type.
> I looked at that bug again, it is to do with structure.
>
> >
> > >
> > > >
> > > >
> > > > > > the abstraction (whatever namespace it resides) allows the
> > > > > > existing toolchain/platform combinations to maintain
> > > > > > compatibility by defaulting to current non-standard intrinsics.
> > > > > How about using the intrinsics (__atomic_xxx) name space for
> > abstraction?
> > > > This covers the GCC and Clang compilers.
> >
> > i haven't investigated fully but there are usages of these intrinsics
> that
> > indicate there may be undesirable difference between clang and gcc
> versions.
> > the hint is there seems to be conditionally compiled code under __clang__
> > when using some __atomic's.
> I sent an RFC to address this [1]. I think the size specific intrinsics
> are not necessary.
>
> [1]
> http://patches.dpdk.org/project/dpdk/patch/20230211015622.408487-1-honnappa.nagarahalli@arm.com/
>
> >
> > for the purpose of this discussion clang just tries to look like gcc so
> i don't
> > regard them as being different compilers for the purpose of this
> discussion.
> >
> > > >
> > > > the namespace starting with `__` is also reserved for the
> implementation.
> > > > this is why compilers gcc/clang/msvc place name their intrinsic and
> > > > builtin functions starting with __ to explicitly avoid collision
> > > > with the application namespace.
> >
> > > Agreed. But, here we are considering '__atomic_' specifically (i.e.
> > > not just '__')
> >
> > i don't understand the confusion __atomic is within the __ namespace
> that is
> > reserved.
> What I mean is, we are not formulating a policy/rule to allow for any name
> space that starts with '__'.
>
> >
> > let me ask this another way, what benefit do you see to trying to
> overlap with
> > the standard namespace? the only benefit i can see is that at some point
> in
> > the future it avoids having to perform a mechanical change to eventually
> > retire the abstraction once all platform/toolchains support standard
> atomics.
> > i.e. basically s/rte_atomic/atomic/g
> >
> > is there another benefit i'm missing?
> The abstraction you have proposed solves the problem for the long term.
> The proposed abstraction stops us from thinking about moving to stdatomics.
> IMO, the problem is short term. Using the __atomic_ name space does not
> have any practical issues with the platforms DPDK supports (unless msvc has
> a problem with this, more questions below).
>
> >
> > >
> > > >
> > > >     ISO/IEC 9899:2011
> > > >
> > > >     7.1.3 (1)
> > > >     All identifiers that begin with an underscore and either an
> uppercase
> > > >     letter or another underscore are always reserved for any use.
> > > >
> > > >     ...
> > > >
> > > > > If there is another platform that uses the same name space for
> > > > > something
> > > > else, I think DPDK should not be supporting that platform.
> > > >
> > > > that's effectively a statement excluding windows platform and all
> > > > non-gcc compilers from ever supporting dpdk.
> > > Apologies, I did not understand your comment on windows platform. Do
> > you mean to say a compiler for windows platform uses '__atomic_xxx' name
> > space to provide some other functionality (and hence it would get
> excluded)?
> >
> > i mean dpdk can never fully be supported without msvc except for
> statically
> > linked builds which are niche and limit it too severely for many
> consumers to
> > practically use dpdk. there are also many application developers who
> would
> > like to integrate dpdk but can't and telling them their only choice is
> to re-port
> > their entire application to clang isn't feasible.
> >
> > i can see no technical reason why we should be excluding a major
> compiler in
> > broad use if it is capable of building dpdk. msvc arguably has some of
> the
> > most sophisticated security features in the industry and the use of those
> > features is mandated by many of the customers who might deploy dpdk
> > applications on windows.
> I did not mean DPDK should not support msvc (may be my sentence below was
> misunderstood).
> Does msvc provide '__atomic_xxx' intrinsics?
>
> >
> > > Clang supports these intrinsics. I am not sure about the merit of
> supporting
> > other non-gcc compilers. May be a topic Techboard discussion.
> > >
> > > >
> > > > > What problems do you see?
> > > >
> > > > i'm fairly certain at least one other compiler uses the __atomic
> > > > namespace but
> > > Do you mean __atomic namespace is used for some other purpose?
> > >
> > > > it would take me time to check, the most notable potential issue
> > > > that comes to mind is if such an intrinsic with the same name is
> > > > provided in a different implementation and has either regressive
> > > > code generation or different semantics it would be bad because it is
> > > > intrinsic you can't just hack around it with #undef __atomic to shim
> in a
> > semantically correct version.
> > > I do not think we should worry about regressive code generation
> problem. It
> > should be fixed by that compiler.
> > > Different semantics is something we need to worry about. It would be
> good
> > to find out more about a compiler that does this.
> >
> > again, this is about portability it's about potential not that we can
> find an
> > example.
> >
> > >
> > > >
> > > > how about this, is there another possible namespace you might
> > > > suggest that conforms or doesn't conflict with the the rules defined
> > > > in ISO/IEC 9899:2011
> > > > 7.1.3 i think if there were that would satisfy all of my concerns
> > > > related to namespaces.
> > > >
> > > > keep in mind the point of moving to a standard is to achieve
> > > > portability so if we do things that will regress us back to being
> > > > dependent on an implementation we haven't succeeded. that's all i'm
> > trying to guarantee here.
> > > Agree. We are trying to solve a problem that is temporary. I am trying
> to
> > keep the problem scope narrow which might help us push to adopt the
> > standard sooner.
> >
> > i do wish we could just target the standard but unless we are willing to
> draw a
> > line and say no more non std=c11 and also we potentially break the abi we
> > are talking years. i don't think it is reasonable to block progress for
> years, so
> > i'm offering a transitional path. it's an evolution over time that we
> have to
> > manage.
> Apologies if I am sounding like I am blocking progress. Rest assured, we
> will find a way. It is just about which solution we are going to pick.
> Also, is there are any information on how long before we move to C11?
>
> >
> > >
> > > >
> > > > i feel like we are really close on this discussion, if we can just
> > > > iron this issue out we can probably get going on the actual changes.
> > > >
> > > > thanks for the consideration.
> > > >
> > > > >
> > > > > >
> > > > > > once in place it provides an opportunity to introduce new
> > > > > > toolchain/platform combinations and enables an opt-in capability
> > > > > > to use stdatomics on existing toolchain/platform combinations
> > > > > > subject to community discussion on how/if/when.
> > > > > >
> > > > > > it would be good to get more participants into the discussion so
> > > > > > i'll cc techboard for some attention. i feel like the only area
> > > > > > that isn't decided is to do or not do this in rte_ namespace.
> > > > > >
> > > > > > i'm strongly in favor of rte_ namespace after discussion, mainly
> > > > > > due to to disadvantages of trying to overlap with the standard
> > > > > > namespace while not providing a compatible api/abi and because
> > > > > > it provides clear disambiguation of that difference in semantics
> > > > > > and compatibility with
> > > > the standard api.
> > > > > >
> > > > > > so far i've noted the following
> > > > > >
> > > > > > * we will not provide the non-explicit apis.
> > > > > +1
> > > > >
> > > > > > * we will make no attempt to support operate on struct/union
> atomics
> > > > > >   with our apis.
> > > > > +1
> > > > >
> > > > > > * we will mirror the standard api potentially in the rte_
> namespace to
> > > > > >   - reference the standard api documentation.
> > > > > >   - assume compatible semantics (sans exceptions from first 2
> points).
> > > > > >
> > > > > > my vote is to remove 'potentially' from the last point above for
> > > > > > reasons previously discussed in postings to the mail thread.
> > > > > >
> > > > > > thanks all for the discussion, i'll send up a patch removing
> > > > > > non-explicit apis for viewing.
> > > > > >
> > > > > > ty
>

[-- Attachment #2: Type: text/html, Size: 24434 bytes --]

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-10 20:30  3%                       ` Tyler Retzlaff
@ 2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
  2023-02-13 15:28  0%                           ` Ben Magistro
  2023-02-13 23:18  0%                           ` Tyler Retzlaff
  0 siblings, 2 replies; 200+ results
From: Honnappa Nagarahalli @ 2023-02-13  5:04 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard, nd

Hi Tyler,
	Few more comments inline. Let us continue to make progress, I will add this topic for Techboard discussion for 22nd Feb.

> -----Original Message-----
> From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Sent: Friday, February 10, 2023 2:30 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: Morten Brørup <mb@smartsharesystems.com>; thomas@monjalon.net;
> dev@dpdk.org; bruce.richardson@intel.com; david.marchand@redhat.com;
> jerinj@marvell.com; konstantin.ananyev@huawei.com;
> ferruh.yigit@amd.com; nd <nd@arm.com>; techboard@dpdk.org
> Subject: Re: [PATCH] eal: introduce atomics abstraction
> 
> On Fri, Feb 10, 2023 at 05:30:00AM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > > > <snip>
> > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > For environments where stdatomics are not supported,
> > > > > > > > > > we could
> > > > > > > have a
> > > > > > > > > stdatomic.h in DPDK implementing the same APIs (we have
> > > > > > > > > to support
> > > > > > > only
> > > > > > > > > _explicit APIs). This allows the code to use stdatomics
> > > > > > > > > APIs and
> > > > > > > when we move
> > > > > > > > > to minimum supported standard C11, we just need to get
> > > > > > > > > rid of the
> > > > > > > file in DPDK
> > > > > > > > > repo.
> > > > > > > > >
> > > > > > > > > my concern with this is that if we provide a stdatomic.h
> > > > > > > > > or
> > > > > > > introduce names
> > > > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > > > >
> > > > > > > > > references:
> > > > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > > > >  * GNU libc manual
> > > > > > > > >
> > > > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reser
> > > > > > > > > ved-
> > > > > > > > > Names.html
> > > > > > > > >
> > > > > > > > > in effect the header, the names and in some instances
> > > > > > > > > namespaces
> > > > > > > introduced
> > > > > > > > > are reserved by the implementation. there are several
> > > > > > > > > reasons in
> > > > > > > the GNU libc
> > > > > > > > Wouldn't this apply only after the particular APIs were
> introduced?
> > > > > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > > > > >
> > > > > > > yeah, i agree they're being a bit wishy washy in the
> > > > > > > wording, but i'm not convinced glibc folks are documenting
> > > > > > > this as permissive guidance against.
> > > > > > >
> > > > > > > >
> > > > > > > > > manual that explain the justification for these
> > > > > > > > > reservations and if
> > > > > > > if we think
> > > > > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > > > > >
> > > > > > > > > i'll also remark that the inter-mingling of names from
> > > > > > > > > the POSIX
> > > > > > > standard
> > > > > > > > > implicitly exposed as a part of the EAL public API has
> > > > > > > > > been
> > > > > > > problematic for
> > > > > > > > > portability.
> > > > > > > > These should be exposed as EAL APIs only when compiled
> > > > > > > > with a
> > > > > > > compiler that does not support stdatomics.
> > > > > > >
> > > > > > > you don't necessarily compile dpdk, the application or its
> > > > > > > other dynamically linked dependencies with the same compiler
> > > > > > > at the same time.
> > > > > > > i.e. basically the model of any dpdk-dev package on any
> > > > > > > linux distribution.
> > > > > > >
> > > > > > > if dpdk is built without real stdatomic types but the
> > > > > > > application has to interoperate with a different kit or
> > > > > > > library that does they would be forced to dance around dpdk
> > > > > > > with their own version of a shim to hide our faked up stdatomics.
> > > > > > >
> > > > > >
> > > > > > So basically, if we want a binary DPDK distribution to be
> > > > > > compatible with a
> > > > > separate application build environment, they both have to
> > > > > implement atomics the same way, i.e. agree on the ABI for atomics.
> > > > > >
> > > > > > Summing up, this leaves us with only two realistic options:
> > > > > >
> > > > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > > > build
> > > > > environment to support C11 stdatomics.
> > > > > > 2. Provide our own DPDK atomics library.
> > > > > >
> > > > > > (As mentioned by Tyler, the third option - using C11
> > > > > > stdatomics inside DPDK, and requiring a build environment
> > > > > > without C11 stdatomics to implement a shim - is not
> > > > > > realistic!)
> > > > > >
> > > > > > I strongly want atomics to be available for use across inline
> > > > > > and compiled
> > > > > code; i.e. it must be possible for both compiled DPDK functions
> > > > > and inline functions to perform atomic transactions on the same
> atomic variable.
> > > > >
> > > > > i consider it a mandatory requirement. i don't see practically
> > > > > how we could withdraw existing use and even if we had clean way
> > > > > i don't see why we would want to. so this item is defintely
> > > > > settled if you were
> > > concerned.
> > > > I think I agree here.
> > > >
> > > > >
> > > > > >
> > > > > > So either we upgrade the DPDK build requirements to support
> > > > > > C11 (including
> > > > > the optional stdatomics), or we provide our own DPDK atomics.
> > > > >
> > > > > i think the issue of requiring a toolchain conformant to a
> > > > > specific standard is a separate matter because any adoption of
> > > > > C11 standard atomics is a potential abi break from the current use of
> intrinsics.
> > > > I am not sure why you are calling it as ABI break. Referring to
> > > > [1], I just see
> > > wrappers around intrinsics (though [2] does not use the intrinsics).
> > > >
> > > > [1]
> > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatom
> > > > ic.h
> > > > [2]
> > > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdat
> > > > omic
> > > > .h
> > >
> > > it's a potential abi break because atomic types are not the same
> > > types as their corresponding integer types etc.. (or at least are
> > > not guaranteed to be by all implementations of c as an abstract language).
> > >
> > >     ISO/IEC 9899:2011
> > >
> > >     6.2.5 (27)
> > >     Further, there is the _Atomic qualifier. The presence of the _Atomic
> > >     qualifier designates an atomic type. The size, representation, and
> alignment
> > >     of an atomic type need not be the same as those of the corresponding
> > >     unqualified type.
> > >
> > >     7.17.6 (3)
> > >     NOTE The representation of atomic integer types need not have the
> same size
> > >     as their corresponding regular types. They should have the same
> > > size whenever
> > >     possible, as it eases effort required to port existing code.
> > >
> > > i use the term `potential abi break' with intent because for me to
> > > assert in absolute terms i would have to evaluate the implementation
> > > of every current and potential future compilers atomic vs non-atomic
> > > types. this as i'm sure you understand is not practical, it would
> > > also defeat the purpose of moving to a standard. therefore i rely on
> > > the specification prescribed by the standard not the detail of a specific
> implementation.
> > Can we say that the platforms 'supported' by DPDK today do not have this
> problem? Any future platforms that will come to DPDK have to evaluate this.
> 
> sadly i don't think we can. i believe in an earlier post i linked a bug filed on
> gcc that shows that clang / gcc were producing different layout than the
> equivalent non-atomic type.
I looked at that bug again, it is to do with structure.

> 
> >
> > >
> > >
> > > > > the abstraction (whatever namespace it resides) allows the
> > > > > existing toolchain/platform combinations to maintain
> > > > > compatibility by defaulting to current non-standard intrinsics.
> > > > How about using the intrinsics (__atomic_xxx) name space for
> abstraction?
> > > This covers the GCC and Clang compilers.
> 
> i haven't investigated fully but there are usages of these intrinsics that
> indicate there may be undesirable difference between clang and gcc versions.
> the hint is there seems to be conditionally compiled code under __clang__
> when using some __atomic's.
I sent an RFC to address this [1]. I think the size specific intrinsics are not necessary.

[1] http://patches.dpdk.org/project/dpdk/patch/20230211015622.408487-1-honnappa.nagarahalli@arm.com/

> 
> for the purpose of this discussion clang just tries to look like gcc so i don't
> regard them as being different compilers for the purpose of this discussion.
> 
> > >
> > > the namespace starting with `__` is also reserved for the implementation.
> > > this is why compilers gcc/clang/msvc place name their intrinsic and
> > > builtin functions starting with __ to explicitly avoid collision
> > > with the application namespace.
> 
> > Agreed. But, here we are considering '__atomic_' specifically (i.e.
> > not just '__')
> 
> i don't understand the confusion __atomic is within the __ namespace that is
> reserved.
What I mean is, we are not formulating a policy/rule to allow for any name space that starts with '__'.

> 
> let me ask this another way, what benefit do you see to trying to overlap with
> the standard namespace? the only benefit i can see is that at some point in
> the future it avoids having to perform a mechanical change to eventually
> retire the abstraction once all platform/toolchains support standard atomics.
> i.e. basically s/rte_atomic/atomic/g
> 
> is there another benefit i'm missing?
The abstraction you have proposed solves the problem for the long term. The proposed abstraction stops us from thinking about moving to stdatomics.
IMO, the problem is short term. Using the __atomic_ name space does not have any practical issues with the platforms DPDK supports (unless msvc has a problem with this, more questions below).

> 
> >
> > >
> > >     ISO/IEC 9899:2011
> > >
> > >     7.1.3 (1)
> > >     All identifiers that begin with an underscore and either an uppercase
> > >     letter or another underscore are always reserved for any use.
> > >
> > >     ...
> > >
> > > > If there is another platform that uses the same name space for
> > > > something
> > > else, I think DPDK should not be supporting that platform.
> > >
> > > that's effectively a statement excluding windows platform and all
> > > non-gcc compilers from ever supporting dpdk.
> > Apologies, I did not understand your comment on windows platform. Do
> you mean to say a compiler for windows platform uses '__atomic_xxx' name
> space to provide some other functionality (and hence it would get excluded)?
> 
> i mean dpdk can never fully be supported without msvc except for statically
> linked builds which are niche and limit it too severely for many consumers to
> practically use dpdk. there are also many application developers who would
> like to integrate dpdk but can't and telling them their only choice is to re-port
> their entire application to clang isn't feasible.
> 
> i can see no technical reason why we should be excluding a major compiler in
> broad use if it is capable of building dpdk. msvc arguably has some of the
> most sophisticated security features in the industry and the use of those
> features is mandated by many of the customers who might deploy dpdk
> applications on windows.
I did not mean DPDK should not support msvc (may be my sentence below was misunderstood).
Does msvc provide '__atomic_xxx' intrinsics?

> 
> > Clang supports these intrinsics. I am not sure about the merit of supporting
> other non-gcc compilers. May be a topic Techboard discussion.
> >
> > >
> > > > What problems do you see?
> > >
> > > i'm fairly certain at least one other compiler uses the __atomic
> > > namespace but
> > Do you mean __atomic namespace is used for some other purpose?
> >
> > > it would take me time to check, the most notable potential issue
> > > that comes to mind is if such an intrinsic with the same name is
> > > provided in a different implementation and has either regressive
> > > code generation or different semantics it would be bad because it is
> > > intrinsic you can't just hack around it with #undef __atomic to shim in a
> semantically correct version.
> > I do not think we should worry about regressive code generation problem. It
> should be fixed by that compiler.
> > Different semantics is something we need to worry about. It would be good
> to find out more about a compiler that does this.
> 
> again, this is about portability it's about potential not that we can find an
> example.
> 
> >
> > >
> > > how about this, is there another possible namespace you might
> > > suggest that conforms or doesn't conflict with the the rules defined
> > > in ISO/IEC 9899:2011
> > > 7.1.3 i think if there were that would satisfy all of my concerns
> > > related to namespaces.
> > >
> > > keep in mind the point of moving to a standard is to achieve
> > > portability so if we do things that will regress us back to being
> > > dependent on an implementation we haven't succeeded. that's all i'm
> trying to guarantee here.
> > Agree. We are trying to solve a problem that is temporary. I am trying to
> keep the problem scope narrow which might help us push to adopt the
> standard sooner.
> 
> i do wish we could just target the standard but unless we are willing to draw a
> line and say no more non std=c11 and also we potentially break the abi we
> are talking years. i don't think it is reasonable to block progress for years, so
> i'm offering a transitional path. it's an evolution over time that we have to
> manage.
Apologies if I am sounding like I am blocking progress. Rest assured, we will find a way. It is just about which solution we are going to pick.
Also, is there are any information on how long before we move to C11?

> 
> >
> > >
> > > i feel like we are really close on this discussion, if we can just
> > > iron this issue out we can probably get going on the actual changes.
> > >
> > > thanks for the consideration.
> > >
> > > >
> > > > >
> > > > > once in place it provides an opportunity to introduce new
> > > > > toolchain/platform combinations and enables an opt-in capability
> > > > > to use stdatomics on existing toolchain/platform combinations
> > > > > subject to community discussion on how/if/when.
> > > > >
> > > > > it would be good to get more participants into the discussion so
> > > > > i'll cc techboard for some attention. i feel like the only area
> > > > > that isn't decided is to do or not do this in rte_ namespace.
> > > > >
> > > > > i'm strongly in favor of rte_ namespace after discussion, mainly
> > > > > due to to disadvantages of trying to overlap with the standard
> > > > > namespace while not providing a compatible api/abi and because
> > > > > it provides clear disambiguation of that difference in semantics
> > > > > and compatibility with
> > > the standard api.
> > > > >
> > > > > so far i've noted the following
> > > > >
> > > > > * we will not provide the non-explicit apis.
> > > > +1
> > > >
> > > > > * we will make no attempt to support operate on struct/union atomics
> > > > >   with our apis.
> > > > +1
> > > >
> > > > > * we will mirror the standard api potentially in the rte_ namespace to
> > > > >   - reference the standard api documentation.
> > > > >   - assume compatible semantics (sans exceptions from first 2 points).
> > > > >
> > > > > my vote is to remove 'potentially' from the last point above for
> > > > > reasons previously discussed in postings to the mail thread.
> > > > >
> > > > > thanks all for the discussion, i'll send up a patch removing
> > > > > non-explicit apis for viewing.
> > > > >
> > > > > ty

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v6 1/3] ethdev: skip congestion management configuration
  2023-02-11  0:35  3%   ` Ferruh Yigit
@ 2023-02-11  5:16  0%     ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2023-02-11  5:16 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Rakesh Kudurumalla, Ori Kam, Thomas Monjalon, Andrew Rybchenko,
	jerinj, ndabilpuram, dev, David Marchand

On Sat, Feb 11, 2023 at 6:05 AM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 2/10/2023 8:26 AM, Rakesh Kudurumalla wrote:
> > diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> > index b60987db4b..f4eb4232d4 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -2203,6 +2203,17 @@ enum rte_flow_action_type {
> >        */
> >       RTE_FLOW_ACTION_TYPE_DROP,
> >
> > +     /**
> > +      * Skip congestion management configuration
> > +      *
> > +      * Using rte_eth_cman_config_set() API the application
> > +      * can configure ethdev Rx queue's congestion mechanism.
> > +      * Introducing RTE_FLOW_ACTION_TYPE_SKIP_CMAN flow action to skip the
> > +      * congestion configuration applied to the given ethdev Rx queue.
> > +      *
> > +      */
> > +     RTE_FLOW_ACTION_TYPE_SKIP_CMAN,
> > +
>
> Inserting new enum item in to the middle of the enum upsets the ABI
> checks [1], can it go to the end?

Yes.

>
>
>
>
> [1]
> 1 function with some indirect sub-type change:
>
>   [C] 'function size_t rte_flow_copy(rte_flow_desc*, size_t, const
> rte_flow_attr*, const rte_flow_item*, const rte_flow_action*)' at
> rte_flow.c:1092:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_flow_desc*' has sub-type changes:
>       in pointed to type 'struct rte_flow_desc' at rte_flow.h:4326:1:
>         type size hasn't changed
>         1 data member changes (1 filtered):
>           type of 'rte_flow_action* actions' changed:
>             in pointed to type 'struct rte_flow_action' at
> rte_flow.h:3775:1:
>               type size hasn't changed
>               1 data member change:
>                 type of 'rte_flow_action_type type' changed:
>                   type size hasn't changed
>                   1 enumerator insertion:
>
> 'rte_flow_action_type::RTE_FLOW_ACTION_TYPE_SKIP_CMAN' value '8'
>                   50 enumerator changes:
>                     'rte_flow_action_type::RTE_FLOW_ACTION_TYPE_COUNT'
> from value '8' to '9' at rte_flow.h:2216:1
>                     ...

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v6 1/3] ethdev: skip congestion management configuration
  @ 2023-02-11  0:35  3%   ` Ferruh Yigit
  2023-02-11  5:16  0%     ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-02-11  0:35 UTC (permalink / raw)
  To: Rakesh Kudurumalla, Ori Kam, Thomas Monjalon, Andrew Rybchenko
  Cc: jerinj, ndabilpuram, dev, David Marchand

On 2/10/2023 8:26 AM, Rakesh Kudurumalla wrote:
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index b60987db4b..f4eb4232d4 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -2203,6 +2203,17 @@ enum rte_flow_action_type {
>  	 */
>  	RTE_FLOW_ACTION_TYPE_DROP,
>  
> +	/**
> +	 * Skip congestion management configuration
> +	 *
> +	 * Using rte_eth_cman_config_set() API the application
> +	 * can configure ethdev Rx queue's congestion mechanism.
> +	 * Introducing RTE_FLOW_ACTION_TYPE_SKIP_CMAN flow action to skip the
> +	 * congestion configuration applied to the given ethdev Rx queue.
> +	 *
> +	 */
> +	RTE_FLOW_ACTION_TYPE_SKIP_CMAN,
> +

Inserting new enum item in to the middle of the enum upsets the ABI
checks [1], can it go to the end?




[1]
1 function with some indirect sub-type change:

  [C] 'function size_t rte_flow_copy(rte_flow_desc*, size_t, const
rte_flow_attr*, const rte_flow_item*, const rte_flow_action*)' at
rte_flow.c:1092:1 has some indirect sub-type changes:
    parameter 1 of type 'rte_flow_desc*' has sub-type changes:
      in pointed to type 'struct rte_flow_desc' at rte_flow.h:4326:1:
        type size hasn't changed
        1 data member changes (1 filtered):
          type of 'rte_flow_action* actions' changed:
            in pointed to type 'struct rte_flow_action' at
rte_flow.h:3775:1:
              type size hasn't changed
              1 data member change:
                type of 'rte_flow_action_type type' changed:
                  type size hasn't changed
                  1 enumerator insertion:

'rte_flow_action_type::RTE_FLOW_ACTION_TYPE_SKIP_CMAN' value '8'
                  50 enumerator changes:
                    'rte_flow_action_type::RTE_FLOW_ACTION_TYPE_COUNT'
from value '8' to '9' at rte_flow.h:2216:1
                    ...

^ permalink raw reply	[relevance 3%]

* Re: [RFC PATCH 0/1] Specify C-standard requirement for DPDK builds
  @ 2023-02-10 23:39  4%           ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-10 23:39 UTC (permalink / raw)
  To: Ben Magistro; +Cc: Bruce Richardson, dev, thomas, david.marchand, mb

On Fri, Feb 10, 2023 at 09:52:06AM -0500, Ben Magistro wrote:
> Adding Tyler
> 
> Sort of following along on the RFC: introduce atomics [1] it seems like the
> decision to use 99 vs 11 here could make an impact on the approach taken in
> that thread.

hey Ben thanks for keeping an eye across threads on the topic. the
atomics thread is fairly long but somewhere in it i did provide a
rationale for why we can't just go straight to using C11 even if we
declared that dpdk on supports compilers >= C11.

i wish we could it would certainly make my life way easier if i could
just -std=c11 and cut & paste my way to completion. the reason why we
can't (aside from not requiring C11 compiler as a minimum) is that there
is potential issue with abi compatibility for existing applications
using non-atomic types currently passed to ABI suddenly requiring
standard atomic types. this is because _Atomic type and type are not
guaranteed to have the same size, alignment, representation etc..

anyway, i welcome us establishing c99 as a minimum for all
toolchain/platform combinations.

> 
> 1) http://mails.dpdk.org/archives/dev/2023-February/262042.html
> 
> On Fri, Feb 3, 2023 at 1:00 PM Bruce Richardson <bruce.richardson@intel.com>
> wrote:
> 
> > On Fri, Feb 03, 2023 at 11:45:04AM -0500, Ben Magistro wrote:
> > >    In our case we have other libraries that we are using that have
> > >    required us to specify a minimum c++ version (14/17 most recently for
> > >    one) so it doesn't feel like a big ask/issue to us (provided things
> > >    don't start conflicting...hah; not anticipating any issue).  Our
> > >    software is also used internally so we have a fair bit of control over
> > >    how fast we can adopt changes.
> > >    This got me wondering what some other projects in the DPDK ecosystem
> > >    are saying/doing around language standards/gcc versions.  So some
> > quick
> > >    checking of the projects I am aware of/looked at/using...
> > >    * trex: cannot find an obvious minimum gcc requirement
> > >    * tldk: we are running our own public folk with several fixes, need to
> > >    find time to solve the build sys change aspect to continue providing
> > >    patches upstream; I know I have hit some places where it was easier to
> > >    say the new minimum DPDK version is x at which point you just adopt
> > the
> > >    minimum requirements of DPDK
> > >    * ovs: looks to be comfortable with an older gcc still
> > >    * seastar: seems to be the most aggressive with adopting language
> > >    standards/compilers I've seen [1] and are asking for gcc 9+ and cpp17+
> > >    * ans: based on release 19.02 (2019), they are on gcc >= 5.4 [2] and
> > is
> > >    the same on the main README file
> > >    I do understand the concern, but if no one is voicing an
> > >    opinion/objection does that mean they agree with/will not be affected
> > >    by the change....
> > >    1) [1]https://docs.seastar.io/master/md_compatibility.html
> > >    2) [2]https://github.com/ansyun/dpdk-ans/releases
> > >    Cheers
> > >
> > Thanks for the info.
> > I also notice that since gcc 5, the default language version used - if none
> > is explicitly specified - is gnu11 (or higher for later versions). Clang
> > seems to do something similar, but not sure at what point it started
> > defaulting to a standard >=c11.
> >
> > /Bruce
> >

^ permalink raw reply	[relevance 4%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-10  5:30  0%                     ` Honnappa Nagarahalli
@ 2023-02-10 20:30  3%                       ` Tyler Retzlaff
  2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-10 20:30 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard

On Fri, Feb 10, 2023 at 05:30:00AM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > For environments where stdatomics are not supported, we
> > > > > > > > > could
> > > > > > have a
> > > > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > > > > support
> > > > > > only
> > > > > > > > _explicit APIs). This allows the code to use stdatomics APIs
> > > > > > > > and
> > > > > > when we move
> > > > > > > > to minimum supported standard C11, we just need to get rid
> > > > > > > > of the
> > > > > > file in DPDK
> > > > > > > > repo.
> > > > > > > >
> > > > > > > > my concern with this is that if we provide a stdatomic.h or
> > > > > > introduce names
> > > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > > >
> > > > > > > > references:
> > > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > > >  * GNU libc manual
> > > > > > > >
> > > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > > > Names.html
> > > > > > > >
> > > > > > > > in effect the header, the names and in some instances
> > > > > > > > namespaces
> > > > > > introduced
> > > > > > > > are reserved by the implementation. there are several
> > > > > > > > reasons in
> > > > > > the GNU libc
> > > > > > > Wouldn't this apply only after the particular APIs were introduced?
> > > > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > > > >
> > > > > > yeah, i agree they're being a bit wishy washy in the wording,
> > > > > > but i'm not convinced glibc folks are documenting this as
> > > > > > permissive guidance against.
> > > > > >
> > > > > > >
> > > > > > > > manual that explain the justification for these reservations
> > > > > > > > and if
> > > > > > if we think
> > > > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > > > >
> > > > > > > > i'll also remark that the inter-mingling of names from the
> > > > > > > > POSIX
> > > > > > standard
> > > > > > > > implicitly exposed as a part of the EAL public API has been
> > > > > > problematic for
> > > > > > > > portability.
> > > > > > > These should be exposed as EAL APIs only when compiled with a
> > > > > > compiler that does not support stdatomics.
> > > > > >
> > > > > > you don't necessarily compile dpdk, the application or its other
> > > > > > dynamically linked dependencies with the same compiler at the
> > > > > > same time.
> > > > > > i.e. basically the model of any dpdk-dev package on any linux
> > > > > > distribution.
> > > > > >
> > > > > > if dpdk is built without real stdatomic types but the
> > > > > > application has to interoperate with a different kit or library
> > > > > > that does they would be forced to dance around dpdk with their
> > > > > > own version of a shim to hide our faked up stdatomics.
> > > > > >
> > > > >
> > > > > So basically, if we want a binary DPDK distribution to be
> > > > > compatible with a
> > > > separate application build environment, they both have to implement
> > > > atomics the same way, i.e. agree on the ABI for atomics.
> > > > >
> > > > > Summing up, this leaves us with only two realistic options:
> > > > >
> > > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > > build
> > > > environment to support C11 stdatomics.
> > > > > 2. Provide our own DPDK atomics library.
> > > > >
> > > > > (As mentioned by Tyler, the third option - using C11 stdatomics
> > > > > inside DPDK, and requiring a build environment without C11
> > > > > stdatomics to implement a shim - is not realistic!)
> > > > >
> > > > > I strongly want atomics to be available for use across inline and
> > > > > compiled
> > > > code; i.e. it must be possible for both compiled DPDK functions and
> > > > inline functions to perform atomic transactions on the same atomic variable.
> > > >
> > > > i consider it a mandatory requirement. i don't see practically how
> > > > we could withdraw existing use and even if we had clean way i don't
> > > > see why we would want to. so this item is defintely settled if you were
> > concerned.
> > > I think I agree here.
> > >
> > > >
> > > > >
> > > > > So either we upgrade the DPDK build requirements to support C11
> > > > > (including
> > > > the optional stdatomics), or we provide our own DPDK atomics.
> > > >
> > > > i think the issue of requiring a toolchain conformant to a specific
> > > > standard is a separate matter because any adoption of C11 standard
> > > > atomics is a potential abi break from the current use of intrinsics.
> > > I am not sure why you are calling it as ABI break. Referring to [1], I just see
> > wrappers around intrinsics (though [2] does not use the intrinsics).
> > >
> > > [1]
> > > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
> > > [2]
> > > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdatomic
> > > .h
> > 
> > it's a potential abi break because atomic types are not the same types as their
> > corresponding integer types etc.. (or at least are not guaranteed to be by all
> > implementations of c as an abstract language).
> > 
> >     ISO/IEC 9899:2011
> > 
> >     6.2.5 (27)
> >     Further, there is the _Atomic qualifier. The presence of the _Atomic
> >     qualifier designates an atomic type. The size, representation, and alignment
> >     of an atomic type need not be the same as those of the corresponding
> >     unqualified type.
> > 
> >     7.17.6 (3)
> >     NOTE The representation of atomic integer types need not have the same size
> >     as their corresponding regular types. They should have the same size
> > whenever
> >     possible, as it eases effort required to port existing code.
> > 
> > i use the term `potential abi break' with intent because for me to assert in
> > absolute terms i would have to evaluate the implementation of every current
> > and potential future compilers atomic vs non-atomic types. this as i'm sure you
> > understand is not practical, it would also defeat the purpose of moving to a
> > standard. therefore i rely on the specification prescribed by the standard not
> > the detail of a specific implementation.
> Can we say that the platforms 'supported' by DPDK today do not have this problem? Any future platforms that will come to DPDK have to evaluate this.

sadly i don't think we can. i believe in an earlier post i linked a bug
filed on gcc that shows that clang / gcc were producing different
layout than the equivalent non-atomic type.

> 
> > 
> > 
> > > > the abstraction (whatever namespace it resides) allows the existing
> > > > toolchain/platform combinations to maintain compatibility by
> > > > defaulting to current non-standard intrinsics.
> > > How about using the intrinsics (__atomic_xxx) name space for abstraction?
> > This covers the GCC and Clang compilers.

i haven't investigated fully but there are usages of these intrinsics
that indicate there may be undesirable difference between clang and gcc
versions. the hint is there seems to be conditionally compiled code
under __clang__ when using some __atomic's.

for the purpose of this discussion clang just tries to look like gcc so
i don't regard them as being different compilers for the purpose of this
discussion.

> > 
> > the namespace starting with `__` is also reserved for the implementation.
> > this is why compilers gcc/clang/msvc place name their intrinsic and builtin
> > functions starting with __ to explicitly avoid collision with the application
> > namespace.

> Agreed. But, here we are considering '__atomic_' specifically (i.e. not just '__')

i don't understand the confusion __atomic is within the __ namespace
that is reserved.

let me ask this another way, what benefit do you see to trying to
overlap with the standard namespace? the only benefit i can see is that
at some point in the future it avoids having to perform a mechanical
change to eventually retire the abstraction once all platform/toolchains
support standard atomics. i.e. basically s/rte_atomic/atomic/g

is there another benefit i'm missing?

> 
> > 
> >     ISO/IEC 9899:2011
> > 
> >     7.1.3 (1)
> >     All identifiers that begin with an underscore and either an uppercase
> >     letter or another underscore are always reserved for any use.
> > 
> >     ...
> > 
> > > If there is another platform that uses the same name space for something
> > else, I think DPDK should not be supporting that platform.
> > 
> > that's effectively a statement excluding windows platform and all non-gcc
> > compilers from ever supporting dpdk.
> Apologies, I did not understand your comment on windows platform. Do you mean to say a compiler for windows platform uses '__atomic_xxx' name space to provide some other functionality (and hence it would get excluded)? 

i mean dpdk can never fully be supported without msvc except for
statically linked builds which are niche and limit it too severely for
many consumers to practically use dpdk. there are also many application
developers who would like to integrate dpdk but can't and telling them
their only choice is to re-port their entire application to clang isn't
feasible.

i can see no technical reason why we should be excluding a major
compiler in broad use if it is capable of building dpdk. msvc arguably
has some of the most sophisticated security features in the industry
and the use of those features is mandated by many of the customers who
might deploy dpdk applications on windows.

> Clang supports these intrinsics. I am not sure about the merit of supporting other non-gcc compilers. May be a topic Techboard discussion.
> 
> > 
> > > What problems do you see?
> > 
> > i'm fairly certain at least one other compiler uses the __atomic namespace but
> Do you mean __atomic namespace is used for some other purpose?
> 
> > it would take me time to check, the most notable potential issue that comes to
> > mind is if such an intrinsic with the same name is provided in a different
> > implementation and has either regressive code generation or different
> > semantics it would be bad because it is intrinsic you can't just hack around it
> > with #undef __atomic to shim in a semantically correct version.
> I do not think we should worry about regressive code generation problem. It should be fixed by that compiler.
> Different semantics is something we need to worry about. It would be good to find out more about a compiler that does this.

again, this is about portability it's about potential not that we can
find an example.

> 
> > 
> > how about this, is there another possible namespace you might suggest that
> > conforms or doesn't conflict with the the rules defined in ISO/IEC 9899:2011
> > 7.1.3 i think if there were that would satisfy all of my concerns related to
> > namespaces.
> > 
> > keep in mind the point of moving to a standard is to achieve portability so if we
> > do things that will regress us back to being dependent on an implementation
> > we haven't succeeded. that's all i'm trying to guarantee here.
> Agree. We are trying to solve a problem that is temporary. I am trying to keep the problem scope narrow which might help us push to adopt the standard sooner.

i do wish we could just target the standard but unless we are willing to
draw a line and say no more non std=c11 and also we potentially break
the abi we are talking years. i don't think it is reasonable to block
progress for years, so i'm offering a transitional path. it's an
evolution over time that we have to manage.

> 
> > 
> > i feel like we are really close on this discussion, if we can just iron this issue out
> > we can probably get going on the actual changes.
> > 
> > thanks for the consideration.
> > 
> > >
> > > >
> > > > once in place it provides an opportunity to introduce new
> > > > toolchain/platform combinations and enables an opt-in capability to
> > > > use stdatomics on existing toolchain/platform combinations subject
> > > > to community discussion on how/if/when.
> > > >
> > > > it would be good to get more participants into the discussion so
> > > > i'll cc techboard for some attention. i feel like the only area that
> > > > isn't decided is to do or not do this in rte_ namespace.
> > > >
> > > > i'm strongly in favor of rte_ namespace after discussion, mainly due
> > > > to to disadvantages of trying to overlap with the standard namespace
> > > > while not providing a compatible api/abi and because it provides
> > > > clear disambiguation of that difference in semantics and compatibility with
> > the standard api.
> > > >
> > > > so far i've noted the following
> > > >
> > > > * we will not provide the non-explicit apis.
> > > +1
> > >
> > > > * we will make no attempt to support operate on struct/union atomics
> > > >   with our apis.
> > > +1
> > >
> > > > * we will mirror the standard api potentially in the rte_ namespace to
> > > >   - reference the standard api documentation.
> > > >   - assume compatible semantics (sans exceptions from first 2 points).
> > > >
> > > > my vote is to remove 'potentially' from the last point above for
> > > > reasons previously discussed in postings to the mail thread.
> > > >
> > > > thanks all for the discussion, i'll send up a patch removing
> > > > non-explicit apis for viewing.
> > > >
> > > > ty

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ethdev: improve link speed to string
  @ 2023-02-10 14:41  3%               ` Ferruh Yigit
  2023-03-23 14:40  3%                 ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2023-02-10 14:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Min Hu (Connor), Andrew Rybchenko, thomas, dev

On 1/19/2023 4:45 PM, Stephen Hemminger wrote:
> On Thu, 19 Jan 2023 11:41:12 +0000
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>>>>>> Nothing good will happen if you try to use the function to
>>>>>> print two different link speeds in one log message.  
>>>>> You are right.
>>>>> And use malloc for "name" will result in memory leakage, which is also
>>>>> not a good option.
>>>>>
>>>>> BTW, do you think if we need to modify the function
>>>>> "rte_eth_link_speed_to_str"?  
>>>>
>>>> IMHO it would be more pain than gain in this case.
>>>>
>>>> .
>>>>  
>>> Agree with you. Thanks Andrew
>>>  
>>
>> It can be option to update the API as following in next ABI break release:
>>
>> const char *
>> rte_eth_link_speed_to_str(uint32_t link_speed, char *buf, size_t buf_size);
>>
>> For this a deprecation notice needs to be sent and approved, not sure
>> though if it worth.
>>
>>
>> Meanwhile, what do you think to update string 'Invalid' to something
>> like 'Irregular' or 'Erratic', does this help to convey the right message?
> 
> 
> API versioning is possible here.


Agree, ABI versioning can be used here.

@Connor, what do you think?

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  2023-02-09 19:44  0%     ` Ferruh Yigit
@ 2023-02-10 14:06  0%       ` Jiawei(Jonny) Wang
  2023-02-14  9:38  0%       ` Jiawei(Jonny) Wang
  1 sibling, 0 replies; 200+ results
From: Jiawei(Jonny) Wang @ 2023-02-10 14:06 UTC (permalink / raw)
  To: Ferruh Yigit, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang
  Cc: dev, Raslan Darawsheh

Hi,

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, February 10, 2023 3:45 AM
> To: Jiawei(Jonny) Wang <jiaweiw@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; NBU-Contact-
> Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> andrew.rybchenko@oktetlabs.ru; Aman Singh <aman.deep.singh@intel.com>;
> Yuying Zhang <yuying.zhang@intel.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue
> API
> 
> On 2/3/2023 1:33 PM, Jiawei Wang wrote:
> > When multiple physical ports are connected to a single DPDK port,
> > (example: kernel bonding, DPDK bonding, failsafe, etc.), we want to
> > know which physical port is used for Rx and Tx.
> >
> 
> I assume "kernel bonding" is out of context, but this patch concerns DPDK
> bonding, failsafe or softnic. (I will refer them as virtual bonding
> device.)
> 

''kernel bonding'' can be thought as Linux bonding.

> To use specific queues of the virtual bonding device may interfere with the
> logic of these devices, like bonding modes or RSS of the underlying devices. I
> can see feature focuses on a very specific use case, but not sure if all possible
> side effects taken into consideration.
> 
> 
> And although the feature is only relavent to virtual bondiong device, core
> ethdev structures are updated for this. Most use cases won't need these, so is
> there a way to reduce the scope of the changes to virtual bonding devices?
> 
> 
> There are a few very core ethdev APIs, like:
> rte_eth_dev_configure()
> rte_eth_tx_queue_setup()
> rte_eth_rx_queue_setup()
> rte_eth_dev_start()
> rte_eth_dev_info_get()
> 
> Almost every user of ehtdev uses these APIs, since these are so fundemental I
> am for being a little more conservative on these APIs.
> 
> Every eccentric features are targetting these APIs first because they are
> common and extending them gives an easy solution, but in long run making
> these APIs more complex, harder to maintain and harder for PMDs to support
> them correctly. So I am for not updating them unless it is a generic use case.
> 
> 
> Also as we talked about PMDs supporting them, I assume your coming PMD
> patch will be implementing 'tx_phy_affinity' config option only for mlx drivers.
> What will happen for other NICs? Will they silently ignore the config option
> from user? So this is a problem for the DPDK application portabiltiy.
> 

Yes, the PMD patch is for net/mlx5 only, the 'tx_phy_affinity' can be used for HW to
choose an mapping queue with physical port.

Other NICs ignore this new configuration for now, or we should add checking in queue setup?

> 
> 
> As far as I understand target is application controlling which sub-device is used
> under the virtual bonding device, can you pleaes give more information why
> this is required, perhaps it can help to provide a better/different solution.
> Like adding the ability to use both bonding device and sub-device for data path,
> this way application can use whichever it wants. (this is just first solution I
> come with, I am not suggesting as replacement solution, but if you can describe
> the problem more I am sure other people can come with better solutions.)
> 

For example: 
There're two physical ports (assume device interface: eth2, eth3), and bonded these two
Devices into one interface (assume bond0).
DPDK application probed/attached the bond0 only (dpdk port id:0),  while sending traffic from dpdk port,
We want to know the packet be sent into which physical port (eth2 or eth3).

With the new configuration, the queue could be configured with underlay device,
Then DPDK application could send the traffic into correct queue as desired.

Add all devices into DPDK, means that need to create multiple RX/TX Queue resources on it.


> And isn't this against the applicatio transparent to underneath device being
> bonding device or actual device?
> 
> 
> > This patch maps a DPDK Tx queue with a physical port, by adding
> > tx_phy_affinity setting in Tx queue.
> > The affinity number is the physical port ID where packets will be
> > sent.
> > Value 0 means no affinity and traffic could be routed to any connected
> > physical ports, this is the default current behavior.
> >
> > The number of physical ports is reported with rte_eth_dev_info_get().
> >
> > The new tx_phy_affinity field is added into the padding hole of
> > rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> > An ABI check rule needs to be added to avoid false warning.
> >
> > Add the testpmd command line:
> > testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> >
> > For example, there're two physical ports connected to a single DPDK
> > port (port id 0), and phy_affinity 1 stood for the first physical port
> > and phy_affinity 2 stood for the second physical port.
> > Use the below commands to config tx phy affinity for per Tx Queue:
> >         port config 0 txq 0 phy_affinity 1
> >         port config 0 txq 1 phy_affinity 1
> >         port config 0 txq 2 phy_affinity 2
> >         port config 0 txq 3 phy_affinity 2
> >
> > These commands config the Tx Queue index 0 and Tx Queue index 1 with
> > phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets, these
> > packets will be sent from the first physical port, and similar with
> > the second physical port if sending packets with Tx Queue 2 or Tx
> > Queue 3.
> >
> > Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> > ---
snip

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-09 17:30  4%                   ` Tyler Retzlaff
@ 2023-02-10  5:30  0%                     ` Honnappa Nagarahalli
  2023-02-10 20:30  3%                       ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2023-02-10  5:30 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard, nd

<snip>

> On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > > > > >
> > > > > > > >
> > > > > > > > For environments where stdatomics are not supported, we
> > > > > > > > could
> > > > > have a
> > > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > > > support
> > > > > only
> > > > > > > _explicit APIs). This allows the code to use stdatomics APIs
> > > > > > > and
> > > > > when we move
> > > > > > > to minimum supported standard C11, we just need to get rid
> > > > > > > of the
> > > > > file in DPDK
> > > > > > > repo.
> > > > > > >
> > > > > > > my concern with this is that if we provide a stdatomic.h or
> > > > > introduce names
> > > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > > >
> > > > > > > references:
> > > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > > >  * GNU libc manual
> > > > > > >
> > > > > > > https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > > Names.html
> > > > > > >
> > > > > > > in effect the header, the names and in some instances
> > > > > > > namespaces
> > > > > introduced
> > > > > > > are reserved by the implementation. there are several
> > > > > > > reasons in
> > > > > the GNU libc
> > > > > > Wouldn't this apply only after the particular APIs were introduced?
> > > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > > >
> > > > > yeah, i agree they're being a bit wishy washy in the wording,
> > > > > but i'm not convinced glibc folks are documenting this as
> > > > > permissive guidance against.
> > > > >
> > > > > >
> > > > > > > manual that explain the justification for these reservations
> > > > > > > and if
> > > > > if we think
> > > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > > >
> > > > > > > i'll also remark that the inter-mingling of names from the
> > > > > > > POSIX
> > > > > standard
> > > > > > > implicitly exposed as a part of the EAL public API has been
> > > > > problematic for
> > > > > > > portability.
> > > > > > These should be exposed as EAL APIs only when compiled with a
> > > > > compiler that does not support stdatomics.
> > > > >
> > > > > you don't necessarily compile dpdk, the application or its other
> > > > > dynamically linked dependencies with the same compiler at the
> > > > > same time.
> > > > > i.e. basically the model of any dpdk-dev package on any linux
> > > > > distribution.
> > > > >
> > > > > if dpdk is built without real stdatomic types but the
> > > > > application has to interoperate with a different kit or library
> > > > > that does they would be forced to dance around dpdk with their
> > > > > own version of a shim to hide our faked up stdatomics.
> > > > >
> > > >
> > > > So basically, if we want a binary DPDK distribution to be
> > > > compatible with a
> > > separate application build environment, they both have to implement
> > > atomics the same way, i.e. agree on the ABI for atomics.
> > > >
> > > > Summing up, this leaves us with only two realistic options:
> > > >
> > > > 1. Go all in on C11 stdatomics, also requiring the application
> > > > build
> > > environment to support C11 stdatomics.
> > > > 2. Provide our own DPDK atomics library.
> > > >
> > > > (As mentioned by Tyler, the third option - using C11 stdatomics
> > > > inside DPDK, and requiring a build environment without C11
> > > > stdatomics to implement a shim - is not realistic!)
> > > >
> > > > I strongly want atomics to be available for use across inline and
> > > > compiled
> > > code; i.e. it must be possible for both compiled DPDK functions and
> > > inline functions to perform atomic transactions on the same atomic variable.
> > >
> > > i consider it a mandatory requirement. i don't see practically how
> > > we could withdraw existing use and even if we had clean way i don't
> > > see why we would want to. so this item is defintely settled if you were
> concerned.
> > I think I agree here.
> >
> > >
> > > >
> > > > So either we upgrade the DPDK build requirements to support C11
> > > > (including
> > > the optional stdatomics), or we provide our own DPDK atomics.
> > >
> > > i think the issue of requiring a toolchain conformant to a specific
> > > standard is a separate matter because any adoption of C11 standard
> > > atomics is a potential abi break from the current use of intrinsics.
> > I am not sure why you are calling it as ABI break. Referring to [1], I just see
> wrappers around intrinsics (though [2] does not use the intrinsics).
> >
> > [1]
> > https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
> > [2]
> > https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdatomic
> > .h
> 
> it's a potential abi break because atomic types are not the same types as their
> corresponding integer types etc.. (or at least are not guaranteed to be by all
> implementations of c as an abstract language).
> 
>     ISO/IEC 9899:2011
> 
>     6.2.5 (27)
>     Further, there is the _Atomic qualifier. The presence of the _Atomic
>     qualifier designates an atomic type. The size, representation, and alignment
>     of an atomic type need not be the same as those of the corresponding
>     unqualified type.
> 
>     7.17.6 (3)
>     NOTE The representation of atomic integer types need not have the same size
>     as their corresponding regular types. They should have the same size
> whenever
>     possible, as it eases effort required to port existing code.
> 
> i use the term `potential abi break' with intent because for me to assert in
> absolute terms i would have to evaluate the implementation of every current
> and potential future compilers atomic vs non-atomic types. this as i'm sure you
> understand is not practical, it would also defeat the purpose of moving to a
> standard. therefore i rely on the specification prescribed by the standard not
> the detail of a specific implementation.
Can we say that the platforms 'supported' by DPDK today do not have this problem? Any future platforms that will come to DPDK have to evaluate this.

> 
> 
> > > the abstraction (whatever namespace it resides) allows the existing
> > > toolchain/platform combinations to maintain compatibility by
> > > defaulting to current non-standard intrinsics.
> > How about using the intrinsics (__atomic_xxx) name space for abstraction?
> This covers the GCC and Clang compilers.
> 
> the namespace starting with `__` is also reserved for the implementation.
> this is why compilers gcc/clang/msvc place name their intrinsic and builtin
> functions starting with __ to explicitly avoid collision with the application
> namespace.
Agreed. But, here we are considering '__atomic_' specifically (i.e. not just '__')

> 
>     ISO/IEC 9899:2011
> 
>     7.1.3 (1)
>     All identifiers that begin with an underscore and either an uppercase
>     letter or another underscore are always reserved for any use.
> 
>     ...
> 
> > If there is another platform that uses the same name space for something
> else, I think DPDK should not be supporting that platform.
> 
> that's effectively a statement excluding windows platform and all non-gcc
> compilers from ever supporting dpdk.
Apologies, I did not understand your comment on windows platform. Do you mean to say a compiler for windows platform uses '__atomic_xxx' name space to provide some other functionality (and hence it would get excluded)? 
Clang supports these intrinsics. I am not sure about the merit of supporting other non-gcc compilers. May be a topic Techboard discussion.

> 
> > What problems do you see?
> 
> i'm fairly certain at least one other compiler uses the __atomic namespace but
Do you mean __atomic namespace is used for some other purpose?

> it would take me time to check, the most notable potential issue that comes to
> mind is if such an intrinsic with the same name is provided in a different
> implementation and has either regressive code generation or different
> semantics it would be bad because it is intrinsic you can't just hack around it
> with #undef __atomic to shim in a semantically correct version.
I do not think we should worry about regressive code generation problem. It should be fixed by that compiler.
Different semantics is something we need to worry about. It would be good to find out more about a compiler that does this.

> 
> how about this, is there another possible namespace you might suggest that
> conforms or doesn't conflict with the the rules defined in ISO/IEC 9899:2011
> 7.1.3 i think if there were that would satisfy all of my concerns related to
> namespaces.
> 
> keep in mind the point of moving to a standard is to achieve portability so if we
> do things that will regress us back to being dependent on an implementation
> we haven't succeeded. that's all i'm trying to guarantee here.
Agree. We are trying to solve a problem that is temporary. I am trying to keep the problem scope narrow which might help us push to adopt the standard sooner.

> 
> i feel like we are really close on this discussion, if we can just iron this issue out
> we can probably get going on the actual changes.
> 
> thanks for the consideration.
> 
> >
> > >
> > > once in place it provides an opportunity to introduce new
> > > toolchain/platform combinations and enables an opt-in capability to
> > > use stdatomics on existing toolchain/platform combinations subject
> > > to community discussion on how/if/when.
> > >
> > > it would be good to get more participants into the discussion so
> > > i'll cc techboard for some attention. i feel like the only area that
> > > isn't decided is to do or not do this in rte_ namespace.
> > >
> > > i'm strongly in favor of rte_ namespace after discussion, mainly due
> > > to to disadvantages of trying to overlap with the standard namespace
> > > while not providing a compatible api/abi and because it provides
> > > clear disambiguation of that difference in semantics and compatibility with
> the standard api.
> > >
> > > so far i've noted the following
> > >
> > > * we will not provide the non-explicit apis.
> > +1
> >
> > > * we will make no attempt to support operate on struct/union atomics
> > >   with our apis.
> > +1
> >
> > > * we will mirror the standard api potentially in the rte_ namespace to
> > >   - reference the standard api documentation.
> > >   - assume compatible semantics (sans exceptions from first 2 points).
> > >
> > > my vote is to remove 'potentially' from the last point above for
> > > reasons previously discussed in postings to the mail thread.
> > >
> > > thanks all for the discussion, i'll send up a patch removing
> > > non-explicit apis for viewing.
> > >
> > > ty

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] eal: introduce atomics abstraction
  2023-02-09 19:19  0%         ` Morten Brørup
@ 2023-02-09 22:04  0%           ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-09 22:04 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, david.marchand, thomas, Honnappa.Nagarahalli, bruce.richardson

On Thu, Feb 09, 2023 at 08:19:14PM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Thursday, 9 February 2023 19.15
> > 
> > On Thu, Feb 09, 2023 at 09:05:46AM +0100, Morten Brørup wrote:
> > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > Sent: Wednesday, 8 February 2023 22.44
> > > >
> > > > Introduce atomics abstraction that permits optional use of standard
> > C11
> > > > atomics when meson is provided the new enable_stdatomics=true
> > option.
> > > >
> > > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > > ---
> > >
> > > Looks good. A few minor suggestions about implementation only.
> > >
> > > With or without suggested modifications,
> > >
> > > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> [...]
> 
> > > > diff --git a/lib/eal/include/generic/rte_atomic.h
> > > > b/lib/eal/include/generic/rte_atomic.h
> > > > index f5c49a9..392d928 100644
> > > > --- a/lib/eal/include/generic/rte_atomic.h
> > > > +++ b/lib/eal/include/generic/rte_atomic.h
> > > > @@ -110,6 +110,100 @@
> > > >
> > > >  #endif /* __DOXYGEN__ */
> > > >
> > > > +#ifdef RTE_STDC_ATOMICS
> > > > +
> > > > +#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 201112L ||
> > > > defined(__STDC_NO_ATOMICS__)
> > > > +#error compiler does not support C11 standard atomics
> > > > +#else
> > > > +#include <stdatomic.h>
> > > > +#endif
> > > > +
> > > > +#define __rte_atomic _Atomic
> > > > +
> > > > +typedef int rte_memory_order;
> > >
> > > I would prefer enum for rte_memory_order:
> > >
> > > typedef enum {
> > >     rte_memory_order_relaxed = memory_order_relaxed,
> > >     rte_memory_order_consume = memory_order_consume,
> > >     rte_memory_order_acquire = memory_order_acquire,
> > >     rte_memory_order_release = memory_order_release,
> > >     rte_memory_order_acq_rel = memory_order_acq_rel,
> > >     rte_memory_order_seq_cst = memory_order_seq_cst
> > > } rte_memory_order;
> > 
> > the reason for not using enum type is abi related. the c standard has
> > this little gem.
> > 
> >     ISO/IEC 9899:2011
> > 
> >     6.7.2.2 (4)
> >     Each enumerated type shall be compatible with char, a signed
> > integer
> >     type, or an unsigned integer type. The choice of type is
> >     implementation-defined, 128) but shall be capable of representing
> > the
> >     values of all the members of the enumeration.
> > 
> >     128) An implementation may delay the choice of which integer type
> > until
> >     all enumeration constants have been seen.
> > 
> > so i'm just being overly protective of maintaining the forward
> > compatibility of the abi.
> > 
> > probably i'm being super unnecessarily cautious in this case since i
> > think
> > in practice even if an implementation chose sizeof(char) i doubt very
> > much
> > that enough enumerated values would get added to this enumeration
> > within
> > the lifetime of the API to suddenly cause the compiler to choose >
> > sizeof(char).
> 
> I am under the impression that compilers usually instantiate enum as int, and you can make it use a smaller size by decorating it with the "packed" attribute - I have benefited from that in the past.

generally i think most implementations choose int as a default but i
think there is potential for char to get used if the compiler is asked
to optimize for size. not a likely optimization preference when using dpdk.

> 
> The only risk I am effectively trying to avoid is someone calling an rte_atomic() function with "order" being another value than one of these values. Probably not ever going to happen.
> 
> Your solution also addresses an academic risk (of the compiler using another type than int for the enum), which will have unwanted side effects - especially if the "order" parameter to the rte_atomic() functions becomes char instead of int.
> 
> I can now conclude that your proposed type (int) is stronger/safer than the type (enum) I suggested. So please keep what you have.
> 
> > 
> > incidentally this is also 
> > 
> 
> [...]
> 
> > > > +#define rte_atomic_compare_exchange_strong_explicit(obj, expected,
> > > > desired, success, fail) \
> > > > +	__atomic_compare_exchange_n(obj, expected, desired, 0, success,
> > > > fail)
> > >
> > > The type of the "weak" parameter to __atomic_compare_exchange_n() is
> > bool, not int, so use "false" instead of 0. There is probably no
> > practical difference, so I'll leave it up to you.
> > >
> > > You might need to include <stdbool.h> for this... I haven't checked.
> > 
> > strictly speaking you are correct the intrinsic does take bool
> > according
> > to documentation.
> > 
> >     ISO/IEC 9899:2011
> > 
> >     7.18 Boolean type and values <stdbool.h>
> >     (1) The header <stdbool.h> defines four macros.
> >     (2) The macro bool expands to _Bool.
> >     (3) The remaining three macros are suitable for use in #if
> > preprocessing
> > 	directives. They are `true' which expands to the integer constant
> > 1,
> > 	`false' which expands to the integer constant 0, and
> > 	__bool_true_false_are_defined which expands to the integer
> > constant 1.
> 
> Thank you for this reference. I wasn't aware that the two boolean values explicitly expanded to those two integer constants. I had never thought about it, but simply assumed that the constant "true" had the same meaning as "not false", like "if (123)" evaluates "123" as true.
> 
> So I learned something new today.
> 
> > 
> > so i could include the header, to expand a macro which expands to
> > integer constant 0 or 1 as appropriate for weak vs strong. do you think
> > i should? (serious question) if you answer yes, i'll make the change.
> 
> Then no. Please keep the 1's and 0's.

thanks! will leave it as is.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
    2023-02-06 15:29  0%     ` Jiawei(Jonny) Wang
  2023-02-07  9:40  0%     ` Ori Kam
@ 2023-02-09 19:44  0%     ` Ferruh Yigit
  2023-02-10 14:06  0%       ` Jiawei(Jonny) Wang
  2023-02-14  9:38  0%       ` Jiawei(Jonny) Wang
  2 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2023-02-09 19:44 UTC (permalink / raw)
  To: Jiawei Wang, viacheslavo, orika, thomas, andrew.rybchenko,
	Aman Singh, Yuying Zhang
  Cc: dev, rasland

On 2/3/2023 1:33 PM, Jiawei Wang wrote:
> When multiple physical ports are connected to a single DPDK port,
> (example: kernel bonding, DPDK bonding, failsafe, etc.),
> we want to know which physical port is used for Rx and Tx.
> 

I assume "kernel bonding" is out of context, but this patch concerns
DPDK bonding, failsafe or softnic. (I will refer them as virtual bonding
device.)

To use specific queues of the virtual bonding device may interfere with
the logic of these devices, like bonding modes or RSS of the underlying
devices. I can see feature focuses on a very specific use case, but not
sure if all possible side effects taken into consideration.


And although the feature is only relavent to virtual bondiong device,
core ethdev structures are updated for this. Most use cases won't need
these, so is there a way to reduce the scope of the changes to virtual
bonding devices?


There are a few very core ethdev APIs, like:
rte_eth_dev_configure()
rte_eth_tx_queue_setup()
rte_eth_rx_queue_setup()
rte_eth_dev_start()
rte_eth_dev_info_get()

Almost every user of ehtdev uses these APIs, since these are so
fundemental I am for being a little more conservative on these APIs.

Every eccentric features are targetting these APIs first because they
are common and extending them gives an easy solution, but in long run
making these APIs more complex, harder to maintain and harder for PMDs
to support them correctly. So I am for not updating them unless it is a
generic use case.


Also as we talked about PMDs supporting them, I assume your coming PMD
patch will be implementing 'tx_phy_affinity' config option only for mlx
drivers. What will happen for other NICs? Will they silently ignore the
config option from user? So this is a problem for the DPDK application
portabiltiy.



As far as I understand target is application controlling which
sub-device is used under the virtual bonding device, can you pleaes give
more information why this is required, perhaps it can help to provide a
better/different solution.
Like adding the ability to use both bonding device and sub-device for
data path, this way application can use whichever it wants. (this is
just first solution I come with, I am not suggesting as replacement
solution, but if you can describe the problem more I am sure other
people can come with better solutions.)

And isn't this against the applicatio transparent to underneath device
being bonding device or actual device?


> This patch maps a DPDK Tx queue with a physical port,
> by adding tx_phy_affinity setting in Tx queue.
> The affinity number is the physical port ID where packets will be
> sent.
> Value 0 means no affinity and traffic could be routed to any
> connected physical ports, this is the default current behavior.
> 
> The number of physical ports is reported with rte_eth_dev_info_get().
> 
> The new tx_phy_affinity field is added into the padding hole of
> rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> An ABI check rule needs to be added to avoid false warning.
> 
> Add the testpmd command line:
> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> 
> For example, there're two physical ports connected to
> a single DPDK port (port id 0), and phy_affinity 1 stood for
> the first physical port and phy_affinity 2 stood for the second
> physical port.
> Use the below commands to config tx phy affinity for per Tx Queue:
>         port config 0 txq 0 phy_affinity 1
>         port config 0 txq 1 phy_affinity 1
>         port config 0 txq 2 phy_affinity 2
>         port config 0 txq 3 phy_affinity 2
> 
> These commands config the Tx Queue index 0 and Tx Queue index 1 with
> phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets,
> these packets will be sent from the first physical port, and similar
> with the second physical port if sending packets with Tx Queue 2
> or Tx Queue 3.
> 
> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> ---
>  app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
>  app/test-pmd/config.c                       |   1 +
>  devtools/libabigail.abignore                |   5 +
>  doc/guides/rel_notes/release_23_03.rst      |   4 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
>  lib/ethdev/rte_ethdev.h                     |  10 ++
>  6 files changed, 133 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index cb8c174020..f771fcf8ac 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void *parsed_result,
>  
>  			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
>  			"    Cleanup txq mbufs for a specific Tx queue\n\n"
> +
> +			"port config (port_id) txq (queue_id) phy_affinity (value)\n"
> +			"    Set the physical affinity value "
> +			"on a specific Tx queue\n\n"
>  		);
>  	}
>  
> @@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t cmd_show_port_flow_transfer_proxy = {
>  	}
>  };
>  
> +/* *** configure port txq phy_affinity value *** */
> +struct cmd_config_tx_phy_affinity {
> +	cmdline_fixed_string_t port;
> +	cmdline_fixed_string_t config;
> +	portid_t portid;
> +	cmdline_fixed_string_t txq;
> +	uint16_t qid;
> +	cmdline_fixed_string_t phy_affinity;
> +	uint8_t value;
> +};
> +
> +static void
> +cmd_config_tx_phy_affinity_parsed(void *parsed_result,
> +				  __rte_unused struct cmdline *cl,
> +				  __rte_unused void *data)
> +{
> +	struct cmd_config_tx_phy_affinity *res = parsed_result;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_port *port;
> +	int ret;
> +
> +	if (port_id_is_invalid(res->portid, ENABLED_WARN))
> +		return;
> +
> +	if (res->portid == (portid_t)RTE_PORT_ALL) {
> +		printf("Invalid port id\n");
> +		return;
> +	}
> +
> +	port = &ports[res->portid];
> +
> +	if (strcmp(res->txq, "txq")) {
> +		printf("Unknown parameter\n");
> +		return;
> +	}
> +	if (tx_queue_id_is_invalid(res->qid))
> +		return;
> +
> +	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
> +	if (ret != 0)
> +		return;
> +
> +	if (dev_info.nb_phy_ports == 0) {
> +		printf("Number of physical ports is 0 which is invalid for PHY Affinity\n");
> +		return;
> +	}
> +	printf("The number of physical ports is %u\n", dev_info.nb_phy_ports);
> +	if (dev_info.nb_phy_ports < res->value) {
> +		printf("The PHY affinity value %u is Invalid, exceeds the "
> +		       "number of physical ports\n", res->value);
> +		return;
> +	}
> +	port->txq[res->qid].conf.tx_phy_affinity = res->value;
> +
> +	cmd_reconfig_device_queue(res->portid, 0, 1);
> +}
> +
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 port, "port");
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 config, "config");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 portid, RTE_UINT16);
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 txq, "txq");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +			      qid, RTE_UINT16);
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 phy_affinity, "phy_affinity");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +			      value, RTE_UINT8);
> +
> +static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
> +	.f = cmd_config_tx_phy_affinity_parsed,
> +	.data = (void *)0,
> +	.help_str = "port config <port_id> txq <queue_id> phy_affinity <value>",
> +	.tokens = {
> +		(void *)&cmd_config_tx_phy_affinity_port,
> +		(void *)&cmd_config_tx_phy_affinity_config,
> +		(void *)&cmd_config_tx_phy_affinity_portid,
> +		(void *)&cmd_config_tx_phy_affinity_txq,
> +		(void *)&cmd_config_tx_phy_affinity_qid,
> +		(void *)&cmd_config_tx_phy_affinity_hwport,
> +		(void *)&cmd_config_tx_phy_affinity_value,
> +		NULL,
> +	},
> +};
> +
>  /* ******************************************************************************** */
>  
>  /* list of instructions */
> @@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
>  	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
> +	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
>  	NULL,
>  };
>  
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index acccb6b035..b83fb17cfa 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
>  		printf("unknown\n");
>  		break;
>  	}
> +	printf("Current number of physical ports: %u\n", dev_info.nb_phy_ports);
>  }
>  
>  void
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 7a93de3ba1..ac7d3fb2da 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -34,3 +34,8 @@
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ; Temporary exceptions till next major ABI version ;
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +
> +; Ignore fields inserted in padding hole of rte_eth_txconf
> +[suppress_type]
> +        name = rte_eth_txconf
> +        has_data_member_inserted_between = {offset_of(tx_deferred_start), offset_of(offloads)}
> diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
> index 73f5d94e14..e99bd2dcb6 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
>  
> +* **Added affinity for multiple physical ports connected to a single DPDK port.**
> +
> +  * Added Tx affinity in queue setup to map a physical port.
> +
>  * **Updated AMD axgbe driver.**
>  
>    * Added multi-process support.
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 79a1fa9cb7..5c716f7679 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only on a specific Tx queue::
>  
>  This command should be run when the port is stopped, or else it will fail.
>  
> +config per queue Tx physical affinity
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Configure a per queue physical affinity value only on a specific Tx queue::
> +
> +   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
> +
> +* ``phy_affinity``: physical port to use for sending,
> +                    when multiple physical ports are connected to
> +                    a single DPDK port.
> +
> +This command should be run when the port is stopped, otherwise it fails.
> +
>  Config VXLAN Encap outer layers
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..2fd971b7b5 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
>  				      less free descriptors than this value. */
>  
>  	uint8_t tx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */
> +	/**
> +	 * Affinity with one of the multiple physical ports connected to the DPDK port.
> +	 * Value 0 means no affinity and traffic could be routed to any connected
> +	 * physical port.
> +	 * The first physical port is number 1 and so on.
> +	 * Number of physical ports is reported by nb_phy_ports in rte_eth_dev_info.
> +	 */
> +	uint8_t tx_phy_affinity;
>  	/**
>  	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_* flags.
>  	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa
> @@ -1744,6 +1752,8 @@ struct rte_eth_dev_info {
>  	/** Device redirection table size, the total number of entries. */
>  	uint16_t reta_size;
>  	uint8_t hash_key_size; /**< Hash key size in bytes */
> +	/** Number of physical ports connected with DPDK port. */
> +	uint8_t nb_phy_ports;
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
>  	uint64_t flow_type_rss_offloads;
>  	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration */


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v2] eal: introduce atomics abstraction
  2023-02-09 18:15  4%       ` Tyler Retzlaff
@ 2023-02-09 19:19  0%         ` Morten Brørup
  2023-02-09 22:04  0%           ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-09 19:19 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, david.marchand, thomas, Honnappa.Nagarahalli, bruce.richardson

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Thursday, 9 February 2023 19.15
> 
> On Thu, Feb 09, 2023 at 09:05:46AM +0100, Morten Brørup wrote:
> > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > Sent: Wednesday, 8 February 2023 22.44
> > >
> > > Introduce atomics abstraction that permits optional use of standard
> C11
> > > atomics when meson is provided the new enable_stdatomics=true
> option.
> > >
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > ---
> >
> > Looks good. A few minor suggestions about implementation only.
> >
> > With or without suggested modifications,
> >
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>

[...]

> > > diff --git a/lib/eal/include/generic/rte_atomic.h
> > > b/lib/eal/include/generic/rte_atomic.h
> > > index f5c49a9..392d928 100644
> > > --- a/lib/eal/include/generic/rte_atomic.h
> > > +++ b/lib/eal/include/generic/rte_atomic.h
> > > @@ -110,6 +110,100 @@
> > >
> > >  #endif /* __DOXYGEN__ */
> > >
> > > +#ifdef RTE_STDC_ATOMICS
> > > +
> > > +#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 201112L ||
> > > defined(__STDC_NO_ATOMICS__)
> > > +#error compiler does not support C11 standard atomics
> > > +#else
> > > +#include <stdatomic.h>
> > > +#endif
> > > +
> > > +#define __rte_atomic _Atomic
> > > +
> > > +typedef int rte_memory_order;
> >
> > I would prefer enum for rte_memory_order:
> >
> > typedef enum {
> >     rte_memory_order_relaxed = memory_order_relaxed,
> >     rte_memory_order_consume = memory_order_consume,
> >     rte_memory_order_acquire = memory_order_acquire,
> >     rte_memory_order_release = memory_order_release,
> >     rte_memory_order_acq_rel = memory_order_acq_rel,
> >     rte_memory_order_seq_cst = memory_order_seq_cst
> > } rte_memory_order;
> 
> the reason for not using enum type is abi related. the c standard has
> this little gem.
> 
>     ISO/IEC 9899:2011
> 
>     6.7.2.2 (4)
>     Each enumerated type shall be compatible with char, a signed
> integer
>     type, or an unsigned integer type. The choice of type is
>     implementation-defined, 128) but shall be capable of representing
> the
>     values of all the members of the enumeration.
> 
>     128) An implementation may delay the choice of which integer type
> until
>     all enumeration constants have been seen.
> 
> so i'm just being overly protective of maintaining the forward
> compatibility of the abi.
> 
> probably i'm being super unnecessarily cautious in this case since i
> think
> in practice even if an implementation chose sizeof(char) i doubt very
> much
> that enough enumerated values would get added to this enumeration
> within
> the lifetime of the API to suddenly cause the compiler to choose >
> sizeof(char).

I am under the impression that compilers usually instantiate enum as int, and you can make it use a smaller size by decorating it with the "packed" attribute - I have benefited from that in the past.

The only risk I am effectively trying to avoid is someone calling an rte_atomic() function with "order" being another value than one of these values. Probably not ever going to happen.

Your solution also addresses an academic risk (of the compiler using another type than int for the enum), which will have unwanted side effects - especially if the "order" parameter to the rte_atomic() functions becomes char instead of int.

I can now conclude that your proposed type (int) is stronger/safer than the type (enum) I suggested. So please keep what you have.

> 
> incidentally this is also why you can't forward declare enums in c.
> 

[...]

> > > +#define rte_atomic_compare_exchange_strong_explicit(obj, expected,
> > > desired, success, fail) \
> > > +	__atomic_compare_exchange_n(obj, expected, desired, 0, success,
> > > fail)
> >
> > The type of the "weak" parameter to __atomic_compare_exchange_n() is
> bool, not int, so use "false" instead of 0. There is probably no
> practical difference, so I'll leave it up to you.
> >
> > You might need to include <stdbool.h> for this... I haven't checked.
> 
> strictly speaking you are correct the intrinsic does take bool
> according
> to documentation.
> 
>     ISO/IEC 9899:2011
> 
>     7.18 Boolean type and values <stdbool.h>
>     (1) The header <stdbool.h> defines four macros.
>     (2) The macro bool expands to _Bool.
>     (3) The remaining three macros are suitable for use in #if
> preprocessing
> 	directives. They are `true' which expands to the integer constant
> 1,
> 	`false' which expands to the integer constant 0, and
> 	__bool_true_false_are_defined which expands to the integer
> constant 1.

Thank you for this reference. I wasn't aware that the two boolean values explicitly expanded to those two integer constants. I had never thought about it, but simply assumed that the constant "true" had the same meaning as "not false", like "if (123)" evaluates "123" as true.

So I learned something new today.

> 
> so i could include the header, to expand a macro which expands to
> integer constant 0 or 1 as appropriate for weak vs strong. do you think
> i should? (serious question) if you answer yes, i'll make the change.

Then no. Please keep the 1's and 0's.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] eal: introduce atomics abstraction
  @ 2023-02-09 18:15  4%       ` Tyler Retzlaff
  2023-02-09 19:19  0%         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-09 18:15 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, david.marchand, thomas, Honnappa.Nagarahalli, bruce.richardson

On Thu, Feb 09, 2023 at 09:05:46AM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Wednesday, 8 February 2023 22.44
> > 
> > Introduce atomics abstraction that permits optional use of standard C11
> > atomics when meson is provided the new enable_stdatomics=true option.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> 
> Looks good. A few minor suggestions about implementation only.
> 
> With or without suggested modifications,
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
> 
> >  config/meson.build                     | 11 ++++
> >  lib/eal/arm/include/rte_atomic_32.h    |  6 ++-
> >  lib/eal/arm/include/rte_atomic_64.h    |  6 ++-
> >  lib/eal/include/generic/rte_atomic.h   | 96
> > +++++++++++++++++++++++++++++++++-
> >  lib/eal/loongarch/include/rte_atomic.h |  6 ++-
> >  lib/eal/ppc/include/rte_atomic.h       |  6 ++-
> >  lib/eal/riscv/include/rte_atomic.h     |  6 ++-
> >  lib/eal/x86/include/rte_atomic.h       |  8 ++-
> >  meson_options.txt                      |  2 +
> >  9 files changed, 139 insertions(+), 8 deletions(-)
> > 
> > diff --git a/config/meson.build b/config/meson.build
> > index 26f3168..25dd628 100644
> > --- a/config/meson.build
> > +++ b/config/meson.build
> > @@ -255,6 +255,17 @@ endif
> >  # add -include rte_config to cflags
> >  add_project_arguments('-include', 'rte_config.h', language: 'c')
> > 
> > +stdc_atomics_enabled = get_option('enable_stdatomics')
> > +dpdk_conf.set('RTE_STDC_ATOMICS', stdc_atomics_enabled)
> > +
> > +if stdc_atomics_enabled
> > +if cc.get_id() == 'gcc' or cc.get_id() == 'clang'
> > +    add_project_arguments('-std=gnu11', language: 'c')
> > +else
> > +    add_project_arguments('-std=c11', language: 'c')
> > +endif
> > +endif
> > +
> >  # enable extra warnings and disable any unwanted warnings
> >  # -Wall is added by default at warning level 1, and -Wextra
> >  # at warning level 2 (DPDK default)
> > diff --git a/lib/eal/arm/include/rte_atomic_32.h
> > b/lib/eal/arm/include/rte_atomic_32.h
> > index c00ab78..7088a12 100644
> > --- a/lib/eal/arm/include/rte_atomic_32.h
> > +++ b/lib/eal/arm/include/rte_atomic_32.h
> > @@ -34,9 +34,13 @@
> >  #define rte_io_rmb() rte_rmb()
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  #ifdef __cplusplus
> > diff --git a/lib/eal/arm/include/rte_atomic_64.h
> > b/lib/eal/arm/include/rte_atomic_64.h
> > index 6047911..7f02c57 100644
> > --- a/lib/eal/arm/include/rte_atomic_64.h
> > +++ b/lib/eal/arm/include/rte_atomic_64.h
> > @@ -38,9 +38,13 @@
> >  #define rte_io_rmb() rte_rmb()
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  /*------------------------ 128 bit atomic operations -----------------
> > --------*/
> > diff --git a/lib/eal/include/generic/rte_atomic.h
> > b/lib/eal/include/generic/rte_atomic.h
> > index f5c49a9..392d928 100644
> > --- a/lib/eal/include/generic/rte_atomic.h
> > +++ b/lib/eal/include/generic/rte_atomic.h
> > @@ -110,6 +110,100 @@
> > 
> >  #endif /* __DOXYGEN__ */
> > 
> > +#ifdef RTE_STDC_ATOMICS
> > +
> > +#if !defined(__STDC_VERSION__) || __STDC_VERSION__ < 201112L ||
> > defined(__STDC_NO_ATOMICS__)
> > +#error compiler does not support C11 standard atomics
> > +#else
> > +#include <stdatomic.h>
> > +#endif
> > +
> > +#define __rte_atomic _Atomic
> > +
> > +typedef int rte_memory_order;
> 
> I would prefer enum for rte_memory_order:
> 
> typedef enum {
>     rte_memory_order_relaxed = memory_order_relaxed,
>     rte_memory_order_consume = memory_order_consume,
>     rte_memory_order_acquire = memory_order_acquire,
>     rte_memory_order_release = memory_order_release,
>     rte_memory_order_acq_rel = memory_order_acq_rel,
>     rte_memory_order_seq_cst = memory_order_seq_cst
> } rte_memory_order;

the reason for not using enum type is abi related. the c standard has
this little gem.

    ISO/IEC 9899:2011

    6.7.2.2 (4)
    Each enumerated type shall be compatible with char, a signed integer
    type, or an unsigned integer type. The choice of type is
    implementation-defined, 128) but shall be capable of representing the
    values of all the members of the enumeration.

    128) An implementation may delay the choice of which integer type until
    all enumeration constants have been seen.

so i'm just being overly protective of maintaining the forward
compatibility of the abi.

probably i'm being super unnecessarily cautious in this case since i think
in practice even if an implementation chose sizeof(char) i doubt very much
that enough enumerated values would get added to this enumeration within
the lifetime of the API to suddenly cause the compiler to choose > sizeof(char).

incidentally this is also why you can't forward declare enums in c.

> 
> > +
> > +#define rte_memory_order_relaxed memory_order_relaxed
> > +#define rte_memory_order_consume memory_order_consume
> > +#define rte_memory_order_acquire memory_order_acquire
> > +#define rte_memory_order_release memory_order_release
> > +#define rte_memory_order_acq_rel memory_order_acq_rel
> > +#define rte_memory_order_seq_cst memory_order_seq_cst
> > +
> > +#define rte_atomic_store_explicit(obj, desired, order) \
> > +	atomic_store_explicit(obj, desired, order)
> > +
> > +#define rte_atomic_load_explicit(obj, order) \
> > +	atomic_load_explicit(obj, order)
> > +
> > +#define rte_atomic_exchange_explicit(obj, desired, order) \
> > +	atomic_exchange_explicit(obj, desired, order)
> > +
> > +#define rte_atomic_compare_exchange_strong_explicit(obj, expected,
> > desired, success, fail) \
> > +	atomic_compare_exchange_strong_explicit(obj, expected, desired,
> > success, fail)
> > +
> > +#define rte_atomic_compare_exchange_weak_explicit(obj, expected,
> > desired, success, fail) \
> > +	atomic_compare_exchange_weak_explicit(obj, expected, desired,
> > success, fail)
> > +
> > +#define rte_atomic_fetch_add_explicit(obj, arg, order) \
> > +	atomic_fetch_add_explicit(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_sub_explicit(obj, arg, order) \
> > +	atomic_fetch_sub_explicit(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_or_explicit(obj, arg, order) \
> > +	atomic_fetch_or_explicit(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_xor_explicit(obj, arg, order) \
> > +	atomic_fetch_xor_explicit(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_and_explicit(obj, arg, order) \
> > +	atomic_fetch_and_explicit(obj, arg, order)
> > +
> > +#else
> > +
> > +#define __rte_atomic
> > +
> > +typedef int rte_memory_order;
> > +
> > +#define rte_memory_order_relaxed __ATOMIC_RELAXED
> > +#define rte_memory_order_consume __ATOMIC_CONSUME
> > +#define rte_memory_order_acquire __ATOMIC_ACQUIRE
> > +#define rte_memory_order_release __ATOMIC_RELEASE
> > +#define rte_memory_order_acq_rel __ATOMIC_ACQ_REL
> > +#define rte_memory_order_seq_cst __ATOMIC_SEQ_CST
> 
> Prefer enum for rte_memory_order:
> 
> typedef enum {
>     rte_memory_order_relaxed = __ATOMIC_RELAXED,
>     rte_memory_order_consume = __ATOMIC_CONSUME,
>     rte_memory_order_acquire = __ATOMIC_ACQUIRE,
>     rte_memory_order_release = __ATOMIC_RELEASE,
>     rte_memory_order_acq_rel = __ATOMIC_ACQ_REL,
>     rte_memory_order_seq_cst = __ATOMIC_SEQ_CST
> } rte_memory_order;
> 
> > +
> > +#define rte_atomic_store_explicit(obj, desired, order) \
> > +	__atomic_store_n(obj, desired, order)
> > +
> > +#define rte_atomic_load_explicit(obj, order) \
> > +	__atomic_load_n(obj, order)
> > +
> > +#define rte_atomic_exchange_explicit(obj, desired, order) \
> > +	__atomic_exchange_n(obj, desired, order)
> > +
> > +#define rte_atomic_compare_exchange_strong_explicit(obj, expected,
> > desired, success, fail) \
> > +	__atomic_compare_exchange_n(obj, expected, desired, 0, success,
> > fail)
> 
> The type of the "weak" parameter to __atomic_compare_exchange_n() is bool, not int, so use "false" instead of 0. There is probably no practical difference, so I'll leave it up to you.
> 
> You might need to include <stdbool.h> for this... I haven't checked.

strictly speaking you are correct the intrinsic does take bool according
to documentation.

    ISO/IEC 9899:2011

    7.18 Boolean type and values <stdbool.h>
    (1) The header <stdbool.h> defines four macros.
    (2) The macro bool expands to _Bool.
    (3) The remaining three macros are suitable for use in #if preprocessing
	directives. They are `true' which expands to the integer constant 1,
	`false' which expands to the integer constant 0, and
	__bool_true_false_are_defined which expands to the integer constant 1.

so i could include the header, to expand a macro which expands to
integer constant 0 or 1 as appropriate for weak vs strong. do you think
i should? (serious question) if you answer yes, i'll make the change.

> 
> > +
> > +#define rte_atomic_compare_exchange_weak_explicit(obj, expected,
> > desired, success, fail) \
> > +	__atomic_compare_exchange_n(obj, expected, desired, 1, success,
> > fail)
> 
> Same as above: Use "true" instead of 1.
> 
> > +
> > +#define rte_atomic_fetch_add_explicit(obj, arg, order) \
> > +	__atomic_fetch_add(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_sub_explicit(obj, arg, order) \
> > +	__atomic_fetch_sub(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_or_explicit(obj, arg, order) \
> > +	__atomic_fetch_or(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_xor_explicit(obj, arg, order) \
> > +	__atomic_fetch_xor(obj, arg, order)
> > +
> > +#define rte_atomic_fetch_and_explicit(obj, arg, order) \
> > +	__atomic_fetch_and(obj, arg, order)
> > +
> > +#endif
> > +
> >  /**
> >   * Compiler barrier.
> >   *
> > @@ -123,7 +217,7 @@
> >  /**
> >   * Synchronization fence between threads based on the specified memory
> > order.
> >   */
> > -static inline void rte_atomic_thread_fence(int memorder);
> > +static inline void rte_atomic_thread_fence(rte_memory_order memorder);
> > 
> >  /*------------------------- 16 bit atomic operations -----------------
> > --------*/
> > 
> > diff --git a/lib/eal/loongarch/include/rte_atomic.h
> > b/lib/eal/loongarch/include/rte_atomic.h
> > index 3c82845..66aa0c8 100644
> > --- a/lib/eal/loongarch/include/rte_atomic.h
> > +++ b/lib/eal/loongarch/include/rte_atomic.h
> > @@ -35,9 +35,13 @@
> >  #define rte_io_rmb()	rte_mb()
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  #ifdef __cplusplus
> > diff --git a/lib/eal/ppc/include/rte_atomic.h
> > b/lib/eal/ppc/include/rte_atomic.h
> > index 663b4d3..a428a83 100644
> > --- a/lib/eal/ppc/include/rte_atomic.h
> > +++ b/lib/eal/ppc/include/rte_atomic.h
> > @@ -38,9 +38,13 @@
> >  #define rte_io_rmb() rte_rmb()
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  /*------------------------- 16 bit atomic operations -----------------
> > --------*/
> > diff --git a/lib/eal/riscv/include/rte_atomic.h
> > b/lib/eal/riscv/include/rte_atomic.h
> > index 4b4633c..3c203a9 100644
> > --- a/lib/eal/riscv/include/rte_atomic.h
> > +++ b/lib/eal/riscv/include/rte_atomic.h
> > @@ -40,9 +40,13 @@
> >  #define rte_io_rmb()	asm volatile("fence ir, ir" : : : "memory")
> > 
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > +#ifdef RTE_STDC_ATOMICS
> > +	atomic_thread_fence(memorder);
> > +#else
> >  	__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  #ifdef __cplusplus
> > diff --git a/lib/eal/x86/include/rte_atomic.h
> > b/lib/eal/x86/include/rte_atomic.h
> > index f2ee1a9..02d8b12 100644
> > --- a/lib/eal/x86/include/rte_atomic.h
> > +++ b/lib/eal/x86/include/rte_atomic.h
> > @@ -87,12 +87,16 @@
> >   * used instead.
> >   */
> >  static __rte_always_inline void
> > -rte_atomic_thread_fence(int memorder)
> > +rte_atomic_thread_fence(rte_memory_order memorder)
> >  {
> > -	if (memorder == __ATOMIC_SEQ_CST)
> > +	if (memorder == rte_memory_order_seq_cst)
> >  		rte_smp_mb();
> >  	else
> > +#ifdef RTE_STDC_ATOMICS
> > +		atomic_thread_fence(memorder);
> > +#else
> >  		__atomic_thread_fence(memorder);
> > +#endif
> >  }
> > 
> >  /*------------------------- 16 bit atomic operations -----------------
> > --------*/
> > diff --git a/meson_options.txt b/meson_options.txt
> > index 0852849..acbcbb8 100644
> > --- a/meson_options.txt
> > +++ b/meson_options.txt
> > @@ -46,6 +46,8 @@ option('mbuf_refcnt_atomic', type: 'boolean', value:
> > true, description:
> >         'Atomically access the mbuf refcnt.')
> >  option('platform', type: 'string', value: 'native', description:
> >         'Platform to build, either "native", "generic" or a SoC. Please
> > refer to the Linux build guide for more information.')
> > +option('enable_stdatomics', type: 'boolean', value: false,
> > description:
> > +       'enable use of standard C11 atomics.')
> >  option('enable_trace_fp', type: 'boolean', value: false, description:
> >         'enable fast path trace points.')
> >  option('tests', type: 'boolean', value: true, description:
> > --
> > 1.8.3.1
> > 

^ permalink raw reply	[relevance 4%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
  2023-02-09  8:34  4%                   ` Morten Brørup
@ 2023-02-09 17:30  4%                   ` Tyler Retzlaff
  2023-02-10  5:30  0%                     ` Honnappa Nagarahalli
  1 sibling, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-09 17:30 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Morten Brørup, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard

On Thu, Feb 09, 2023 at 12:16:38AM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > > > > >
> > > > > > >
> > > > > > > For environments where stdatomics are not supported, we could
> > > > have a
> > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > > support
> > > > only
> > > > > > _explicit APIs). This allows the code to use stdatomics APIs and
> > > > when we move
> > > > > > to minimum supported standard C11, we just need to get rid of
> > > > > > the
> > > > file in DPDK
> > > > > > repo.
> > > > > >
> > > > > > my concern with this is that if we provide a stdatomic.h or
> > > > introduce names
> > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > >
> > > > > > references:
> > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > >  * GNU libc manual
> > > > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > Names.html
> > > > > >
> > > > > > in effect the header, the names and in some instances namespaces
> > > > introduced
> > > > > > are reserved by the implementation. there are several reasons in
> > > > the GNU libc
> > > > > Wouldn't this apply only after the particular APIs were introduced?
> > > > i.e. it should not apply if the compiler does not support stdatomics.
> > > >
> > > > yeah, i agree they're being a bit wishy washy in the wording, but
> > > > i'm not convinced glibc folks are documenting this as permissive
> > > > guidance against.
> > > >
> > > > >
> > > > > > manual that explain the justification for these reservations and
> > > > > > if
> > > > if we think
> > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > >
> > > > > > i'll also remark that the inter-mingling of names from the POSIX
> > > > standard
> > > > > > implicitly exposed as a part of the EAL public API has been
> > > > problematic for
> > > > > > portability.
> > > > > These should be exposed as EAL APIs only when compiled with a
> > > > compiler that does not support stdatomics.
> > > >
> > > > you don't necessarily compile dpdk, the application or its other
> > > > dynamically linked dependencies with the same compiler at the same
> > > > time.
> > > > i.e. basically the model of any dpdk-dev package on any linux
> > > > distribution.
> > > >
> > > > if dpdk is built without real stdatomic types but the application
> > > > has to interoperate with a different kit or library that does they
> > > > would be forced to dance around dpdk with their own version of a
> > > > shim to hide our faked up stdatomics.
> > > >
> > >
> > > So basically, if we want a binary DPDK distribution to be compatible with a
> > separate application build environment, they both have to implement atomics
> > the same way, i.e. agree on the ABI for atomics.
> > >
> > > Summing up, this leaves us with only two realistic options:
> > >
> > > 1. Go all in on C11 stdatomics, also requiring the application build
> > environment to support C11 stdatomics.
> > > 2. Provide our own DPDK atomics library.
> > >
> > > (As mentioned by Tyler, the third option - using C11 stdatomics inside
> > > DPDK, and requiring a build environment without C11 stdatomics to
> > > implement a shim - is not realistic!)
> > >
> > > I strongly want atomics to be available for use across inline and compiled
> > code; i.e. it must be possible for both compiled DPDK functions and inline
> > functions to perform atomic transactions on the same atomic variable.
> > 
> > i consider it a mandatory requirement. i don't see practically how we could
> > withdraw existing use and even if we had clean way i don't see why we would
> > want to. so this item is defintely settled if you were concerned.
> I think I agree here.
> 
> > 
> > >
> > > So either we upgrade the DPDK build requirements to support C11 (including
> > the optional stdatomics), or we provide our own DPDK atomics.
> > 
> > i think the issue of requiring a toolchain conformant to a specific standard is a
> > separate matter because any adoption of C11 standard atomics is a potential
> > abi break from the current use of intrinsics.
> I am not sure why you are calling it as ABI break. Referring to [1], I just see wrappers around intrinsics (though [2] does not use the intrinsics).
> 
> [1] https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
> [2] https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdatomic.h

it's a potential abi break because atomic types are not the same types as
their corresponding integer types etc.. (or at least are not guaranteed to
be by all implementations of c as an abstract language).

    ISO/IEC 9899:2011

    6.2.5 (27)
    Further, there is the _Atomic qualifier. The presence of the _Atomic
    qualifier designates an atomic type. The size, representation, and alignment
    of an atomic type need not be the same as those of the corresponding
    unqualified type.

    7.17.6 (3)
    NOTE The representation of atomic integer types need not have the same size
    as their corresponding regular types. They should have the same size whenever
    possible, as it eases effort required to port existing code.

i use the term `potential abi break' with intent because for me to assert
in absolute terms i would have to evaluate the implementation of every
current and potential future compilers atomic vs non-atomic types. this
as i'm sure you understand is not practical, it would also defeat the
purpose of moving to a standard. therefore i rely on the specification
prescribed by the standard not the detail of a specific implementation.


> > the abstraction (whatever namespace it resides) allows the existing
> > toolchain/platform combinations to maintain compatibility by defaulting to
> > current non-standard intrinsics.
> How about using the intrinsics (__atomic_xxx) name space for abstraction? This covers the GCC and Clang compilers.

the namespace starting with `__` is also reserved for the implementation.
this is why compilers gcc/clang/msvc place name their intrinsic and
builtin functions starting with __ to explicitly avoid collision with the
application namespace.

    ISO/IEC 9899:2011

    7.1.3 (1)
    All identifiers that begin with an underscore and either an uppercase
    letter or another underscore are always reserved for any use.

    ...

> If there is another platform that uses the same name space for something else, I think DPDK should not be supporting that platform.

that's effectively a statement excluding windows platform and all
non-gcc compilers from ever supporting dpdk.

> What problems do you see?

i'm fairly certain at least one other compiler uses the __atomic
namespace but it would take me time to check, the most notable potential
issue that comes to mind is if such an intrinsic with the same name is
provided in a different implementation and has either regressive code
generation or different semantics it would be bad because it is
intrinsic you can't just hack around it with #undef __atomic to shim in
a semantically correct version.

how about this, is there another possible namespace you might suggest
that conforms or doesn't conflict with the the rules defined in
ISO/IEC 9899:2011 7.1.3 i think if there were that would satisfy all of
my concerns related to namespaces.

keep in mind the point of moving to a standard is to achieve portability
so if we do things that will regress us back to being dependent on an
implementation we haven't succeeded. that's all i'm trying to guarantee
here.

i feel like we are really close on this discussion, if we can just iron
this issue out we can probably get going on the actual changes.

thanks for the consideration.

> 
> > 
> > once in place it provides an opportunity to introduce new toolchain/platform
> > combinations and enables an opt-in capability to use stdatomics on existing
> > toolchain/platform combinations subject to community discussion on
> > how/if/when.
> > 
> > it would be good to get more participants into the discussion so i'll cc techboard
> > for some attention. i feel like the only area that isn't decided is to do or not do
> > this in rte_ namespace.
> > 
> > i'm strongly in favor of rte_ namespace after discussion, mainly due to to
> > disadvantages of trying to overlap with the standard namespace while not
> > providing a compatible api/abi and because it provides clear disambiguation of
> > that difference in semantics and compatibility with the standard api.
> > 
> > so far i've noted the following
> > 
> > * we will not provide the non-explicit apis.
> +1
> 
> > * we will make no attempt to support operate on struct/union atomics
> >   with our apis.
> +1
> 
> > * we will mirror the standard api potentially in the rte_ namespace to
> >   - reference the standard api documentation.
> >   - assume compatible semantics (sans exceptions from first 2 points).
> > 
> > my vote is to remove 'potentially' from the last point above for reasons
> > previously discussed in postings to the mail thread.
> > 
> > thanks all for the discussion, i'll send up a patch removing non-explicit apis for
> > viewing.
> > 
> > ty

^ permalink raw reply	[relevance 4%]

* RE: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-02-06 16:38  0%                           ` Jerin Jacob
@ 2023-02-09 17:00  0%                             ` Naga Harish K, S V
  0 siblings, 0 replies; 200+ results
From: Naga Harish K, S V @ 2023-02-09 17:00 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, February 6, 2023 10:08 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Mon, Feb 6, 2023 at 11:52 AM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Friday, February 3, 2023 3:15 PM
> > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > <jay.jayatheerthan@intel.com>
> > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > >
> > > On Thu, Feb 2, 2023 at 9:42 PM Naga Harish K, S V
> > > <s.v.naga.harish.k@intel.com> wrote:
> > > >
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, January 30, 2023 8:13 PM
> > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > > <jay.jayatheerthan@intel.com>
> > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get
> > > > > APIs
> > > > >
> > > > > On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
> > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > > Sent: Saturday, January 28, 2023 4:24 PM
> > > > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan,
> > > > > > > Jay <jay.jayatheerthan@intel.com>
> > > > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params
> > > > > > > set/get APIs
> > > > > > >
> > > > > > > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > > > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > +        */
> > > > > > > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > > > > > > >
> > > > > > > > > > > > > Introduce
> > > > > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > > > > > > to
> > > > > > > > > make
> > > > > > > > > > > > > sure rsvd is zero.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The reserved fields are not used by the adapter or
> > > application.
> > > > > > > > > > > > Not sure Is it necessary to Introduce a new API to
> > > > > > > > > > > > clear reserved
> > > > > > > fields.
> > > > > > > > > > >
> > > > > > > > > > > When adapter starts using new fileds(when we add new
> > > > > > > > > > > fieds in future), the old applicaiton which is not
> > > > > > > > > > > using
> > > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() may
> > > > > > > > > > > have
> > > > > junk
> > > > > > > > > > > value and then adapter implementation will behave bad.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > does it mean, the application doesn't re-compile for
> > > > > > > > > > the new
> > > DPDK?
> > > > > > > > >
> > > > > > > > > Yes. No need recompile if ABI not breaking.
> > > > > > > > >
> > > > > > > > > > When some of the reserved fields are used in the
> > > > > > > > > > future, the application
> > > > > > > > > also may need to be recompiled along with DPDK right?
> > > > > > > > > > As the application also may need to use the newly
> > > > > > > > > > consumed reserved
> > > > > > > > > fields?
> > > > > > > > >
> > > > > > > > > The problematic case is:
> > > > > > > > >
> > > > > > > > > Adapter implementation of 23.07(Assuming there is change
> > > > > > > > > params) field needs to work with application of 23.03.
> > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() will sove
> that.
> > > > > > > > >
> > > > > > > >
> > > > > > > > As rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > initializes only
> > > > > > > reserved fields to zero,  it may not solve the issue in this case.
> > > > > > >
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init() needs to zero
> > > > > > > all fields, not just reserved field.
> > > > > > > The application calling sequence  is
> > > > > > >
> > > > > > > struct my_config c;
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > > > > > > c.interseted_filed_to_be_updated = val;
> > > > > > >
> > > > > > Can it be done like
> > > > > >         struct my_config c = {0};
> > > > > >         c.interseted_filed_to_be_updated = val; and update
> > > > > > Doxygen comments to recommend above usage to reset all fields?
> > > > > > This way,  rte_event_eth_rx_adapter_runtime_params_init() can
> > > > > > be
> > > > > avoided.
> > > > >
> > > > > Better to have a function for documentation clarity. Similar
> > > > > scheme already there in DPDK. See rte_eth_cman_config_init()
> > > > >
> > > > >
> > > >
> > > >
> > > > The reference function rte_eth_cman_config_init() is resetting the
> > > > params
> > > struct and initializing the required params with default values in the pmd
> cb.
> > >
> > > No need for PMD cb.
> > >
> > > > The proposed rte_event_eth_rx_adapter_runtime_params_init () API
> > > > just
> > > needs to reset the params struct. There are no pmd CBs involved.
> > > > Having an API just to reset the struct seems overkill. What do you
> think?
> > >
> > > It is slow path API. Keeping it as function is better. Also, it
> > > helps the documentations of config parm in
> > > rte_event_eth_rx_adapter_runtime_params_config()
> > > like, This structure must be initialized with
> > > rte_event_eth_rx_adapter_runtime_params_init() or so.
> > >
> > >
> >
> > Are there any other reasons to have this API (*params_init()) other than
> documentation?
> 
> Initialization code is segregated for tracking.
> 

The discussed changes are updated in the v3 patchset.

> >
> > >
> > > >
> > > > > >
> > > > > > > Let me share an example and you can tell where is the issue
> > > > > > >
> > > > > > > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > > > > > > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all
> 64B.
> > > > > > > 3)There is an application written based on 22.03 which using
> > > > > > > only 8B after calling
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > 4)Assume, in 22.07 another 8B added to structure.
> > > > > > > 5)Now, the application (3) needs to run on 22.07. Since the
> > > > > > > application is calling
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > and 9 to 15B are zero, the implementation will not go bad.
> > > > > > >
> > > > > > > > The old application only tries to set/get previous valid
> > > > > > > > fields and the newly
> > > > > > > used fields may still contain junk value.
> > > > > > > > If the application wants to make use of any the newly used
> > > > > > > > params, the
> > > > > > > application changes are required anyway.
> > > > > > >
> > > > > > > Yes. If application wants to make use of newly added features.
> > > > > > > No need to change if new features are not needed for old
> application.

^ permalink raw reply	[relevance 0%]

* [PATCH v7 2/3] graph: pcap capture for graph nodes
  2023-02-09 10:24  4%   ` [PATCH v7 1/3] " Amit Prakash Shukla
@ 2023-02-09 10:24  2%     ` Amit Prakash Shukla
  0 siblings, 0 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-09 10:24 UTC (permalink / raw)
  To: Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Anatoly Burakov
  Cc: dev, david.marchand, Amit Prakash Shukla

Implementation adds support to capture packets at each node with
packet metadata and node name.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

v6:
 - Squashing test graph param initialize fix
 
v7:
 - Resending the patch

 app/test/test_graph_perf.c             |   2 +-
 doc/guides/rel_notes/release_23_03.rst |   7 +
 lib/graph/graph.c                      |  17 +-
 lib/graph/graph_pcap.c                 | 216 +++++++++++++++++++++++++
 lib/graph/graph_pcap_private.h         | 116 +++++++++++++
 lib/graph/graph_populate.c             |  12 +-
 lib/graph/graph_private.h              |   5 +
 lib/graph/meson.build                  |   3 +-
 lib/graph/rte_graph.h                  |   5 +
 lib/graph/rte_graph_worker.h           |   9 ++
 10 files changed, 388 insertions(+), 4 deletions(-)
 create mode 100644 lib/graph/graph_pcap.c
 create mode 100644 lib/graph/graph_pcap_private.h

diff --git a/app/test/test_graph_perf.c b/app/test/test_graph_perf.c
index 1d065438a6..c5b463f700 100644
--- a/app/test/test_graph_perf.c
+++ b/app/test/test_graph_perf.c
@@ -324,7 +324,7 @@ graph_init(const char *gname, uint8_t nb_srcs, uint8_t nb_sinks,
 	char nname[RTE_NODE_NAMESIZE / 2];
 	struct test_node_data *node_data;
 	char *ename[nodes_per_stage];
-	struct rte_graph_param gconf;
+	struct rte_graph_param gconf = {0};
 	const struct rte_memzone *mz;
 	uint8_t total_percent = 0;
 	rte_node_t *src_nodes;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index bb435dde32..328dfd3009 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -87,6 +87,10 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added pcap trace support in graph library.**
+
+  * Added support to capture packets at each graph node with packet metadata and
+    node name.
 
 Removed Items
 -------------
@@ -119,6 +123,9 @@ API Changes
 * Experimental function ``rte_pcapng_copy`` was updated to support comment
   section in enhanced packet block in pcapng library.
 
+* Experimental structures ``struct rte_graph_param``, ``struct rte_graph`` and
+  ``struct graph`` were updated to support pcap trace in graph library.
+
 ABI Changes
 -----------
 
diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 3a617cc369..a839a2803b 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -15,6 +15,7 @@
 #include <rte_string_fns.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static struct graph_head graph_list = STAILQ_HEAD_INITIALIZER(graph_list);
 static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER;
@@ -228,7 +229,12 @@ graph_mem_fixup_node_ctx(struct rte_graph *graph)
 		node_db = node_from_name(name);
 		if (node_db == NULL)
 			SET_ERR_JMP(ENOLINK, fail, "Node %s not found", name);
-		node->process = node_db->process;
+
+		if (graph->pcap_enable) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = node_db->process;
+		} else
+			node->process = node_db->process;
 	}
 
 	return graph;
@@ -242,6 +248,9 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	if (graph == NULL || rte_eal_process_type() == RTE_PROC_PRIMARY)
 		return graph;
 
+	if (graph_pcap_file_open(graph->pcap_filename) || graph_pcap_mp_init())
+		graph_pcap_exit(graph);
+
 	return graph_mem_fixup_node_ctx(graph);
 }
 
@@ -323,11 +332,17 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	if (graph_has_isolated_node(graph))
 		goto graph_cleanup;
 
+	/* Initialize pcap config. */
+	graph_pcap_enable(prm->pcap_enable);
+
 	/* Initialize graph object */
 	graph->socket = prm->socket_id;
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
+	if (prm->pcap_filename)
+		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
new file mode 100644
index 0000000000..9cbd1b8fdb
--- /dev/null
+++ b/lib/graph/graph_pcap.c
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#include <errno.h>
+#include <pwd.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+
+#include "rte_graph_worker.h"
+
+#include "graph_pcap_private.h"
+
+#define GRAPH_PCAP_BUF_SZ	128
+#define GRAPH_PCAP_NUM_PACKETS	1024
+#define GRAPH_PCAP_PKT_POOL	"graph_pcap_pkt_pool"
+#define GRAPH_PCAP_FILE_NAME	"dpdk_graph_pcap_capture_XXXXXX.pcapng"
+
+/* For multi-process, packets are captured in separate files. */
+static rte_pcapng_t *pcapng_fd;
+static bool pcap_enable;
+struct rte_mempool *pkt_mp;
+
+void
+graph_pcap_enable(bool val)
+{
+	pcap_enable = val;
+}
+
+int
+graph_pcap_is_enable(void)
+{
+	return pcap_enable;
+}
+
+void
+graph_pcap_exit(struct rte_graph *graph)
+{
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		if (pkt_mp)
+			rte_mempool_free(pkt_mp);
+
+	if (pcapng_fd) {
+		rte_pcapng_close(pcapng_fd);
+		pcapng_fd = NULL;
+	}
+
+	/* Disable pcap. */
+	graph->pcap_enable = 0;
+	graph_pcap_enable(0);
+}
+
+static int
+graph_pcap_default_path_get(char **dir_path)
+{
+	struct passwd *pwd;
+	char *home_dir;
+
+	/* First check for shell environment variable */
+	home_dir = getenv("HOME");
+	if (home_dir == NULL) {
+		graph_warn("Home env not preset.");
+		/* Fallback to password file entry */
+		pwd = getpwuid(getuid());
+		if (pwd == NULL)
+			return -EINVAL;
+
+		home_dir = pwd->pw_dir;
+	}
+
+	/* Append default pcap file to directory */
+	if (asprintf(dir_path, "%s/%s", home_dir, GRAPH_PCAP_FILE_NAME) == -1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int
+graph_pcap_file_open(const char *filename)
+{
+	int fd;
+	char file_name[RTE_GRAPH_PCAP_FILE_SZ];
+	char *pcap_dir;
+
+	if (pcapng_fd)
+		goto done;
+
+	if (!filename || filename[0] == '\0') {
+		if (graph_pcap_default_path_get(&pcap_dir) < 0)
+			return -1;
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s", pcap_dir);
+		free(pcap_dir);
+	} else {
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s_XXXXXX.pcapng",
+			 filename);
+	}
+
+	fd = mkstemps(file_name, strlen(".pcapng"));
+	if (fd < 0) {
+		graph_err("mkstemps() failure");
+		return -1;
+	}
+
+	graph_info("pcap filename: %s", file_name);
+
+	/* Open a capture file */
+	pcapng_fd = rte_pcapng_fdopen(fd, NULL, NULL, "Graph pcap tracer",
+				      NULL);
+	if (pcapng_fd == NULL) {
+		graph_err("Graph rte_pcapng_fdopen failed.");
+		close(fd);
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_mp_init(void)
+{
+	pkt_mp = rte_mempool_lookup(GRAPH_PCAP_PKT_POOL);
+	if (pkt_mp)
+		goto done;
+
+	/* Make a pool for cloned packets */
+	pkt_mp = rte_pktmbuf_pool_create_by_ops(GRAPH_PCAP_PKT_POOL,
+			IOV_MAX + RTE_GRAPH_BURST_SIZE,	0, 0,
+			rte_pcapng_mbuf_size(RTE_MBUF_DEFAULT_BUF_SIZE),
+			SOCKET_ID_ANY, "ring_mp_mc");
+	if (pkt_mp == NULL) {
+		graph_err("Cannot create mempool for graph pcap capture.");
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_init(struct graph *graph)
+{
+	struct rte_graph *graph_data = graph->graph;
+
+	if (graph_pcap_file_open(graph->pcap_filename) < 0)
+		goto error;
+
+	if (graph_pcap_mp_init() < 0)
+		goto error;
+
+	/* User configured number of packets to capture. */
+	if (graph->num_pkt_to_capture)
+		graph_data->nb_pkt_to_capture = graph->num_pkt_to_capture;
+	else
+		graph_data->nb_pkt_to_capture = GRAPH_PCAP_NUM_PACKETS;
+
+	/* All good. Now populate data for secondary process. */
+	rte_strscpy(graph_data->pcap_filename, graph->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
+	graph_data->pcap_enable = 1;
+
+	return 0;
+
+error:
+	graph_pcap_exit(graph_data);
+	graph_pcap_enable(0);
+	graph_err("Graph pcap initialization failed. Disabling pcap trace.");
+	return -1;
+}
+
+uint16_t
+graph_pcap_dispatch(struct rte_graph *graph,
+			      struct rte_node *node, void **objs,
+			      uint16_t nb_objs)
+{
+	struct rte_mbuf *mbuf_clones[RTE_GRAPH_BURST_SIZE];
+	char buffer[GRAPH_PCAP_BUF_SZ];
+	uint64_t i, num_packets;
+	struct rte_mbuf *mbuf;
+	ssize_t len;
+
+	if (!nb_objs || (graph->nb_pkt_captured >= graph->nb_pkt_to_capture))
+		goto done;
+
+	num_packets = graph->nb_pkt_to_capture - graph->nb_pkt_captured;
+	/* nb_objs will never be greater than RTE_GRAPH_BURST_SIZE */
+	if (num_packets > nb_objs)
+		num_packets = nb_objs;
+
+	snprintf(buffer, GRAPH_PCAP_BUF_SZ, "%s: %s", graph->name, node->name);
+
+	for (i = 0; i < num_packets; i++) {
+		struct rte_mbuf *mc;
+		mbuf = (struct rte_mbuf *)objs[i];
+
+		mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len,
+				     rte_get_tsc_cycles(), 0, buffer);
+		if (mc == NULL)
+			break;
+
+		mbuf_clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng_fd, mbuf_clones, i);
+	rte_pktmbuf_free_bulk(mbuf_clones, i);
+	if (len <= 0)
+		goto done;
+
+	graph->nb_pkt_captured += i;
+
+done:
+	return node->original_process(graph, node, objs, nb_objs);
+}
diff --git a/lib/graph/graph_pcap_private.h b/lib/graph/graph_pcap_private.h
new file mode 100644
index 0000000000..2ec772072c
--- /dev/null
+++ b/lib/graph/graph_pcap_private.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#ifndef _RTE_GRAPH_PCAP_PRIVATE_H_
+#define _RTE_GRAPH_PCAP_PRIVATE_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "graph_private.h"
+
+/**
+ * @internal
+ *
+ * Pcap trace enable/disable function.
+ *
+ * The function is called to enable/disable graph pcap trace functionality.
+ *
+ * @param val
+ *   Value to be set to enable/disable graph pcap trace.
+ */
+void graph_pcap_enable(bool val);
+
+/**
+ * @internal
+ *
+ * Check graph pcap trace is enable/disable.
+ *
+ * The function is called to check if the graph pcap trace is enabled/disabled.
+ *
+ * @return
+ *   - 1: Enable
+ *   - 0: Disable
+ */
+int graph_pcap_is_enable(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to allocate mempool.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_mp_init(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to open pcap file.
+ *
+ * @param filename
+ *   Pcap filename.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_file_open(const char *filename);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked when the graph pcap trace is enabled. This function
+ * open's pcap file and allocates mempool. Information needed for secondary
+ * process is populated.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_init(struct graph *graph);
+
+/**
+ * @internal
+ *
+ * Exit graph pcap trace functionality.
+ *
+ * The function is called to exit graph pcap trace and close open fd's and
+ * free up memory. Pcap trace is also disabled.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ */
+void graph_pcap_exit(struct rte_graph *graph);
+
+/**
+ * @internal
+ *
+ * Capture mbuf metadata and node metadata to a pcap file.
+ *
+ * When graph pcap trace enabled, this function is invoked prior to each node
+ * and mbuf, node metadata is parsed and captured in a pcap file.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param objs
+ *   Pointer to an array of objects to be processed.
+ * @param nb_objs
+ *   Number of objects in the array.
+ */
+uint16_t graph_pcap_dispatch(struct rte_graph *graph,
+				   struct rte_node *node, void **objs,
+				   uint16_t nb_objs);
+
+#endif /* _RTE_GRAPH_PCAP_PRIVATE_H_ */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..2c0844ce92 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -9,6 +9,7 @@
 #include <rte_memzone.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static size_t
 graph_fp_mem_calc_size(struct graph *graph)
@@ -75,7 +76,11 @@ graph_nodes_populate(struct graph *_graph)
 		memset(node, 0, sizeof(*node));
 		node->fence = RTE_GRAPH_FENCE;
 		node->off = off;
-		node->process = graph_node->node->process;
+		if (graph_pcap_is_enable()) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = graph_node->node->process;
+		} else
+			node->process = graph_node->node->process;
 		memcpy(node->name, graph_node->node->name, RTE_GRAPH_NAMESIZE);
 		pid = graph_node->node->parent_id;
 		if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */
@@ -183,6 +188,8 @@ graph_fp_mem_populate(struct graph *graph)
 	int rc;
 
 	graph_header_popluate(graph);
+	if (graph_pcap_is_enable())
+		graph_pcap_init(graph);
 	graph_nodes_populate(graph);
 	rc = graph_node_nexts_populate(graph);
 	rc |= graph_src_nodes_populate(graph);
@@ -227,6 +234,9 @@ graph_nodes_mem_destroy(struct rte_graph *graph)
 int
 graph_fp_mem_destroy(struct graph *graph)
 {
+	if (graph_pcap_is_enable())
+		graph_pcap_exit(graph->graph);
+
 	graph_nodes_mem_destroy(graph->graph);
 	return rte_memzone_free(graph->mz);
 }
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -22,6 +22,7 @@ extern int rte_graph_logtype;
 			__func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__, )))
 
 #define graph_err(...) GRAPH_LOG(ERR, __VA_ARGS__)
+#define graph_warn(...) GRAPH_LOG(WARNING, __VA_ARGS__)
 #define graph_info(...) GRAPH_LOG(INFO, __VA_ARGS__)
 #define graph_dbg(...) GRAPH_LOG(DEBUG, __VA_ARGS__)
 
@@ -100,6 +101,10 @@ struct graph {
 	/**< Memory size of the graph. */
 	int socket;
 	/**< Socket identifier where memory is allocated. */
+	uint64_t num_pkt_to_capture;
+	/**< Number of packets to be captured per core. */
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];
+	/**< pcap file name/path. */
 	STAILQ_HEAD(gnode_list, graph_node) node_list;
 	/**< Nodes in a graph. */
 };
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -14,7 +14,8 @@ sources = files(
         'graph_debug.c',
         'graph_stats.c',
         'graph_populate.c',
+        'graph_pcap.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal']
+deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..c9a77297fc 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -35,6 +35,7 @@ extern "C" {
 
 #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
 #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
+#define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file name. */
 #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph offset. */
 #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
 #define RTE_EDGE_ID_INVALID UINT16_MAX   /**< Invalid edge id. */
@@ -164,6 +165,10 @@ struct rte_graph_param {
 	uint16_t nb_node_patterns;  /**< Number of node patterns. */
 	const char **node_patterns;
 	/**< Array of node patterns based on shell pattern. */
+
+	bool pcap_enable; /**< Pcap enable. */
+	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
+	char *pcap_filename; /**< Filename in which packets to be captured.*/
 };
 
 /**
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index fc6fee48c8..438595b15c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -44,6 +44,12 @@ struct rte_graph {
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
+	bool pcap_enable;	        /**< Pcap trace enabled. */
+	/** Number of packets captured per core. */
+	uint64_t nb_pkt_captured;
+	/** Number of packets to capture per core. */
+	uint64_t nb_pkt_to_capture;
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
 	uint64_t fence;			/**< Fence. */
 } __rte_cache_aligned;
 
@@ -64,6 +70,9 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/** Original process function when pcap is enabled. */
+	rte_node_process_t original_process;
+
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[relevance 2%]

* [PATCH v7 1/3] pcapng: comment option support for epb
  2023-02-09  9:56  4% ` [PATCH v6 1/4] " Amit Prakash Shukla
  2023-02-09  9:56  2%   ` [PATCH v6 2/4] graph: pcap capture for graph nodes Amit Prakash Shukla
  2023-02-09 10:03  0%   ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
@ 2023-02-09 10:24  4%   ` Amit Prakash Shukla
  2023-02-09 10:24  2%     ` [PATCH v7 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
  2 siblings, 1 reply; 200+ results
From: Amit Prakash Shukla @ 2023-02-09 10:24 UTC (permalink / raw)
  To: Reshma Pattan, Stephen Hemminger
  Cc: dev, jerinj, david.marchand, Amit Prakash Shukla

This change enhances rte_pcapng_copy to have comment in enhanced
packet block.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

v6:
 - Squashing test graph param initialize fix
 
v7:
 - Resending the patch

 app/test/test_pcapng.c                 |  4 ++--
 doc/guides/rel_notes/release_23_03.rst |  2 ++
 lib/pcapng/rte_pcapng.c                | 10 +++++++++-
 lib/pcapng/rte_pcapng.h                |  4 +++-
 lib/pdump/rte_pdump.c                  |  2 +-
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index edba46d1fe..b8429a02f1 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -146,7 +146,7 @@ test_write_packets(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
@@ -262,7 +262,7 @@ test_write_over_limit_iov_max(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 1fa101c420..bb435dde32 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -116,6 +116,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* Experimental function ``rte_pcapng_copy`` was updated to support comment
+  section in enhanced packet block in pcapng library.
 
 ABI Changes
 -----------
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index ea004939e6..65c8c77fa4 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -466,7 +466,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *md,
 		struct rte_mempool *mp,
 		uint32_t length, uint64_t cycles,
-		enum rte_pcapng_direction direction)
+		enum rte_pcapng_direction direction,
+		const char *comment)
 {
 	struct pcapng_enhance_packet_block *epb;
 	uint32_t orig_len, data_len, padding, flags;
@@ -527,6 +528,9 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 	if (rss_hash)
 		optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
 
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+
 	/* reserve trailing options and block length */
 	opt = (struct pcapng_option *)
 		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
@@ -564,6 +568,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 					&hash_opt, sizeof(hash_opt));
 	}
 
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT, comment,
+					strlen(comment));
+
 	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
 
 	/* Add PCAPNG packet header */
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index 86b7996e29..4afdec22ef 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -125,6 +125,8 @@ enum rte_pcapng_direction {
  *   The timestamp in TSC cycles.
  * @param direction
  *   The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ *   Packet comment.
  *
  * @return
  *   - The pointer to the new mbuf formatted for pcapng_write
@@ -136,7 +138,7 @@ struct rte_mbuf *
 rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *m, struct rte_mempool *mp,
 		uint32_t length, uint64_t timestamp,
-		enum rte_pcapng_direction direction);
+		enum rte_pcapng_direction direction, const char *comment);
 
 
 /**
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index a81544cb57..9bc4bab4f2 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -122,7 +122,7 @@ pdump_copy(uint16_t port_id, uint16_t queue,
 		if (cbs->ver == V2)
 			p = rte_pcapng_copy(port_id, queue,
 					    pkts[i], mp, cbs->snaplen,
-					    ts, direction);
+					    ts, direction, NULL);
 		else
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* RE: [PATCH v6 1/4] pcapng: comment option support for epb
  2023-02-09  9:56  4% ` [PATCH v6 1/4] " Amit Prakash Shukla
  2023-02-09  9:56  2%   ` [PATCH v6 2/4] graph: pcap capture for graph nodes Amit Prakash Shukla
@ 2023-02-09 10:03  0%   ` Amit Prakash Shukla
  2023-02-09 10:24  4%   ` [PATCH v7 1/3] " Amit Prakash Shukla
  2 siblings, 0 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-09 10:03 UTC (permalink / raw)
  To: Amit Prakash Shukla, Reshma Pattan, Stephen Hemminger
  Cc: dev, Jerin Jacob Kollanukkaran, david.marchand

Please ignore this version. I will resend the patch.

> -----Original Message-----
> From: Amit Prakash Shukla <amitprakashs@marvell.com>
> Sent: Thursday, February 9, 2023 3:26 PM
> To: Reshma Pattan <reshma.pattan@intel.com>; Stephen Hemminger
> <stephen@networkplumber.org>
> Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> david.marchand@redhat.com; Amit Prakash Shukla
> <amitprakashs@marvell.com>
> Subject: [PATCH v6 1/4] pcapng: comment option support for epb
> 
> This change enhances rte_pcapng_copy to have comment in enhanced
> packet block.
> 
> Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
> ---
> v2:
>  - Fixed code style issue
>  - Fixed CI compilation issue on github-robot
> 
> v3:
>  - Code review suggestion from Stephen
>  - Fixed potential memory leak
> 
> v4:
>  - Code review suggestion from Jerin
> 
> v5:
>  - Code review suggestion from Jerin
> 
> v6:
>  - Squashing test graph param initialize fix
> 
>  app/test/test_pcapng.c                 |  4 ++--
>  doc/guides/rel_notes/release_23_03.rst |  2 ++
>  lib/pcapng/rte_pcapng.c                | 10 +++++++++-
>  lib/pcapng/rte_pcapng.h                |  4 +++-
>  lib/pdump/rte_pdump.c                  |  2 +-
>  5 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c index
> edba46d1fe..b8429a02f1 100644
> --- a/app/test/test_pcapng.c
> +++ b/app/test/test_pcapng.c
> @@ -146,7 +146,7 @@ test_write_packets(void)
>  		struct rte_mbuf *mc;
> 
>  		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
> -				rte_get_tsc_cycles(), 0);
> +				rte_get_tsc_cycles(), 0, NULL);
>  		if (mc == NULL) {
>  			fprintf(stderr, "Cannot copy packet\n");
>  			return -1;
> @@ -262,7 +262,7 @@ test_write_over_limit_iov_max(void)
>  		struct rte_mbuf *mc;
> 
>  		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
> -				rte_get_tsc_cycles(), 0);
> +				rte_get_tsc_cycles(), 0, NULL);
>  		if (mc == NULL) {
>  			fprintf(stderr, "Cannot copy packet\n");
>  			return -1;
> diff --git a/doc/guides/rel_notes/release_23_03.rst
> b/doc/guides/rel_notes/release_23_03.rst
> index 1fa101c420..bb435dde32 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -116,6 +116,8 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =======================================================
> 
> +* Experimental function ``rte_pcapng_copy`` was updated to support
> +comment
> +  section in enhanced packet block in pcapng library.
> 
>  ABI Changes
>  -----------
> diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c index
> ea004939e6..65c8c77fa4 100644
> --- a/lib/pcapng/rte_pcapng.c
> +++ b/lib/pcapng/rte_pcapng.c
> @@ -466,7 +466,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
>  		const struct rte_mbuf *md,
>  		struct rte_mempool *mp,
>  		uint32_t length, uint64_t cycles,
> -		enum rte_pcapng_direction direction)
> +		enum rte_pcapng_direction direction,
> +		const char *comment)
>  {
>  	struct pcapng_enhance_packet_block *epb;
>  	uint32_t orig_len, data_len, padding, flags; @@ -527,6 +528,9 @@
> rte_pcapng_copy(uint16_t port_id, uint32_t queue,
>  	if (rss_hash)
>  		optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
> 
> +	if (comment)
> +		optlen += pcapng_optlen(strlen(comment));
> +
>  	/* reserve trailing options and block length */
>  	opt = (struct pcapng_option *)
>  		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t)); @@ -
> 564,6 +568,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
>  					&hash_opt, sizeof(hash_opt));
>  	}
> 
> +	if (comment)
> +		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
> comment,
> +					strlen(comment));
> +
>  	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
> 
>  	/* Add PCAPNG packet header */
> diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h index
> 86b7996e29..4afdec22ef 100644
> --- a/lib/pcapng/rte_pcapng.h
> +++ b/lib/pcapng/rte_pcapng.h
> @@ -125,6 +125,8 @@ enum rte_pcapng_direction {
>   *   The timestamp in TSC cycles.
>   * @param direction
>   *   The direction of the packer: receive, transmit or unknown.
> + * @param comment
> + *   Packet comment.
>   *
>   * @return
>   *   - The pointer to the new mbuf formatted for pcapng_write
> @@ -136,7 +138,7 @@ struct rte_mbuf *
>  rte_pcapng_copy(uint16_t port_id, uint32_t queue,
>  		const struct rte_mbuf *m, struct rte_mempool *mp,
>  		uint32_t length, uint64_t timestamp,
> -		enum rte_pcapng_direction direction);
> +		enum rte_pcapng_direction direction, const char
> *comment);
> 
> 
>  /**
> diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c index
> a81544cb57..9bc4bab4f2 100644
> --- a/lib/pdump/rte_pdump.c
> +++ b/lib/pdump/rte_pdump.c
> @@ -122,7 +122,7 @@ pdump_copy(uint16_t port_id, uint16_t queue,
>  		if (cbs->ver == V2)
>  			p = rte_pcapng_copy(port_id, queue,
>  					    pkts[i], mp, cbs->snaplen,
> -					    ts, direction);
> +					    ts, direction, NULL);
>  		else
>  			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
> 
> --
> 2.25.1


^ permalink raw reply	[relevance 0%]

* [PATCH v6 2/4] graph: pcap capture for graph nodes
  2023-02-09  9:56  4% ` [PATCH v6 1/4] " Amit Prakash Shukla
@ 2023-02-09  9:56  2%   ` Amit Prakash Shukla
  2023-02-09 10:03  0%   ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
  2023-02-09 10:24  4%   ` [PATCH v7 1/3] " Amit Prakash Shukla
  2 siblings, 0 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-09  9:56 UTC (permalink / raw)
  To: Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Anatoly Burakov
  Cc: dev, david.marchand, Amit Prakash Shukla

Implementation adds support to capture packets at each node with
packet metadata and node name.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

v6:
 - Squashing test graph param initialize fix

 doc/guides/rel_notes/release_23_03.rst |   7 +
 lib/graph/graph.c                      |  17 +-
 lib/graph/graph_pcap.c                 | 216 +++++++++++++++++++++++++
 lib/graph/graph_pcap_private.h         | 116 +++++++++++++
 lib/graph/graph_populate.c             |  12 +-
 lib/graph/graph_private.h              |   5 +
 lib/graph/meson.build                  |   3 +-
 lib/graph/rte_graph.h                  |   5 +
 lib/graph/rte_graph_worker.h           |   9 ++
 9 files changed, 387 insertions(+), 3 deletions(-)
 create mode 100644 lib/graph/graph_pcap.c
 create mode 100644 lib/graph/graph_pcap_private.h

diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index bb435dde32..328dfd3009 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -87,6 +87,10 @@ New Features
     ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter
     required for eth_rx, eth_tx, crypto and timer eventdev adapters.
 
+* **Added pcap trace support in graph library.**
+
+  * Added support to capture packets at each graph node with packet metadata and
+    node name.
 
 Removed Items
 -------------
@@ -119,6 +123,9 @@ API Changes
 * Experimental function ``rte_pcapng_copy`` was updated to support comment
   section in enhanced packet block in pcapng library.
 
+* Experimental structures ``struct rte_graph_param``, ``struct rte_graph`` and
+  ``struct graph`` were updated to support pcap trace in graph library.
+
 ABI Changes
 -----------
 
diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 3a617cc369..a839a2803b 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -15,6 +15,7 @@
 #include <rte_string_fns.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static struct graph_head graph_list = STAILQ_HEAD_INITIALIZER(graph_list);
 static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER;
@@ -228,7 +229,12 @@ graph_mem_fixup_node_ctx(struct rte_graph *graph)
 		node_db = node_from_name(name);
 		if (node_db == NULL)
 			SET_ERR_JMP(ENOLINK, fail, "Node %s not found", name);
-		node->process = node_db->process;
+
+		if (graph->pcap_enable) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = node_db->process;
+		} else
+			node->process = node_db->process;
 	}
 
 	return graph;
@@ -242,6 +248,9 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
 	if (graph == NULL || rte_eal_process_type() == RTE_PROC_PRIMARY)
 		return graph;
 
+	if (graph_pcap_file_open(graph->pcap_filename) || graph_pcap_mp_init())
+		graph_pcap_exit(graph);
+
 	return graph_mem_fixup_node_ctx(graph);
 }
 
@@ -323,11 +332,17 @@ rte_graph_create(const char *name, struct rte_graph_param *prm)
 	if (graph_has_isolated_node(graph))
 		goto graph_cleanup;
 
+	/* Initialize pcap config. */
+	graph_pcap_enable(prm->pcap_enable);
+
 	/* Initialize graph object */
 	graph->socket = prm->socket_id;
 	graph->src_node_count = src_node_count;
 	graph->node_count = graph_nodes_count(graph);
 	graph->id = graph_id;
+	graph->num_pkt_to_capture = prm->num_pkt_to_capture;
+	if (prm->pcap_filename)
+		rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
 
 	/* Allocate the Graph fast path memory and populate the data */
 	if (graph_fp_mem_create(graph))
diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c
new file mode 100644
index 0000000000..9cbd1b8fdb
--- /dev/null
+++ b/lib/graph/graph_pcap.c
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#include <errno.h>
+#include <pwd.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+
+#include "rte_graph_worker.h"
+
+#include "graph_pcap_private.h"
+
+#define GRAPH_PCAP_BUF_SZ	128
+#define GRAPH_PCAP_NUM_PACKETS	1024
+#define GRAPH_PCAP_PKT_POOL	"graph_pcap_pkt_pool"
+#define GRAPH_PCAP_FILE_NAME	"dpdk_graph_pcap_capture_XXXXXX.pcapng"
+
+/* For multi-process, packets are captured in separate files. */
+static rte_pcapng_t *pcapng_fd;
+static bool pcap_enable;
+struct rte_mempool *pkt_mp;
+
+void
+graph_pcap_enable(bool val)
+{
+	pcap_enable = val;
+}
+
+int
+graph_pcap_is_enable(void)
+{
+	return pcap_enable;
+}
+
+void
+graph_pcap_exit(struct rte_graph *graph)
+{
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		if (pkt_mp)
+			rte_mempool_free(pkt_mp);
+
+	if (pcapng_fd) {
+		rte_pcapng_close(pcapng_fd);
+		pcapng_fd = NULL;
+	}
+
+	/* Disable pcap. */
+	graph->pcap_enable = 0;
+	graph_pcap_enable(0);
+}
+
+static int
+graph_pcap_default_path_get(char **dir_path)
+{
+	struct passwd *pwd;
+	char *home_dir;
+
+	/* First check for shell environment variable */
+	home_dir = getenv("HOME");
+	if (home_dir == NULL) {
+		graph_warn("Home env not preset.");
+		/* Fallback to password file entry */
+		pwd = getpwuid(getuid());
+		if (pwd == NULL)
+			return -EINVAL;
+
+		home_dir = pwd->pw_dir;
+	}
+
+	/* Append default pcap file to directory */
+	if (asprintf(dir_path, "%s/%s", home_dir, GRAPH_PCAP_FILE_NAME) == -1)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int
+graph_pcap_file_open(const char *filename)
+{
+	int fd;
+	char file_name[RTE_GRAPH_PCAP_FILE_SZ];
+	char *pcap_dir;
+
+	if (pcapng_fd)
+		goto done;
+
+	if (!filename || filename[0] == '\0') {
+		if (graph_pcap_default_path_get(&pcap_dir) < 0)
+			return -1;
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s", pcap_dir);
+		free(pcap_dir);
+	} else {
+		snprintf(file_name, RTE_GRAPH_PCAP_FILE_SZ, "%s_XXXXXX.pcapng",
+			 filename);
+	}
+
+	fd = mkstemps(file_name, strlen(".pcapng"));
+	if (fd < 0) {
+		graph_err("mkstemps() failure");
+		return -1;
+	}
+
+	graph_info("pcap filename: %s", file_name);
+
+	/* Open a capture file */
+	pcapng_fd = rte_pcapng_fdopen(fd, NULL, NULL, "Graph pcap tracer",
+				      NULL);
+	if (pcapng_fd == NULL) {
+		graph_err("Graph rte_pcapng_fdopen failed.");
+		close(fd);
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_mp_init(void)
+{
+	pkt_mp = rte_mempool_lookup(GRAPH_PCAP_PKT_POOL);
+	if (pkt_mp)
+		goto done;
+
+	/* Make a pool for cloned packets */
+	pkt_mp = rte_pktmbuf_pool_create_by_ops(GRAPH_PCAP_PKT_POOL,
+			IOV_MAX + RTE_GRAPH_BURST_SIZE,	0, 0,
+			rte_pcapng_mbuf_size(RTE_MBUF_DEFAULT_BUF_SIZE),
+			SOCKET_ID_ANY, "ring_mp_mc");
+	if (pkt_mp == NULL) {
+		graph_err("Cannot create mempool for graph pcap capture.");
+		return -1;
+	}
+
+done:
+	return 0;
+}
+
+int
+graph_pcap_init(struct graph *graph)
+{
+	struct rte_graph *graph_data = graph->graph;
+
+	if (graph_pcap_file_open(graph->pcap_filename) < 0)
+		goto error;
+
+	if (graph_pcap_mp_init() < 0)
+		goto error;
+
+	/* User configured number of packets to capture. */
+	if (graph->num_pkt_to_capture)
+		graph_data->nb_pkt_to_capture = graph->num_pkt_to_capture;
+	else
+		graph_data->nb_pkt_to_capture = GRAPH_PCAP_NUM_PACKETS;
+
+	/* All good. Now populate data for secondary process. */
+	rte_strscpy(graph_data->pcap_filename, graph->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ);
+	graph_data->pcap_enable = 1;
+
+	return 0;
+
+error:
+	graph_pcap_exit(graph_data);
+	graph_pcap_enable(0);
+	graph_err("Graph pcap initialization failed. Disabling pcap trace.");
+	return -1;
+}
+
+uint16_t
+graph_pcap_dispatch(struct rte_graph *graph,
+			      struct rte_node *node, void **objs,
+			      uint16_t nb_objs)
+{
+	struct rte_mbuf *mbuf_clones[RTE_GRAPH_BURST_SIZE];
+	char buffer[GRAPH_PCAP_BUF_SZ];
+	uint64_t i, num_packets;
+	struct rte_mbuf *mbuf;
+	ssize_t len;
+
+	if (!nb_objs || (graph->nb_pkt_captured >= graph->nb_pkt_to_capture))
+		goto done;
+
+	num_packets = graph->nb_pkt_to_capture - graph->nb_pkt_captured;
+	/* nb_objs will never be greater than RTE_GRAPH_BURST_SIZE */
+	if (num_packets > nb_objs)
+		num_packets = nb_objs;
+
+	snprintf(buffer, GRAPH_PCAP_BUF_SZ, "%s: %s", graph->name, node->name);
+
+	for (i = 0; i < num_packets; i++) {
+		struct rte_mbuf *mc;
+		mbuf = (struct rte_mbuf *)objs[i];
+
+		mc = rte_pcapng_copy(mbuf->port, 0, mbuf, pkt_mp, mbuf->pkt_len,
+				     rte_get_tsc_cycles(), 0, buffer);
+		if (mc == NULL)
+			break;
+
+		mbuf_clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng_fd, mbuf_clones, i);
+	rte_pktmbuf_free_bulk(mbuf_clones, i);
+	if (len <= 0)
+		goto done;
+
+	graph->nb_pkt_captured += i;
+
+done:
+	return node->original_process(graph, node, objs, nb_objs);
+}
diff --git a/lib/graph/graph_pcap_private.h b/lib/graph/graph_pcap_private.h
new file mode 100644
index 0000000000..2ec772072c
--- /dev/null
+++ b/lib/graph/graph_pcap_private.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(C) 2023 Marvell International Ltd.
+ */
+
+#ifndef _RTE_GRAPH_PCAP_PRIVATE_H_
+#define _RTE_GRAPH_PCAP_PRIVATE_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "graph_private.h"
+
+/**
+ * @internal
+ *
+ * Pcap trace enable/disable function.
+ *
+ * The function is called to enable/disable graph pcap trace functionality.
+ *
+ * @param val
+ *   Value to be set to enable/disable graph pcap trace.
+ */
+void graph_pcap_enable(bool val);
+
+/**
+ * @internal
+ *
+ * Check graph pcap trace is enable/disable.
+ *
+ * The function is called to check if the graph pcap trace is enabled/disabled.
+ *
+ * @return
+ *   - 1: Enable
+ *   - 0: Disable
+ */
+int graph_pcap_is_enable(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to allocate mempool.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_mp_init(void);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked to open pcap file.
+ *
+ * @param filename
+ *   Pcap filename.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_file_open(const char *filename);
+
+/**
+ * @internal
+ *
+ * Initialise graph pcap trace functionality.
+ *
+ * The function invoked when the graph pcap trace is enabled. This function
+ * open's pcap file and allocates mempool. Information needed for secondary
+ * process is populated.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ *
+ * @return
+ *   0 on success and -1 on failure.
+ */
+int graph_pcap_init(struct graph *graph);
+
+/**
+ * @internal
+ *
+ * Exit graph pcap trace functionality.
+ *
+ * The function is called to exit graph pcap trace and close open fd's and
+ * free up memory. Pcap trace is also disabled.
+ *
+ * @param graph
+ *   Pointer to graph structure.
+ */
+void graph_pcap_exit(struct rte_graph *graph);
+
+/**
+ * @internal
+ *
+ * Capture mbuf metadata and node metadata to a pcap file.
+ *
+ * When graph pcap trace enabled, this function is invoked prior to each node
+ * and mbuf, node metadata is parsed and captured in a pcap file.
+ *
+ * @param graph
+ *   Pointer to the graph object.
+ * @param node
+ *   Pointer to the node object.
+ * @param objs
+ *   Pointer to an array of objects to be processed.
+ * @param nb_objs
+ *   Number of objects in the array.
+ */
+uint16_t graph_pcap_dispatch(struct rte_graph *graph,
+				   struct rte_node *node, void **objs,
+				   uint16_t nb_objs);
+
+#endif /* _RTE_GRAPH_PCAP_PRIVATE_H_ */
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 102fd6c29b..2c0844ce92 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -9,6 +9,7 @@
 #include <rte_memzone.h>
 
 #include "graph_private.h"
+#include "graph_pcap_private.h"
 
 static size_t
 graph_fp_mem_calc_size(struct graph *graph)
@@ -75,7 +76,11 @@ graph_nodes_populate(struct graph *_graph)
 		memset(node, 0, sizeof(*node));
 		node->fence = RTE_GRAPH_FENCE;
 		node->off = off;
-		node->process = graph_node->node->process;
+		if (graph_pcap_is_enable()) {
+			node->process = graph_pcap_dispatch;
+			node->original_process = graph_node->node->process;
+		} else
+			node->process = graph_node->node->process;
 		memcpy(node->name, graph_node->node->name, RTE_GRAPH_NAMESIZE);
 		pid = graph_node->node->parent_id;
 		if (pid != RTE_NODE_ID_INVALID) { /* Cloned node */
@@ -183,6 +188,8 @@ graph_fp_mem_populate(struct graph *graph)
 	int rc;
 
 	graph_header_popluate(graph);
+	if (graph_pcap_is_enable())
+		graph_pcap_init(graph);
 	graph_nodes_populate(graph);
 	rc = graph_node_nexts_populate(graph);
 	rc |= graph_src_nodes_populate(graph);
@@ -227,6 +234,9 @@ graph_nodes_mem_destroy(struct rte_graph *graph)
 int
 graph_fp_mem_destroy(struct graph *graph)
 {
+	if (graph_pcap_is_enable())
+		graph_pcap_exit(graph->graph);
+
 	graph_nodes_mem_destroy(graph->graph);
 	return rte_memzone_free(graph->mz);
 }
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index f9a85c8926..7d1b30b8ac 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -22,6 +22,7 @@ extern int rte_graph_logtype;
 			__func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__, )))
 
 #define graph_err(...) GRAPH_LOG(ERR, __VA_ARGS__)
+#define graph_warn(...) GRAPH_LOG(WARNING, __VA_ARGS__)
 #define graph_info(...) GRAPH_LOG(INFO, __VA_ARGS__)
 #define graph_dbg(...) GRAPH_LOG(DEBUG, __VA_ARGS__)
 
@@ -100,6 +101,10 @@ struct graph {
 	/**< Memory size of the graph. */
 	int socket;
 	/**< Socket identifier where memory is allocated. */
+	uint64_t num_pkt_to_capture;
+	/**< Number of packets to be captured per core. */
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];
+	/**< pcap file name/path. */
 	STAILQ_HEAD(gnode_list, graph_node) node_list;
 	/**< Nodes in a graph. */
 };
diff --git a/lib/graph/meson.build b/lib/graph/meson.build
index c7327549e8..3526d1b5d4 100644
--- a/lib/graph/meson.build
+++ b/lib/graph/meson.build
@@ -14,7 +14,8 @@ sources = files(
         'graph_debug.c',
         'graph_stats.c',
         'graph_populate.c',
+        'graph_pcap.c',
 )
 headers = files('rte_graph.h', 'rte_graph_worker.h')
 
-deps += ['eal']
+deps += ['eal', 'pcapng']
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index b32c4bc217..c9a77297fc 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -35,6 +35,7 @@ extern "C" {
 
 #define RTE_GRAPH_NAMESIZE 64 /**< Max length of graph name. */
 #define RTE_NODE_NAMESIZE 64  /**< Max length of node name. */
+#define RTE_GRAPH_PCAP_FILE_SZ 64 /**< Max length of pcap file name. */
 #define RTE_GRAPH_OFF_INVALID UINT32_MAX /**< Invalid graph offset. */
 #define RTE_NODE_ID_INVALID UINT32_MAX   /**< Invalid node id. */
 #define RTE_EDGE_ID_INVALID UINT16_MAX   /**< Invalid edge id. */
@@ -164,6 +165,10 @@ struct rte_graph_param {
 	uint16_t nb_node_patterns;  /**< Number of node patterns. */
 	const char **node_patterns;
 	/**< Array of node patterns based on shell pattern. */
+
+	bool pcap_enable; /**< Pcap enable. */
+	uint64_t num_pkt_to_capture; /**< Number of packets to capture. */
+	char *pcap_filename; /**< Filename in which packets to be captured.*/
 };
 
 /**
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index fc6fee48c8..438595b15c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -44,6 +44,12 @@ struct rte_graph {
 	rte_graph_t id;	/**< Graph identifier. */
 	int socket;	/**< Socket ID where memory is allocated. */
 	char name[RTE_GRAPH_NAMESIZE];	/**< Name of the graph. */
+	bool pcap_enable;	        /**< Pcap trace enabled. */
+	/** Number of packets captured per core. */
+	uint64_t nb_pkt_captured;
+	/** Number of packets to capture per core. */
+	uint64_t nb_pkt_to_capture;
+	char pcap_filename[RTE_GRAPH_PCAP_FILE_SZ];  /**< Pcap filename. */
 	uint64_t fence;			/**< Fence. */
 } __rte_cache_aligned;
 
@@ -64,6 +70,9 @@ struct rte_node {
 	char parent[RTE_NODE_NAMESIZE];	/**< Parent node name. */
 	char name[RTE_NODE_NAMESIZE];	/**< Name of the node. */
 
+	/** Original process function when pcap is enabled. */
+	rte_node_process_t original_process;
+
 	/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
 	uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */
-- 
2.25.1


^ permalink raw reply	[relevance 2%]

* [PATCH v6 1/4] pcapng: comment option support for epb
  @ 2023-02-09  9:56  4% ` Amit Prakash Shukla
  2023-02-09  9:56  2%   ` [PATCH v6 2/4] graph: pcap capture for graph nodes Amit Prakash Shukla
                     ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Amit Prakash Shukla @ 2023-02-09  9:56 UTC (permalink / raw)
  To: Reshma Pattan, Stephen Hemminger
  Cc: dev, jerinj, david.marchand, Amit Prakash Shukla

This change enhances rte_pcapng_copy to have comment in enhanced
packet block.

Signed-off-by: Amit Prakash Shukla <amitprakashs@marvell.com>
---
v2:
 - Fixed code style issue
 - Fixed CI compilation issue on github-robot

v3:
 - Code review suggestion from Stephen
 - Fixed potential memory leak
 
v4:
 - Code review suggestion from Jerin

v5:
 - Code review suggestion from Jerin

v6:
 - Squashing test graph param initialize fix

 app/test/test_pcapng.c                 |  4 ++--
 doc/guides/rel_notes/release_23_03.rst |  2 ++
 lib/pcapng/rte_pcapng.c                | 10 +++++++++-
 lib/pcapng/rte_pcapng.h                |  4 +++-
 lib/pdump/rte_pdump.c                  |  2 +-
 5 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
index edba46d1fe..b8429a02f1 100644
--- a/app/test/test_pcapng.c
+++ b/app/test/test_pcapng.c
@@ -146,7 +146,7 @@ test_write_packets(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
@@ -262,7 +262,7 @@ test_write_over_limit_iov_max(void)
 		struct rte_mbuf *mc;
 
 		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
-				rte_get_tsc_cycles(), 0);
+				rte_get_tsc_cycles(), 0, NULL);
 		if (mc == NULL) {
 			fprintf(stderr, "Cannot copy packet\n");
 			return -1;
diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst
index 1fa101c420..bb435dde32 100644
--- a/doc/guides/rel_notes/release_23_03.rst
+++ b/doc/guides/rel_notes/release_23_03.rst
@@ -116,6 +116,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* Experimental function ``rte_pcapng_copy`` was updated to support comment
+  section in enhanced packet block in pcapng library.
 
 ABI Changes
 -----------
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index ea004939e6..65c8c77fa4 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -466,7 +466,8 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *md,
 		struct rte_mempool *mp,
 		uint32_t length, uint64_t cycles,
-		enum rte_pcapng_direction direction)
+		enum rte_pcapng_direction direction,
+		const char *comment)
 {
 	struct pcapng_enhance_packet_block *epb;
 	uint32_t orig_len, data_len, padding, flags;
@@ -527,6 +528,9 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 	if (rss_hash)
 		optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
 
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+
 	/* reserve trailing options and block length */
 	opt = (struct pcapng_option *)
 		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
@@ -564,6 +568,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 					&hash_opt, sizeof(hash_opt));
 	}
 
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT, comment,
+					strlen(comment));
+
 	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
 
 	/* Add PCAPNG packet header */
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index 86b7996e29..4afdec22ef 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -125,6 +125,8 @@ enum rte_pcapng_direction {
  *   The timestamp in TSC cycles.
  * @param direction
  *   The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ *   Packet comment.
  *
  * @return
  *   - The pointer to the new mbuf formatted for pcapng_write
@@ -136,7 +138,7 @@ struct rte_mbuf *
 rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *m, struct rte_mempool *mp,
 		uint32_t length, uint64_t timestamp,
-		enum rte_pcapng_direction direction);
+		enum rte_pcapng_direction direction, const char *comment);
 
 
 /**
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index a81544cb57..9bc4bab4f2 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -122,7 +122,7 @@ pdump_copy(uint16_t port_id, uint16_t queue,
 		if (cbs->ver == V2)
 			p = rte_pcapng_copy(port_id, queue,
 					    pkts[i], mp, cbs->snaplen,
-					    ts, direction);
+					    ts, direction, NULL);
 		else
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
-- 
2.25.1


^ permalink raw reply	[relevance 4%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
@ 2023-02-09  8:34  4%                   ` Morten Brørup
  2023-02-09 17:30  4%                   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2023-02-09  8:34 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Tyler Retzlaff
  Cc: thomas, dev, bruce.richardson, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd, techboard, nd

> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Thursday, 9 February 2023 01.17
> 
> <snip>
> 
> > > > > >
> > > > > > >
> > > > > > > For environments where stdatomics are not supported, we
> could
> > > > have a
> > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > > support
> > > > only
> > > > > > _explicit APIs). This allows the code to use stdatomics APIs
> and
> > > > when we move
> > > > > > to minimum supported standard C11, we just need to get rid of
> > > > > > the
> > > > file in DPDK
> > > > > > repo.
> > > > > >
> > > > > > my concern with this is that if we provide a stdatomic.h or
> > > > introduce names
> > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > >
> > > > > > references:
> > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > >  * GNU libc manual
> > > > > >
> https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > Names.html
> > > > > >
> > > > > > in effect the header, the names and in some instances
> namespaces
> > > > introduced
> > > > > > are reserved by the implementation. there are several reasons
> in
> > > > the GNU libc
> > > > > Wouldn't this apply only after the particular APIs were
> introduced?
> > > > i.e. it should not apply if the compiler does not support
> stdatomics.
> > > >
> > > > yeah, i agree they're being a bit wishy washy in the wording, but
> > > > i'm not convinced glibc folks are documenting this as permissive
> > > > guidance against.
> > > >
> > > > >
> > > > > > manual that explain the justification for these reservations
> and
> > > > > > if
> > > > if we think
> > > > > > about ODR and ABI compatibility we can conceive of others.
> > > > > >
> > > > > > i'll also remark that the inter-mingling of names from the
> POSIX
> > > > standard
> > > > > > implicitly exposed as a part of the EAL public API has been
> > > > problematic for
> > > > > > portability.
> > > > > These should be exposed as EAL APIs only when compiled with a
> > > > compiler that does not support stdatomics.
> > > >
> > > > you don't necessarily compile dpdk, the application or its other
> > > > dynamically linked dependencies with the same compiler at the
> same
> > > > time.
> > > > i.e. basically the model of any dpdk-dev package on any linux
> > > > distribution.
> > > >
> > > > if dpdk is built without real stdatomic types but the application
> > > > has to interoperate with a different kit or library that does
> they
> > > > would be forced to dance around dpdk with their own version of a
> > > > shim to hide our faked up stdatomics.
> > > >
> > >
> > > So basically, if we want a binary DPDK distribution to be
> compatible with a
> > separate application build environment, they both have to implement
> atomics
> > the same way, i.e. agree on the ABI for atomics.
> > >
> > > Summing up, this leaves us with only two realistic options:
> > >
> > > 1. Go all in on C11 stdatomics, also requiring the application
> build
> > environment to support C11 stdatomics.
> > > 2. Provide our own DPDK atomics library.
> > >
> > > (As mentioned by Tyler, the third option - using C11 stdatomics
> inside
> > > DPDK, and requiring a build environment without C11 stdatomics to
> > > implement a shim - is not realistic!)
> > >
> > > I strongly want atomics to be available for use across inline and
> compiled
> > code; i.e. it must be possible for both compiled DPDK functions and
> inline
> > functions to perform atomic transactions on the same atomic variable.
> >
> > i consider it a mandatory requirement. i don't see practically how we
> could
> > withdraw existing use and even if we had clean way i don't see why we
> would
> > want to. so this item is defintely settled if you were concerned.
> I think I agree here.
> 
> >
> > >
> > > So either we upgrade the DPDK build requirements to support C11
> (including
> > the optional stdatomics), or we provide our own DPDK atomics.
> >
> > i think the issue of requiring a toolchain conformant to a specific
> standard is a
> > separate matter because any adoption of C11 standard atomics is a
> potential
> > abi break from the current use of intrinsics.
> I am not sure why you are calling it as ABI break. Referring to [1], I
> just see wrappers around intrinsics (though [2] does not use the
> intrinsics).
> 
> [1] https://github.com/gcc-
> mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
> [2] https://github.com/llvm-
> mirror/clang/blob/master/lib/Headers/stdatomic.h

Good input, Honnappa.

This means that the ABI break is purely academic, and there is no ABI breakage in reality.

Since the underlying implementation is the same, it is perfectly OK to mix C11 and intrinsic atomics, even when the DPDK and the application are built in different environments (with and without C11 atomics, or vice versa).

This eliminates my only remaining practical concern about this approach.

> 
> >
> > the abstraction (whatever namespace it resides) allows the existing
> > toolchain/platform combinations to maintain compatibility by
> defaulting to
> > current non-standard intrinsics.
> How about using the intrinsics (__atomic_xxx) name space for
> abstraction? This covers the GCC and Clang compilers.
> If there is another platform that uses the same name space for
> something else, I think DPDK should not be supporting that platform.
> What problems do you see?
> 
> >
> > once in place it provides an opportunity to introduce new
> toolchain/platform
> > combinations and enables an opt-in capability to use stdatomics on
> existing
> > toolchain/platform combinations subject to community discussion on
> > how/if/when.
> >
> > it would be good to get more participants into the discussion so i'll
> cc techboard
> > for some attention. i feel like the only area that isn't decided is
> to do or not do
> > this in rte_ namespace.
> >
> > i'm strongly in favor of rte_ namespace after discussion, mainly due
> to to
> > disadvantages of trying to overlap with the standard namespace while
> not
> > providing a compatible api/abi and because it provides clear
> disambiguation of
> > that difference in semantics and compatibility with the standard api.
> >
> > so far i've noted the following
> >
> > * we will not provide the non-explicit apis.
> +1
> 
> > * we will make no attempt to support operate on struct/union atomics
> >   with our apis.
> +1
> 
> > * we will mirror the standard api potentially in the rte_ namespace
> to
> >   - reference the standard api documentation.
> >   - assume compatible semantics (sans exceptions from first 2
> points).
> >
> > my vote is to remove 'potentially' from the last point above for
> reasons
> > previously discussed in postings to the mail thread.
> >
> > thanks all for the discussion, i'll send up a patch removing non-
> explicit apis for
> > viewing.
> >
> > ty


^ permalink raw reply	[relevance 4%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-08 16:35  4%               ` Tyler Retzlaff
@ 2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
  2023-02-09  8:34  4%                   ` Morten Brørup
  2023-02-09 17:30  4%                   ` Tyler Retzlaff
  0 siblings, 2 replies; 200+ results
From: Honnappa Nagarahalli @ 2023-02-09  0:16 UTC (permalink / raw)
  To: Tyler Retzlaff, Morten Brørup
  Cc: thomas, dev, bruce.richardson, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd, techboard, nd

<snip>

> > > > >
> > > > > >
> > > > > > For environments where stdatomics are not supported, we could
> > > have a
> > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > > > > support
> > > only
> > > > > _explicit APIs). This allows the code to use stdatomics APIs and
> > > when we move
> > > > > to minimum supported standard C11, we just need to get rid of
> > > > > the
> > > file in DPDK
> > > > > repo.
> > > > >
> > > > > my concern with this is that if we provide a stdatomic.h or
> > > introduce names
> > > > > from stdatomic.h it's a violation of the C standard.
> > > > >
> > > > > references:
> > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > >  * GNU libc manual
> > > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > Names.html
> > > > >
> > > > > in effect the header, the names and in some instances namespaces
> > > introduced
> > > > > are reserved by the implementation. there are several reasons in
> > > the GNU libc
> > > > Wouldn't this apply only after the particular APIs were introduced?
> > > i.e. it should not apply if the compiler does not support stdatomics.
> > >
> > > yeah, i agree they're being a bit wishy washy in the wording, but
> > > i'm not convinced glibc folks are documenting this as permissive
> > > guidance against.
> > >
> > > >
> > > > > manual that explain the justification for these reservations and
> > > > > if
> > > if we think
> > > > > about ODR and ABI compatibility we can conceive of others.
> > > > >
> > > > > i'll also remark that the inter-mingling of names from the POSIX
> > > standard
> > > > > implicitly exposed as a part of the EAL public API has been
> > > problematic for
> > > > > portability.
> > > > These should be exposed as EAL APIs only when compiled with a
> > > compiler that does not support stdatomics.
> > >
> > > you don't necessarily compile dpdk, the application or its other
> > > dynamically linked dependencies with the same compiler at the same
> > > time.
> > > i.e. basically the model of any dpdk-dev package on any linux
> > > distribution.
> > >
> > > if dpdk is built without real stdatomic types but the application
> > > has to interoperate with a different kit or library that does they
> > > would be forced to dance around dpdk with their own version of a
> > > shim to hide our faked up stdatomics.
> > >
> >
> > So basically, if we want a binary DPDK distribution to be compatible with a
> separate application build environment, they both have to implement atomics
> the same way, i.e. agree on the ABI for atomics.
> >
> > Summing up, this leaves us with only two realistic options:
> >
> > 1. Go all in on C11 stdatomics, also requiring the application build
> environment to support C11 stdatomics.
> > 2. Provide our own DPDK atomics library.
> >
> > (As mentioned by Tyler, the third option - using C11 stdatomics inside
> > DPDK, and requiring a build environment without C11 stdatomics to
> > implement a shim - is not realistic!)
> >
> > I strongly want atomics to be available for use across inline and compiled
> code; i.e. it must be possible for both compiled DPDK functions and inline
> functions to perform atomic transactions on the same atomic variable.
> 
> i consider it a mandatory requirement. i don't see practically how we could
> withdraw existing use and even if we had clean way i don't see why we would
> want to. so this item is defintely settled if you were concerned.
I think I agree here.

> 
> >
> > So either we upgrade the DPDK build requirements to support C11 (including
> the optional stdatomics), or we provide our own DPDK atomics.
> 
> i think the issue of requiring a toolchain conformant to a specific standard is a
> separate matter because any adoption of C11 standard atomics is a potential
> abi break from the current use of intrinsics.
I am not sure why you are calling it as ABI break. Referring to [1], I just see wrappers around intrinsics (though [2] does not use the intrinsics).

[1] https://github.com/gcc-mirror/gcc/blob/master/gcc/ginclude/stdatomic.h
[2] https://github.com/llvm-mirror/clang/blob/master/lib/Headers/stdatomic.h

> 
> the abstraction (whatever namespace it resides) allows the existing
> toolchain/platform combinations to maintain compatibility by defaulting to
> current non-standard intrinsics.
How about using the intrinsics (__atomic_xxx) name space for abstraction? This covers the GCC and Clang compilers.
If there is another platform that uses the same name space for something else, I think DPDK should not be supporting that platform.
What problems do you see?

> 
> once in place it provides an opportunity to introduce new toolchain/platform
> combinations and enables an opt-in capability to use stdatomics on existing
> toolchain/platform combinations subject to community discussion on
> how/if/when.
> 
> it would be good to get more participants into the discussion so i'll cc techboard
> for some attention. i feel like the only area that isn't decided is to do or not do
> this in rte_ namespace.
> 
> i'm strongly in favor of rte_ namespace after discussion, mainly due to to
> disadvantages of trying to overlap with the standard namespace while not
> providing a compatible api/abi and because it provides clear disambiguation of
> that difference in semantics and compatibility with the standard api.
> 
> so far i've noted the following
> 
> * we will not provide the non-explicit apis.
+1

> * we will make no attempt to support operate on struct/union atomics
>   with our apis.
+1

> * we will mirror the standard api potentially in the rte_ namespace to
>   - reference the standard api documentation.
>   - assume compatible semantics (sans exceptions from first 2 points).
> 
> my vote is to remove 'potentially' from the last point above for reasons
> previously discussed in postings to the mail thread.
> 
> thanks all for the discussion, i'll send up a patch removing non-explicit apis for
> viewing.
> 
> ty

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-08  8:31  3%             ` Morten Brørup
@ 2023-02-08 16:35  4%               ` Tyler Retzlaff
  2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-08 16:35 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Honnappa Nagarahalli, thomas, dev, bruce.richardson,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd,
	techboard

On Wed, Feb 08, 2023 at 09:31:32AM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Wednesday, 8 February 2023 02.21
> > 
> > On Tue, Feb 07, 2023 at 11:34:14PM +0000, Honnappa Nagarahalli wrote:
> > > <snip>
> > >
> > > > > >
> > > > > > Honnappa, please could you give your view on the future of
> > atomics in
> > > > DPDK?
> > > > > Thanks Thomas, apologies it has taken me a while to get to this
> > discussion.
> > > > >
> > > > > IMO, we do not need DPDK's own abstractions. APIs from
> > stdatomic.h
> > > > (stdatomics as is called here) already serve the purpose. These
> > APIs are well
> > > > understood and documented.
> > > >
> > > > i agree that whatever atomics APIs we advocate for should align
> > with the
> > > > standard C atomics for the reasons you state including implied
> > semantics.
> > > Another point I want to make is, we need 'xxx_explicit' APIs only, as
> > we want memory ordering explicitly provided at each call site. (This
> > can be discussed later).
> > 
> > i don't have any issue with removing the non-explicit versions. they're
> > just just convenience for seq_cst anyway. if people don't want them we
> > don't have to have them.
> 
> I agree with Honnappa on this point.
> 
> The non-explicit versions are for lazy (or not so experienced) developers, and might impact performance if used instead of the correct explicit versions.
> 
> I'm working on porting some of our application code from DPDK's rte_atomic32 operations to modern atomics, and I'm temporarily using acq_rel with a FIXME comment on each operation until I have the overview to determine if another memory order is better for each operation. And if I don't get around to fixing the memory order, it is still a step in the right direct direction to get rid of the old __sync based atomics; and the FIXME's remain to be fixed in a later release.
> 
> So here's an idea: Alternatively to omitting the non-explicit versions, we could include them for application developers, but document them as placeholders for "memory order to be determined later" and emit a warning when used. It might speed up the transition away from old atomic operations. Alternatively, we risk thoughtless use of seq_cst with the explicit versions, which might be difficult to detect in code reviews.

i think it may be cleaner to ust remove the non-explicit versions. if we
are publishing api in the rte_xxx namespace then there are no
pre-existing expectations that they are present.

it also reduces the api surface that eventually gets retired ~years from
now when all ports and compilers in the matrix are std=C11.

i'll update the patch accordingly just so we have a visual.

>  
> Either way, with or without non-explicit versions, is fine with me.
> 
> > 
> > >
> > > >
> > > > >
> > > > > For environments where stdatomics are not supported, we could
> > have a
> > > > stdatomic.h in DPDK implementing the same APIs (we have to support
> > only
> > > > _explicit APIs). This allows the code to use stdatomics APIs and
> > when we move
> > > > to minimum supported standard C11, we just need to get rid of the
> > file in DPDK
> > > > repo.
> > > >
> > > > my concern with this is that if we provide a stdatomic.h or
> > introduce names
> > > > from stdatomic.h it's a violation of the C standard.
> > > >
> > > > references:
> > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > >  * GNU libc manual
> > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > Names.html
> > > >
> > > > in effect the header, the names and in some instances namespaces
> > introduced
> > > > are reserved by the implementation. there are several reasons in
> > the GNU libc
> > > Wouldn't this apply only after the particular APIs were introduced?
> > i.e. it should not apply if the compiler does not support stdatomics.
> > 
> > yeah, i agree they're being a bit wishy washy in the wording, but i'm
> > not convinced glibc folks are documenting this as permissive guidance
> > against.
> > 
> > >
> > > > manual that explain the justification for these reservations and if
> > if we think
> > > > about ODR and ABI compatibility we can conceive of others.
> > > >
> > > > i'll also remark that the inter-mingling of names from the POSIX
> > standard
> > > > implicitly exposed as a part of the EAL public API has been
> > problematic for
> > > > portability.
> > > These should be exposed as EAL APIs only when compiled with a
> > compiler that does not support stdatomics.
> > 
> > you don't necessarily compile dpdk, the application or its other
> > dynamically linked dependencies with the same compiler at the same
> > time.
> > i.e. basically the model of any dpdk-dev package on any linux
> > distribution.
> > 
> > if dpdk is built without real stdatomic types but the application has
> > to
> > interoperate with a different kit or library that does they would be
> > forced
> > to dance around dpdk with their own version of a shim to hide our
> > faked up stdatomics.
> > 
> 
> So basically, if we want a binary DPDK distribution to be compatible with a separate application build environment, they both have to implement atomics the same way, i.e. agree on the ABI for atomics.
> 
> Summing up, this leaves us with only two realistic options:
> 
> 1. Go all in on C11 stdatomics, also requiring the application build environment to support C11 stdatomics.
> 2. Provide our own DPDK atomics library.
> 
> (As mentioned by Tyler, the third option - using C11 stdatomics inside DPDK, and requiring a build environment without C11 stdatomics to implement a shim - is not realistic!)
> 
> I strongly want atomics to be available for use across inline and compiled code; i.e. it must be possible for both compiled DPDK functions and inline functions to perform atomic transactions on the same atomic variable.

i consider it a mandatory requirement. i don't see practically how we
could withdraw existing use and even if we had clean way i don't see why
we would want to. so this item is defintely settled if you were
concerned.

> 
> So either we upgrade the DPDK build requirements to support C11 (including the optional stdatomics), or we provide our own DPDK atomics.

i think the issue of requiring a toolchain conformant to a specific
standard is a separate matter because any adoption of C11 standard
atomics is a potential abi break from the current use of intrinsics.

the abstraction (whatever namespace it resides) allows the existing
toolchain/platform combinations to maintain compatibility by defaulting
to current non-standard intrinsics.

once in place it provides an opportunity to introduce new toolchain/platform
combinations and enables an opt-in capability to use stdatomics on
existing toolchain/platform combinations subject to community discussion
on how/if/when.

it would be good to get more participants into the discussion so i'll cc
techboard for some attention. i feel like the only area that isn't
decided is to do or not do this in rte_ namespace.

i'm strongly in favor of rte_ namespace after discussion, mainly due to
to disadvantages of trying to overlap with the standard namespace while not
providing a compatible api/abi and because it provides clear
disambiguation of that difference in semantics and compatibility with
the standard api.

so far i've noted the following

* we will not provide the non-explicit apis.
* we will make no attempt to support operate on struct/union atomics
  with our apis.
* we will mirror the standard api potentially in the rte_ namespace to
  - reference the standard api documentation.
  - assume compatible semantics (sans exceptions from first 2 points).

my vote is to remove 'potentially' from the last point above for reasons
previously discussed in postings to the mail thread.

thanks all for the discussion, i'll send up a patch removing
non-explicit apis for viewing.

ty

^ permalink raw reply	[relevance 4%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-08  1:20  0%           ` Tyler Retzlaff
@ 2023-02-08  8:31  3%             ` Morten Brørup
  2023-02-08 16:35  4%               ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-08  8:31 UTC (permalink / raw)
  To: Tyler Retzlaff, Honnappa Nagarahalli
  Cc: thomas, dev, bruce.richardson, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Wednesday, 8 February 2023 02.21
> 
> On Tue, Feb 07, 2023 at 11:34:14PM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > > >
> > > > > Honnappa, please could you give your view on the future of
> atomics in
> > > DPDK?
> > > > Thanks Thomas, apologies it has taken me a while to get to this
> discussion.
> > > >
> > > > IMO, we do not need DPDK's own abstractions. APIs from
> stdatomic.h
> > > (stdatomics as is called here) already serve the purpose. These
> APIs are well
> > > understood and documented.
> > >
> > > i agree that whatever atomics APIs we advocate for should align
> with the
> > > standard C atomics for the reasons you state including implied
> semantics.
> > Another point I want to make is, we need 'xxx_explicit' APIs only, as
> we want memory ordering explicitly provided at each call site. (This
> can be discussed later).
> 
> i don't have any issue with removing the non-explicit versions. they're
> just just convenience for seq_cst anyway. if people don't want them we
> don't have to have them.

I agree with Honnappa on this point.

The non-explicit versions are for lazy (or not so experienced) developers, and might impact performance if used instead of the correct explicit versions.

I'm working on porting some of our application code from DPDK's rte_atomic32 operations to modern atomics, and I'm temporarily using acq_rel with a FIXME comment on each operation until I have the overview to determine if another memory order is better for each operation. And if I don't get around to fixing the memory order, it is still a step in the right direct direction to get rid of the old __sync based atomics; and the FIXME's remain to be fixed in a later release.

So here's an idea: Alternatively to omitting the non-explicit versions, we could include them for application developers, but document them as placeholders for "memory order to be determined later" and emit a warning when used. It might speed up the transition away from old atomic operations. Alternatively, we risk thoughtless use of seq_cst with the explicit versions, which might be difficult to detect in code reviews.
 
Either way, with or without non-explicit versions, is fine with me.

> 
> >
> > >
> > > >
> > > > For environments where stdatomics are not supported, we could
> have a
> > > stdatomic.h in DPDK implementing the same APIs (we have to support
> only
> > > _explicit APIs). This allows the code to use stdatomics APIs and
> when we move
> > > to minimum supported standard C11, we just need to get rid of the
> file in DPDK
> > > repo.
> > >
> > > my concern with this is that if we provide a stdatomic.h or
> introduce names
> > > from stdatomic.h it's a violation of the C standard.
> > >
> > > references:
> > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > >  * GNU libc manual
> > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > Names.html
> > >
> > > in effect the header, the names and in some instances namespaces
> introduced
> > > are reserved by the implementation. there are several reasons in
> the GNU libc
> > Wouldn't this apply only after the particular APIs were introduced?
> i.e. it should not apply if the compiler does not support stdatomics.
> 
> yeah, i agree they're being a bit wishy washy in the wording, but i'm
> not convinced glibc folks are documenting this as permissive guidance
> against.
> 
> >
> > > manual that explain the justification for these reservations and if
> if we think
> > > about ODR and ABI compatibility we can conceive of others.
> > >
> > > i'll also remark that the inter-mingling of names from the POSIX
> standard
> > > implicitly exposed as a part of the EAL public API has been
> problematic for
> > > portability.
> > These should be exposed as EAL APIs only when compiled with a
> compiler that does not support stdatomics.
> 
> you don't necessarily compile dpdk, the application or its other
> dynamically linked dependencies with the same compiler at the same
> time.
> i.e. basically the model of any dpdk-dev package on any linux
> distribution.
> 
> if dpdk is built without real stdatomic types but the application has
> to
> interoperate with a different kit or library that does they would be
> forced
> to dance around dpdk with their own version of a shim to hide our
> faked up stdatomics.
> 

So basically, if we want a binary DPDK distribution to be compatible with a separate application build environment, they both have to implement atomics the same way, i.e. agree on the ABI for atomics.

Summing up, this leaves us with only two realistic options:

1. Go all in on C11 stdatomics, also requiring the application build environment to support C11 stdatomics.
2. Provide our own DPDK atomics library.

(As mentioned by Tyler, the third option - using C11 stdatomics inside DPDK, and requiring a build environment without C11 stdatomics to implement a shim - is not realistic!)

I strongly want atomics to be available for use across inline and compiled code; i.e. it must be possible for both compiled DPDK functions and inline functions to perform atomic transactions on the same atomic variable.

So either we upgrade the DPDK build requirements to support C11 (including the optional stdatomics), or we provide our own DPDK atomics.

> >
> > >
> > > let's discuss this from here. if there's still overwhelming desire
> to go this route
> > > then we'll just do our best.
> > >
> > > ty


^ permalink raw reply	[relevance 3%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-07 23:34  0%         ` Honnappa Nagarahalli
@ 2023-02-08  1:20  0%           ` Tyler Retzlaff
  2023-02-08  8:31  3%             ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-08  1:20 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: thomas, dev, bruce.richardson, mb, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd

On Tue, Feb 07, 2023 at 11:34:14PM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > > >
> > > > Honnappa, please could you give your view on the future of atomics in
> > DPDK?
> > > Thanks Thomas, apologies it has taken me a while to get to this discussion.
> > >
> > > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> > (stdatomics as is called here) already serve the purpose. These APIs are well
> > understood and documented.
> > 
> > i agree that whatever atomics APIs we advocate for should align with the
> > standard C atomics for the reasons you state including implied semantics.
> Another point I want to make is, we need 'xxx_explicit' APIs only, as we want memory ordering explicitly provided at each call site. (This can be discussed later).

i don't have any issue with removing the non-explicit versions. they're
just just convenience for seq_cst anyway. if people don't want them we
don't have to have them.

> 
> > 
> > >
> > > For environments where stdatomics are not supported, we could have a
> > stdatomic.h in DPDK implementing the same APIs (we have to support only
> > _explicit APIs). This allows the code to use stdatomics APIs and when we move
> > to minimum supported standard C11, we just need to get rid of the file in DPDK
> > repo.
> > 
> > my concern with this is that if we provide a stdatomic.h or introduce names
> > from stdatomic.h it's a violation of the C standard.
> > 
> > references:
> >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> >  * GNU libc manual
> >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > Names.html
> > 
> > in effect the header, the names and in some instances namespaces introduced
> > are reserved by the implementation. there are several reasons in the GNU libc
> Wouldn't this apply only after the particular APIs were introduced? i.e. it should not apply if the compiler does not support stdatomics.

yeah, i agree they're being a bit wishy washy in the wording, but i'm
not convinced glibc folks are documenting this as permissive guidance
against.

> 
> > manual that explain the justification for these reservations and if if we think
> > about ODR and ABI compatibility we can conceive of others.
> > 
> > i'll also remark that the inter-mingling of names from the POSIX standard
> > implicitly exposed as a part of the EAL public API has been problematic for
> > portability.
> These should be exposed as EAL APIs only when compiled with a compiler that does not support stdatomics.

you don't necessarily compile dpdk, the application or its other
dynamically linked dependencies with the same compiler at the same time.
i.e. basically the model of any dpdk-dev package on any linux distribution.

if dpdk is built without real stdatomic types but the application has to
interoperate with a different kit or library that does they would be forced
to dance around dpdk with their own version of a shim to hide our
faked up stdatomics.

> 
> > 
> > let's discuss this from here. if there's still overwhelming desire to go this route
> > then we'll just do our best.
> > 
> > ty

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
    @ 2023-02-07 23:34  0%         ` Honnappa Nagarahalli
  2023-02-08  1:20  0%           ` Tyler Retzlaff
  1 sibling, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2023-02-07 23:34 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: thomas, dev, bruce.richardson, mb, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd, nd

<snip>

> > >
> > > Honnappa, please could you give your view on the future of atomics in
> DPDK?
> > Thanks Thomas, apologies it has taken me a while to get to this discussion.
> >
> > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> (stdatomics as is called here) already serve the purpose. These APIs are well
> understood and documented.
> 
> i agree that whatever atomics APIs we advocate for should align with the
> standard C atomics for the reasons you state including implied semantics.
Another point I want to make is, we need 'xxx_explicit' APIs only, as we want memory ordering explicitly provided at each call site. (This can be discussed later).

> 
> >
> > For environments where stdatomics are not supported, we could have a
> stdatomic.h in DPDK implementing the same APIs (we have to support only
> _explicit APIs). This allows the code to use stdatomics APIs and when we move
> to minimum supported standard C11, we just need to get rid of the file in DPDK
> repo.
> 
> my concern with this is that if we provide a stdatomic.h or introduce names
> from stdatomic.h it's a violation of the C standard.
> 
> references:
>  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
>  * GNU libc manual
>    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> Names.html
> 
> in effect the header, the names and in some instances namespaces introduced
> are reserved by the implementation. there are several reasons in the GNU libc
Wouldn't this apply only after the particular APIs were introduced? i.e. it should not apply if the compiler does not support stdatomics.

> manual that explain the justification for these reservations and if if we think
> about ODR and ABI compatibility we can conceive of others.
> 
> i'll also remark that the inter-mingling of names from the POSIX standard
> implicitly exposed as a part of the EAL public API has been problematic for
> portability.
These should be exposed as EAL APIs only when compiled with a compiler that does not support stdatomics.

> 
> let's discuss this from here. if there's still overwhelming desire to go this route
> then we'll just do our best.
> 
> ty

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] eal: introduce atomics abstraction
  2023-02-07 15:16  0%                 ` Morten Brørup
@ 2023-02-07 21:58  0%                   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2023-02-07 21:58 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Bruce Richardson, Honnappa Nagarahalli, thomas, dev,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd

On Tue, Feb 07, 2023 at 04:16:58PM +0100, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Friday, 3 February 2023 21.49
> > 
> > On Fri, Feb 03, 2023 at 12:19:13PM +0000, Bruce Richardson wrote:
> > > On Thu, Feb 02, 2023 at 11:00:23AM -0800, Tyler Retzlaff wrote:
> > > > On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > > > Sent: Wednesday, 1 February 2023 22.41
> > > > > >
> > > > > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli
> > wrote:
> > > > > > >
> > > > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > > > > >
> > > > > > > > Honnappa, please could you give your view on the future of
> > atomics
> > > > > > in DPDK?
> > > > > > > Thanks Thomas, apologies it has taken me a while to get to
> > this
> > > > > > discussion.
> > > > > > >
> > > > > > > IMO, we do not need DPDK's own abstractions. APIs from
> > stdatomic.h
> > > > > > (stdatomics as is called here) already serve the purpose. These
> > APIs
> > > > > > are well understood and documented.
> > > > > >
> > > > > > i agree that whatever atomics APIs we advocate for should align
> > with
> > > > > > the
> > > > > > standard C atomics for the reasons you state including implied
> > > > > > semantics.
> > > > > >
> > > > > > >
> > > > > > > For environments where stdatomics are not supported, we could
> > have a
> > > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> > support only
> > > > > > _explicit APIs). This allows the code to use stdatomics APIs
> > and when
> > > > > > we move to minimum supported standard C11, we just need to get
> > rid of
> > > > > > the file in DPDK repo.
> > > > >
> > > > > Perhaps we can use something already existing, such as this:
> > > > > https://android.googlesource.com/platform/bionic/+/lollipop-
> > release/libc/include/stdatomic.h
> > > > >
> > > > > >
> > > > > > my concern with this is that if we provide a stdatomic.h or
> > introduce
> > > > > > names
> > > > > > from stdatomic.h it's a violation of the C standard.
> > > > > >
> > > > > > references:
> > > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > > >  * GNU libc manual
> > > > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > > Names.html
> > > > > >
> > > > > > in effect the header, the names and in some instances
> > namespaces
> > > > > > introduced
> > > > > > are reserved by the implementation. there are several reasons
> > in the
> > > > > > GNU libc
> > > > > > manual that explain the justification for these reservations
> > and if
> > > > > > if we think about ODR and ABI compatibility we can conceive of
> > others.
> > > > >
> > > > > I we are going to move to C11 soon, I consider the shim interim,
> > and am inclined to ignore these warning factors.
> > > > >
> > > > > If we are not moving to C11 soon, I would consider these
> > disadvantages more seriously.
> > > >
> > > > I think it's reasonable to assume that we are talking years here.
> > > >
> > > > We've had a few discussions about minimum C standard. I think my
> > first
> > > > mailing list exchanges about C99 was almost 2 years ago. Given that
> > we
> > > > still aren't on C99 now (though i know Bruce has a series up)
> > indicates
> > > > that progression to C11 isn't going to happen any time soon and
> > even if
> > > > it was the baseline we still can't just use it (reasons described
> > > > later).
> > > >
> > > > Also, i'll point out that we seem to have accepted moving to C99
> > with
> > > > one of the holdback compilers technically being non-conformant but
> > it
> > > > isn't blocking us because it provides the subset of C99 features
> > without
> > > > being conforming that we happen to be using.
> > > >
> > > What compiler is this? As far as I know, all our currently support
> > > compilers claim to support C99 fully. All should support C11 also,
> > > except for GCC 4.8 on RHEL/CentOS 7. Once we drop support for Centos
> > 7, I
> > > think we can require at minimum a c11 compiler for building DPDK
> > itself.
> > > I'm still a little uncertain about requiring that users build their
> > own
> > > code with -std=c11, though.
> > 
> > perhaps i'm mistaken but it was my understanding that the gcc version
> > on
> > RHEL 7 did not fully conform to C99? maybe i read C99 when it was
> > actually
> > C11.
> 
> RHEL does supports C99, it's C11 that it doesn't support [1].
> 
> [1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D8762F@smartserver.smartshare.dk/
> 
> > 
> > regardless, even if every supported compiler for dpdk was C11
> > conformant
> > including stdatomics which are optional we can't just move from
> > intrinsic/builtins to standard C atomics (because of the compatibility
> > and performance issues mentioned previously).
> 
> For example, with C11, you can make structures atomic. And an atomic instance of a type can have a different size than the non-atomic type.
> 
> If do we make a shim, it will have some limitations compared to C11 atomics, e.g. it cannot handle atomic structures.

right, so it "looks" like standard but then doesn't work like standard.

> 
> Either we accept these limitations of the shim, or we use our own namespace. If we accept the limitations, we risk that someone with a C11 build environment uses them anyway, and it will not work in non-C11 build environments. So a shim is not a rose without thorns.

the standard says even integer types can have different alignment and size
so it isn't strictly portable to replace any integer type with the similar
_Atomic type. here is an example of something i would prefer not to have
to navigate which using a shim would sign us up for.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146

> > 
> > so just re-orienting this discussion, the purpose of this abstraction
> > is
> > to allow the optional use of standard C atomics when a conformant
> > compiler
> > is available and satisfactory code is generated for the desired target.
> 
> I think it is more important getting this feature into DPDK than using the C11 stdatomic.h API for atomics in DPDK.
> 
> I don't feel strongly about this API, and will accept either the proposed patch series (with the C11-like API, but rte_ prefixed namespace), or a C11 stdatomic.h API shim.

let's just stay out of the standard namespace, it doesn't buy us the
forward compatibility we want and being explicit in the namespace makes
it obvious that we aren't.

ty

^ permalink raw reply	[relevance 0%]

* RE: [PATCH] eal: introduce atomics abstraction
  2023-02-03 20:49  0%               ` Tyler Retzlaff
@ 2023-02-07 15:16  0%                 ` Morten Brørup
  2023-02-07 21:58  0%                   ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-07 15:16 UTC (permalink / raw)
  To: Tyler Retzlaff, Bruce Richardson
  Cc: Honnappa Nagarahalli, thomas, dev, david.marchand, jerinj,
	konstantin.ananyev, ferruh.yigit, nd

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Friday, 3 February 2023 21.49
> 
> On Fri, Feb 03, 2023 at 12:19:13PM +0000, Bruce Richardson wrote:
> > On Thu, Feb 02, 2023 at 11:00:23AM -0800, Tyler Retzlaff wrote:
> > > On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > > Sent: Wednesday, 1 February 2023 22.41
> > > > >
> > > > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli
> wrote:
> > > > > >
> > > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > > > >
> > > > > > > Honnappa, please could you give your view on the future of
> atomics
> > > > > in DPDK?
> > > > > > Thanks Thomas, apologies it has taken me a while to get to
> this
> > > > > discussion.
> > > > > >
> > > > > > IMO, we do not need DPDK's own abstractions. APIs from
> stdatomic.h
> > > > > (stdatomics as is called here) already serve the purpose. These
> APIs
> > > > > are well understood and documented.
> > > > >
> > > > > i agree that whatever atomics APIs we advocate for should align
> with
> > > > > the
> > > > > standard C atomics for the reasons you state including implied
> > > > > semantics.
> > > > >
> > > > > >
> > > > > > For environments where stdatomics are not supported, we could
> have a
> > > > > stdatomic.h in DPDK implementing the same APIs (we have to
> support only
> > > > > _explicit APIs). This allows the code to use stdatomics APIs
> and when
> > > > > we move to minimum supported standard C11, we just need to get
> rid of
> > > > > the file in DPDK repo.
> > > >
> > > > Perhaps we can use something already existing, such as this:
> > > > https://android.googlesource.com/platform/bionic/+/lollipop-
> release/libc/include/stdatomic.h
> > > >
> > > > >
> > > > > my concern with this is that if we provide a stdatomic.h or
> introduce
> > > > > names
> > > > > from stdatomic.h it's a violation of the C standard.
> > > > >
> > > > > references:
> > > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > > >  * GNU libc manual
> > > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > > Names.html
> > > > >
> > > > > in effect the header, the names and in some instances
> namespaces
> > > > > introduced
> > > > > are reserved by the implementation. there are several reasons
> in the
> > > > > GNU libc
> > > > > manual that explain the justification for these reservations
> and if
> > > > > if we think about ODR and ABI compatibility we can conceive of
> others.
> > > >
> > > > I we are going to move to C11 soon, I consider the shim interim,
> and am inclined to ignore these warning factors.
> > > >
> > > > If we are not moving to C11 soon, I would consider these
> disadvantages more seriously.
> > >
> > > I think it's reasonable to assume that we are talking years here.
> > >
> > > We've had a few discussions about minimum C standard. I think my
> first
> > > mailing list exchanges about C99 was almost 2 years ago. Given that
> we
> > > still aren't on C99 now (though i know Bruce has a series up)
> indicates
> > > that progression to C11 isn't going to happen any time soon and
> even if
> > > it was the baseline we still can't just use it (reasons described
> > > later).
> > >
> > > Also, i'll point out that we seem to have accepted moving to C99
> with
> > > one of the holdback compilers technically being non-conformant but
> it
> > > isn't blocking us because it provides the subset of C99 features
> without
> > > being conforming that we happen to be using.
> > >
> > What compiler is this? As far as I know, all our currently support
> > compilers claim to support C99 fully. All should support C11 also,
> > except for GCC 4.8 on RHEL/CentOS 7. Once we drop support for Centos
> 7, I
> > think we can require at minimum a c11 compiler for building DPDK
> itself.
> > I'm still a little uncertain about requiring that users build their
> own
> > code with -std=c11, though.
> 
> perhaps i'm mistaken but it was my understanding that the gcc version
> on
> RHEL 7 did not fully conform to C99? maybe i read C99 when it was
> actually
> C11.

RHEL does supports C99, it's C11 that it doesn't support [1].

[1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D8762F@smartserver.smartshare.dk/

> 
> regardless, even if every supported compiler for dpdk was C11
> conformant
> including stdatomics which are optional we can't just move from
> intrinsic/builtins to standard C atomics (because of the compatibility
> and performance issues mentioned previously).

For example, with C11, you can make structures atomic. And an atomic instance of a type can have a different size than the non-atomic type.

If do we make a shim, it will have some limitations compared to C11 atomics, e.g. it cannot handle atomic structures.

Either we accept these limitations of the shim, or we use our own namespace. If we accept the limitations, we risk that someone with a C11 build environment uses them anyway, and it will not work in non-C11 build environments. So a shim is not a rose without thorns.

> 
> so just re-orienting this discussion, the purpose of this abstraction
> is
> to allow the optional use of standard C atomics when a conformant
> compiler
> is available and satisfactory code is generated for the desired target.

I think it is more important getting this feature into DPDK than using the C11 stdatomic.h API for atomics in DPDK.

I don't feel strongly about this API, and will accept either the proposed patch series (with the C11-like API, but rte_ prefixed namespace), or a C11 stdatomic.h API shim.


^ permalink raw reply	[relevance 0%]

* RE: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
    2023-02-06 15:29  0%     ` Jiawei(Jonny) Wang
@ 2023-02-07  9:40  0%     ` Ori Kam
  2023-02-09 19:44  0%     ` Ferruh Yigit
  2 siblings, 0 replies; 200+ results
From: Ori Kam @ 2023-02-07  9:40 UTC (permalink / raw)
  To: Jiawei(Jonny) Wang, Slava Ovsiienko,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang, Ferruh Yigit
  Cc: dev, Raslan Darawsheh

Hi Jiawei,


> -----Original Message-----
> From: Jiawei(Jonny) Wang <jiaweiw@nvidia.com>
> Sent: Friday, 3 February 2023 15:34
> 
> When multiple physical ports are connected to a single DPDK port,
> (example: kernel bonding, DPDK bonding, failsafe, etc.),
> we want to know which physical port is used for Rx and Tx.
> 
> This patch maps a DPDK Tx queue with a physical port,
> by adding tx_phy_affinity setting in Tx queue.
> The affinity number is the physical port ID where packets will be
> sent.
> Value 0 means no affinity and traffic could be routed to any
> connected physical ports, this is the default current behavior.
> 
> The number of physical ports is reported with rte_eth_dev_info_get().
> 
> The new tx_phy_affinity field is added into the padding hole of
> rte_eth_txconf structure, the size of rte_eth_txconf keeps the same.
> An ABI check rule needs to be added to avoid false warning.
> 
> Add the testpmd command line:
> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> 
> For example, there're two physical ports connected to
> a single DPDK port (port id 0), and phy_affinity 1 stood for
> the first physical port and phy_affinity 2 stood for the second
> physical port.
> Use the below commands to config tx phy affinity for per Tx Queue:
>         port config 0 txq 0 phy_affinity 1
>         port config 0 txq 1 phy_affinity 1
>         port config 0 txq 2 phy_affinity 2
>         port config 0 txq 3 phy_affinity 2
> 
> These commands config the Tx Queue index 0 and Tx Queue index 1 with
> phy affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets,
> these packets will be sent from the first physical port, and similar
> with the second physical port if sending packets with Tx Queue 2
> or Tx Queue 3.
> 
> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> ---
>  app/test-pmd/cmdline.c                      | 100 ++++++++++++++++++++
>  app/test-pmd/config.c                       |   1 +
>  devtools/libabigail.abignore                |   5 +
>  doc/guides/rel_notes/release_23_03.rst      |   4 +
>  doc/guides/testpmd_app_ug/testpmd_funcs.rst |  13 +++
>  lib/ethdev/rte_ethdev.h                     |  10 ++
>  6 files changed, 133 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
> index cb8c174020..f771fcf8ac 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -776,6 +776,10 @@ static void cmd_help_long_parsed(void
> *parsed_result,
> 
>  			"port cleanup (port_id) txq (queue_id) (free_cnt)\n"
>  			"    Cleanup txq mbufs for a specific Tx queue\n\n"
> +
> +			"port config (port_id) txq (queue_id) phy_affinity
> (value)\n"
> +			"    Set the physical affinity value "
> +			"on a specific Tx queue\n\n"
>  		);
>  	}
> 
> @@ -12633,6 +12637,101 @@ static cmdline_parse_inst_t
> cmd_show_port_flow_transfer_proxy = {
>  	}
>  };
> 
> +/* *** configure port txq phy_affinity value *** */
> +struct cmd_config_tx_phy_affinity {
> +	cmdline_fixed_string_t port;
> +	cmdline_fixed_string_t config;
> +	portid_t portid;
> +	cmdline_fixed_string_t txq;
> +	uint16_t qid;
> +	cmdline_fixed_string_t phy_affinity;
> +	uint8_t value;
> +};
> +
> +static void
> +cmd_config_tx_phy_affinity_parsed(void *parsed_result,
> +				  __rte_unused struct cmdline *cl,
> +				  __rte_unused void *data)
> +{
> +	struct cmd_config_tx_phy_affinity *res = parsed_result;
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_port *port;
> +	int ret;
> +
> +	if (port_id_is_invalid(res->portid, ENABLED_WARN))
> +		return;
> +
> +	if (res->portid == (portid_t)RTE_PORT_ALL) {
> +		printf("Invalid port id\n");
> +		return;
> +	}
> +
> +	port = &ports[res->portid];
> +
> +	if (strcmp(res->txq, "txq")) {
> +		printf("Unknown parameter\n");
> +		return;
> +	}
> +	if (tx_queue_id_is_invalid(res->qid))
> +		return;
> +
> +	ret = eth_dev_info_get_print_err(res->portid, &dev_info);
> +	if (ret != 0)
> +		return;
> +
> +	if (dev_info.nb_phy_ports == 0) {
> +		printf("Number of physical ports is 0 which is invalid for PHY
> Affinity\n");
> +		return;
> +	}
> +	printf("The number of physical ports is %u\n",
> dev_info.nb_phy_ports);
> +	if (dev_info.nb_phy_ports < res->value) {
> +		printf("The PHY affinity value %u is Invalid, exceeds the "
> +		       "number of physical ports\n", res->value);
> +		return;
> +	}
> +	port->txq[res->qid].conf.tx_phy_affinity = res->value;
> +
> +	cmd_reconfig_device_queue(res->portid, 0, 1);
> +}
> +
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_port =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 port, "port");
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_config =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 config, "config");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_portid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 portid, RTE_UINT16);
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_txq =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 txq, "txq");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_qid =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +			      qid, RTE_UINT16);
> +cmdline_parse_token_string_t cmd_config_tx_phy_affinity_hwport =
> +	TOKEN_STRING_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +				 phy_affinity, "phy_affinity");
> +cmdline_parse_token_num_t cmd_config_tx_phy_affinity_value =
> +	TOKEN_NUM_INITIALIZER(struct cmd_config_tx_phy_affinity,
> +			      value, RTE_UINT8);
> +
> +static cmdline_parse_inst_t cmd_config_tx_phy_affinity = {
> +	.f = cmd_config_tx_phy_affinity_parsed,
> +	.data = (void *)0,
> +	.help_str = "port config <port_id> txq <queue_id> phy_affinity
> <value>",
> +	.tokens = {
> +		(void *)&cmd_config_tx_phy_affinity_port,
> +		(void *)&cmd_config_tx_phy_affinity_config,
> +		(void *)&cmd_config_tx_phy_affinity_portid,
> +		(void *)&cmd_config_tx_phy_affinity_txq,
> +		(void *)&cmd_config_tx_phy_affinity_qid,
> +		(void *)&cmd_config_tx_phy_affinity_hwport,
> +		(void *)&cmd_config_tx_phy_affinity_value,
> +		NULL,
> +	},
> +};
> +
>  /*
> ****************************************************************
> **************** */
> 
>  /* list of instructions */
> @@ -12866,6 +12965,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_capa,
>  	(cmdline_parse_inst_t *)&cmd_show_port_cman_config,
>  	(cmdline_parse_inst_t *)&cmd_set_port_cman_config,
> +	(cmdline_parse_inst_t *)&cmd_config_tx_phy_affinity,
>  	NULL,
>  };
> 
> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
> index acccb6b035..b83fb17cfa 100644
> --- a/app/test-pmd/config.c
> +++ b/app/test-pmd/config.c
> @@ -936,6 +936,7 @@ port_infos_display(portid_t port_id)
>  		printf("unknown\n");
>  		break;
>  	}
> +	printf("Current number of physical ports: %u\n",
> dev_info.nb_phy_ports);
>  }
> 
>  void
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 7a93de3ba1..ac7d3fb2da 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -34,3 +34,8 @@
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ; Temporary exceptions till next major ABI version ;
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +
> +; Ignore fields inserted in padding hole of rte_eth_txconf
> +[suppress_type]
> +        name = rte_eth_txconf
> +        has_data_member_inserted_between = {offset_of(tx_deferred_start),
> offset_of(offloads)}
> diff --git a/doc/guides/rel_notes/release_23_03.rst
> b/doc/guides/rel_notes/release_23_03.rst
> index 73f5d94e14..e99bd2dcb6 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Added affinity for multiple physical ports connected to a single DPDK
> port.**
> +
> +  * Added Tx affinity in queue setup to map a physical port.
> +
>  * **Updated AMD axgbe driver.**
> 
>    * Added multi-process support.
> diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> index 79a1fa9cb7..5c716f7679 100644
> --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
> @@ -1605,6 +1605,19 @@ Enable or disable a per queue Tx offloading only
> on a specific Tx queue::
> 
>  This command should be run when the port is stopped, or else it will fail.
> 
> +config per queue Tx physical affinity
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Configure a per queue physical affinity value only on a specific Tx queue::
> +
> +   testpmd> port (port_id) txq (queue_id) phy_affinity (value)
> +
> +* ``phy_affinity``: physical port to use for sending,
> +                    when multiple physical ports are connected to
> +                    a single DPDK port.
> +
> +This command should be run when the port is stopped, otherwise it fails.
> +
>  Config VXLAN Encap outer layers
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..2fd971b7b5 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1138,6 +1138,14 @@ struct rte_eth_txconf {
>  				      less free descriptors than this value. */
> 
>  	uint8_t tx_deferred_start; /**< Do not start queue with
> rte_eth_dev_start(). */
> +	/**
> +	 * Affinity with one of the multiple physical ports connected to the
> DPDK port.
> +	 * Value 0 means no affinity and traffic could be routed to any
> connected
> +	 * physical port.
> +	 * The first physical port is number 1 and so on.
> +	 * Number of physical ports is reported by nb_phy_ports in
> rte_eth_dev_info.
> +	 */
> +	uint8_t tx_phy_affinity;
>  	/**
>  	 * Per-queue Tx offloads to be set  using RTE_ETH_TX_OFFLOAD_*
> flags.
>  	 * Only offloads set on tx_queue_offload_capa or tx_offload_capa
> @@ -1744,6 +1752,8 @@ struct rte_eth_dev_info {
>  	/** Device redirection table size, the total number of entries. */
>  	uint16_t reta_size;
>  	uint8_t hash_key_size; /**< Hash key size in bytes */
> +	/** Number of physical ports connected with DPDK port. */
> +	uint8_t nb_phy_ports;
>  	/** Bit mask of RSS offloads, the bit offset also means flow type */
>  	uint64_t flow_type_rss_offloads;
>  	struct rte_eth_rxconf default_rxconf; /**< Default Rx configuration
> */
> --
> 2.18.1

Acked-by: Ori Kam <orika@nvidia.com>
Thanks,
Ori

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  2023-02-06  6:21  0%                         ` Naga Harish K, S V
@ 2023-02-06 16:38  0%                           ` Jerin Jacob
  2023-02-09 17:00  0%                             ` Naga Harish K, S V
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2023-02-06 16:38 UTC (permalink / raw)
  To: Naga Harish K, S V
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan, Jay

On Mon, Feb 6, 2023 at 11:52 AM Naga Harish K, S V
<s.v.naga.harish.k@intel.com> wrote:
>
> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, February 3, 2023 3:15 PM
> > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> > Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> > Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> >
> > On Thu, Feb 2, 2023 at 9:42 PM Naga Harish K, S V
> > <s.v.naga.harish.k@intel.com> wrote:
> > >
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, January 30, 2023 8:13 PM
> > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > <jay.jayatheerthan@intel.com>
> > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > > >
> > > > On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
> > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > > Sent: Saturday, January 28, 2023 4:24 PM
> > > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > > > <jay.jayatheerthan@intel.com>
> > > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get
> > > > > > APIs
> > > > > >
> > > > > > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > +        */
> > > > > > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > > > > > >
> > > > > > > > > > > > Introduce
> > > > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > > > > > to
> > > > > > > > make
> > > > > > > > > > > > sure rsvd is zero.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The reserved fields are not used by the adapter or
> > application.
> > > > > > > > > > > Not sure Is it necessary to Introduce a new API to
> > > > > > > > > > > clear reserved
> > > > > > fields.
> > > > > > > > > >
> > > > > > > > > > When adapter starts using new fileds(when we add new
> > > > > > > > > > fieds in future), the old applicaiton which is not using
> > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() may have
> > > > junk
> > > > > > > > > > value and then adapter implementation will behave bad.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > does it mean, the application doesn't re-compile for the new
> > DPDK?
> > > > > > > >
> > > > > > > > Yes. No need recompile if ABI not breaking.
> > > > > > > >
> > > > > > > > > When some of the reserved fields are used in the future,
> > > > > > > > > the application
> > > > > > > > also may need to be recompiled along with DPDK right?
> > > > > > > > > As the application also may need to use the newly consumed
> > > > > > > > > reserved
> > > > > > > > fields?
> > > > > > > >
> > > > > > > > The problematic case is:
> > > > > > > >
> > > > > > > > Adapter implementation of 23.07(Assuming there is change
> > > > > > > > params) field needs to work with application of 23.03.
> > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > > > > > > >
> > > > > > >
> > > > > > > As rte_event_eth_rx_adapter_runtime_params_init() initializes
> > > > > > > only
> > > > > > reserved fields to zero,  it may not solve the issue in this case.
> > > > > >
> > > > > > rte_event_eth_rx_adapter_runtime_params_init() needs to zero all
> > > > > > fields, not just reserved field.
> > > > > > The application calling sequence  is
> > > > > >
> > > > > > struct my_config c;
> > > > > > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > > > > > c.interseted_filed_to_be_updated = val;
> > > > > >
> > > > > Can it be done like
> > > > >         struct my_config c = {0};
> > > > >         c.interseted_filed_to_be_updated = val; and update Doxygen
> > > > > comments to recommend above usage to reset all fields?
> > > > > This way,  rte_event_eth_rx_adapter_runtime_params_init() can be
> > > > avoided.
> > > >
> > > > Better to have a function for documentation clarity. Similar scheme
> > > > already there in DPDK. See rte_eth_cman_config_init()
> > > >
> > > >
> > >
> > >
> > > The reference function rte_eth_cman_config_init() is resetting the params
> > struct and initializing the required params with default values in the pmd cb.
> >
> > No need for PMD cb.
> >
> > > The proposed rte_event_eth_rx_adapter_runtime_params_init () API just
> > needs to reset the params struct. There are no pmd CBs involved.
> > > Having an API just to reset the struct seems overkill. What do you think?
> >
> > It is slow path API. Keeping it as function is better. Also, it helps the
> > documentations of config parm in
> > rte_event_eth_rx_adapter_runtime_params_config()
> > like, This structure must be initialized with
> > rte_event_eth_rx_adapter_runtime_params_init() or so.
> >
> >
>
> Are there any other reasons to have this API (*params_init()) other than documentation?

Initialization code is segregated for tracking.

>
> >
> > >
> > > > >
> > > > > > Let me share an example and you can tell where is the issue
> > > > > >
> > > > > > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > > > > > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
> > > > > > 3)There is an application written based on 22.03 which using
> > > > > > only 8B after calling
> > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > 4)Assume, in 22.07 another 8B added to structure.
> > > > > > 5)Now, the application (3) needs to run on 22.07. Since the
> > > > > > application is calling
> > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > and 9 to 15B are zero, the implementation will not go bad.
> > > > > >
> > > > > > > The old application only tries to set/get previous valid
> > > > > > > fields and the newly
> > > > > > used fields may still contain junk value.
> > > > > > > If the application wants to make use of any the newly used
> > > > > > > params, the
> > > > > > application changes are required anyway.
> > > > > >
> > > > > > Yes. If application wants to make use of newly added features.
> > > > > > No need to change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* Re: Sign changes through function signatures
  2023-02-04  8:09  0%           ` Morten Brørup
@ 2023-02-06 15:57  3%             ` Ben Magistro
  0 siblings, 0 replies; 200+ results
From: Ben Magistro @ 2023-02-06 15:57 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Tyler Retzlaff, Bruce Richardson, Thomas Monjalon, Olivier Matz,
	ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

[-- Attachment #1: Type: text/plain, Size: 13556 bytes --]

I'm a fan of "just rip the bandaid off" (especially when it's convenient
for me, however it's very possible I will also be the person to bring up
backwards compatibility).  Speaking of backwards compatibility, API/ABI
breakage was semi-recently discussed at the techboard [1].  From the notes
it was not clear to me what level of breakage is going to be acceptable
going forward.  This same question seems likely to apply to discussion
around specifying the c standard [2] though potentially less impactful
based on the recent discussion.  I am also beginning to see how a "-ng"
project happens to simplify making many large breaking changes.

To try and add my thoughts here.  For practical use, I don't believe a
socket or core ID should ever be negative.  I don't believe you should need
more than 4 bits for socket id (personally only aware of 4 socket system
boards), but if we are saying we keep the same memory space (32 bits) there
is no practical reason not to allocate 8 bits which gives you the remaining
24 bits for flags. In this thread, we've already identified three? that
seem useful, flag_unset (possibly regretting this, but can see this flag
being overloaded to indicate both unset and error setting), flag_any_okay,
and flag_none (not entirely clear to me yet how this would be used
differently than any_okay).  To me this really sounds like a struct makes
sense to manage the value + flags associated with it as a unit.

We are now venturing into areas I know I don't have enough knowledge about
to speak authoritatively on.  On the aspect of numa id and socket id, I had
to look this up, but it appears that one socket can have more than one numa
id (AMD Threadripper) associated with it.  I don't have easy access to an
AMD system I can run a `lscpu` on to provide a sample/confirm.  I am also
not sure what if any implications there are for how it is used within this
code base.  From a practical purpose, I believe memory is still associated
with a socket  so numa and socket may be able to be used interchangeably
for this purpose in which case I agree, pick one and standardize on that
term/language throughout the code base, possibly adding a note for future
developers/users.

When talking about core id I believe we need to utilize at least 16 bits of
space as we can have systems with dual AMD 64C/128T which I believe should
show as cores 0-255 today.  I have not looked at that aspect of the code
but see it as closely related to the socket discussion.  If making changes
to one, it is probably worth reviewing the other at the same time.  Very
quickly looking at rte_lcore [3], it seems like we either have a model that
should be followed for sockets (as suggested by Morten) or another case
where a struct may also make more sense to wrap a value and provide flags
versus magic values.

Going back to the ABI/API breakage question...  When quickly looking at the
API today, we have a number of functions that return negative values to
indicate errors.  Using references and structs may simplify that to the
point of return == 0 on success and < 0 on error, possibly with no need to
utilize rte_errno for these functions so that would at least allow for
following the existing model/pattern.  I am probably oversimplifying this
aspect.

I will say, in the case of TLDK, I've had to increase the return size of
some functions to int64_t to allow the return of the maximum value on
success and support returning a negative value on error.  Without looking,
I don't remember if that was in one of our wrappers, internal code, or
public APIs.  Regardless of where it actually is, I did not like this as
there are functions that expect a uint32_t so casts or warning suppression
may still be required in the code base.

1) http://mails.dpdk.org/archives/dev/2023-January/259811.html
2) http://mails.dpdk.org/archives/dev/2023-February/261097.html
3)
https://doc.dpdk.org/api/rte__lcore_8h.html#acbf23499dc0b2d223e4d311ad5f1b04e

On Sat, Feb 4, 2023 at 3:09 AM Morten Brørup <mb@smartsharesystems.com>
wrote:

> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Friday, 3 February 2023 23.13
> >
> > On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> > > On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 2 February 2023 21.45
> > > > >
> > > > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > While making some updates to our code base for 22.11.1 that
> > were
> > > > > missed in
> > > > > > > our first pass through, we hit the numa node change[1].  In
> > the
> > > > > process of
> > > > > > > updating our code, we noticed that a couple functions
> > > > > (rx/tx_queue_setup,
> > > > > > > maybe more that we aren't using) state they accept
> > `SOCKET_ID_ANY`
> > > > > but the
> > > > > > > function signature then asks for an unsigned integer while
> > > > > `SOCKET_ID_ANY`
> > > > > > > is `-1`.  Following it through the redirect to the "real"
> > function
> > > > > it also
> > > > > > > asks for an unsigned integer which is then passed on to one
> > or more
> > > > > > > functions asking for an integer.  As an example using the the
> > i40e
> > > > > driver
> > > > > > > -- we would call `rte_eth_tx_queue_setup` [2] which
> > ultimately
> > > > > calls
> > > > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > > > `rte_zmalloc_socket`[4]
> > > > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > > > >
> > > > > > > I guess what I am looking for is clarification on if this is
> > > > > intentional or
> > > > > > > if this is additional cleanup that may need to be
> > completed/be
> > > > > desirable so
> > > > > > > that signs are maintained through the call paths and avoid
> > > > > potentially
> > > > > > > producing sign-conversion warnings.  From the very quick
> > glance I
> > > > > took at
> > > > > > > the i40e driver, it seems these are just passed through to
> > other
> > > > > functions
> > > > > > > and no direct use/manipulation occurs (at least in the
> > mentioned
> > > > > functions).
> > > > > >
> > > > > > i believe this is just sloppyness with sign in our api surface.
> > i too
> > > > > > find it frustrating that use of these api force either explicit
> > > > > > casts or suffer having to suppress warnings.
> > > > > >
> > > > > > in the past examples of this have been cleaned up without full
> > > > > deprecation
> > > > > > notices but there are a lot of instances. i also feel
> > (unpopular
> > > > > opinion)
> > > > > > that for some integer types like this that have constrained
> > range /
> > > > > number
> > > > > > spaces it would be of value to introduce a typedef that can be
> > used
> > > > > > consistently.
> > > > > >
> > > > > > for now you'll just have to add the casts and hopefully in the
> > future
> > > > > we
> > > > > > will fix the api making them unnecessary. of course feel free
> > to
> > > > > submit
> > > > > > patches too, it would be great to have these cleaned up.
> > > > >
> > > > > I agree it should be cleaned up.
> > > > > Those IDs should accept negative values.
> > > > > Not sure which type we should choose (int, int32_t, or a
> > typedef).
> > > >
> > > > Why would we use a signed socket ID? We don't use signed port IDs.
> > To me, unsigned seems the way to go. (A minor detail: With unsigned we
> > can use the entire range of values minus one (for the magic "any"
> > value), whereas with signed we can only use the positive range of
> > values. This detail is completely irrelevant when using 32 bit for
> > socket ID, but could be relevant if using fewer bits.)
> > > >
> > > > Also, we don't need 32 bit for socket ID. 8 or 16 bit should
> > suffice, like port ID. But reducing from 32 bit would probably cause
> > major ABI breakage.
> > > >
> > > > >
> > > > > Another thing to check is the name of the variable.
> > > > > It should be a socket ID when talking about CPU,
> > > > > and a NUMA node ID when talking about memory.
> > > > >
> > > > > And last but not the least,
> > > > > how can we keep ABI compatibility?
> > > > > I hope we can use function versioning to avoid deprecation and
> > > > > breaking.
> > > > >
> > > > > Trials and suggestions are welcome.
> > > >
> > > > Signedness is not the only problem with the socket ID. The meaning
> > of SOCKET_ID_ANY is excessively overloaded. If we want to clean this
> > up, we should consider the need for another magic value SOCKET_ID_NONE
> > for devices connected to the chipset, as discussed in this other email
> > thread [1]. And as discussed there, there are also size problems,
> > because some device structures use 8 bit to hold the socket ID.
> > > >
> > > > And functions should always return -1, never SOCKET_ID_ANY, to
> > indicate error.
> > > >
> > > > [1]:
> > http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smarts
> > erver.smartshare.dk/
> > > >
> > > > I only bring warnings and complications to the discussion here, no
> > solutions. Sorry! :-(
> > > >
> > >
> > > Personally, I think if we are going to change things, we should do
> > things
> > > properly, especially/even if we are going to have to break ABI or use
> > ABI
> > > compatibility.
> > >
> > > I would suggest rather than a typedef, we should actually wrap the
> > int
> > > value in a struct - for two reasons:
> >
> > >
> > > * it means the compiler will actually error out for us if an int or
> > >   unsigned int is used instead. This allow easier fixing at compile-
> > time
> > >   rather than hoping things are correctly specified in existing code.
> > >
> > > * it allows us to do things like explicitly calling out flags, rather
> > than
> > >   just using magic values. While still keeping the size 32 bits, we
> > can
> > >   have the actual socket value as 16-bits and have flags to indicate:
> > >   - ANY socket, NO socket, INVALID value socket. This could end up
> > being
> > >   useful in many cases, for example, when allocating memory we could
> > >   specify a socket number with the ANY flag, indicating that any
> > socket is
> > >   ok, but we'd ideally prefer the number specified.
> >
> > i'm a fan of this where it makes sense. i did this with rte_thread_t
> > for
> > exactly your first reason. but i did receive resistance from other
> > members of the community. personally i like compilation to fail when i
> > make a mistake.
> >
> > it's definitely way easier to make the argument to do this when the
> > actual valued is opaque. if it isn't i think then we need to provide
> > macro/inline accessors to allow applications do whatever it is they do
> > with the value they carry.
> >
> > i'll also note that this allows you a cheap way to sprinkle extra
> > integrity checking when running functional tests. if you have low
> > performance inline accessors you can do things like enforce the range
> > of
> > values or or that enumerations are part of a set for debug builds.
> >
> > as a side i would also caution while i suggested a typedef i don't mean
> > that everything should be typedef'd especially actual structs that are
> > used like structs. typedefs for things like socket id would
> > unquestionably convey more information and implied semantics to the
> > user
> > of an api than just a standard `int' or whatever. consequently i have
> > found
> > that this lowers mistakes with the use of the api.
>
> Hiding the socket_id in a typedef'd structure seems like shooting sparrows
> with cannons.
>
> DPDK is using a C coding style, where there is a convention for not using
> typedefs:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs
>
> In the tread case, a typedef made sense, because the underlying type can
> differ across O/S'es, and thus should be opaque. Which is in line with the
> coding style.
>
> But I don't think this is the case for socket_id. The socket_id is an
> enumeration type, and all we need is a magic number for the "chipset"
> pseudo-socket. And with that, perhaps some iterator macros to include/omit
> this pseudo-socket, like the lcore_id iterators with and without the main
> lcore.
>
> The mix of signed and unsigned in function signatures (and in the
> definition of SOCKET_ID_ANY) is pure sloppyness. This problem may also be
> present in other function signatures; we just happened to run into it for
> the socket_id.
>
> The compiler has flags to warn about mixing signed and unsigned types, so
> we could use that flag to reveal and fix those bugs.
>
> >
> > >
> > > As for socket id, and numa id, I'm not sure we should have different
> > > names/types for the two. For example, for PCI devices, do they need a
> > third
> > > type or are they associated with cores or with memory? The socket id
> > for
> > > the core only matters in terms of data locality, i.e. what memory or
> > cache
> > > location it is in. Therefore, for me, I'd pick one name and stick
> > with it.
> >
> > i think the choice for more than one type vs one type is whether or not
> > they are "the same" number space as opposed to just coincidentally
> > overlapping number spaces.
> >
> > >
> > > /Bruce
>
>

[-- Attachment #2: Type: text/html, Size: 16895 bytes --]

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue API
  @ 2023-02-06 15:29  0%     ` Jiawei(Jonny) Wang
  2023-02-07  9:40  0%     ` Ori Kam
  2023-02-09 19:44  0%     ` Ferruh Yigit
  2 siblings, 0 replies; 200+ results
From: Jiawei(Jonny) Wang @ 2023-02-06 15:29 UTC (permalink / raw)
  To: Jiawei(Jonny) Wang, Slava Ovsiienko, Ori Kam,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko, Aman Singh, Yuying Zhang, Ferruh Yigit
  Cc: dev, Raslan Darawsheh

Hi,

@Andrew, @Thomas, @Ori, 

Could you lease help to review the patch?

Thanks.

> -----Original Message-----
> From: Jiawei Wang <jiaweiw@nvidia.com>
> Sent: Friday, February 3, 2023 9:34 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>;
> NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>;
> andrew.rybchenko@oktetlabs.ru; Aman Singh <aman.deep.singh@intel.com>;
> Yuying Zhang <yuying.zhang@intel.com>; Ferruh Yigit <ferruh.yigit@amd.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>
> Subject: [PATCH v4 1/2] ethdev: introduce the PHY affinity field in Tx queue
> API
> 
> When multiple physical ports are connected to a single DPDK port,
> (example: kernel bonding, DPDK bonding, failsafe, etc.), we want to know
> which physical port is used for Rx and Tx.
> 
> This patch maps a DPDK Tx queue with a physical port, by adding
> tx_phy_affinity setting in Tx queue.
> The affinity number is the physical port ID where packets will be sent.
> Value 0 means no affinity and traffic could be routed to any connected
> physical ports, this is the default current behavior.
> 
> The number of physical ports is reported with rte_eth_dev_info_get().
> 
> The new tx_phy_affinity field is added into the padding hole of rte_eth_txconf
> structure, the size of rte_eth_txconf keeps the same.
> An ABI check rule needs to be added to avoid false warning.
> 
> Add the testpmd command line:
> testpmd> port config (port_id) txq (queue_id) phy_affinity (value)
> 
> For example, there're two physical ports connected to a single DPDK port
> (port id 0), and phy_affinity 1 stood for the first physical port and phy_affinity
> 2 stood for the second physical port.
> Use the below commands to config tx phy affinity for per Tx Queue:
>         port config 0 txq 0 phy_affinity 1
>         port config 0 txq 1 phy_affinity 1
>         port config 0 txq 2 phy_affinity 2
>         port config 0 txq 3 phy_affinity 2
> 
> These commands config the Tx Queue index 0 and Tx Queue index 1 with phy
> affinity 1, uses Tx Queue 0 or Tx Queue 1 send packets, these packets will be
> sent from the first physical port, and similar with the second physical port if
> sending packets with Tx Queue 2 or Tx Queue 3.
> 
> Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
> ---

snip

> 2.18.1


^ permalink raw reply	[relevance 0%]

* [PATCH 1/3] net/nfp: remove usage of print statements
  @ 2023-02-06  7:05  8% ` Chaoyong He
  0 siblings, 0 replies; 200+ results
From: Chaoyong He @ 2023-02-06  7:05 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, niklas.soderlund, James Hershaw, Chaoyong He

From: James Hershaw <james.hershaw@corigine.com>

Removal of the usage of printf() statements from the nfp PMD in favour
of appropriate RTE logging functions in compliance with the standard.

Debug messages are now logged using the appropriate RTE_LOG functions so
it is no longer necessary to print specific statements when compiled in
with the DEBUG tag, rather log these messages using the appropriate
functions regardless of whether the DEBUG tag is set or not.

Signed-off-by: James Hershaw <james.hershaw@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
---
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c | 71 +++++++++-------------
 drivers/net/nfp/nfpcore/nfp_cppcore.c      |  3 +-
 drivers/net/nfp/nfpcore/nfp_hwinfo.c       | 17 +++---
 drivers/net/nfp/nfpcore/nfp_mip.c          | 12 ++--
 drivers/net/nfp/nfpcore/nfp_mutex.c        | 10 ++-
 drivers/net/nfp/nfpcore/nfp_nsp.c          | 30 +++++----
 drivers/net/nfp/nfpcore/nfp_nsp_cmds.c     |  4 +-
 drivers/net/nfp/nfpcore/nfp_nsp_eth.c      | 16 ++---
 drivers/net/nfp/nfpcore/nfp_resource.c     |  5 +-
 drivers/net/nfp/nfpcore/nfp_rtsym.c        | 28 +++------
 10 files changed, 87 insertions(+), 109 deletions(-)

diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
index 22c8bc4b14..8d7eb96da1 100644
--- a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
+++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
@@ -34,6 +34,7 @@
 #include <rte_string_fns.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_target.h"
 #include "nfp6000/nfp6000.h"
 
@@ -173,23 +174,17 @@ nfp_compute_bar(const struct nfp_bar *bar, uint32_t *bar_config,
 		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(tok);
 
 		if ((offset & mask) != ((offset + size - 1) & mask)) {
-			printf("BAR%d: Won't use for Fixed mapping\n",
-				bar->index);
-			printf("\t<%#llx,%#llx>, action=%d\n",
-				(unsigned long long)offset,
-				(unsigned long long)(offset + size), act);
-			printf("\tBAR too small (0x%llx).\n",
-				(unsigned long long)mask);
+			PMD_DRV_LOG(ERR, "BAR%d: Won't use for Fixed mapping <%#llx,%#llx>, action=%d BAR too small (0x%llx)",
+				    bar->index, (unsigned long long)offset,
+				    (unsigned long long)(offset + size), act,
+				    (unsigned long long)mask);
 			return -EINVAL;
 		}
 		offset &= mask;
 
-#ifdef DEBUG
-		printf("BAR%d: Created Fixed mapping\n", bar->index);
-		printf("\t%d:%d:%d:0x%#llx-0x%#llx>\n", tgt, act, tok,
-			(unsigned long long)offset,
-			(unsigned long long)(offset + mask));
-#endif
+		PMD_DRV_LOG(DEBUG, "BAR%d: Created Fixed mapping %d:%d:%d:0x%#llx-0x%#llx>",
+			    bar->index, tgt, act, tok, (unsigned long long)offset,
+			    (unsigned long long)(offset + mask));
 
 		bitsize = 40 - 16;
 	} else {
@@ -204,33 +199,27 @@ nfp_compute_bar(const struct nfp_bar *bar, uint32_t *bar_config,
 		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(tok);
 
 		if ((offset & mask) != ((offset + size - 1) & mask)) {
-			printf("BAR%d: Won't use for bulk mapping\n",
-				bar->index);
-			printf("\t<%#llx,%#llx>\n", (unsigned long long)offset,
-				(unsigned long long)(offset + size));
-			printf("\ttarget=%d, token=%d\n", tgt, tok);
-			printf("\tBAR too small (%#llx) - (%#llx != %#llx).\n",
-				(unsigned long long)mask,
-				(unsigned long long)(offset & mask),
-				(unsigned long long)(offset + size - 1) & mask);
-
+			PMD_DRV_LOG(ERR, "BAR%d: Won't use for bulk mapping <%#llx,%#llx> target=%d, token=%d BAR too small (%#llx) - (%#llx != %#llx).",
+				    bar->index, (unsigned long long)offset,
+				    (unsigned long long)(offset + size),
+				    tgt, tok, (unsigned long long)mask,
+				    (unsigned long long)(offset & mask),
+				    (unsigned long long)(offset + size - 1) & mask);
 			return -EINVAL;
 		}
 
 		offset &= mask;
 
-#ifdef DEBUG
-		printf("BAR%d: Created bulk mapping %d:x:%d:%#llx-%#llx\n",
-			bar->index, tgt, tok, (unsigned long long)offset,
-			(unsigned long long)(offset + ~mask));
-#endif
+		PMD_DRV_LOG(DEBUG, "BAR%d: Created bulk mapping %d:x:%d:%#llx-%#llx",
+			    bar->index, tgt, tok, (unsigned long long)offset,
+			    (unsigned long long)(offset + ~mask));
 
 		bitsize = 40 - 21;
 	}
 
 	if (bar->bitsize < bitsize) {
-		printf("BAR%d: Too small for %d:%d:%d\n", bar->index, tgt, tok,
-			act);
+		PMD_DRV_LOG(ERR, "BAR%d: Too small for %d:%d:%d", bar->index,
+			    tgt, tok, act);
 		return -EINVAL;
 	}
 
@@ -263,9 +252,7 @@ nfp_bar_write(struct nfp_pcie_user *nfp, struct nfp_bar *bar,
 	*(uint32_t *)(bar->csr) = newcfg;
 
 	bar->barcfg = newcfg;
-#ifdef DEBUG
-	printf("BAR%d: updated to 0x%08x\n", bar->index, newcfg);
-#endif
+	PMD_DRV_LOG(DEBUG, "BAR%d: updated to 0x%08x", bar->index, newcfg);
 
 	return 0;
 }
@@ -535,7 +522,7 @@ nfp6000_area_read(struct nfp_cpp_area *area, void *kernel_vaddr,
 
 	/* Unaligned? Translate to an explicit access */
 	if ((priv->offset + offset) & (width - 1)) {
-		printf("aread_read unaligned!!!\n");
+		PMD_DRV_LOG(ERR, "aread_read unaligned!!!");
 		return -EINVAL;
 	}
 
@@ -702,7 +689,7 @@ nfp_acquire_secondary_process_lock(struct nfp_pcie_user *desc)
 	desc->secondary_lock = open(lockfile, O_RDWR | O_CREAT | O_NONBLOCK,
 				    0666);
 	if (desc->secondary_lock < 0) {
-		RTE_LOG(ERR, PMD, "NFP lock for secondary process failed\n");
+		PMD_DRV_LOG(ERR, "NFP lock for secondary process failed");
 		free(lockfile);
 		return desc->secondary_lock;
 	}
@@ -711,7 +698,7 @@ nfp_acquire_secondary_process_lock(struct nfp_pcie_user *desc)
 	lock.l_whence = SEEK_SET;
 	rc = fcntl(desc->secondary_lock, F_SETLK, &lock);
 	if (rc < 0) {
-		RTE_LOG(ERR, PMD, "NFP lock for secondary process failed\n");
+		PMD_DRV_LOG(ERR, "NFP lock for secondary process failed");
 		close(desc->secondary_lock);
 	}
 
@@ -725,7 +712,7 @@ nfp6000_set_model(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 	uint32_t model;
 
 	if (rte_pci_read_config(dev, &model, 4, 0x2e) < 0) {
-		printf("nfp set model failed\n");
+		PMD_DRV_LOG(ERR, "nfp set model failed");
 		return -1;
 	}
 
@@ -741,7 +728,7 @@ nfp6000_set_interface(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 	uint16_t interface;
 
 	if (rte_pci_read_config(dev, &interface, 2, 0x154) < 0) {
-		printf("nfp set interface failed\n");
+		PMD_DRV_LOG(ERR, "nfp set interface failed");
 		return -1;
 	}
 
@@ -760,14 +747,14 @@ nfp6000_set_serial(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 
 	pos = rte_pci_find_ext_capability(dev, RTE_PCI_EXT_CAP_ID_DSN);
 	if (pos <= 0) {
-		printf("PCI_EXT_CAP_ID_DSN not found. nfp set serial failed\n");
+		PMD_DRV_LOG(ERR, "PCI_EXT_CAP_ID_DSN not found. nfp set serial failed");
 		return -1;
 	} else {
 		pos += 6;
 	}
 
 	if (rte_pci_read_config(dev, &tmp, 2, pos) < 0) {
-		printf("nfp set serial failed\n");
+		PMD_DRV_LOG(ERR, "nfp set serial failed");
 		return -1;
 	}
 
@@ -776,7 +763,7 @@ nfp6000_set_serial(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 
 	pos += 2;
 	if (rte_pci_read_config(dev, &tmp, 2, pos) < 0) {
-		printf("nfp set serial failed\n");
+		PMD_DRV_LOG(ERR, "nfp set serial failed");
 		return -1;
 	}
 
@@ -785,7 +772,7 @@ nfp6000_set_serial(struct rte_pci_device *dev, struct nfp_cpp *cpp)
 
 	pos += 2;
 	if (rte_pci_read_config(dev, &tmp, 2, pos) < 0) {
-		printf("nfp set serial failed\n");
+		PMD_DRV_LOG(ERR, "nfp set serial failed");
 		return -1;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_cppcore.c b/drivers/net/nfp/nfpcore/nfp_cppcore.c
index 37799af558..e1e0a143f9 100644
--- a/drivers/net/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/nfp/nfpcore/nfp_cppcore.c
@@ -15,6 +15,7 @@
 #include <ethdev_pci.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_target.h"
 #include "nfp6000/nfp6000.h"
 #include "nfp6000/nfp_xpb.h"
@@ -701,7 +702,7 @@ nfp_cpp_read(struct nfp_cpp *cpp, uint32_t destination,
 
 	area = nfp_cpp_area_alloc_acquire(cpp, destination, address, length);
 	if (!area) {
-		printf("Area allocation/acquire failed\n");
+		PMD_DRV_LOG(ERR, "Area allocation/acquire failed");
 		return -1;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_hwinfo.c b/drivers/net/nfp/nfpcore/nfp_hwinfo.c
index 9f848bde79..9b66569953 100644
--- a/drivers/net/nfp/nfpcore/nfp_hwinfo.c
+++ b/drivers/net/nfp/nfpcore/nfp_hwinfo.c
@@ -20,6 +20,7 @@
 #include <time.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp6000/nfp6000.h"
 #include "nfp_resource.h"
 #include "nfp_hwinfo.h"
@@ -40,12 +41,12 @@ nfp_hwinfo_db_walk(struct nfp_hwinfo *hwinfo, uint32_t size)
 	     key = val + strlen(val) + 1) {
 		val = key + strlen(key) + 1;
 		if (val >= end) {
-			printf("Bad HWINFO - overflowing key\n");
+			PMD_DRV_LOG(ERR, "Bad HWINFO - overflowing value");
 			return -EINVAL;
 		}
 
 		if (val + strlen(val) + 1 > end) {
-			printf("Bad HWINFO - overflowing value\n");
+			PMD_DRV_LOG(ERR, "Bad HWINFO - overflowing value");
 			return -EINVAL;
 		}
 	}
@@ -59,7 +60,7 @@ nfp_hwinfo_db_validate(struct nfp_hwinfo *db, uint32_t len)
 
 	size = db->size;
 	if (size > len) {
-		printf("Unsupported hwinfo size %u > %u\n", size, len);
+		PMD_DRV_LOG(ERR, "Unsupported hwinfo size %u > %u", size, len);
 		return -EINVAL;
 	}
 
@@ -67,8 +68,8 @@ nfp_hwinfo_db_validate(struct nfp_hwinfo *db, uint32_t len)
 	new_crc = nfp_crc32_posix((char *)db, size);
 	crc = (uint32_t *)(db->start + size);
 	if (new_crc != *crc) {
-		printf("Corrupt hwinfo table (CRC mismatch)\n");
-		printf("\tcalculated 0x%x, expected 0x%x\n", new_crc, *crc);
+		PMD_DRV_LOG(ERR, "Corrupt hwinfo table (CRC mismatch) calculated 0x%x, expected 0x%x",
+			    new_crc, *crc);
 		return -EINVAL;
 	}
 
@@ -108,12 +109,12 @@ nfp_hwinfo_try_fetch(struct nfp_cpp *cpp, size_t *cpp_size)
 		goto exit_free;
 
 	header = (void *)db;
-	printf("NFP HWINFO header: %#08x\n", *(uint32_t *)header);
+	PMD_DRV_LOG(DEBUG, "NFP HWINFO header: %#08x", *(uint32_t *)header);
 	if (nfp_hwinfo_is_updating(header))
 		goto exit_free;
 
 	if (header->version != NFP_HWINFO_VERSION_2) {
-		printf("Unknown HWInfo version: 0x%08x\n",
+		PMD_DRV_LOG(DEBUG, "Unknown HWInfo version: 0x%08x",
 			header->version);
 		goto exit_free;
 	}
@@ -145,7 +146,7 @@ nfp_hwinfo_fetch(struct nfp_cpp *cpp, size_t *hwdb_size)
 
 		nanosleep(&wait, NULL);
 		if (count++ > 200) {
-			printf("NFP access error\n");
+			PMD_DRV_LOG(ERR, "NFP access error");
 			return NULL;
 		}
 	}
diff --git a/drivers/net/nfp/nfpcore/nfp_mip.c b/drivers/net/nfp/nfpcore/nfp_mip.c
index c86966df8b..d342bc4141 100644
--- a/drivers/net/nfp/nfpcore/nfp_mip.c
+++ b/drivers/net/nfp/nfpcore/nfp_mip.c
@@ -7,6 +7,7 @@
 #include <rte_byteorder.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_mip.h"
 #include "nfp_nffw.h"
 
@@ -43,18 +44,17 @@ nfp_mip_try_read(struct nfp_cpp *cpp, uint32_t cpp_id, uint64_t addr,
 
 	ret = nfp_cpp_read(cpp, cpp_id, addr, mip, sizeof(*mip));
 	if (ret != sizeof(*mip)) {
-		printf("Failed to read MIP data (%d, %zu)\n",
-			ret, sizeof(*mip));
+		PMD_DRV_LOG(ERR, "Failed to read MIP data (%d, %zu)", ret, sizeof(*mip));
 		return -EIO;
 	}
 	if (mip->signature != NFP_MIP_SIGNATURE) {
-		printf("Incorrect MIP signature (0x%08x)\n",
-			 rte_le_to_cpu_32(mip->signature));
+		PMD_DRV_LOG(ERR, "Incorrect MIP signature (0x%08x)",
+			    rte_le_to_cpu_32(mip->signature));
 		return -EINVAL;
 	}
 	if (mip->mip_version != NFP_MIP_VERSION) {
-		printf("Unsupported MIP version (%d)\n",
-			 rte_le_to_cpu_32(mip->mip_version));
+		PMD_DRV_LOG(ERR, "Unsupported MIP version (%d)",
+			    rte_le_to_cpu_32(mip->mip_version));
 		return -EINVAL;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_mutex.c b/drivers/net/nfp/nfpcore/nfp_mutex.c
index 318c5800d7..de9049c6a0 100644
--- a/drivers/net/nfp/nfpcore/nfp_mutex.c
+++ b/drivers/net/nfp/nfpcore/nfp_mutex.c
@@ -10,6 +10,7 @@
 #include <sched.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp6000/nfp6000.h"
 
 #define MUTEX_LOCKED(interface)  ((((uint32_t)(interface)) << 16) | 0x000f)
@@ -265,12 +266,9 @@ nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex)
 		if (err < 0 && errno != EBUSY)
 			return err;
 		if (time(NULL) >= warn_at) {
-			printf("Warning: waiting for NFP mutex\n");
-			printf("\tusage:%u\n", mutex->usage);
-			printf("\tdepth:%hd]\n", mutex->depth);
-			printf("\ttarget:%d\n", mutex->target);
-			printf("\taddr:%llx\n", mutex->address);
-			printf("\tkey:%08x]\n", mutex->key);
+			PMD_DRV_LOG(ERR, "Warning: waiting for NFP mutex usage:%u depth:%hd] target:%d addr:%llx key:%08x]",
+				    mutex->usage, mutex->depth, mutex->target,
+				    mutex->address, mutex->key);
 			warn_at = time(NULL) + 60;
 		}
 		sched_yield();
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp.c b/drivers/net/nfp/nfpcore/nfp_nsp.c
index 876a4017c9..22fb3407c6 100644
--- a/drivers/net/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/nfp/nfpcore/nfp_nsp.c
@@ -11,6 +11,7 @@
 #include <rte_common.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_nsp.h"
 #include "nfp_resource.h"
 
@@ -62,7 +63,7 @@ nfp_nsp_print_extended_error(uint32_t ret_val)
 
 	for (i = 0; i < (int)ARRAY_SIZE(nsp_errors); i++)
 		if (ret_val == (uint32_t)nsp_errors[i].code)
-			printf("err msg: %s\n", nsp_errors[i].msg);
+			PMD_DRV_LOG(ERR, "err msg: %s", nsp_errors[i].msg);
 }
 
 static int
@@ -81,7 +82,7 @@ nfp_nsp_check(struct nfp_nsp *state)
 		return err;
 
 	if (FIELD_GET(NSP_STATUS_MAGIC, reg) != NSP_MAGIC) {
-		printf("Cannot detect NFP Service Processor\n");
+		PMD_DRV_LOG(ERR, "Cannot detect NFP Service Processor");
 		return -ENODEV;
 	}
 
@@ -89,13 +90,13 @@ nfp_nsp_check(struct nfp_nsp *state)
 	state->ver.minor = FIELD_GET(NSP_STATUS_MINOR, reg);
 
 	if (state->ver.major != NSP_MAJOR || state->ver.minor < NSP_MINOR) {
-		printf("Unsupported ABI %hu.%hu\n", state->ver.major,
+		PMD_DRV_LOG(ERR, "Unsupported ABI %hu.%hu", state->ver.major,
 						    state->ver.minor);
 		return -EINVAL;
 	}
 
 	if (reg & NSP_STATUS_BUSY) {
-		printf("Service processor busy!\n");
+		PMD_DRV_LOG(ERR, "Service processor busy!");
 		return -EBUSY;
 	}
 
@@ -223,7 +224,7 @@ nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
 
 	if (!FIELD_FIT(NSP_BUFFER_CPP, buff_cpp >> 8) ||
 	    !FIELD_FIT(NSP_BUFFER_ADDRESS, buff_addr)) {
-		printf("Host buffer out of reach %08x %" PRIx64 "\n",
+		PMD_DRV_LOG(ERR, "Host buffer out of reach %08x %" PRIx64,
 			buff_cpp, buff_addr);
 		return -EINVAL;
 	}
@@ -245,7 +246,7 @@ nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
 	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_command,
 			       NSP_COMMAND_START, 0);
 	if (err) {
-		printf("Error %d waiting for code 0x%04x to start\n",
+		PMD_DRV_LOG(ERR, "Error %d waiting for code 0x%04x to start",
 			err, code);
 		return err;
 	}
@@ -254,7 +255,7 @@ nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
 	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_status, NSP_STATUS_BUSY,
 			       0);
 	if (err) {
-		printf("Error %d waiting for code 0x%04x to complete\n",
+		PMD_DRV_LOG(ERR, "Error %d waiting for code 0x%04x to start",
 			err, code);
 		return err;
 	}
@@ -266,7 +267,7 @@ nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
 
 	err = FIELD_GET(NSP_STATUS_RESULT, reg);
 	if (err) {
-		printf("Result (error) code set: %d (%d) command: %d\n",
+		PMD_DRV_LOG(ERR, "Result (error) code set: %d (%d) command: %d",
 			 -err, (int)ret_val, code);
 		nfp_nsp_print_extended_error(ret_val);
 		return -err;
@@ -289,8 +290,8 @@ nfp_nsp_command_buf(struct nfp_nsp *nsp, uint16_t code, uint32_t option,
 	uint32_t cpp_id;
 
 	if (nsp->ver.minor < 13) {
-		printf("NSP: Code 0x%04x with buffer not supported\n", code);
-		printf("\t(ABI %hu.%hu)\n", nsp->ver.major, nsp->ver.minor);
+		PMD_DRV_LOG(ERR, "NSP: Code 0x%04x with buffer not supported ABI %hu.%hu)",
+			    code, nsp->ver.major, nsp->ver.minor);
 		return -EOPNOTSUPP;
 	}
 
@@ -303,11 +304,8 @@ nfp_nsp_command_buf(struct nfp_nsp *nsp, uint16_t code, uint32_t option,
 
 	max_size = RTE_MAX(in_size, out_size);
 	if (FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M < max_size) {
-		printf("NSP: default buffer too small for command 0x%04x\n",
-		       code);
-		printf("\t(%llu < %u)\n",
-		       FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M,
-		       max_size);
+		PMD_DRV_LOG(ERR, "NSP: default buffer too small for command 0x%04x (%llu < %u)",
+			    code, FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M, max_size);
 		return -EINVAL;
 	}
 
@@ -372,7 +370,7 @@ nfp_nsp_wait(struct nfp_nsp *state)
 		}
 	}
 	if (err)
-		printf("NSP failed to respond %d\n", err);
+		PMD_DRV_LOG(ERR, "NSP failed to respond %d", err);
 
 	return err;
 }
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c b/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
index bfd1eddb3e..1de3d1b00f 100644
--- a/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
+++ b/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
@@ -6,6 +6,7 @@
 #include <stdio.h>
 #include <rte_byteorder.h>
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_nsp.h"
 #include "nfp_nffw.h"
 
@@ -39,8 +40,7 @@ __nfp_nsp_identify(struct nfp_nsp *nsp)
 	memset(ni, 0, sizeof(*ni));
 	ret = nfp_nsp_read_identify(nsp, ni, sizeof(*ni));
 	if (ret < 0) {
-		printf("reading bsp version failed %d\n",
-			ret);
+		PMD_DRV_LOG(ERR, "reading bsp version failed %d", ret);
 		goto exit_free;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
index f8f3c372ac..eb532e5f3a 100644
--- a/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
@@ -7,6 +7,7 @@
 #include <rte_common.h>
 #include <rte_byteorder.h>
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_nsp.h"
 #include "nfp6000/nfp6000.h"
 
@@ -236,7 +237,7 @@ nfp_eth_calc_port_geometry(struct nfp_eth_table *table)
 				continue;
 			if (table->ports[i].label_subport ==
 			    table->ports[j].label_subport)
-				printf("Port %d subport %d is a duplicate\n",
+				PMD_DRV_LOG(DEBUG, "Port %d subport %d is a duplicate",
 					 table->ports[i].label_port,
 					 table->ports[i].label_subport);
 
@@ -275,7 +276,7 @@ __nfp_eth_read_ports(struct nfp_nsp *nsp)
 	memset(entries, 0, NSP_ETH_TABLE_SIZE);
 	ret = nfp_nsp_read_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
 	if (ret < 0) {
-		printf("reading port table failed %d\n", ret);
+		PMD_DRV_LOG(ERR, "reading port table failed %d", ret);
 		goto err;
 	}
 
@@ -294,7 +295,7 @@ __nfp_eth_read_ports(struct nfp_nsp *nsp)
 	 * above.
 	 */
 	if (ret && ret != cnt) {
-		printf("table entry count (%d) unmatch entries present (%d)\n",
+		PMD_DRV_LOG(ERR, "table entry count (%d) unmatch entries present (%d)",
 		       ret, cnt);
 		goto err;
 	}
@@ -372,12 +373,12 @@ nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx)
 
 	ret = nfp_nsp_read_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
 	if (ret < 0) {
-		printf("reading port table failed %d\n", ret);
+		PMD_DRV_LOG(ERR, "reading port table failed %d", ret);
 		goto err;
 	}
 
 	if (!(entries[idx].port & NSP_ETH_PORT_LANES_MASK)) {
-		printf("trying to set port state on disabled port %d\n", idx);
+		PMD_DRV_LOG(ERR, "trying to set port state on disabled port %d", idx);
 		goto err;
 	}
 
@@ -535,7 +536,7 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
 	 *	 codes were initially not populated correctly.
 	 */
 	if (nfp_nsp_get_abi_ver_minor(nsp) < 17) {
-		printf("set operations not supported, please update flash\n");
+		PMD_DRV_LOG(ERR, "set operations not supported, please update flash");
 		return -EOPNOTSUPP;
 	}
 
@@ -647,8 +648,7 @@ __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
 
 	rate = nfp_eth_speed2rate(speed);
 	if (rate == RATE_INVALID) {
-		printf("could not find matching lane rate for speed %u\n",
-			 speed);
+		PMD_DRV_LOG(ERR, "could not find matching lane rate for speed %u", speed);
 		return -EINVAL;
 	}
 
diff --git a/drivers/net/nfp/nfpcore/nfp_resource.c b/drivers/net/nfp/nfpcore/nfp_resource.c
index 7b5630fd86..6a10c9b0a7 100644
--- a/drivers/net/nfp/nfpcore/nfp_resource.c
+++ b/drivers/net/nfp/nfpcore/nfp_resource.c
@@ -10,6 +10,7 @@
 #include <rte_string_fns.h>
 
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp6000/nfp6000.h"
 #include "nfp_resource.h"
 #include "nfp_crc.h"
@@ -79,7 +80,7 @@ nfp_cpp_resource_find(struct nfp_cpp *cpp, struct nfp_resource *res)
 
 	/* Search for a matching entry */
 	if (!memcmp(name_pad, NFP_RESOURCE_TBL_NAME "\0\0\0\0\0\0\0\0", 8)) {
-		printf("Grabbing device lock not supported\n");
+		PMD_DRV_LOG(ERR, "Grabbing device lock not supported");
 		return -EOPNOTSUPP;
 	}
 	key = nfp_crc32_posix(name_pad, NFP_RESOURCE_ENTRY_NAME_SZ);
@@ -185,7 +186,7 @@ nfp_resource_acquire(struct nfp_cpp *cpp, const char *name)
 			goto err_free;
 
 		if (count++ > 1000) {
-			printf("Error: resource %s timed out\n", name);
+			PMD_DRV_LOG(ERR, "Error: resource %s timed out", name);
 			err = -EBUSY;
 			goto err_free;
 		}
diff --git a/drivers/net/nfp/nfpcore/nfp_rtsym.c b/drivers/net/nfp/nfpcore/nfp_rtsym.c
index 56bbf05cd8..288a37da60 100644
--- a/drivers/net/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/nfp/nfpcore/nfp_rtsym.c
@@ -11,6 +11,7 @@
 #include <stdio.h>
 #include <rte_byteorder.h>
 #include "nfp_cpp.h"
+#include "nfp_logs.h"
 #include "nfp_mip.h"
 #include "nfp_rtsym.h"
 #include "nfp6000/nfp6000.h"
@@ -56,11 +57,8 @@ nfp_rtsym_sw_entry_init(struct nfp_rtsym_table *cache, uint32_t strtab_size,
 	sw->size = ((uint64_t)fw->size_hi << 32) |
 		   rte_le_to_cpu_32(fw->size_lo);
 
-#ifdef DEBUG
-	printf("rtsym_entry_init\n");
-	printf("\tname=%s, addr=%" PRIx64 ", size=%" PRIu64 ",target=%d\n",
-		sw->name, sw->addr, sw->size, sw->target);
-#endif
+	PMD_INIT_LOG(DEBUG, "rtsym_entry_init name=%s, addr=%" PRIx64 ", size=%" PRIu64 ", target=%d",
+		     sw->name, sw->addr, sw->size, sw->target);
 	switch (fw->target) {
 	case SYM_TGT_LMEM:
 		sw->target = NFP_RTSYM_TARGET_LMEM;
@@ -241,10 +239,8 @@ nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name, int *error)
 
 	id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0, sym->domain);
 
-#ifdef DEBUG
-	printf("Reading symbol %s with size %" PRIu64 " at %" PRIx64 "\n",
+	PMD_DRV_LOG(DEBUG, "Reading symbol %s with size %" PRIu64 " at %" PRIx64 "",
 		name, sym->size, sym->addr);
-#endif
 	switch (sym->size) {
 	case 4:
 		err = nfp_cpp_readl(rtbl->cpp, id, sym->addr, &val32);
@@ -254,7 +250,7 @@ nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name, int *error)
 		err = nfp_cpp_readq(rtbl->cpp, id, sym->addr, &val);
 		break;
 	default:
-		printf("rtsym '%s' unsupported size: %" PRId64 "\n",
+		PMD_DRV_LOG(ERR, "rtsym '%s' unsupported size: %" PRId64,
 			name, sym->size);
 		err = -EINVAL;
 		break;
@@ -279,17 +275,15 @@ nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name,
 	const struct nfp_rtsym *sym;
 	uint8_t *mem;
 
-#ifdef DEBUG
-	printf("mapping symbol %s\n", name);
-#endif
+	PMD_DRV_LOG(DEBUG, "mapping symbol %s", name);
 	sym = nfp_rtsym_lookup(rtbl, name);
 	if (!sym) {
-		printf("symbol lookup fails for %s\n", name);
+		PMD_DRV_LOG(ERR, "symbol lookup fails for %s", name);
 		return NULL;
 	}
 
 	if (sym->size < min_size) {
-		printf("Symbol %s too small (%" PRIu64 " < %u)\n", name,
+		PMD_DRV_LOG(ERR, "Symbol %s too small (%" PRIu64 " < %u)", name,
 			sym->size, min_size);
 		return NULL;
 	}
@@ -297,12 +291,10 @@ nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name,
 	mem = nfp_cpp_map_area(rtbl->cpp, sym->domain, sym->target, sym->addr,
 			       sym->size, area);
 	if (!mem) {
-		printf("Failed to map symbol %s\n", name);
+		PMD_DRV_LOG(ERR, "Failed to map symbol %s", name);
 		return NULL;
 	}
-#ifdef DEBUG
-	printf("symbol %s with address %p\n", name, mem);
-#endif
+	PMD_DRV_LOG(DEBUG, "symbol %s with address %p", name, mem);
 
 	return mem;
 }
-- 
2.29.3


^ permalink raw reply	[relevance 8%]

* RE: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
  @ 2023-02-06  6:21  0%                         ` Naga Harish K, S V
  2023-02-06 16:38  0%                           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Naga Harish K, S V @ 2023-02-06  6:21 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: jerinj, Carrillo, Erik G, Gujjar, Abhinandan S, dev, Jayatheerthan,  Jay

Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, February 3, 2023 3:15 PM
> To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> Cc: jerinj@marvell.com; Carrillo, Erik G <erik.g.carrillo@intel.com>; Gujjar,
> Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> Jayatheerthan, Jay <jay.jayatheerthan@intel.com>
> Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> 
> On Thu, Feb 2, 2023 at 9:42 PM Naga Harish K, S V
> <s.v.naga.harish.k@intel.com> wrote:
> >
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, January 30, 2023 8:13 PM
> > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > <jay.jayatheerthan@intel.com>
> > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get APIs
> > >
> > > On Mon, Jan 30, 2023 at 3:26 PM Naga Harish K, S V
> > > <s.v.naga.harish.k@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Saturday, January 28, 2023 4:24 PM
> > > > > To: Naga Harish K, S V <s.v.naga.harish.k@intel.com>
> > > > > Cc: jerinj@marvell.com; Carrillo, Erik G
> > > > > <erik.g.carrillo@intel.com>; Gujjar, Abhinandan S
> > > > > <abhinandan.gujjar@intel.com>; dev@dpdk.org; Jayatheerthan, Jay
> > > > > <jay.jayatheerthan@intel.com>
> > > > > Subject: Re: [PATCH v2 1/3] eventdev/eth_rx: add params set/get
> > > > > APIs
> > > > >
> > > > > On Wed, Jan 25, 2023 at 10:02 PM Naga Harish K, S V
> > > > > <s.v.naga.harish.k@intel.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > >
> > > > > > > > > >
> > > > > > > > > > > > +        */
> > > > > > > > > > > > +       uint32_t rsvd[15];
> > > > > > > > > > > > +       /**< Reserved fields for future use */
> > > > > > > > > > >
> > > > > > > > > > > Introduce
> > > > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > > > > > > > to
> > > > > > > make
> > > > > > > > > > > sure rsvd is zero.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The reserved fields are not used by the adapter or
> application.
> > > > > > > > > > Not sure Is it necessary to Introduce a new API to
> > > > > > > > > > clear reserved
> > > > > fields.
> > > > > > > > >
> > > > > > > > > When adapter starts using new fileds(when we add new
> > > > > > > > > fieds in future), the old applicaiton which is not using
> > > > > > > > > rte_event_eth_rx_adapter_runtime_params_init() may have
> > > junk
> > > > > > > > > value and then adapter implementation will behave bad.
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > does it mean, the application doesn't re-compile for the new
> DPDK?
> > > > > > >
> > > > > > > Yes. No need recompile if ABI not breaking.
> > > > > > >
> > > > > > > > When some of the reserved fields are used in the future,
> > > > > > > > the application
> > > > > > > also may need to be recompiled along with DPDK right?
> > > > > > > > As the application also may need to use the newly consumed
> > > > > > > > reserved
> > > > > > > fields?
> > > > > > >
> > > > > > > The problematic case is:
> > > > > > >
> > > > > > > Adapter implementation of 23.07(Assuming there is change
> > > > > > > params) field needs to work with application of 23.03.
> > > > > > > rte_event_eth_rx_adapter_runtime_params_init() will sove that.
> > > > > > >
> > > > > >
> > > > > > As rte_event_eth_rx_adapter_runtime_params_init() initializes
> > > > > > only
> > > > > reserved fields to zero,  it may not solve the issue in this case.
> > > > >
> > > > > rte_event_eth_rx_adapter_runtime_params_init() needs to zero all
> > > > > fields, not just reserved field.
> > > > > The application calling sequence  is
> > > > >
> > > > > struct my_config c;
> > > > > rte_event_eth_rx_adapter_runtime_params_init(&c)
> > > > > c.interseted_filed_to_be_updated = val;
> > > > >
> > > > Can it be done like
> > > >         struct my_config c = {0};
> > > >         c.interseted_filed_to_be_updated = val; and update Doxygen
> > > > comments to recommend above usage to reset all fields?
> > > > This way,  rte_event_eth_rx_adapter_runtime_params_init() can be
> > > avoided.
> > >
> > > Better to have a function for documentation clarity. Similar scheme
> > > already there in DPDK. See rte_eth_cman_config_init()
> > >
> > >
> >
> >
> > The reference function rte_eth_cman_config_init() is resetting the params
> struct and initializing the required params with default values in the pmd cb.
> 
> No need for PMD cb.
> 
> > The proposed rte_event_eth_rx_adapter_runtime_params_init () API just
> needs to reset the params struct. There are no pmd CBs involved.
> > Having an API just to reset the struct seems overkill. What do you think?
> 
> It is slow path API. Keeping it as function is better. Also, it helps the
> documentations of config parm in
> rte_event_eth_rx_adapter_runtime_params_config()
> like, This structure must be initialized with
> rte_event_eth_rx_adapter_runtime_params_init() or so.
> 
> 

Are there any other reasons to have this API (*params_init()) other than documentation?

> 
> >
> > > >
> > > > > Let me share an example and you can tell where is the issue
> > > > >
> > > > > 1)Assume parameter structure is 64B and for 22.03 8B are used.
> > > > > 2)rte_event_eth_rx_adapter_runtime_params_init() will clear all 64B.
> > > > > 3)There is an application written based on 22.03 which using
> > > > > only 8B after calling
> > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > 4)Assume, in 22.07 another 8B added to structure.
> > > > > 5)Now, the application (3) needs to run on 22.07. Since the
> > > > > application is calling
> > > > > rte_event_eth_rx_adapter_runtime_params_init()
> > > > > and 9 to 15B are zero, the implementation will not go bad.
> > > > >
> > > > > > The old application only tries to set/get previous valid
> > > > > > fields and the newly
> > > > > used fields may still contain junk value.
> > > > > > If the application wants to make use of any the newly used
> > > > > > params, the
> > > > > application changes are required anyway.
> > > > >
> > > > > Yes. If application wants to make use of newly added features.
> > > > > No need to change if new features are not needed for old application.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/12] mldev: introduce machine learning device library
  2023-02-03 20:49  0%         ` Thomas Monjalon
@ 2023-02-05 23:41  0%           ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2023-02-05 23:41 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	dev, Bruce Richardson, Srikanth Yalavarthi, ferruh.yigit,
	ajit.khaparde, aboyer, andrew.rybchenko, beilei.xing, chas3,
	chenbo.xia, ciara.loftus, Devendra Singh Rawat, ed.czeck,
	evgenys, grive, g.singh, zhouguoyang, haiyue.wang, Harman Kalra,
	heinrich.kuhn, hemant.agrawal, hyonkim, igorch, Igor Russkikh,
	jgrajcia, jasvinder.singh, jianwang, jiawenwu, jingjing.wu,
	johndale, john.miller, linville, keith.wiles,
	Kiran Kumar Kokkilagadda, oulijun, Liron Himi, longli, mw,
	spinler, matan, matt.peters, maxime.coquelin, mk, humin29,
	Pradeep Kumar Nalla, Nithin Kumar Dabilpuram, qiming.yang,
	qi.z.zhang, Radha Chintakuntla, rahul.lakkireddy, Rasesh Mody,
	rosen.xu, sachin.saxena, Satha Koteswara Rao Kottidi,
	Shahed Shaikh, shaibran, shepard.siegel, asomalap, somnath.kotur,
	sthemmin, steven.webster, Sunil Kumar Kori, mtetsuyah,
	Veerasenareddy Burru, viacheslavo, xiao.w.wang,
	cloud.wangxiaoyun, yisen.zhuang, yongwang, xuanziyang2,
	Prasun Kapoor, Nadav Haklai, Satananda Burla,
	Narayana Prasad Raju Athreya, Akhil Goyal, mdr, dmitry.kozliuk,
	anatoly.burakov, cristian.dumitrescu, honnappa.nagarahalli,
	mattias.ronnblom, ruifeng.wang, drc, konstantin.ananyev,
	olivier.matz, jay.jayatheerthan, Ashwin Sekhar T K,
	Pavan Nikhilesh Bhagavatula, eagostini, Derek Chickles,
	Parijat Shukla, Anup Prabhu, Prince Takkar, david.marchand

On Fri, 03 Feb 2023 21:49:02 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> > > > > Good catch.
> > > > > By the way, we should remove unused RTE_LOGTYPE_*.    
> > > > 
> > > > Yes, for 23.11 would like to work down the list.    
> > > 
> > > Do we need to wait 23.11?
> > > It is not an ABI breakage.
> > > And most of these defines are already unused.  
> > 
> > Turning them into deprecated would be API breakage though  
> 
> API breakage is not forbidden.
> 

For the internal ones it would be ok, but what about the RTE_LOGTYPE_USER1 etc.
These need to go through the regular deprecation process.

The problem is that if the the types are not registered (see eal_common_log.c)
they might get reused.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 9/9] telemetry: change public API to use 64-bit signed values
  @ 2023-02-05 22:55  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2023-02-05 22:55 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, Morten Brørup, Tyler Retzlaff, Ciara Power, david.marchand

12/01/2023 18:41, Bruce Richardson:
> While the unsigned values added to telemetry dicts/arrays were up to
> 64-bits in size, the sized values were only up to 32-bits. We can

sized -> signed

> standardize the API by having both int and uint functions take 64-bit
> values. For ABI compatibility, we use function versioning to ensure
> older binaries can still use the older functions taking a 32-bit
> parameter.
> 
> Suggested-by: Morten Brørup <mb@smartsharesystems.com>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
[...]
> --- a/lib/telemetry/version.map
> +++ b/lib/telemetry/version.map
> +DPDK_24 {
> +	global:
> +
> +	rte_tel_data_add_array_int;
> +	rte_tel_data_add_dict_int;
> +} DPDK_23;

For the record, these are the versioned symbols.



^ permalink raw reply	[relevance 0%]

* RE: Sign changes through function signatures
  2023-02-03 22:12  0%         ` Tyler Retzlaff
@ 2023-02-04  8:09  0%           ` Morten Brørup
  2023-02-06 15:57  3%             ` Ben Magistro
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2023-02-04  8:09 UTC (permalink / raw)
  To: Tyler Retzlaff, Bruce Richardson
  Cc: Thomas Monjalon, Ben Magistro, Olivier Matz, ferruh.yigit,
	andrew.rybchenko, ben.magistro, dev, Stefan Baranoff,
	david.marchand, anatoly.burakov

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Friday, 3 February 2023 23.13
> 
> On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> > On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 2 February 2023 21.45
> > > >
> > > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > > Hello,
> > > > > >
> > > > > > While making some updates to our code base for 22.11.1 that
> were
> > > > missed in
> > > > > > our first pass through, we hit the numa node change[1].  In
> the
> > > > process of
> > > > > > updating our code, we noticed that a couple functions
> > > > (rx/tx_queue_setup,
> > > > > > maybe more that we aren't using) state they accept
> `SOCKET_ID_ANY`
> > > > but the
> > > > > > function signature then asks for an unsigned integer while
> > > > `SOCKET_ID_ANY`
> > > > > > is `-1`.  Following it through the redirect to the "real"
> function
> > > > it also
> > > > > > asks for an unsigned integer which is then passed on to one
> or more
> > > > > > functions asking for an integer.  As an example using the the
> i40e
> > > > driver
> > > > > > -- we would call `rte_eth_tx_queue_setup` [2] which
> ultimately
> > > > calls
> > > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > > `rte_zmalloc_socket`[4]
> > > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > > >
> > > > > > I guess what I am looking for is clarification on if this is
> > > > intentional or
> > > > > > if this is additional cleanup that may need to be
> completed/be
> > > > desirable so
> > > > > > that signs are maintained through the call paths and avoid
> > > > potentially
> > > > > > producing sign-conversion warnings.  From the very quick
> glance I
> > > > took at
> > > > > > the i40e driver, it seems these are just passed through to
> other
> > > > functions
> > > > > > and no direct use/manipulation occurs (at least in the
> mentioned
> > > > functions).
> > > > >
> > > > > i believe this is just sloppyness with sign in our api surface.
> i too
> > > > > find it frustrating that use of these api force either explicit
> > > > > casts or suffer having to suppress warnings.
> > > > >
> > > > > in the past examples of this have been cleaned up without full
> > > > deprecation
> > > > > notices but there are a lot of instances. i also feel
> (unpopular
> > > > opinion)
> > > > > that for some integer types like this that have constrained
> range /
> > > > number
> > > > > spaces it would be of value to introduce a typedef that can be
> used
> > > > > consistently.
> > > > >
> > > > > for now you'll just have to add the casts and hopefully in the
> future
> > > > we
> > > > > will fix the api making them unnecessary. of course feel free
> to
> > > > submit
> > > > > patches too, it would be great to have these cleaned up.
> > > >
> > > > I agree it should be cleaned up.
> > > > Those IDs should accept negative values.
> > > > Not sure which type we should choose (int, int32_t, or a
> typedef).
> > >
> > > Why would we use a signed socket ID? We don't use signed port IDs.
> To me, unsigned seems the way to go. (A minor detail: With unsigned we
> can use the entire range of values minus one (for the magic "any"
> value), whereas with signed we can only use the positive range of
> values. This detail is completely irrelevant when using 32 bit for
> socket ID, but could be relevant if using fewer bits.)
> > >
> > > Also, we don't need 32 bit for socket ID. 8 or 16 bit should
> suffice, like port ID. But reducing from 32 bit would probably cause
> major ABI breakage.
> > >
> > > >
> > > > Another thing to check is the name of the variable.
> > > > It should be a socket ID when talking about CPU,
> > > > and a NUMA node ID when talking about memory.
> > > >
> > > > And last but not the least,
> > > > how can we keep ABI compatibility?
> > > > I hope we can use function versioning to avoid deprecation and
> > > > breaking.
> > > >
> > > > Trials and suggestions are welcome.
> > >
> > > Signedness is not the only problem with the socket ID. The meaning
> of SOCKET_ID_ANY is excessively overloaded. If we want to clean this
> up, we should consider the need for another magic value SOCKET_ID_NONE
> for devices connected to the chipset, as discussed in this other email
> thread [1]. And as discussed there, there are also size problems,
> because some device structures use 8 bit to hold the socket ID.
> > >
> > > And functions should always return -1, never SOCKET_ID_ANY, to
> indicate error.
> > >
> > > [1]:
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smarts
> erver.smartshare.dk/
> > >
> > > I only bring warnings and complications to the discussion here, no
> solutions. Sorry! :-(
> > >
> >
> > Personally, I think if we are going to change things, we should do
> things
> > properly, especially/even if we are going to have to break ABI or use
> ABI
> > compatibility.
> >
> > I would suggest rather than a typedef, we should actually wrap the
> int
> > value in a struct - for two reasons:
> 
> >
> > * it means the compiler will actually error out for us if an int or
> >   unsigned int is used instead. This allow easier fixing at compile-
> time
> >   rather than hoping things are correctly specified in existing code.
> >
> > * it allows us to do things like explicitly calling out flags, rather
> than
> >   just using magic values. While still keeping the size 32 bits, we
> can
> >   have the actual socket value as 16-bits and have flags to indicate:
> >   - ANY socket, NO socket, INVALID value socket. This could end up
> being
> >   useful in many cases, for example, when allocating memory we could
> >   specify a socket number with the ANY flag, indicating that any
> socket is
> >   ok, but we'd ideally prefer the number specified.
> 
> i'm a fan of this where it makes sense. i did this with rte_thread_t
> for
> exactly your first reason. but i did receive resistance from other
> members of the community. personally i like compilation to fail when i
> make a mistake.
> 
> it's definitely way easier to make the argument to do this when the
> actual valued is opaque. if it isn't i think then we need to provide
> macro/inline accessors to allow applications do whatever it is they do
> with the value they carry.
> 
> i'll also note that this allows you a cheap way to sprinkle extra
> integrity checking when running functional tests. if you have low
> performance inline accessors you can do things like enforce the range
> of
> values or or that enumerations are part of a set for debug builds.
> 
> as a side i would also caution while i suggested a typedef i don't mean
> that everything should be typedef'd especially actual structs that are
> used like structs. typedefs for things like socket id would
> unquestionably convey more information and implied semantics to the
> user
> of an api than just a standard `int' or whatever. consequently i have
> found
> that this lowers mistakes with the use of the api.

Hiding the socket_id in a typedef'd structure seems like shooting sparrows with cannons.

DPDK is using a C coding style, where there is a convention for not using typedefs:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs

In the tread case, a typedef made sense, because the underlying type can differ across O/S'es, and thus should be opaque. Which is in line with the coding style.

But I don't think this is the case for socket_id. The socket_id is an enumeration type, and all we need is a magic number for the "chipset" pseudo-socket. And with that, perhaps some iterator macros to include/omit this pseudo-socket, like the lcore_id iterators with and without the main lcore.

The mix of signed and unsigned in function signatures (and in the definition of SOCKET_ID_ANY) is pure sloppyness. This problem may also be present in other function signatures; we just happened to run into it for the socket_id.

The compiler has flags to warn about mixing signed and unsigned types, so we could use that flag to reveal and fix those bugs.

> 
> >
> > As for socket id, and numa id, I'm not sure we should have different
> > names/types for the two. For example, for PCI devices, do they need a
> third
> > type or are they associated with cores or with memory? The socket id
> for
> > the core only matters in terms of data locality, i.e. what memory or
> cache
> > location it is in. Therefore, for me, I'd pick one name and stick
> with it.
> 
> i think the choice for more than one type vs one type is whether or not
> they are "the same" number space as opposed to just coincidentally
> overlapping number spaces.
> 
> >
> > /Bruce


^ permalink raw reply	[relevance 0%]

* Re: Sign changes through function signatures
  @ 2023-02-03 22:12  0%         ` Tyler Retzlaff
  2023-02-04  8:09  0%           ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-03 22:12 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Morten Brørup, Thomas Monjalon, Ben Magistro, Olivier Matz,
	ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 2 February 2023 21.45
> > > 
> > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > Hello,
> > > > >
> > > > > While making some updates to our code base for 22.11.1 that were
> > > missed in
> > > > > our first pass through, we hit the numa node change[1].  In the
> > > process of
> > > > > updating our code, we noticed that a couple functions
> > > (rx/tx_queue_setup,
> > > > > maybe more that we aren't using) state they accept `SOCKET_ID_ANY`
> > > but the
> > > > > function signature then asks for an unsigned integer while
> > > `SOCKET_ID_ANY`
> > > > > is `-1`.  Following it through the redirect to the "real" function
> > > it also
> > > > > asks for an unsigned integer which is then passed on to one or more
> > > > > functions asking for an integer.  As an example using the the i40e
> > > driver
> > > > > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately
> > > calls
> > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > `rte_zmalloc_socket`[4]
> > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > >
> > > > > I guess what I am looking for is clarification on if this is
> > > intentional or
> > > > > if this is additional cleanup that may need to be completed/be
> > > desirable so
> > > > > that signs are maintained through the call paths and avoid
> > > potentially
> > > > > producing sign-conversion warnings.  From the very quick glance I
> > > took at
> > > > > the i40e driver, it seems these are just passed through to other
> > > functions
> > > > > and no direct use/manipulation occurs (at least in the mentioned
> > > functions).
> > > >
> > > > i believe this is just sloppyness with sign in our api surface. i too
> > > > find it frustrating that use of these api force either explicit
> > > > casts or suffer having to suppress warnings.
> > > >
> > > > in the past examples of this have been cleaned up without full
> > > deprecation
> > > > notices but there are a lot of instances. i also feel (unpopular
> > > opinion)
> > > > that for some integer types like this that have constrained range /
> > > number
> > > > spaces it would be of value to introduce a typedef that can be used
> > > > consistently.
> > > >
> > > > for now you'll just have to add the casts and hopefully in the future
> > > we
> > > > will fix the api making them unnecessary. of course feel free to
> > > submit
> > > > patches too, it would be great to have these cleaned up.
> > > 
> > > I agree it should be cleaned up.
> > > Those IDs should accept negative values.
> > > Not sure which type we should choose (int, int32_t, or a typedef).
> > 
> > Why would we use a signed socket ID? We don't use signed port IDs. To me, unsigned seems the way to go. (A minor detail: With unsigned we can use the entire range of values minus one (for the magic "any" value), whereas with signed we can only use the positive range of values. This detail is completely irrelevant when using 32 bit for socket ID, but could be relevant if using fewer bits.)
> > 
> > Also, we don't need 32 bit for socket ID. 8 or 16 bit should suffice, like port ID. But reducing from 32 bit would probably cause major ABI breakage.
> > 
> > > 
> > > Another thing to check is the name of the variable.
> > > It should be a socket ID when talking about CPU,
> > > and a NUMA node ID when talking about memory.
> > > 
> > > And last but not the least,
> > > how can we keep ABI compatibility?
> > > I hope we can use function versioning to avoid deprecation and
> > > breaking.
> > > 
> > > Trials and suggestions are welcome.
> > 
> > Signedness is not the only problem with the socket ID. The meaning of SOCKET_ID_ANY is excessively overloaded. If we want to clean this up, we should consider the need for another magic value SOCKET_ID_NONE for devices connected to the chipset, as discussed in this other email thread [1]. And as discussed there, there are also size problems, because some device structures use 8 bit to hold the socket ID.
> > 
> > And functions should always return -1, never SOCKET_ID_ANY, to indicate error.
> > 
> > [1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smartserver.smartshare.dk/
> > 
> > I only bring warnings and complications to the discussion here, no solutions. Sorry! :-(
> >
> 
> Personally, I think if we are going to change things, we should do things
> properly, especially/even if we are going to have to break ABI or use ABI
> compatibility.
> 
> I would suggest rather than a typedef, we should actually wrap the int
> value in a struct - for two reasons:

> 
> * it means the compiler will actually error out for us if an int or
>   unsigned int is used instead. This allow easier fixing at compile-time
>   rather than hoping things are correctly specified in existing code.
> 
> * it allows us to do things like explicitly calling out flags, rather than
>   just using magic values. While still keeping the size 32 bits, we can
>   have the actual socket value as 16-bits and have flags to indicate:
>   - ANY socket, NO socket, INVALID value socket. This could end up being
>   useful in many cases, for example, when allocating memory we could
>   specify a socket number with the ANY flag, indicating that any socket is
>   ok, but we'd ideally prefer the number specified.

i'm a fan of this where it makes sense. i did this with rte_thread_t for
exactly your first reason. but i did receive resistance from other
members of the community. personally i like compilation to fail when i
make a mistake.

it's definitely way easier to make the argument to do this when the
actual valued is opaque. if it isn't i think then we need to provide
macro/inline accessors to allow applications do whatever it is they do
with the value they carry.

i'll also note that this allows you a cheap way to sprinkle extra
integrity checking when running functional tests. if you have low
performance inline accessors you can do things like enforce the range of
values or or that enumerations are part of a set for debug builds.

as a side i would also caution while i suggested a typedef i don't mean
that everything should be typedef'd especially actual structs that are
used like structs. typedefs for things like socket id would
unquestionably convey more information and implied semantics to the user
of an api than just a standard `int' or whatever. consequently i have found
that this lowers mistakes with the use of the api.

> 
> As for socket id, and numa id, I'm not sure we should have different
> names/types for the two. For example, for PCI devices, do they need a third
> type or are they associated with cores or with memory? The socket id for
> the core only matters in terms of data locality, i.e. what memory or cache
> location it is in. Therefore, for me, I'd pick one name and stick with it.

i think the choice for more than one type vs one type is whether or not
they are "the same" number space as opposed to just coincidentally
overlapping number spaces.

> 
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/12] mldev: introduce machine learning device library
  2023-02-03 20:26  0%       ` Stephen Hemminger
@ 2023-02-03 20:49  0%         ` Thomas Monjalon
  2023-02-05 23:41  0%           ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-02-03 20:49 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	dev, Bruce Richardson, Srikanth Yalavarthi, ferruh.yigit,
	ajit.khaparde, aboyer, andrew.rybchenko, beilei.xing, chas3,
	chenbo.xia, ciara.loftus, Devendra Singh Rawat, ed.czeck,
	evgenys, grive, g.singh, zhouguoyang, haiyue.wang, Harman Kalra,
	heinrich.kuhn, hemant.agrawal, hyonkim, igorch, Igor Russkikh,
	jgrajcia, jasvinder.singh, jianwang, jiawenwu, jingjing.wu,
	johndale, john.miller, linville, keith.wiles,
	Kiran Kumar Kokkilagadda, oulijun, Liron Himi, longli, mw,
	spinler, matan, matt.peters, maxime.coquelin, mk, humin29,
	Pradeep Kumar Nalla, Nithin Kumar Dabilpuram, qiming.yang,
	qi.z.zhang, Radha Chintakuntla, rahul.lakkireddy, Rasesh Mody,
	rosen.xu, sachin.saxena, Satha Koteswara Rao Kottidi,
	Shahed Shaikh, shaibran, shepard.siegel, asomalap, somnath.kotur,
	sthemmin, steven.webster, Sunil Kumar Kori, mtetsuyah,
	Veerasenareddy Burru, viacheslavo, xiao.w.wang,
	cloud.wangxiaoyun, yisen.zhuang, yongwang, xuanziyang2,
	Prasun Kapoor, Nadav Haklai, Satananda Burla,
	Narayana Prasad Raju Athreya, Akhil Goyal, mdr, dmitry.kozliuk,
	anatoly.burakov, cristian.dumitrescu, honnappa.nagarahalli,
	mattias.ronnblom, ruifeng.wang, drc, konstantin.ananyev,
	olivier.matz, jay.jayatheerthan, Ashwin Sekhar T K,
	Pavan Nikhilesh Bhagavatula, eagostini, Derek Chickles,
	Parijat Shukla, Anup Prabhu, Prince Takkar, david.marchand

03/02/2023 21:26, Stephen Hemminger:
> On Fri, 03 Feb 2023 21:18:40 +0100
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 03/02/2023 18:33, Stephen Hemminger:
> > > On Fri, 03 Feb 2023 09:42:45 +0100
> > > Thomas Monjalon <thomas@monjalon.net> wrote:
> > >   
> > > > 03/02/2023 01:25, Stephen Hemminger:  
> > > > > On Wed, 1 Feb 2023 13:34:41 +0000
> > > > > Shivah Shankar Shankar Narayan Rao <sshankarnara@marvell.com> wrote:
> > > > >     
> > > > > > --- a/lib/eal/include/rte_log.h
> > > > > > +++ b/lib/eal/include/rte_log.h
> > > > > > @@ -48,6 +48,7 @@ extern "C" {
> > > > > >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
> > > > > >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> > > > > >  #define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> > > > > > +#define RTE_LOGTYPE_MLDEV     21 /**< Log related to mldev. */    
> > > > > 
> > > > > NAK to this part.
> > > > > No new static logtypes please.    
> > > > 
> > > > Good catch.
> > > > By the way, we should remove unused RTE_LOGTYPE_*.  
> > > 
> > > Yes, for 23.11 would like to work down the list.  
> > 
> > Do we need to wait 23.11?
> > It is not an ABI breakage.
> > And most of these defines are already unused.
> 
> Turning them into deprecated would be API breakage though

API breakage is not forbidden.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH] eal: introduce atomics abstraction
  @ 2023-02-03 20:49  0%               ` Tyler Retzlaff
  2023-02-07 15:16  0%                 ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2023-02-03 20:49 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Morten Brørup, Honnappa Nagarahalli, thomas, dev,
	david.marchand, jerinj, konstantin.ananyev, ferruh.yigit, nd

On Fri, Feb 03, 2023 at 12:19:13PM +0000, Bruce Richardson wrote:
> On Thu, Feb 02, 2023 at 11:00:23AM -0800, Tyler Retzlaff wrote:
> > On Thu, Feb 02, 2023 at 09:43:58AM +0100, Morten Brørup wrote:
> > > > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > > > Sent: Wednesday, 1 February 2023 22.41
> > > > 
> > > > On Wed, Feb 01, 2023 at 01:07:59AM +0000, Honnappa Nagarahalli wrote:
> > > > >
> > > > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > > Sent: Tuesday, January 31, 2023 4:42 PM
> > > > > >
> > > > > > Honnappa, please could you give your view on the future of atomics
> > > > in DPDK?
> > > > > Thanks Thomas, apologies it has taken me a while to get to this
> > > > discussion.
> > > > >
> > > > > IMO, we do not need DPDK's own abstractions. APIs from stdatomic.h
> > > > (stdatomics as is called here) already serve the purpose. These APIs
> > > > are well understood and documented.
> > > > 
> > > > i agree that whatever atomics APIs we advocate for should align with
> > > > the
> > > > standard C atomics for the reasons you state including implied
> > > > semantics.
> > > > 
> > > > >
> > > > > For environments where stdatomics are not supported, we could have a
> > > > stdatomic.h in DPDK implementing the same APIs (we have to support only
> > > > _explicit APIs). This allows the code to use stdatomics APIs and when
> > > > we move to minimum supported standard C11, we just need to get rid of
> > > > the file in DPDK repo.
> > > 
> > > Perhaps we can use something already existing, such as this:
> > > https://android.googlesource.com/platform/bionic/+/lollipop-release/libc/include/stdatomic.h
> > > 
> > > > 
> > > > my concern with this is that if we provide a stdatomic.h or introduce
> > > > names
> > > > from stdatomic.h it's a violation of the C standard.
> > > > 
> > > > references:
> > > >  * ISO/IEC 9899:2011 sections 7.1.2, 7.1.3.
> > > >  * GNU libc manual
> > > >    https://www.gnu.org/software/libc/manual/html_node/Reserved-
> > > > Names.html
> > > > 
> > > > in effect the header, the names and in some instances namespaces
> > > > introduced
> > > > are reserved by the implementation. there are several reasons in the
> > > > GNU libc
> > > > manual that explain the justification for these reservations and if
> > > > if we think about ODR and ABI compatibility we can conceive of others.
> > > 
> > > I we are going to move to C11 soon, I consider the shim interim, and am inclined to ignore these warning factors.
> > > 
> > > If we are not moving to C11 soon, I would consider these disadvantages more seriously.
> > 
> > I think it's reasonable to assume that we are talking years here.
> > 
> > We've had a few discussions about minimum C standard. I think my first
> > mailing list exchanges about C99 was almost 2 years ago. Given that we
> > still aren't on C99 now (though i know Bruce has a series up) indicates
> > that progression to C11 isn't going to happen any time soon and even if
> > it was the baseline we still can't just use it (reasons described
> > later).
> > 
> > Also, i'll point out that we seem to have accepted moving to C99 with
> > one of the holdback compilers technically being non-conformant but it
> > isn't blocking us because it provides the subset of C99 features without
> > being conforming that we happen to be using.
> > 
> What compiler is this? As far as I know, all our currently support
> compilers claim to support C99 fully. All should support C11 also,
> except for GCC 4.8 on RHEL/CentOS 7. Once we drop support for Centos 7, I
> think we can require at minimum a c11 compiler for building DPDK itself.
> I'm still a little uncertain about requiring that users build their own
> code with -std=c11, though.

perhaps i'm mistaken but it was my understanding that the gcc version on
RHEL 7 did not fully conform to C99? maybe i read C99 when it was actually
C11.

regardless, even if every supported compiler for dpdk was C11 conformant
including stdatomics which are optional we can't just move from
intrinsic/builtins to standard C atomics (because of the compatibility
and performance issues mentioned previously).

so just re-orienting this discussion, the purpose of this abstraction is
to allow the optional use of standard C atomics when a conformant compiler
is available and satisfactory code is generated for the desired target.

> 
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/12] mldev: introduce machine learning device library
  2023-02-03 20:18  2%     ` Thomas Monjalon
@ 2023-02-03 20:26  0%       ` Stephen Hemminger
  2023-02-03 20:49  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2023-02-03 20:26 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	dev, Bruce Richardson, Srikanth Yalavarthi, ferruh.yigit,
	ajit.khaparde, aboyer, andrew.rybchenko, beilei.xing, chas3,
	chenbo.xia, ciara.loftus, Devendra Singh Rawat, ed.czeck,
	evgenys, grive, g.singh, zhouguoyang, haiyue.wang, Harman Kalra,
	heinrich.kuhn, hemant.agrawal, hyonkim, igorch, Igor Russkikh,
	jgrajcia, jasvinder.singh, jianwang, jiawenwu, jingjing.wu,
	johndale, john.miller, linville, keith.wiles,
	Kiran Kumar Kokkilagadda, oulijun, Liron Himi, longli, mw,
	spinler, matan, matt.peters, maxime.coquelin, mk, humin29,
	Pradeep Kumar Nalla, Nithin Kumar Dabilpuram, qiming.yang,
	qi.z.zhang, Radha Chintakuntla, rahul.lakkireddy, Rasesh Mody,
	rosen.xu, sachin.saxena, Satha Koteswara Rao Kottidi,
	Shahed Shaikh, shaibran, shepard.siegel, asomalap, somnath.kotur,
	sthemmin, steven.webster, Sunil Kumar Kori, mtetsuyah,
	Veerasenareddy Burru, viacheslavo, xiao.w.wang,
	cloud.wangxiaoyun, yisen.zhuang, yongwang, xuanziyang2,
	Prasun Kapoor, Nadav Haklai, Satananda Burla,
	Narayana Prasad Raju Athreya, Akhil Goyal, mdr, dmitry.kozliuk,
	anatoly.burakov, cristian.dumitrescu, honnappa.nagarahalli,
	mattias.ronnblom, ruifeng.wang, drc, konstantin.ananyev,
	olivier.matz, jay.jayatheerthan, Ashwin Sekhar T K,
	Pavan Nikhilesh Bhagavatula, eagostini, Derek Chickles,
	Parijat Shukla, Anup Prabhu, Prince Takkar, david.marchand

On Fri, 03 Feb 2023 21:18:40 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> 03/02/2023 18:33, Stephen Hemminger:
> > On Fri, 03 Feb 2023 09:42:45 +0100
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >   
> > > 03/02/2023 01:25, Stephen Hemminger:  
> > > > On Wed, 1 Feb 2023 13:34:41 +0000
> > > > Shivah Shankar Shankar Narayan Rao <sshankarnara@marvell.com> wrote:
> > > >     
> > > > > --- a/lib/eal/include/rte_log.h
> > > > > +++ b/lib/eal/include/rte_log.h
> > > > > @@ -48,6 +48,7 @@ extern "C" {
> > > > >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
> > > > >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> > > > >  #define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> > > > > +#define RTE_LOGTYPE_MLDEV     21 /**< Log related to mldev. */    
> > > > 
> > > > NAK to this part.
> > > > No new static logtypes please.    
> > > 
> > > Good catch.
> > > By the way, we should remove unused RTE_LOGTYPE_*.  
> > 
> > Yes, for 23.11 would like to work down the list.  
> 
> Do we need to wait 23.11?
> It is not an ABI breakage.
> And most of these defines are already unused.

Turning them into deprecated would be API breakage though


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/12] mldev: introduce machine learning device library
  @ 2023-02-03 20:18  2%     ` Thomas Monjalon
  2023-02-03 20:26  0%       ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2023-02-03 20:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Shivah Shankar Shankar Narayan Rao, Jerin Jacob Kollanukkaran,
	dev, Bruce Richardson, Srikanth Yalavarthi, ferruh.yigit,
	ajit.khaparde, aboyer, andrew.rybchenko, beilei.xing, chas3,
	chenbo.xia, ciara.loftus, Devendra Singh Rawat, ed.czeck,
	evgenys, grive, g.singh, zhouguoyang, haiyue.wang, Harman Kalra,
	heinrich.kuhn, hemant.agrawal, hyonkim, igorch, Igor Russkikh,
	jgrajcia, jasvinder.singh, jianwang, jiawenwu, jingjing.wu,
	johndale, john.miller, linville, keith.wiles,
	Kiran Kumar Kokkilagadda, oulijun, Liron Himi, longli, mw,
	spinler, matan, matt.peters, maxime.coquelin, mk, humin29,
	Pradeep Kumar Nalla, Nithin Kumar Dabilpuram, qiming.yang,
	qi.z.zhang, Radha Chintakuntla, rahul.lakkireddy, Rasesh Mody,
	rosen.xu, sachin.saxena, Satha Koteswara Rao Kottidi,
	Shahed Shaikh, shaibran, shepard.siegel, asomalap, somnath.kotur,
	sthemmin, steven.webster, Sunil Kumar Kori, mtetsuyah,
	Veerasenareddy Burru, viacheslavo, xiao.w.wang,
	cloud.wangxiaoyun, yisen.zhuang, yongwang, xuanziyang2,
	Prasun Kapoor, Nadav Haklai, Satananda Burla,
	Narayana Prasad Raju Athreya, Akhil Goyal, mdr, dmitry.kozliuk,
	anatoly.burakov, cristian.dumitrescu, honnappa.nagarahalli,
	mattias.ronnblom, ruifeng.wang, drc, konstantin.ananyev,
	olivier.matz, jay.jayatheerthan, Ashwin Sekhar T K,
	Pavan Nikhilesh Bhagavatula, eagostini, Derek Chickles,
	Parijat Shukla, Anup Prabhu, Prince Takkar, david.marchand

03/02/2023 18:33, Stephen Hemminger:
> On Fri, 03 Feb 2023 09:42:45 +0100
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 03/02/2023 01:25, Stephen Hemminger:
> > > On Wed, 1 Feb 2023 13:34:41 +0000
> > > Shivah Shankar Shankar Narayan Rao <sshankarnara@marvell.com> wrote:
> > >   
> > > > --- a/lib/eal/include/rte_log.h
> > > > +++ b/lib/eal/include/rte_log.h
> > > > @@ -48,6 +48,7 @@ extern "C" {
> > > >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
> > > >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
> > > >  #define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
> > > > +#define RTE_LOGTYPE_MLDEV     21 /**< Log related to mldev. */  
> > > 
> > > NAK to this part.
> > > No new static logtypes please.  
> > 
> > Good catch.
> > By the way, we should remove unused RTE_LOGTYPE_*.
> 
> Yes, for 23.11 would like to work down the list.

Do we need to wait 23.11?
It is not an ABI breakage.
And most of these defines are already unused.



^ permalink raw reply	[relevance 2%]

Results 1601-1800 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2021-09-13  8:45     [dpdk-dev] Questions about rte_eth_link_speed_to_str API Min Hu (Connor)
2021-09-16  2:56     ` [dpdk-dev] [RFC] ethdev: improve link speed to string Min Hu (Connor)
2021-09-16  6:22       ` Andrew Rybchenko
2021-09-16  8:16         ` Min Hu (Connor)
2021-09-16  8:21           ` Andrew Rybchenko
2021-09-17  0:43             ` Min Hu (Connor)
2023-01-19 11:41               ` Ferruh Yigit
2023-01-19 16:45                 ` Stephen Hemminger
2023-02-10 14:41  3%               ` Ferruh Yigit
2023-03-23 14:40  3%                 ` Ferruh Yigit
2022-04-20  8:16     [PATCH v1 0/5] Direct re-arming of buffers on receive side Feifei Wang
2023-01-04  7:30     ` [PATCH v3 0/3] " Feifei Wang
2023-01-04  7:30       ` [PATCH v3 1/3] ethdev: enable direct rearm with separate API Feifei Wang
2023-01-04  8:21         ` Morten Brørup
2023-01-04  8:51           ` 回复: " Feifei Wang
2023-01-04 10:11             ` Morten Brørup
2023-02-24  8:55  0%           ` 回复: " Feifei Wang
2022-08-03 13:28     [dpdk-dev] [RFC PATCH 1/1] mldev: introduce machine learning device library jerinj
2023-02-03  8:42     ` [dpdk-dev] [PATCH v1 01/12] " Thomas Monjalon
2023-02-03 17:33       ` Stephen Hemminger
2023-02-03 20:18  2%     ` Thomas Monjalon
2023-02-03 20:26  0%       ` Stephen Hemminger
2023-02-03 20:49  0%         ` Thomas Monjalon
2023-02-05 23:41  0%           ` Stephen Hemminger
2022-11-03 15:47     [PATCH 0/2] ABI check updates David Marchand
2023-03-23 17:15  9% ` [PATCH v2 " David Marchand
2023-03-23 17:15 21%   ` [PATCH v2 1/2] devtools: unify configuration for ABI check David Marchand
2023-03-23 17:15 41%   ` [PATCH v2 2/2] devtools: stop depending on libabigail xml format David Marchand
2023-03-28 18:38  4%   ` [PATCH v2 0/2] ABI check updates Thomas Monjalon
2022-11-17  5:09     [PATCH v1 00/13] graph enhancement for multi-core dispatch Zhirun Yan
2022-11-17  5:09     ` [PATCH v1 04/13] graph: add get/set graph worker model APIs Zhirun Yan
2023-02-20 13:50  3%   ` Jerin Jacob
2023-02-24  6:31  0%     ` Yan, Zhirun
2023-02-26 22:23  0%       ` Jerin Jacob
2023-03-02  8:38  0%         ` Yan, Zhirun
2023-03-02 13:58  0%           ` Jerin Jacob
2023-03-07  8:26  0%             ` Yan, Zhirun
2022-12-13 18:27     [RFC PATCH 0/7] Standardize telemetry int types Bruce Richardson
2023-01-12 17:41     ` [PATCH v3 0/9] " Bruce Richardson
2023-01-12 17:41       ` [PATCH v3 9/9] telemetry: change public API to use 64-bit signed values Bruce Richardson
2023-02-05 22:55  0%     ` Thomas Monjalon
2023-01-07 16:18     [PATCH 1/3] eventdev/eth_rx: add params set/get APIs Naga Harish K S V
2023-01-23 18:04     ` [PATCH v2 " Naga Harish K S V
2023-01-24  4:29       ` Jerin Jacob
2023-01-24 13:07         ` Naga Harish K, S V
2023-01-25  4:12           ` Jerin Jacob
2023-01-25  9:52             ` Naga Harish K, S V
2023-01-25 10:38               ` Jerin Jacob
2023-01-25 16:32                 ` Naga Harish K, S V
2023-01-28 10:53                   ` Jerin Jacob
2023-01-30  9:56                     ` Naga Harish K, S V
2023-01-30 14:43                       ` Jerin Jacob
2023-02-02 16:12                         ` Naga Harish K, S V
2023-02-03  9:44                           ` Jerin Jacob
2023-02-06  6:21  0%                         ` Naga Harish K, S V
2023-02-06 16:38  0%                           ` Jerin Jacob
2023-02-09 17:00  0%                             ` Naga Harish K, S V
2023-01-12 11:35     [RFC PATCH 0/1] Specify C-standard requirement for DPDK builds Bruce Richardson
2023-02-03 14:09     ` Ben Magistro
2023-02-03 15:09       ` Bruce Richardson
2023-02-03 16:45         ` Ben Magistro
2023-02-03 18:00           ` Bruce Richardson
2023-02-10 14:52             ` Ben Magistro
2023-02-10 23:39  4%           ` Tyler Retzlaff
2023-01-12 21:26     [PATCH] eal: abstract compiler atomics Tyler Retzlaff
2023-01-12 21:26     ` [PATCH] eal: introduce atomics abstraction Tyler Retzlaff
2023-01-31 22:42       ` Thomas Monjalon
2023-02-01  1:07         ` Honnappa Nagarahalli
2023-02-01 21:41           ` Tyler Retzlaff
2023-02-02  8:43             ` Morten Brørup
2023-02-02 19:00               ` Tyler Retzlaff
2023-02-03 12:19                 ` Bruce Richardson
2023-02-03 20:49  0%               ` Tyler Retzlaff
2023-02-07 15:16  0%                 ` Morten Brørup
2023-02-07 21:58  0%                   ` Tyler Retzlaff
2023-02-07 23:34  0%         ` Honnappa Nagarahalli
2023-02-08  1:20  0%           ` Tyler Retzlaff
2023-02-08  8:31  3%             ` Morten Brørup
2023-02-08 16:35  4%               ` Tyler Retzlaff
2023-02-09  0:16  3%                 ` Honnappa Nagarahalli
2023-02-09  8:34  4%                   ` Morten Brørup
2023-02-09 17:30  4%                   ` Tyler Retzlaff
2023-02-10  5:30  0%                     ` Honnappa Nagarahalli
2023-02-10 20:30  3%                       ` Tyler Retzlaff
2023-02-13  5:04  0%                         ` Honnappa Nagarahalli
2023-02-13 15:28  0%                           ` Ben Magistro
2023-02-13 23:18  0%                           ` Tyler Retzlaff
2023-02-08 21:43     ` [PATCH v2] eal: abstract compiler atomics Tyler Retzlaff
2023-02-08 21:43       ` [PATCH v2] eal: introduce atomics abstraction Tyler Retzlaff
2023-02-09  8:05         ` Morten Brørup
2023-02-09 18:15  4%       ` Tyler Retzlaff
2023-02-09 19:19  0%         ` Morten Brørup
2023-02-09 22:04  0%           ` Tyler Retzlaff
2023-01-16 15:37     [PATCH 0/5] dma/ioat: fix issues with stopping and restarting device Bruce Richardson
2023-01-16 17:37     ` [PATCH v2 0/6] " Bruce Richardson
2023-01-16 17:37       ` [PATCH v2 6/6] test/dmadev: add tests for stopping and restarting dev Bruce Richardson
2023-02-14 16:04  0%     ` Kevin Laatz
2023-02-15  1:59  3%     ` fengchengwen
2023-02-15 11:57  3%       ` Bruce Richardson
2023-02-16  1:24  0%         ` fengchengwen
2023-02-16  9:24  0%           ` Bruce Richardson
2023-02-16 11:09     ` [PATCH v3 0/6] dma/ioat: fix issues with stopping and restarting device Bruce Richardson
2023-02-16 11:09  3%   ` [PATCH v3 6/6] test/dmadev: add tests for stopping and restarting dev Bruce Richardson
2023-02-16 11:42  0%     ` fengchengwen
2023-02-02 19:23     Sign changes through function signatures Ben Magistro
2023-02-02 20:26     ` Tyler Retzlaff
2023-02-02 20:45       ` Thomas Monjalon
2023-02-02 21:26         ` Morten Brørup
2023-02-03 12:05           ` Bruce Richardson
2023-02-03 22:12  0%         ` Tyler Retzlaff
2023-02-04  8:09  0%           ` Morten Brørup
2023-02-06 15:57  3%             ` Ben Magistro
2023-02-03  5:07     [PATCH v3 0/2] add new PHY affinity in the flow item and Tx queue API Jiawei Wang
2023-02-03 13:33     ` [PATCH v4 " Jiawei Wang
2023-02-03 13:33       ` [PATCH v4 1/2] ethdev: introduce the PHY affinity field in " Jiawei Wang
2023-02-06 15:29  0%     ` Jiawei(Jonny) Wang
2023-02-07  9:40  0%     ` Ori Kam
2023-02-09 19:44  0%     ` Ferruh Yigit
2023-02-10 14:06  0%       ` Jiawei(Jonny) Wang
2023-02-14  9:38  0%       ` Jiawei(Jonny) Wang
2023-02-14 10:01  0%         ` Ferruh Yigit
2023-02-03  8:08     [PATCH] doc: update NFP documentation with Corigine information Chaoyong He
2023-02-15 13:37  0% ` Ferruh Yigit
2023-02-15 17:58  0%   ` Niklas Söderlund
2023-02-20  8:41     ` [PATCH v2 0/3] update NFP documentation Chaoyong He
2023-02-20  8:41  8%   ` [PATCH v2 3/3] doc: add Corigine information to nfp documentation Chaoyong He
2023-02-03  8:19     [PATCH v5 1/3] pcapng: comment option support for epb Amit Prakash Shukla
2023-02-09  9:56  4% ` [PATCH v6 1/4] " Amit Prakash Shukla
2023-02-09  9:56  2%   ` [PATCH v6 2/4] graph: pcap capture for graph nodes Amit Prakash Shukla
2023-02-09 10:03  0%   ` [PATCH v6 1/4] pcapng: comment option support for epb Amit Prakash Shukla
2023-02-09 10:24  4%   ` [PATCH v7 1/3] " Amit Prakash Shukla
2023-02-09 10:24  2%     ` [PATCH v7 2/3] graph: pcap capture for graph nodes Amit Prakash Shukla
2023-02-06  7:05     [PATCH 0/3] cleanup the PMD Chaoyong He
2023-02-06  7:05  8% ` [PATCH 1/3] net/nfp: remove usage of print statements Chaoyong He
2023-02-07 20:41     [RFC 00/13] Replace static logtypes with static Stephen Hemminger
2023-02-13 19:55  3% ` [PATCH v4 00/19] Replace use of static logtypes Stephen Hemminger
2023-02-13 19:55  3%   ` [PATCH v4 18/19] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-14  2:18  3% ` [PATCH v5 00/22] Replace us of static logtypes Stephen Hemminger
2023-02-14  2:19  3%   ` [PATCH v5 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-14 22:47  3% ` [PATCH v6 00/22] Replace use of static logtypes in libraries Stephen Hemminger
2023-02-14 22:47       ` [PATCH v6 01/22] gso: don't log message on non TCP/UDP Stephen Hemminger
2023-02-15  7:26  3%     ` Hu, Jiayu
2023-02-15 17:12  0%       ` Stephen Hemminger
2023-02-14 22:47  3%   ` [PATCH v6 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-15 17:23  3% ` [PATCH v7 00/22] Replace use of static logtypes in libraries Stephen Hemminger
2023-02-15 17:23  3%   ` [PATCH v7 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-20 23:35  3% ` [PATCH v8 00/22] Convert static logtypes in libraries Stephen Hemminger
2023-02-20 23:35  3%   ` [PATCH v8 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-21 15:02  0%     ` David Marchand
2023-02-21 19:01  2% ` [PATCH v9 00/22] Convert static logtypes in libraries Stephen Hemminger
2023-02-21 19:02  2%   ` [PATCH v9 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-22 16:07  2% ` [PATCH v10 00/22] Convert static log type values in libraries Stephen Hemminger
2023-02-22 16:08  2%   ` [PATCH v10 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-22 21:55  2% ` [PATCH v11 00/22] Convert static log type values in libraries Stephen Hemminger
2023-02-22 21:55  2%   ` [PATCH v11 21/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-02-23  7:11  0%     ` Ruifeng Wang
2023-02-23  7:27  0%       ` Ruifeng Wang
2023-02-24  9:45  0%     ` Ruifeng Wang
2023-02-09  3:03     [PATCH] mem: fix displaying heap ID failed for heap info command Huisong Li
2023-02-22  7:49  4% ` [PATCH v2] " Huisong Li
2023-02-10  2:48     [PATCH v4 0/3] add telemetry cmds for ring Jie Hai
2023-05-09  1:29  3% ` [PATCH v5 " Jie Hai
2023-05-09  1:29  3%   ` [PATCH v5 1/3] ring: fix unmatched type definition and usage Jie Hai
2023-05-09  6:23  0%     ` Ruifeng Wang
2023-05-09  8:15  0%       ` Jie Hai
2023-05-09  9:24  3%   ` [PATCH v6 0/3] add telemetry cmds for ring Jie Hai
2023-05-09  9:24  3%     ` [PATCH v6 1/3] ring: fix unmatched type definition and usage Jie Hai
2023-02-10  8:14     [PATCH v5 1/3] ethdev: skip congestion management configuration Rakesh Kudurumalla
2023-02-10  8:26     ` [PATCH v6 " Rakesh Kudurumalla
2023-02-11  0:35  3%   ` Ferruh Yigit
2023-02-11  5:16  0%     ` Jerin Jacob
2023-02-13  2:19     [PATCH v6 00/21] add support for cpfl PMD in DPDK Mingxia Liu
2023-02-16  0:29     ` [PATCH v7 01/21] net/cpfl: support device initialization Mingxia Liu
2023-02-27 13:46       ` Ferruh Yigit
2023-02-27 15:45         ` Thomas Monjalon
2023-02-27 23:38  3%       ` Ferruh Yigit
2023-02-13 11:31     [PATCH v10 0/4] add support for self monitoring Tomasz Duszynski
2023-02-16 17:54     ` [PATCH v11 " Tomasz Duszynski
2023-02-16 17:54       ` [PATCH v11 1/4] lib: add generic support for reading PMU events Tomasz Duszynski
2023-02-16 23:50         ` Konstantin Ananyev
2023-02-17  8:49           ` [EXT] " Tomasz Duszynski
2023-02-17 10:14             ` Konstantin Ananyev
2023-02-19 14:23               ` Tomasz Duszynski
2023-02-20 14:31                 ` Konstantin Ananyev
2023-02-20 16:59                   ` Tomasz Duszynski
2023-02-20 17:21                     ` Konstantin Ananyev
2023-02-20 20:42                       ` Tomasz Duszynski
2023-02-21  0:48  3%                     ` Konstantin Ananyev
2023-02-27  8:12  0%                       ` Tomasz Duszynski
2023-02-19 11:55     [PATCH] drivers: skip build of sub-libs not supporting IOVA mode Thomas Monjalon
2023-03-06 16:13     ` [PATCH v2 0/2] refactor diasbling IOVA as PA Thomas Monjalon
2023-03-06 16:13  2%   ` [PATCH v2 1/2] build: clarify configuration without IOVA field in mbuf Thomas Monjalon
2023-03-09  1:43  0%     ` fengchengwen
2023-03-09  7:29  0%       ` Thomas Monjalon
2023-03-09 11:23  0%         ` fengchengwen
2023-03-09 12:12  0%           ` Thomas Monjalon
2023-03-09 13:10  0%             ` Bruce Richardson
2023-03-13 15:51  0%               ` Thomas Monjalon
2023-02-21  3:10     [PATCH 0/2] configure RSS and handle metadata correctly Chaoyong He
2023-02-21  3:10  3% ` [PATCH 2/2] net/nfp: modify RSS's processing logic Chaoyong He
2023-02-21  3:29     ` [PATCH v2 0/2] configure RSS and handle metadata correctly Chaoyong He
2023-02-21  3:29  3%   ` [PATCH v2 2/2] net/nfp: modify RSS's processing logic Chaoyong He
2023-02-21  3:55       ` [PATCH v2 0/2] configure RSS and handle metadata correctly Chaoyong He
2023-02-21  3:55  3%     ` [PATCH v2 2/2] net/nfp: modify RSS's processing logic Chaoyong He
2023-02-22 21:43     [PATCH] vhost: fix madvise arguments alignment Mike Pattrick
2023-02-23  4:35     ` [PATCH v2] " Mike Pattrick
2023-02-23 16:12  3%   ` Maxime Coquelin
2023-02-23 16:57  0%     ` Mike Pattrick
2023-02-24 15:05  4%       ` Patrick Robb
2023-02-23 16:04     [RFC PATCH] drivers/net: fix RSS multi-queue mode check Ferruh Yigit
2023-02-27  1:34     ` lihuisong (C)
2023-02-27  9:57       ` Ferruh Yigit
2023-02-28  1:24         ` lihuisong (C)
2023-02-28  8:23  3%       ` Ferruh Yigit
2023-02-28  9:39  3% [RFC 0/2] Add high-performance timer facility Mattias Rönnblom
2023-02-28 16:01  0% ` Morten Brørup
2023-03-01 11:18  0%   ` Mattias Rönnblom
2023-03-01 13:31  3%     ` Morten Brørup
2023-03-01 15:50  3%       ` Mattias Rönnblom
2023-03-01 17:06  0%         ` Morten Brørup
2023-03-15 17:03  3% ` [RFC v2 " Mattias Rönnblom
2023-03-09  8:56  4% [RFC 1/2] security: introduce out of place support for inline ingress Nithin Dabilpuram
2023-04-11 10:04  4% ` [PATCH 1/3] " Nithin Dabilpuram
2023-04-11 18:05  3%   ` Stephen Hemminger
2023-04-18  8:33  4%     ` Jerin Jacob
2023-04-24 22:41  3%       ` Thomas Monjalon
2023-03-13  7:26     [PATCH] lib/hash: new feature adding existing key Abdullah Ömer Yamaç
2023-03-13  7:35     ` Abdullah Ömer Yamaç
2023-03-13 15:48  3%   ` Stephen Hemminger
2023-03-13  9:34     [PATCH] reorder: fix registration of dynamic field in mbuf Volodymyr Fialko
2023-03-13 10:19  3% ` David Marchand
2023-03-14 12:48     [PATCH 0/5] fix segment fault when parse args Chengwen Feng
2023-03-16 18:18     ` Ferruh Yigit
2023-03-17  2:43  3%   ` fengchengwen
2023-03-21 13:50  0%     ` Ferruh Yigit
2023-03-22  1:15  0%       ` fengchengwen
2023-03-22  8:53  0%         ` Ferruh Yigit
2023-03-22 13:49  0%           ` Thomas Monjalon
2023-03-23 11:58  3%             ` fengchengwen
2023-03-23 12:51  3%               ` Thomas Monjalon
2023-03-15 11:00     [PATCH 0/5] support setting and querying RSS algorithms Dongdong Liu
2023-03-15 11:00 10% ` [PATCH 1/5] ethdev: support setting and querying rss algorithm Dongdong Liu
2023-03-15 11:28  0%   ` Ivan Malov
2023-03-16 13:10  3%     ` Dongdong Liu
2023-03-16 14:31  0%       ` Ivan Malov
2023-03-15 13:43  3%   ` Thomas Monjalon
2023-03-16 13:16  3%     ` Dongdong Liu
2023-03-20 10:26     [PATCH 1/2] app/mldev: fix build with debug David Marchand
2023-03-20 10:26  5% ` [PATCH 2/2] ci: test compilation " David Marchand
2023-03-20 12:18     ` [PATCH v2 1/2] app/mldev: fix build " David Marchand
2023-03-20 12:18 19%   ` [PATCH v2 2/2] ci: test compilation with debug in GHA David Marchand
2023-03-24  2:16     [PATCH v2 00/15] graph enhancement for multi-core dispatch Zhirun Yan
2023-03-29  6:43     ` [PATCH v3 " Zhirun Yan
2023-03-29  6:43       ` [PATCH v3 03/15] graph: move node process into inline function Zhirun Yan
2023-03-29 15:34  3%     ` Stephen Hemminger
2023-03-29 15:41  0%       ` Jerin Jacob
2023-03-29 23:40  2% [PATCH v12 00/22] Covert static log types in libraries to dynamic Stephen Hemminger
2023-03-29 23:40  2% ` [PATCH v12 18/22] hash: move rte_hash_set_alg out header Stephen Hemminger
2023-03-31 17:17  3% DPDK 23.03 released Thomas Monjalon
2023-03-31 20:08     [PATCH] devtools: add script to check for non inclusive naming Stephen Hemminger
2023-04-03 14:47 14% ` [PATCH v2] " Stephen Hemminger
2023-04-03  6:59  9% [PATCH] version: 23.07-rc0 David Marchand
2023-04-03  9:37 10% ` [PATCH v2] " David Marchand
2023-04-06  7:44  0%   ` David Marchand
2023-04-03 21:52     [PATCH 0/9] msvc integration changes Tyler Retzlaff
2023-04-03 21:52  6% ` [PATCH 6/9] eal: expand most macros to empty when using msvc Tyler Retzlaff
2023-04-03 21:52  3% ` [PATCH 9/9] telemetry: avoid expanding versioned symbol macros on msvc Tyler Retzlaff
2023-04-04 20:07     ` [PATCH v2 0/9] msvc integration changes Tyler Retzlaff
2023-04-04 20:07  6%   ` [PATCH v2 6/9] eal: expand most macros to empty when using msvc Tyler Retzlaff
2023-04-04 20:07  3%   ` [PATCH v2 9/9] telemetry: avoid expanding versioned symbol macros on msvc Tyler Retzlaff
2023-04-05 10:56  0%     ` Bruce Richardson
2023-04-05 16:02  0%       ` Tyler Retzlaff
2023-04-05 16:17  0%         ` Bruce Richardson
2023-04-06  0:45     ` [PATCH v3 00/11] msvc integration changes Tyler Retzlaff
2023-04-06  0:45  6%   ` [PATCH v3 08/11] eal: expand most macros to empty when using msvc Tyler Retzlaff
2023-04-06  0:45  3%   ` [PATCH v3 11/11] telemetry: avoid expanding versioned symbol macros on msvc Tyler Retzlaff
2023-04-11 10:24  0%     ` Bruce Richardson
2023-04-11 20:34  0%       ` Tyler Retzlaff
2023-04-12  8:50  0%         ` Bruce Richardson
2023-04-11 21:12     ` [PATCH v4 00/14] msvc integration changes Tyler Retzlaff
2023-04-11 21:12  6%   ` [PATCH v4 11/14] eal: expand most macros to empty when using MSVC Tyler Retzlaff
2023-04-11 21:12  3%   ` [PATCH v4 13/14] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
2023-04-13 21:25     ` [PATCH v5 00/14] msvc integration changes Tyler Retzlaff
2023-04-13 21:26  6%   ` [PATCH v5 11/14] eal: expand most macros to empty when using MSVC Tyler Retzlaff
2023-04-14  6:45         ` Morten Brørup
2023-04-14 17:02  4%       ` Tyler Retzlaff
2023-04-15  7:16  3%         ` Morten Brørup
2023-04-15 20:52  4%           ` Tyler Retzlaff
2023-04-15 22:41  4%             ` Morten Brørup
2023-04-13 21:26  3%   ` [PATCH v5 13/14] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
2023-04-15  1:15     ` [PATCH v6 00/15] msvc integration changes Tyler Retzlaff
2023-04-15  1:15  5%   ` [PATCH v6 11/15] eal: expand most macros to empty when using MSVC Tyler Retzlaff
2023-04-15  1:15  3%   ` [PATCH v6 13/15] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
2023-04-17 16:10     ` [PATCH v7 00/14] msvc integration changes Tyler Retzlaff
2023-04-17 16:10  5%   ` [PATCH v7 10/14] eal: expand most macros to empty when using MSVC Tyler Retzlaff
2023-04-17 16:10  3%   ` [PATCH v7 12/14] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
2023-05-02  3:15     ` [PATCH v8 00/14] msvc integration changes Tyler Retzlaff
2023-05-02  3:15  5%   ` [PATCH v8 10/14] eal: expand most macros to empty when using MSVC Tyler Retzlaff
2023-05-02  3:15  3%   ` [PATCH v8 12/14] telemetry: avoid expanding versioned symbol macros on MSVC Tyler Retzlaff
2023-04-05 12:40  3% [PATCH v2 0/3] vhost: add device op to offload the interrupt kick Eelco Chaudron
2023-04-05 12:41     ` [PATCH v2 3/3] " Eelco Chaudron
2023-05-10 11:44       ` David Marchand
2023-05-16  8:53         ` Eelco Chaudron
2023-05-16 10:12  3%       ` David Marchand
2023-05-08 13:58  0% ` [PATCH v2 0/3] " Eelco Chaudron
2023-04-05 23:12 17% [PATCH] MAINTAINERS: sort file entries Stephen Hemminger
2023-04-13 11:53     [PATCH v2 1/3] eal: add x86 cpuid support for monitorx Sivaprasad Tummala
2023-04-13 11:53  3% ` [PATCH v2 2/3] doc: announce new cpu flag added to rte_cpu_flag_t Sivaprasad Tummala
2023-04-17  4:31  3%   ` [PATCH v3 1/4] " Sivaprasad Tummala
2023-04-18  8:25         ` [PATCH v4 0/4] power: monitor support for AMD EPYC processors Sivaprasad Tummala
2023-04-18  8:25  3%       ` [PATCH v4 1/4] doc: announce new cpu flag added to rte_cpu_flag_t Sivaprasad Tummala
2023-04-18  8:52  3%         ` Ferruh Yigit
2023-04-18  9:22  3%           ` Bruce Richardson
2023-04-14  8:43     [PATCH] reorder: improve buffer structure layout Volodymyr Fialko
2023-04-14 14:52  3% ` Stephen Hemminger
2023-04-14 14:54  3%   ` Bruce Richardson
2023-04-14 15:30  0%     ` Stephen Hemminger
2023-04-18  5:30     [RFC 0/4] Support VFIO sparse mmap in PCI bus Chenbo Xia
2023-04-18  7:46  3% ` David Marchand
2023-04-18  9:27  0%   ` Xia, Chenbo
2023-04-18  9:33  0%   ` Xia, Chenbo
2023-04-18 10:45     [PATCH] eventdev: fix alignment padding Sivaprasad Tummala
2023-04-18 11:06  4% ` Morten Brørup
2023-04-18 12:40  3%   ` Mattias Rönnblom
2023-04-19  8:36     [RFC] lib: set/get max memzone segments Ophir Munk
2023-04-20  7:43     ` Thomas Monjalon
2023-04-20 18:20       ` Tyler Retzlaff
2023-04-21  8:34  4%     ` Thomas Monjalon
2023-04-28 10:31     [dpdk-dev] [PATCH] net/liquidio: removed LiquidIO ethdev driver jerinj
2023-05-02 14:18  5% ` Ferruh Yigit
2023-05-08 13:44  1% ` [dpdk-dev] [PATCH v2] net/liquidio: remove " jerinj
     [not found]     <20230125075636.363cafaf@hermes.local>
     [not found]     ` <3688057.uBEoKPz9u1@thomas>
     [not found]       ` <DS0PR11MB73090EC350B82E0730D0D9A197CE9@DS0PR11MB7309.namprd11.prod.outlook.com>
2023-05-05 15:05  3%     ` Minutes of Technical Board Meeting, 2023-01-11 Stephen Hemminger
2023-05-11  8:16     [PATCH v2] eventdev: avoid non-burst shortcut for variable-size bursts Mattias Rönnblom
2023-05-11  8:24     ` [PATCH v3] " Mattias Rönnblom
2023-05-12 11:59       ` Jerin Jacob
2023-05-12 13:15         ` Mattias Rönnblom
2023-05-15 12:38           ` Jerin Jacob
2023-05-15 20:52  3%         ` Mattias Rönnblom
2023-05-16  6:37     [PATCH v1 0/7] ethdev: modify field API for multiple headers Michael Baum
2023-05-16  6:37  3% ` [PATCH v1 5/7] ethdev: add GENEVE TLV option modification support Michael Baum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).