From: Ankur Dwivedi <adwivedi@marvell.com>
To: Vladimir Medvedkin <medvedkinv@gmail.com>
Cc: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>, Jerin Jacob <jerinj@marvell.com>,
Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>,
Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>,
Sunil Kumar Kori <skori@marvell.com>,
Rakesh Kudurumalla <rkudurumalla@marvell.com>
Subject: RE: [EXTERNAL] Re: [PATCH v1 04/12] node: add process callback for IP4 FIB
Date: Wed, 23 Apr 2025 09:46:48 +0000 [thread overview]
Message-ID: <BY1PR18MB6109AF4D383E22F53030A679DDBA2@BY1PR18MB6109.namprd18.prod.outlook.com> (raw)
In-Reply-To: <CANDrEHnPsoiNwwiuCursBKoqGo+e0EXSg20sg5oS4KwwNrdB5A@mail.gmail.com>
Hi Vladimir,
>Hi Ankur,
>
>пт, 18 апр. 2025 г. в 15:45, Ankur Dwivedi <adwivedi@marvell.com
><mailto:adwivedi@marvell.com> >:
>
>
>
> Hi Vladimir,
> >> diff --git a/lib/node/ip4_lookup_fib.c b/lib/node/ip4_lookup_fib.c
> >> index e87864e672..c535b191f8 100644
> >> --- a/lib/node/ip4_lookup_fib.c
> >> +++ b/lib/node/ip4_lookup_fib.c
> >> @@ -40,6 +40,169 @@ static struct ip4_lookup_fib_node_main
> >ip4_lookup_fib_nm;
> >> #define IP4_LOOKUP_NODE_PRIV1_OFF(ctx) \
> >> (((struct ip4_lookup_fib_node_ctx *)ctx)->mbuf_priv1_off)
> >>
> >> +static uint16_t
> >> +ip4_lookup_fib_node_process(struct rte_graph *graph, struct
>rte_node
> >*node, void **objs,
> >> + uint16_t nb_objs)
> >> +{
> >> + struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3, **pkts;
> >> + struct rte_fib *fib = IP4_LOOKUP_NODE_FIB(node->ctx);
> >> + const int dyn = IP4_LOOKUP_NODE_PRIV1_OFF(node->ctx);
> >> + struct rte_ipv4_hdr *ipv4_hdr;
> >> + uint64_t next_hop[nb_objs];
> >> + uint16_t lookup_err = 0;
> >> + void **to_next, **from;
> >> + uint16_t last_spec = 0;
> >> + rte_edge_t next_index;
> >> + uint16_t n_left_from;
> >> + uint32_t ip[nb_objs];
> >> + uint16_t held = 0;
> >> + uint32_t drop_nh;
> >> + uint16_t next;
> >> + int i, rc;
> >> +
> >> + /* Speculative next */
> >> + next_index = RTE_NODE_IP4_LOOKUP_NEXT_REWRITE;
> >> + /* Drop node */
> >> + drop_nh =
>((uint32_t)RTE_NODE_IP4_LOOKUP_NEXT_PKT_DROP) <<
> >16;
> >> +
> >> + pkts = (struct rte_mbuf **)objs;
> >> + from = objs;
> >> + n_left_from = nb_objs;
> >> +
> >> + /* Get stream for the speculated next node */
> >> + to_next = rte_node_next_stream_get(graph, node, next_index,
> >> +nb_objs);
> >> +
> >> + for (i = OBJS_PER_CLINE; i < RTE_GRAPH_BURST_SIZE; i +=
> >OBJS_PER_CLINE)
> >> + rte_prefetch0(&objs[i]);
> >
> >Does this prefetching loop make any sense? Unless objs are not
>passed across
> >threads this array likely to be in the cache already.
> >
> >And if objs are passed across threads, then why do you start
>prefetching from
> >the next cache line instead of the first, and why don't you stop at
>nb_objs?
> For example if cache size is 64 bytes. Then there will be 8 pointers to
>mbuf per cache line (if address size is 8 bytes). If nb_objs is 256, then the
>remaining pointers needs to be prefetched to cache. This loop helps to
>prefetch nb_objs pointers to cache.
>
>
>
>My comment was about necessity. I'll rephrase and enumerate my questions
>to be more clear:
>1. Are mbuf pointers contained in the objs array not in cache? My assumptions
>here are:
> - If you run graph in RTC mode, objs array is very likely already be in your L1
>cache (since some previous node/nodes just put packets there)
Yes, the objs are more likely to be in L1 cache in RTC. But if they are not the above loop prefetches them.
> - In dispatch mode it doesn't make much sense to run this lookup node in an
>another thread separately from the remaining nodes processing IPv4
If its run in another thread/lcore, the above prefetching will help.
>
>2. if you still thing this prefetching loop is required here, then:
>
>
> The loop can be changed to stop at nb_objs.
> for (i = OBJS_PER_CLINE; i < nb_objs; i += OBJS_PER_CLINE) { }
>
>
>
> why you start from the OBJS_PER_CLINE (second cache line of the objs array)
Because the first line is prefetched in __rte_node_process() which is called in rte_graph_walk().
>3. "i < nb_objs". Should be "i < RTE_ALIGN_CEIL(nb_objs, OBJS_PER_CLINE) /
>OBJS_PER_CLINE"
In this loop i will be incremented by 1. But the for loop which I mentioned above, also does the same thing but i is incremented by OBJS_PER_CLINE.
>
>
> >
> >> +
> >> +#if RTE_GRAPH_BURST_SIZE > 64
> >> + for (i = 0; i < 4 && i < n_left_from; i++) {
> >> + rte_prefetch0(pkts[i]);
> >> + rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[i], void *,
> >> + sizeof(struct rte_ether_hdr)));
> >
> >This construction does not make sense to me. Same as similar
>constructions
> >below
> >
> >The second prefetch has memory dependency of the data, that will
>be
> >prefetched by the first one. Does removing this prefetch affects
>performance?
> The first prefetches the data at mbuf address. The second prefetch is
>for the address containing ipv4 header. Both can be in separate cache lines, as
>ipv4 starts at mbuf->buf_addr + mbuf->data_off + sizeof(ethernet header).
>data_off is generally 64 bytes or 128 bytes depending on cache line size.
> >
>
>
>
>Indeed, both of them are in separate cache lines. But my point here is about
>data dependency for the second prefetch. In order to issue the prefetch
>instruction for the cache line containing the ipv4 header you must know the
>address. You calculate this address by:
>rte_pktmbuf_mtod_offset(pkts[i], void *, sizeof(struct rte_ether_hdr))
>rte_pktmbuf_mtod_offset is accessing mbuf->buf_addr and mbuf->data_off ,
>as you mentioned. But, these fields are not in your L1 cache at the moment,
>because you just asked to prefetch them with the previous instruction.
Ack.
>So my suggestion here would be to look at how it is done in
>ipv4_lookup_sse.c:ip4_lookup_node_process_vec()
>Simplifying, in a run loop it prefetches mbuf + 2, then prefetches CL with the v4
>header of the mbuf + 1 (assuming in a previous invocation mbuf CL was
>already fetched into L1), and finally does processing of the mbuf + 0 (again,
>assuming the previous iteration ipv4 CL was fetched).
I have seen the implementation. The code looks fine to me. I will try to add similar code in fib lookup.
>
>
> >> + }
> >> +#endif
> >> +
> >> + i = 0;
> >> + while (n_left_from >= 4) {
> >> +#if RTE_GRAPH_BURST_SIZE > 64
> >> + if (likely(n_left_from > 7)) {
> >> + rte_prefetch0(pkts[4]);
> >> + rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[4], void
> >*,
> >> + sizeof(struct rte_ether_hdr)));
> >> + rte_prefetch0(pkts[5]);
> >> + rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[5], void
> >*,
> >> + sizeof(struct rte_ether_hdr)));
> >> + rte_prefetch0(pkts[6]);
> >> + rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[6], void
> >*,
> >> + sizeof(struct rte_ether_hdr)));
> >> + rte_prefetch0(pkts[7]);
> >> + rte_prefetch0(rte_pktmbuf_mtod_offset(pkts[7], void
> >*,
> >> + sizeof(struct rte_ether_hdr)));
> >> + }
> >> +#endif
> >> +
> >> + mbuf0 = pkts[0];
> >> + mbuf1 = pkts[1];
> >> + mbuf2 = pkts[2];
> >> + mbuf3 = pkts[3];
> >> + pkts += 4;
> >> + n_left_from -= 4;
> >> + /* Extract DIP of mbuf0 */
> >> + ipv4_hdr = rte_pktmbuf_mtod_offset(mbuf0, struct
> >rte_ipv4_hdr *,
> >> + sizeof(struct rte_ether_hdr));
> >> + /* Extract cksum, ttl as ipv4 hdr is in cache */
> >> + node_mbuf_priv1(mbuf0, dyn)->cksum = ipv4_hdr-
> >>hdr_checksum;
> >> + node_mbuf_priv1(mbuf0, dyn)->ttl = ipv4_hdr-
>>time_to_live;
> >> +
> >> + ip[i++] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> >> +
> >> + /* Extract DIP of mbuf1 */
> >> + ipv4_hdr = rte_pktmbuf_mtod_offset(mbuf1, struct
> >rte_ipv4_hdr *,
> >> + sizeof(struct rte_ether_hdr));
> >> + /* Extract cksum, ttl as ipv4 hdr is in cache */
> >> + node_mbuf_priv1(mbuf1, dyn)->cksum = ipv4_hdr-
> >>hdr_checksum;
> >> + node_mbuf_priv1(mbuf1, dyn)->ttl = ipv4_hdr-
>>time_to_live;
> >> +
> >> + ip[i++] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> >> +
> >> + /* Extract DIP of mbuf2 */
> >> + ipv4_hdr = rte_pktmbuf_mtod_offset(mbuf2, struct
> >rte_ipv4_hdr *,
> >> + sizeof(struct rte_ether_hdr));
> >> + /* Extract cksum, ttl as ipv4 hdr is in cache */
> >> + node_mbuf_priv1(mbuf2, dyn)->cksum = ipv4_hdr-
> >>hdr_checksum;
> >> + node_mbuf_priv1(mbuf2, dyn)->ttl = ipv4_hdr-
>>time_to_live;
> >> +
> >> + ip[i++] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> >> +
> >> + /* Extract DIP of mbuf3 */
> >> + ipv4_hdr = rte_pktmbuf_mtod_offset(mbuf3, struct
> >rte_ipv4_hdr *,
> >> + sizeof(struct rte_ether_hdr));
> >> +
> >> + /* Extract cksum, ttl as ipv4 hdr is in cache */
> >> + node_mbuf_priv1(mbuf3, dyn)->cksum = ipv4_hdr-
> >>hdr_checksum;
> >> + node_mbuf_priv1(mbuf3, dyn)->ttl = ipv4_hdr-
>>time_to_live;
> >> +
> >> + ip[i++] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> >> + }
> >> + while (n_left_from > 0) {
> >> + mbuf0 = pkts[0];
> >> + pkts += 1;
> >> + n_left_from -= 1;
> >> +
> >> + /* Extract DIP of mbuf0 */
> >> + ipv4_hdr = rte_pktmbuf_mtod_offset(mbuf0, struct
> >rte_ipv4_hdr *,
> >> + sizeof(struct rte_ether_hdr));
> >> + /* Extract cksum, ttl as ipv4 hdr is in cache */
> >> + node_mbuf_priv1(mbuf0, dyn)->cksum = ipv4_hdr-
> >>hdr_checksum;
> >> + node_mbuf_priv1(mbuf0, dyn)->ttl = ipv4_hdr-
>>time_to_live;
> >> +
> >> + ip[i++] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> >> + }
> >> +
> >> + rc = rte_fib_lookup_bulk(fib, ip, next_hop, nb_objs);
> >> + if (unlikely(rc != 0))
> >> + return 0;
> >> +
> >> + for (i = 0; i < nb_objs; i++) {
> >> + if (unlikely(next_hop[i] == FIB_DEFAULT_NH)) {
> >> + next_hop[i] = drop_nh;
> >maybe it is worth just do define FIB_DEFAULT_NH as a drop_nh vaue
>to omit
> >these next_hop reassignments?
> Ack.
> >> + lookup_err += 1;
> >> + }
> >> +
> >> + mbuf0 = (struct rte_mbuf *)objs[i];
> >> + node_mbuf_priv1(mbuf0, dyn)->nh =
>(uint16_t)next_hop[i];
> >> + next = (uint16_t)(next_hop[i] >> 16);
> >> +
> >> + if (unlikely(next_index ^ next)) {
> >> + /* Copy things successfully speculated till now */
> >> + rte_memcpy(to_next, from, last_spec *
> >sizeof(from[0]));
> >> + from += last_spec;
> >> + to_next += last_spec;
> >> + held += last_spec;
> >> + last_spec = 0;
> >> +
> >> + rte_node_enqueue_x1(graph, node, next, from[0]);
> >> + from += 1;
> >> + } else {
> >> + last_spec += 1;
> >> + }
> >> + }
> >> +
> >> + /* !!! Home run !!! */
> >> + if (likely(last_spec == nb_objs)) {
> >> + rte_node_next_stream_move(graph, node, next_index);
> >> + return nb_objs;
> >> + }
> >> +
> >> + NODE_INCREMENT_XSTAT_ID(node, 0, lookup_err != 0,
>lookup_err);
> >> + held += last_spec;
> >> + rte_memcpy(to_next, from, last_spec * sizeof(from[0]));
> >> + rte_node_next_stream_put(graph, node, next_index, held);
> >> +
> >> + return nb_objs;
> >> +}
> >> +
> >>
>RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_node_ip4_fib_route_add, 25.07)
> >> int
> >> rte_node_ip4_fib_route_add(uint32_t ip, uint8_t depth, uint16_t
> >> next_hop, @@ -147,6 +310,7 @@ static struct rte_node_xstats
> >ip4_lookup_fib_xstats = {
> >> };
> >>
> >> static struct rte_node_register ip4_lookup_fib_node = {
> >> + .process = ip4_lookup_fib_node_process,
> >> .name = "ip4_lookup_fib",
> >>
> >> .init = ip4_lookup_fib_node_init,
> >
> >--
> >Regards,
> >Vladimir
>
>
>
>
>
>--
>
>Regards,
>
>Vladimir
Regards,
Ankur
next prev parent reply other threads:[~2025-04-23 9:46 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-15 12:10 [PATCH v1 00/12] add lookup fib nodes in graph library Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 01/12] fib: move macro to header file Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 02/12] node: add IP4 lookup FIB node Ankur Dwivedi
2025-04-16 7:32 ` Nitin Saxena
2025-04-16 10:26 ` [EXTERNAL] " Ankur Dwivedi
2025-04-16 9:34 ` Medvedkin, Vladimir
2025-04-16 10:07 ` [EXTERNAL] " Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 03/12] node: add IP4 FIB route add Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 04/12] node: add process callback for IP4 FIB Ankur Dwivedi
2025-04-16 7:54 ` Nitin Saxena
2025-04-16 12:54 ` Medvedkin, Vladimir
2025-04-18 7:38 ` [EXTERNAL] " Ankur Dwivedi
2025-04-19 18:14 ` Vladimir Medvedkin
2025-04-23 9:46 ` Ankur Dwivedi [this message]
2025-04-15 12:10 ` [PATCH v1 05/12] node: add next node in packet classification Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 06/12] app/graph: add IP4 lookup mode command Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 07/12] fib: move macro to header file Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 08/12] node: add IP6 lookup FIB node Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 09/12] node: add IP6 FIB route add Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 10/12] node: add process callback for IP6 FIB Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 11/12] node: add next node in packet classification Ankur Dwivedi
2025-04-15 12:10 ` [PATCH v1 12/12] app/graph: add IP6 lookup mode command Ankur Dwivedi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BY1PR18MB6109AF4D383E22F53030A679DDBA2@BY1PR18MB6109.namprd18.prod.outlook.com \
--to=adwivedi@marvell.com \
--cc=dev@dpdk.org \
--cc=jerinj@marvell.com \
--cc=medvedkinv@gmail.com \
--cc=ndabilpuram@marvell.com \
--cc=pbhagavatula@marvell.com \
--cc=rkudurumalla@marvell.com \
--cc=skori@marvell.com \
--cc=vladimir.medvedkin@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).