From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> In rte_pktmbuf_free(), there might be cache miss/memory stall issue. In small packet case, it could harm the performance. >From the result of memnic-tester, in less than 1024 frame size the performance could be improved. Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU. size | before | after 64 | 5.55Mpps | 5.83Mpps 128 | 5.44Mpps | 5.71Mpps 256 | 5.22Mpps | 5.40Mpps 512 | 4.52Mpps | 4.64Mpps 1024 | 3.73Mpps | 3.68Mpps 1280 | 3.22Mpps | 3.17Mpps 1518 | 2.93Mpps | 2.90Mpps Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Reviewed-by: Hayato Momma <h-momma@ce.jp.nec.com> --- pmd/pmd_memnic.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c index cc0ae25..1db065f 100644 --- a/pmd/pmd_memnic.c +++ b/pmd/pmd_memnic.c @@ -344,7 +344,7 @@ static uint16_t memnic_xmit_pkts(void *tx_queue, struct memnic_adapter *adapter = q->adapter; struct memnic_data *data = &adapter->nic->down; struct memnic_packet *p; - uint16_t nr; + uint16_t i, nr; int idx; struct rte_eth_stats *st = &adapter->stats[rte_lcore_id()]; uint64_t pkts, bytes, errs; @@ -408,9 +408,9 @@ retry: rte_compiler_barrier(); p->status = MEMNIC_PKT_ST_FILLED; - - rte_pktmbuf_free(tx_pkts[nr]); } + for (i = 0; i < nr; i++) + rte_pktmbuf_free(tx_pkts[i]); /* stats */ st->opackets += pkts; -- 1.8.3.1
2014-09-11 07:52, Hiroshi Shimamoto:
> @@ -408,9 +408,9 @@ retry:
>
> rte_compiler_barrier();
> p->status = MEMNIC_PKT_ST_FILLED;
> -
> - rte_pktmbuf_free(tx_pkts[nr]);
> }
> + for (i = 0; i < nr; i++)
> + rte_pktmbuf_free(tx_pkts[i]);
>
> /* stats */
> st->opackets += pkts;
>
You are bursting mbuf freeing. Why title is about "split"?
--
Thomas
On Sep 24, 2014, at 10:20 AM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote: > 2014-09-11 07:52, Hiroshi Shimamoto: >> @@ -408,9 +408,9 @@ retry: >> >> rte_compiler_barrier(); >> p->status = MEMNIC_PKT_ST_FILLED; >> - >> - rte_pktmbuf_free(tx_pkts[nr]); >> } >> + for (i = 0; i < nr; i++) >> + rte_pktmbuf_free(tx_pkts[i]); >> >> /* stats */ >> st->opackets += pkts; >> > > You are bursting mbuf freeing. Why title is about "split”? Maybe this should be a new API as in rte_pktmbuf_bulk_free(tx_pkts, nr); ?? This would remove the loop in the application and I know I have done the same thing for Pktgen too. > > -- > Thomas Keith Wiles, Principal Technologist with CTO office, Wind River mobile 972-213-5533
Hi Thomas, Keith, > Subject: Re: [dpdk-dev] [memnic PATCH 7/7] pmd: split calling mbuf free > > > On Sep 24, 2014, at 10:20 AM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote: > > > 2014-09-11 07:52, Hiroshi Shimamoto: > >> @@ -408,9 +408,9 @@ retry: > >> > >> rte_compiler_barrier(); > >> p->status = MEMNIC_PKT_ST_FILLED; > >> - > >> - rte_pktmbuf_free(tx_pkts[nr]); > >> } > >> + for (i = 0; i < nr; i++) > >> + rte_pktmbuf_free(tx_pkts[i]); > >> > >> /* stats */ > >> st->opackets += pkts; > >> > > > > You are bursting mbuf freeing. Why title is about "split”? I thought that in this patch splits main loop operations to putting content and freeing mbuf, then took work "split", but I see "burst mbuf freeing" is preferable. > > Maybe this should be a new API as in rte_pktmbuf_bulk_free(tx_pkts, nr); ?? > This would remove the loop in the application and I know I have done the same thing for Pktgen too. Good point, yes, I'm thinking that having new API like rte_pktmbuf_(alloc|free)_bulk() is good to reduce TLS access and gain performance. I put that on my stack, but haven't had a time yet. Do you have any plan to do such thing? thanks, Hiroshi > > > > -- > > Thomas > > Keith Wiles, Principal Technologist with CTO office, Wind River mobile 972-213-5533
On Sep 24, 2014, at 8:12 PM, Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> wrote: > Hi Thomas, Keith, > >> Subject: Re: [dpdk-dev] [memnic PATCH 7/7] pmd: split calling mbuf free >> >> >> On Sep 24, 2014, at 10:20 AM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote: >> >>> 2014-09-11 07:52, Hiroshi Shimamoto: >>>> @@ -408,9 +408,9 @@ retry: >>>> >>>> rte_compiler_barrier(); >>>> p->status = MEMNIC_PKT_ST_FILLED; >>>> - >>>> - rte_pktmbuf_free(tx_pkts[nr]); >>>> } >>>> + for (i = 0; i < nr; i++) >>>> + rte_pktmbuf_free(tx_pkts[i]); >>>> >>>> /* stats */ >>>> st->opackets += pkts; >>>> >>> >>> You are bursting mbuf freeing. Why title is about "split”? > > I thought that in this patch splits main loop operations to putting content and > freeing mbuf, then took work "split", but I see "burst mbuf freeing" is preferable. > >> >> Maybe this should be a new API as in rte_pktmbuf_bulk_free(tx_pkts, nr); ?? >> This would remove the loop in the application and I know I have done the same thing for Pktgen too. > > Good point, yes, I'm thinking that having new API like rte_pktmbuf_(alloc|free)_bulk() > is good to reduce TLS access and gain performance. > I put that on my stack, but haven't had a time yet. > > Do you have any plan to do such thing? I do not have any plans, but the alloc would be good too. > > thanks, > Hiroshi > >>> >>> -- >>> Thomas >> >> Keith Wiles, Principal Technologist with CTO office, Wind River mobile 972-213-5533 Keith Wiles, Principal Technologist with CTO office, Wind River mobile 972-213-5533