DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] Dpdk poor performance on virtual machine
       [not found]                                         ` <CABc_bMCqu-4V4gn=JPqO491BF6Cnjj=4SaAey9qyTQcha134yw@mail.gmail.com>
@ 2016-12-15  7:20                                           ` edgar helmut
  2016-12-15 12:54                                             ` Wiles, Keith
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-15  7:20 UTC (permalink / raw)
  To: users

Hi.
Some help is needed to understand performance issue on virtual machine.

Running testpmd over the host functions well (testpmd forwards 10g between
two 82599 ports).
However same application running on a virtual machine over same host
results with huge degradation in performance.
The testpmd then is not even able to read 100mbps from nic without drops,
and from a profile i made it looks like a dpdk application runs more than
10 times slower than over host...

Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
Qemu is 2.3.0 (though I tried with a newer as well).
NICs are connected to guest using pci passthrough, and guest's cpu is set
as passthrough (same as host).
On guest start the host allocates transparent hugepages (AnonHugePages) so
i assume the guest memory is backed with real hugepages on the host.
I tried binding with igb_uio and with uio_pci_generic but both results with
same performance.

Due to the performance difference i guess i miss something.

Please advise what may i miss here?
Is this a native penalty of qemu??

Thanks
Edgar

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15  7:20                                           ` [dpdk-users] Dpdk poor performance on virtual machine edgar helmut
@ 2016-12-15 12:54                                             ` Wiles, Keith
  2016-12-15 13:32                                               ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: Wiles, Keith @ 2016-12-15 12:54 UTC (permalink / raw)
  To: edgar helmut; +Cc: users


> On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com> wrote:
> 
> Hi.
> Some help is needed to understand performance issue on virtual machine.
> 
> Running testpmd over the host functions well (testpmd forwards 10g between
> two 82599 ports).
> However same application running on a virtual machine over same host
> results with huge degradation in performance.
> The testpmd then is not even able to read 100mbps from nic without drops,
> and from a profile i made it looks like a dpdk application runs more than
> 10 times slower than over host…

Not sure I understand the overall setup, but did you make sure the NIC/PCI bus is on the same socket as the VM. If you have multiple sockets on your platform. If you have to access the NIC across the QPI it could explain some of the performance drop. Not sure that much drop is this problem.

> 
> Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> Qemu is 2.3.0 (though I tried with a newer as well).
> NICs are connected to guest using pci passthrough, and guest's cpu is set
> as passthrough (same as host).
> On guest start the host allocates transparent hugepages (AnonHugePages) so
> i assume the guest memory is backed with real hugepages on the host.
> I tried binding with igb_uio and with uio_pci_generic but both results with
> same performance.
> 
> Due to the performance difference i guess i miss something.
> 
> Please advise what may i miss here?
> Is this a native penalty of qemu??
> 
> Thanks
> Edgar

Regards,
Keith


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15 12:54                                             ` Wiles, Keith
@ 2016-12-15 13:32                                               ` edgar helmut
  2016-12-15 14:33                                                 ` Hu, Xuekun
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-15 13:32 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: users

I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.

I just made two more steps:
1. setting iommu=pt for better usage of the igb_uio
2. using taskset and isolcpu so now it looks like the relevant dpdk cores
use dedicated cores.

It improved the performance though I still see significant difference
between the vm and the host which I can't fully explain.

any further idea?

Regards,
Edgar


On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com> wrote:

>
> > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> wrote:
> >
> > Hi.
> > Some help is needed to understand performance issue on virtual machine.
> >
> > Running testpmd over the host functions well (testpmd forwards 10g
> between
> > two 82599 ports).
> > However same application running on a virtual machine over same host
> > results with huge degradation in performance.
> > The testpmd then is not even able to read 100mbps from nic without drops,
> > and from a profile i made it looks like a dpdk application runs more than
> > 10 times slower than over host…
>
> Not sure I understand the overall setup, but did you make sure the NIC/PCI
> bus is on the same socket as the VM. If you have multiple sockets on your
> platform. If you have to access the NIC across the QPI it could explain
> some of the performance drop. Not sure that much drop is this problem.
>
> >
> > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > Qemu is 2.3.0 (though I tried with a newer as well).
> > NICs are connected to guest using pci passthrough, and guest's cpu is set
> > as passthrough (same as host).
> > On guest start the host allocates transparent hugepages (AnonHugePages)
> so
> > i assume the guest memory is backed with real hugepages on the host.
> > I tried binding with igb_uio and with uio_pci_generic but both results
> with
> > same performance.
> >
> > Due to the performance difference i guess i miss something.
> >
> > Please advise what may i miss here?
> > Is this a native penalty of qemu??
> >
> > Thanks
> > Edgar
>
> Regards,
> Keith
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15 13:32                                               ` edgar helmut
@ 2016-12-15 14:33                                                 ` Hu, Xuekun
  2016-12-15 17:17                                                   ` Stephen Hemminger
  2016-12-15 17:24                                                   ` edgar helmut
  0 siblings, 2 replies; 21+ messages in thread
From: Hu, Xuekun @ 2016-12-15 14:33 UTC (permalink / raw)
  To: edgar helmut, Wiles, Keith; +Cc: users

Are you sure the anonhugepages size was equal to the total VM's memory size? 
Sometimes, transparent huge page mechanism doesn't grantee the app is using
the real huge pages. 
 

-----Original Message-----
From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
Sent: Thursday, December 15, 2016 9:32 PM
To: Wiles, Keith
Cc: users@dpdk.org
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.

I just made two more steps:
1. setting iommu=pt for better usage of the igb_uio
2. using taskset and isolcpu so now it looks like the relevant dpdk cores
use dedicated cores.

It improved the performance though I still see significant difference
between the vm and the host which I can't fully explain.

any further idea?

Regards,
Edgar


On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com> wrote:

>
> > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> wrote:
> >
> > Hi.
> > Some help is needed to understand performance issue on virtual machine.
> >
> > Running testpmd over the host functions well (testpmd forwards 10g
> between
> > two 82599 ports).
> > However same application running on a virtual machine over same host
> > results with huge degradation in performance.
> > The testpmd then is not even able to read 100mbps from nic without drops,
> > and from a profile i made it looks like a dpdk application runs more than
> > 10 times slower than over host…
>
> Not sure I understand the overall setup, but did you make sure the NIC/PCI
> bus is on the same socket as the VM. If you have multiple sockets on your
> platform. If you have to access the NIC across the QPI it could explain
> some of the performance drop. Not sure that much drop is this problem.
>
> >
> > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > Qemu is 2.3.0 (though I tried with a newer as well).
> > NICs are connected to guest using pci passthrough, and guest's cpu is set
> > as passthrough (same as host).
> > On guest start the host allocates transparent hugepages (AnonHugePages)
> so
> > i assume the guest memory is backed with real hugepages on the host.
> > I tried binding with igb_uio and with uio_pci_generic but both results
> with
> > same performance.
> >
> > Due to the performance difference i guess i miss something.
> >
> > Please advise what may i miss here?
> > Is this a native penalty of qemu??
> >
> > Thanks
> > Edgar
>
> Regards,
> Keith
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15 14:33                                                 ` Hu, Xuekun
@ 2016-12-15 17:17                                                   ` Stephen Hemminger
  2016-12-15 17:29                                                     ` edgar helmut
  2016-12-15 17:24                                                   ` edgar helmut
  1 sibling, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2016-12-15 17:17 UTC (permalink / raw)
  To: Hu, Xuekun; +Cc: edgar helmut, Wiles, Keith, users

On Thu, 15 Dec 2016 14:33:25 +0000
"Hu, Xuekun" <xuekun.hu@intel.com> wrote:

> Are you sure the anonhugepages size was equal to the total VM's memory size? 
> Sometimes, transparent huge page mechanism doesn't grantee the app is using
> the real huge pages. 
>  
> 
> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> Sent: Thursday, December 15, 2016 9:32 PM
> To: Wiles, Keith
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
> 
> I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.
> 
> I just made two more steps:
> 1. setting iommu=pt for better usage of the igb_uio
> 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> use dedicated cores.
> 
> It improved the performance though I still see significant difference
> between the vm and the host which I can't fully explain.
> 
> any further idea?
> 
> Regards,
> Edgar
> 
> 
> On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
> 
> >  
> > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>  
> > wrote:  
> > >
> > > Hi.
> > > Some help is needed to understand performance issue on virtual machine.
> > >
> > > Running testpmd over the host functions well (testpmd forwards 10g  
> > between  
> > > two 82599 ports).
> > > However same application running on a virtual machine over same host
> > > results with huge degradation in performance.
> > > The testpmd then is not even able to read 100mbps from nic without drops,
> > > and from a profile i made it looks like a dpdk application runs more than
> > > 10 times slower than over host…  
> >
> > Not sure I understand the overall setup, but did you make sure the NIC/PCI
> > bus is on the same socket as the VM. If you have multiple sockets on your
> > platform. If you have to access the NIC across the QPI it could explain
> > some of the performance drop. Not sure that much drop is this problem.
> >  
> > >
> > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > NICs are connected to guest using pci passthrough, and guest's cpu is set
> > > as passthrough (same as host).
> > > On guest start the host allocates transparent hugepages (AnonHugePages)  
> > so  
> > > i assume the guest memory is backed with real hugepages on the host.
> > > I tried binding with igb_uio and with uio_pci_generic but both results  
> > with  
> > > same performance.
> > >
> > > Due to the performance difference i guess i miss something.
> > >
> > > Please advise what may i miss here?
> > > Is this a native penalty of qemu??
> > >
> > > Thanks
> > > Edgar  
> >
> > Regards,
> > Keith
> >
> >  

Also make sure you run host with 1G hugepages and run guest in hugepage
memory. If not, the IOMMU has to do 4K operations and thrashes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15 14:33                                                 ` Hu, Xuekun
  2016-12-15 17:17                                                   ` Stephen Hemminger
@ 2016-12-15 17:24                                                   ` edgar helmut
  2016-12-16  1:14                                                     ` Hu, Xuekun
  1 sibling, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-15 17:24 UTC (permalink / raw)
  To: Hu, Xuekun; +Cc: Wiles, Keith, users

in fact the vm was created with 6G RAM, its kernel boot args are defined
with 4 hugepages of 1G each, though when starting the vm i noted that
anonhugepages increased.

The relevant qemu process id is 6074, and the following sums the amount of
allocated AnonHugePages:
sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
'|awk '{s+=$1} END {print s}'
which results with 4360192

so not all the memory is backed with transparent hugepages though it is
more than the amount of hugepages the guest supposed to boot with.

How can I be sure that the required 4G hugepages are really allocated?, and
not, for example, only 2G out of the 4G are allocated (and the rest 2 are
mapping of the default 4K)?

thanks

On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com> wrote:

> Are you sure the anonhugepages size was equal to the total VM's memory
> size?
> Sometimes, transparent huge page mechanism doesn't grantee the app is using
> the real huge pages.
>
>
> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> Sent: Thursday, December 15, 2016 9:32 PM
> To: Wiles, Keith
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
>
> I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> 2.40GHz.
>
> I just made two more steps:
> 1. setting iommu=pt for better usage of the igb_uio
> 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> use dedicated cores.
>
> It improved the performance though I still see significant difference
> between the vm and the host which I can't fully explain.
>
> any further idea?
>
> Regards,
> Edgar
>
>
> On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
> >
> > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> > >
> > > Hi.
> > > Some help is needed to understand performance issue on virtual machine.
> > >
> > > Running testpmd over the host functions well (testpmd forwards 10g
> > between
> > > two 82599 ports).
> > > However same application running on a virtual machine over same host
> > > results with huge degradation in performance.
> > > The testpmd then is not even able to read 100mbps from nic without
> drops,
> > > and from a profile i made it looks like a dpdk application runs more
> than
> > > 10 times slower than over host…
> >
> > Not sure I understand the overall setup, but did you make sure the
> NIC/PCI
> > bus is on the same socket as the VM. If you have multiple sockets on your
> > platform. If you have to access the NIC across the QPI it could explain
> > some of the performance drop. Not sure that much drop is this problem.
> >
> > >
> > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > NICs are connected to guest using pci passthrough, and guest's cpu is
> set
> > > as passthrough (same as host).
> > > On guest start the host allocates transparent hugepages (AnonHugePages)
> > so
> > > i assume the guest memory is backed with real hugepages on the host.
> > > I tried binding with igb_uio and with uio_pci_generic but both results
> > with
> > > same performance.
> > >
> > > Due to the performance difference i guess i miss something.
> > >
> > > Please advise what may i miss here?
> > > Is this a native penalty of qemu??
> > >
> > > Thanks
> > > Edgar
> >
> > Regards,
> > Keith
> >
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15 17:17                                                   ` Stephen Hemminger
@ 2016-12-15 17:29                                                     ` edgar helmut
  2016-12-15 19:14                                                       ` Stephen Hemminger
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-15 17:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Hu, Xuekun, Wiles, Keith, users

Stephen, this is not the case, it relies on using the transparent hugepages
which looks like 2M hugepages size.
Why should be a problem to back 1G pages of the guest to 2M pages at the
host?
the transparent hugepages makes the deployment much more flexible.



On Thu, Dec 15, 2016 at 7:17 PM, Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Thu, 15 Dec 2016 14:33:25 +0000
> "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
>
> > Are you sure the anonhugepages size was equal to the total VM's memory
> size?
> > Sometimes, transparent huge page mechanism doesn't grantee the app is
> using
> > the real huge pages.
> >
> >
> > -----Original Message-----
> > From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> > Sent: Thursday, December 15, 2016 9:32 PM
> > To: Wiles, Keith
> > Cc: users@dpdk.org
> > Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
> >
> > I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> 2.40GHz.
> >
> > I just made two more steps:
> > 1. setting iommu=pt for better usage of the igb_uio
> > 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> > use dedicated cores.
> >
> > It improved the performance though I still see significant difference
> > between the vm and the host which I can't fully explain.
> >
> > any further idea?
> >
> > Regards,
> > Edgar
> >
> >
> > On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
> >
> > >
> > > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com
> >
> > > wrote:
> > > >
> > > > Hi.
> > > > Some help is needed to understand performance issue on virtual
> machine.
> > > >
> > > > Running testpmd over the host functions well (testpmd forwards 10g
> > > between
> > > > two 82599 ports).
> > > > However same application running on a virtual machine over same host
> > > > results with huge degradation in performance.
> > > > The testpmd then is not even able to read 100mbps from nic without
> drops,
> > > > and from a profile i made it looks like a dpdk application runs more
> than
> > > > 10 times slower than over host…
> > >
> > > Not sure I understand the overall setup, but did you make sure the
> NIC/PCI
> > > bus is on the same socket as the VM. If you have multiple sockets on
> your
> > > platform. If you have to access the NIC across the QPI it could explain
> > > some of the performance drop. Not sure that much drop is this problem.
> > >
> > > >
> > > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > > NICs are connected to guest using pci passthrough, and guest's cpu
> is set
> > > > as passthrough (same as host).
> > > > On guest start the host allocates transparent hugepages
> (AnonHugePages)
> > > so
> > > > i assume the guest memory is backed with real hugepages on the host.
> > > > I tried binding with igb_uio and with uio_pci_generic but both
> results
> > > with
> > > > same performance.
> > > >
> > > > Due to the performance difference i guess i miss something.
> > > >
> > > > Please advise what may i miss here?
> > > > Is this a native penalty of qemu??
> > > >
> > > > Thanks
> > > > Edgar
> > >
> > > Regards,
> > > Keith
> > >
> > >
>
> Also make sure you run host with 1G hugepages and run guest in hugepage
> memory. If not, the IOMMU has to do 4K operations and thrashes.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15 17:29                                                     ` edgar helmut
@ 2016-12-15 19:14                                                       ` Stephen Hemminger
  2016-12-15 19:29                                                         ` Jes Nielsen
  0 siblings, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2016-12-15 19:14 UTC (permalink / raw)
  To: edgar helmut; +Cc: Hu, Xuekun, Wiles, Keith, users

On Thu, 15 Dec 2016 19:29:54 +0200
edgar helmut <helmut.edgar100@gmail.com> wrote:

> Stephen, this is not the case, it relies on using the transparent hugepages
> which looks like 2M hugepages size.
> Why should be a problem to back 1G pages of the guest to 2M pages at the
> host?
> the transparent hugepages makes the deployment much more flexible.


The IOMMU has a cache (like TLB) which has a limited number of slots.
If using 2M pages then the IOMMU cache misses will cause a performance drop
just like CPU cache misses.  I think Intel had some slides back at IDF 2012
that showed the impact.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15 19:14                                                       ` Stephen Hemminger
@ 2016-12-15 19:29                                                         ` Jes Nielsen
  0 siblings, 0 replies; 21+ messages in thread
From: Jes Nielsen @ 2016-12-15 19:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: edgar helmut, Hu, Xuekun, Wiles, Keith, users

Are you perhaps running on an old Sandy Bridge x86?

I heard that Intel Sandy Bridge CPUs have a limitation with their VT-d
IOTLB that limits PCIe passthrough throughput. Sandy Bridge (and earlier)
CPUs are not recommended if high performance is required.

On Thu, Dec 15, 2016 at 1:14 PM, Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Thu, 15 Dec 2016 19:29:54 +0200
> edgar helmut <helmut.edgar100@gmail.com> wrote:
>
> > Stephen, this is not the case, it relies on using the transparent
> hugepages
> > which looks like 2M hugepages size.
> > Why should be a problem to back 1G pages of the guest to 2M pages at the
> > host?
> > the transparent hugepages makes the deployment much more flexible.
>
>
> The IOMMU has a cache (like TLB) which has a limited number of slots.
> If using 2M pages then the IOMMU cache misses will cause a performance drop
> just like CPU cache misses.  I think Intel had some slides back at IDF 2012
> that showed the impact.
>
>


-- 

Best Regards,

Jes Nielsen

6WIND, Solutions Engineering
Austin, TX 78730, USA
Tel: (512) 426-8222

www.6wind.com

This e-mail message, including any attachments, is for the sole use of the
intended recipient(s) and contains information
that is confidential and proprietary to 6WIND. All unauthorized review,
use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-15 17:24                                                   ` edgar helmut
@ 2016-12-16  1:14                                                     ` Hu, Xuekun
  2016-12-17 12:56                                                       ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: Hu, Xuekun @ 2016-12-16  1:14 UTC (permalink / raw)
  To: edgar helmut; +Cc: Wiles, Keith, users

You said VM’s memory was 6G, while transparent hugepages was only used ~4G (4360192KB). So some were mapped to 4K pages.

BTW, the memory used by transparent hugepage is not the hugepage you reserved in kernel boot option.

From: edgar helmut [mailto:helmut.edgar100@gmail.com]
Sent: Friday, December 16, 2016 1:24 AM
To: Hu, Xuekun
Cc: Wiles, Keith; users@dpdk.org
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

in fact the vm was created with 6G RAM, its kernel boot args are defined with 4 hugepages of 1G each, though when starting the vm i noted that anonhugepages increased.
The relevant qemu process id is 6074, and the following sums the amount of allocated AnonHugePages:
sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2} '|awk '{s+=$1} END {print s}'
which results with 4360192
so not all the memory is backed with transparent hugepages though it is more than the amount of hugepages the guest supposed to boot with.
How can I be sure that the required 4G hugepages are really allocated?, and not, for example, only 2G out of the 4G are allocated (and the rest 2 are mapping of the default 4K)?

thanks

On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com<mailto:xuekun.hu@intel.com>> wrote:
Are you sure the anonhugepages size was equal to the total VM's memory size?
Sometimes, transparent huge page mechanism doesn't grantee the app is using
the real huge pages.


-----Original Message-----
From: users [mailto:users-bounces@dpdk.org<mailto:users-bounces@dpdk.org>] On Behalf Of edgar helmut
Sent: Thursday, December 15, 2016 9:32 PM
To: Wiles, Keith
Cc: users@dpdk.org<mailto:users@dpdk.org>
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.

I just made two more steps:
1. setting iommu=pt for better usage of the igb_uio
2. using taskset and isolcpu so now it looks like the relevant dpdk cores
use dedicated cores.

It improved the performance though I still see significant difference
between the vm and the host which I can't fully explain.

any further idea?

Regards,
Edgar


On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com<mailto:keith.wiles@intel.com>> wrote:

>
> > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>>
> wrote:
> >
> > Hi.
> > Some help is needed to understand performance issue on virtual machine.
> >
> > Running testpmd over the host functions well (testpmd forwards 10g
> between
> > two 82599 ports).
> > However same application running on a virtual machine over same host
> > results with huge degradation in performance.
> > The testpmd then is not even able to read 100mbps from nic without drops,
> > and from a profile i made it looks like a dpdk application runs more than
> > 10 times slower than over host…
>
> Not sure I understand the overall setup, but did you make sure the NIC/PCI
> bus is on the same socket as the VM. If you have multiple sockets on your
> platform. If you have to access the NIC across the QPI it could explain
> some of the performance drop. Not sure that much drop is this problem.
>
> >
> > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > Qemu is 2.3.0 (though I tried with a newer as well).
> > NICs are connected to guest using pci passthrough, and guest's cpu is set
> > as passthrough (same as host).
> > On guest start the host allocates transparent hugepages (AnonHugePages)
> so
> > i assume the guest memory is backed with real hugepages on the host.
> > I tried binding with igb_uio and with uio_pci_generic but both results
> with
> > same performance.
> >
> > Due to the performance difference i guess i miss something.
> >
> > Please advise what may i miss here?
> > Is this a native penalty of qemu??
> >
> > Thanks
> > Edgar
>
> Regards,
> Keith
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-16  1:14                                                     ` Hu, Xuekun
@ 2016-12-17 12:56                                                       ` edgar helmut
  2016-12-23 19:22                                                         ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-17 12:56 UTC (permalink / raw)
  To: Hu, Xuekun; +Cc: Wiles, Keith, users

That's what I afraid.
In fact i need the host to back the entire guest's memory with hugepages.
I will find the way to do that and make the testing again.


On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:

> You said VM’s memory was 6G, while transparent hugepages was only used ~4G
> (4360192KB). So some were mapped to 4K pages.
>
>
>
> BTW, the memory used by transparent hugepage is not the hugepage you
> reserved in kernel boot option.
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Friday, December 16, 2016 1:24 AM
> *To:* Hu, Xuekun
> *Cc:* Wiles, Keith; users@dpdk.org
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> in fact the vm was created with 6G RAM, its kernel boot args are defined
> with 4 hugepages of 1G each, though when starting the vm i noted that
> anonhugepages increased.
>
> The relevant qemu process id is 6074, and the following sums the amount of
> allocated AnonHugePages:
> sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
> '|awk '{s+=$1} END {print s}'
>
> which results with 4360192
>
> so not all the memory is backed with transparent hugepages though it is
> more than the amount of hugepages the guest supposed to boot with.
>
> How can I be sure that the required 4G hugepages are really allocated?,
> and not, for example, only 2G out of the 4G are allocated (and the rest 2
> are mapping of the default 4K)?
>
>
>
> thanks
>
>
>
> On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com> wrote:
>
> Are you sure the anonhugepages size was equal to the total VM's memory
> size?
> Sometimes, transparent huge page mechanism doesn't grantee the app is using
> the real huge pages.
>
>
>
> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> Sent: Thursday, December 15, 2016 9:32 PM
> To: Wiles, Keith
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
>
> I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> 2.40GHz.
>
> I just made two more steps:
> 1. setting iommu=pt for better usage of the igb_uio
> 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> use dedicated cores.
>
> It improved the performance though I still see significant difference
> between the vm and the host which I can't fully explain.
>
> any further idea?
>
> Regards,
> Edgar
>
>
> On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
> >
> > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> > >
> > > Hi.
> > > Some help is needed to understand performance issue on virtual machine.
> > >
> > > Running testpmd over the host functions well (testpmd forwards 10g
> > between
> > > two 82599 ports).
> > > However same application running on a virtual machine over same host
> > > results with huge degradation in performance.
> > > The testpmd then is not even able to read 100mbps from nic without
> drops,
> > > and from a profile i made it looks like a dpdk application runs more
> than
> > > 10 times slower than over host…
> >
> > Not sure I understand the overall setup, but did you make sure the
> NIC/PCI
> > bus is on the same socket as the VM. If you have multiple sockets on your
> > platform. If you have to access the NIC across the QPI it could explain
> > some of the performance drop. Not sure that much drop is this problem.
> >
> > >
> > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > NICs are connected to guest using pci passthrough, and guest's cpu is
> set
> > > as passthrough (same as host).
> > > On guest start the host allocates transparent hugepages (AnonHugePages)
> > so
> > > i assume the guest memory is backed with real hugepages on the host.
> > > I tried binding with igb_uio and with uio_pci_generic but both results
> > with
> > > same performance.
> > >
> > > Due to the performance difference i guess i miss something.
> > >
> > > Please advise what may i miss here?
> > > Is this a native penalty of qemu??
> > >
> > > Thanks
> > > Edgar
> >
> > Regards,
> > Keith
> >
> >
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-17 12:56                                                       ` edgar helmut
@ 2016-12-23 19:22                                                         ` edgar helmut
  2016-12-24  7:06                                                           ` Hu, Xuekun
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-23 19:22 UTC (permalink / raw)
  To: Hu, Xuekun; +Cc: Wiles, Keith, users

Hello,
I changed the setup but still performance are poor :( and I need your help
to understand the root cause.
the setup is (sorry for long description):
(test equipment is pktgen using dpdk installed on a second physical machine
coonected with 82599 NICs)
host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket , ubuntu
16.04, with 4 hugepages of 1G each.
hypervizor (kvm): QEMU emulator version 2.5.0
guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M
each.
guest total memory is 2G and all of it is backed by the host with
transparent hugepages (I can see the AnonHugePages consumed at guest
creation). This memory includes the 512 hugepages for the testpmd
application.
I pinned and isolated the guest's vcpus (using kernel option isolcapu), and
could see clearly that the isolation functions well.

2 x 82599 NICs connected as passthrough using macvtap interfaces to the
guest, so the guest receives and forwards packets from one interface to the
second and vice versa.
at the guest I bind its interfaces using igb_uio.

the testpmd at guest starts dropping packets at about ~800mbps between both
ports bi-directional using two vcpus for forwarding (one for the
application management and two for forwarding).
at 1.2 gbps it drops a lot of packets.
the same testpmd configuration on the host (between both 82599 NICs)
forwards about 5-6gbps on both ports bi-directional.

I assumed that forwarding ~5-6 gbps between two ports should be trivial, so
it will be great if someone can share its configuration for a tested setup.

Any further idea will be highly appreciated.

Thanks.

On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100@gmail.com>
wrote:

> That's what I afraid.
> In fact i need the host to back the entire guest's memory with hugepages.
> I will find the way to do that and make the testing again.
>
>
> On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
>
> You said VM’s memory was 6G, while transparent hugepages was only used ~4G
> (4360192KB). So some were mapped to 4K pages.
>
>
>
> BTW, the memory used by transparent hugepage is not the hugepage you
> reserved in kernel boot option.
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Friday, December 16, 2016 1:24 AM
> *To:* Hu, Xuekun
> *Cc:* Wiles, Keith; users@dpdk.org
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> in fact the vm was created with 6G RAM, its kernel boot args are defined
> with 4 hugepages of 1G each, though when starting the vm i noted that
> anonhugepages increased.
>
> The relevant qemu process id is 6074, and the following sums the amount of
> allocated AnonHugePages:
> sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
> '|awk '{s+=$1} END {print s}'
>
> which results with 4360192
>
> so not all the memory is backed with transparent hugepages though it is
> more than the amount of hugepages the guest supposed to boot with.
>
> How can I be sure that the required 4G hugepages are really allocated?,
> and not, for example, only 2G out of the 4G are allocated (and the rest 2
> are mapping of the default 4K)?
>
>
>
> thanks
>
>
>
> On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com> wrote:
>
> Are you sure the anonhugepages size was equal to the total VM's memory
> size?
> Sometimes, transparent huge page mechanism doesn't grantee the app is using
> the real huge pages.
>
>
>
> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> Sent: Thursday, December 15, 2016 9:32 PM
> To: Wiles, Keith
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
>
> I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> 2.40GHz.
>
> I just made two more steps:
> 1. setting iommu=pt for better usage of the igb_uio
> 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> use dedicated cores.
>
> It improved the performance though I still see significant difference
> between the vm and the host which I can't fully explain.
>
> any further idea?
>
> Regards,
> Edgar
>
>
> On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
> >
> > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> > >
> > > Hi.
> > > Some help is needed to understand performance issue on virtual machine.
> > >
> > > Running testpmd over the host functions well (testpmd forwards 10g
> > between
> > > two 82599 ports).
> > > However same application running on a virtual machine over same host
> > > results with huge degradation in performance.
> > > The testpmd then is not even able to read 100mbps from nic without
> drops,
> > > and from a profile i made it looks like a dpdk application runs more
> than
> > > 10 times slower than over host…
> >
> > Not sure I understand the overall setup, but did you make sure the
> NIC/PCI
> > bus is on the same socket as the VM. If you have multiple sockets on your
> > platform. If you have to access the NIC across the QPI it could explain
> > some of the performance drop. Not sure that much drop is this problem.
> >
> > >
> > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > NICs are connected to guest using pci passthrough, and guest's cpu is
> set
> > > as passthrough (same as host).
> > > On guest start the host allocates transparent hugepages (AnonHugePages)
> > so
> > > i assume the guest memory is backed with real hugepages on the host.
> > > I tried binding with igb_uio and with uio_pci_generic but both results
> > with
> > > same performance.
> > >
> > > Due to the performance difference i guess i miss something.
> > >
> > > Please advise what may i miss here?
> > > Is this a native penalty of qemu??
> > >
> > > Thanks
> > > Edgar
> >
> > Regards,
> > Keith
> >
> >
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-23 19:22                                                         ` edgar helmut
@ 2016-12-24  7:06                                                           ` Hu, Xuekun
  2016-12-24  8:06                                                             ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: Hu, Xuekun @ 2016-12-24  7:06 UTC (permalink / raw)
  To: edgar helmut; +Cc: Wiles, Keith, users

Now your setup has a new thing, “macvtap”. I don’t know what’s the performance of using macvtap. I only know it has much worse perf than the “real” pci pass-through.

I also don’t know why you select such config for your setup, anonymous huge pages and macvtap. Any specific purpose?

I think you should get a baseline first, then to get how much perf dropped if using anonymous hugepages or macvtap。

1.      Baseline: real hugepage + real pci pass-through

2.      Anon hugepages vs hugepages

3.      Real pci pass-through vs. macvtap

From: edgar helmut [mailto:helmut.edgar100@gmail.com]
Sent: Saturday, December 24, 2016 3:23 AM
To: Hu, Xuekun <xuekun.hu@intel.com>
Cc: Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

Hello,
I changed the setup but still performance are poor :( and I need your help to understand the root cause.
the setup is (sorry for long description):
(test equipment is pktgen using dpdk installed on a second physical machine coonected with 82599 NICs)
host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket , ubuntu 16.04, with 4 hugepages of 1G each.
hypervizor (kvm): QEMU emulator version 2.5.0
guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M each.
guest total memory is 2G and all of it is backed by the host with transparent hugepages (I can see the AnonHugePages consumed at guest creation). This memory includes the 512 hugepages for the testpmd application.
I pinned and isolated the guest's vcpus (using kernel option isolcapu), and could see clearly that the isolation functions well.

2 x 82599 NICs connected as passthrough using macvtap interfaces to the guest, so the guest receives and forwards packets from one interface to the second and vice versa.
at the guest I bind its interfaces using igb_uio.
the testpmd at guest starts dropping packets at about ~800mbps between both ports bi-directional using two vcpus for forwarding (one for the application management and two for forwarding).
at 1.2 gbps it drops a lot of packets.
the same testpmd configuration on the host (between both 82599 NICs) forwards about 5-6gbps on both ports bi-directional.

I assumed that forwarding ~5-6 gbps between two ports should be trivial, so it will be great if someone can share its configuration for a tested setup.
Any further idea will be highly appreciated.

Thanks.

On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>> wrote:
That's what I afraid.
In fact i need the host to back the entire guest's memory with hugepages.
I will find the way to do that and make the testing again.


On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com<mailto:xuekun.hu@intel.com>> wrote:
You said VM’s memory was 6G, while transparent hugepages was only used ~4G (4360192KB). So some were mapped to 4K pages.

BTW, the memory used by transparent hugepage is not the hugepage you reserved in kernel boot option.

From: edgar helmut [mailto:helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>]
Sent: Friday, December 16, 2016 1:24 AM
To: Hu, Xuekun
Cc: Wiles, Keith; users@dpdk.org<mailto:users@dpdk.org>
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

in fact the vm was created with 6G RAM, its kernel boot args are defined with 4 hugepages of 1G each, though when starting the vm i noted that anonhugepages increased.
The relevant qemu process id is 6074, and the following sums the amount of allocated AnonHugePages:
sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2} '|awk '{s+=$1} END {print s}'
which results with 4360192
so not all the memory is backed with transparent hugepages though it is more than the amount of hugepages the guest supposed to boot with.
How can I be sure that the required 4G hugepages are really allocated?, and not, for example, only 2G out of the 4G are allocated (and the rest 2 are mapping of the default 4K)?

thanks

On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com<mailto:xuekun.hu@intel.com>> wrote:
Are you sure the anonhugepages size was equal to the total VM's memory size?
Sometimes, transparent huge page mechanism doesn't grantee the app is using
the real huge pages.


-----Original Message-----
From: users [mailto:users-bounces@dpdk.org<mailto:users-bounces@dpdk.org>] On Behalf Of edgar helmut
Sent: Thursday, December 15, 2016 9:32 PM
To: Wiles, Keith
Cc: users@dpdk.org<mailto:users@dpdk.org>
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.

I just made two more steps:
1. setting iommu=pt for better usage of the igb_uio
2. using taskset and isolcpu so now it looks like the relevant dpdk cores
use dedicated cores.

It improved the performance though I still see significant difference
between the vm and the host which I can't fully explain.

any further idea?

Regards,
Edgar


On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com<mailto:keith.wiles@intel.com>> wrote:

>
> > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>>
> wrote:
> >
> > Hi.
> > Some help is needed to understand performance issue on virtual machine.
> >
> > Running testpmd over the host functions well (testpmd forwards 10g
> between
> > two 82599 ports).
> > However same application running on a virtual machine over same host
> > results with huge degradation in performance.
> > The testpmd then is not even able to read 100mbps from nic without drops,
> > and from a profile i made it looks like a dpdk application runs more than
> > 10 times slower than over host…
>
> Not sure I understand the overall setup, but did you make sure the NIC/PCI
> bus is on the same socket as the VM. If you have multiple sockets on your
> platform. If you have to access the NIC across the QPI it could explain
> some of the performance drop. Not sure that much drop is this problem.
>
> >
> > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > Qemu is 2.3.0 (though I tried with a newer as well).
> > NICs are connected to guest using pci passthrough, and guest's cpu is set
> > as passthrough (same as host).
> > On guest start the host allocates transparent hugepages (AnonHugePages)
> so
> > i assume the guest memory is backed with real hugepages on the host.
> > I tried binding with igb_uio and with uio_pci_generic but both results
> with
> > same performance.
> >
> > Due to the performance difference i guess i miss something.
> >
> > Please advise what may i miss here?
> > Is this a native penalty of qemu??
> >
> > Thanks
> > Edgar
>
> Regards,
> Keith
>
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-24  7:06                                                           ` Hu, Xuekun
@ 2016-12-24  8:06                                                             ` edgar helmut
  2016-12-24 15:52                                                               ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-24  8:06 UTC (permalink / raw)
  To: Hu, Xuekun; +Cc: Wiles, Keith, users

I am looking for a mean to measure in and out packets to and from the vm
(without asking the vm itself). While pure passthrough doesn't expose an
interface to query for in/out pkts the macvtap exposes such an interface.
As for the anonymous hugepages I was looking for a more flexible method and
I assumed there is no much difference.
I will make the test with reserved hugepages.
However is there any knowledge about macvtap performance issues when
delivering 5-6 gbps?

Thanks


On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:

Now your setup has a new thing, “macvtap”. I don’t know what’s the
performance of using macvtap. I only know it has much worse perf than the
“real” pci pass-through.



I also don’t know why you select such config for your setup, anonymous huge
pages and macvtap. Any specific purpose?



I think you should get a baseline first, then to get how much perf dropped
if using anonymous hugepages or macvtap。

1.      Baseline: real hugepage + real pci pass-through

2.      Anon hugepages vs hugepages

3.      Real pci pass-through vs. macvtap



*From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
*Sent:* Saturday, December 24, 2016 3:23 AM
*To:* Hu, Xuekun <xuekun.hu@intel.com>
*Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org

*Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine



Hello,

I changed the setup but still performance are poor :( and I need your help
to understand the root cause.

the setup is (sorry for long description):

(test equipment is pktgen using dpdk installed on a second physical machine
coonected with 82599 NICs)

host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket , ubuntu
16.04, with 4 hugepages of 1G each.

hypervizor (kvm): QEMU emulator version 2.5.0

guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04

dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M
each.

guest total memory is 2G and all of it is backed by the host with
transparent hugepages (I can see the AnonHugePages consumed at guest
creation). This memory includes the 512 hugepages for the testpmd
application.

I pinned and isolated the guest's vcpus (using kernel option isolcapu), and
could see clearly that the isolation functions well.



2 x 82599 NICs connected as passthrough using macvtap interfaces to the
guest, so the guest receives and forwards packets from one interface to the
second and vice versa.

at the guest I bind its interfaces using igb_uio.

the testpmd at guest starts dropping packets at about ~800mbps between both
ports bi-directional using two vcpus for forwarding (one for the
application management and two for forwarding).

at 1.2 gbps it drops a lot of packets.

the same testpmd configuration on the host (between both 82599 NICs)
forwards about 5-6gbps on both ports bi-directional.

I assumed that forwarding ~5-6 gbps between two ports should be trivial, so
it will be great if someone can share its configuration for a tested setup.

Any further idea will be highly appreciated.



Thanks.



On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100@gmail.com>
wrote:

That's what I afraid.

In fact i need the host to back the entire guest's memory with hugepages.

I will find the way to do that and make the testing again.





On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:

You said VM’s memory was 6G, while transparent hugepages was only used ~4G
(4360192KB). So some were mapped to 4K pages.



BTW, the memory used by transparent hugepage is not the hugepage you
reserved in kernel boot option.



*From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
*Sent:* Friday, December 16, 2016 1:24 AM
*To:* Hu, Xuekun
*Cc:* Wiles, Keith; users@dpdk.org
*Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine



in fact the vm was created with 6G RAM, its kernel boot args are defined
with 4 hugepages of 1G each, though when starting the vm i noted that
anonhugepages increased.

The relevant qemu process id is 6074, and the following sums the amount of
allocated AnonHugePages:
sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
'|awk '{s+=$1} END {print s}'

which results with 4360192

so not all the memory is backed with transparent hugepages though it is
more than the amount of hugepages the guest supposed to boot with.

How can I be sure that the required 4G hugepages are really allocated?, and
not, for example, only 2G out of the 4G are allocated (and the rest 2 are
mapping of the default 4K)?



thanks



On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com> wrote:

Are you sure the anonhugepages size was equal to the total VM's memory size?
Sometimes, transparent huge page mechanism doesn't grantee the app is using
the real huge pages.



-----Original Message-----
From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
Sent: Thursday, December 15, 2016 9:32 PM
To: Wiles, Keith
Cc: users@dpdk.org
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.

I just made two more steps:
1. setting iommu=pt for better usage of the igb_uio
2. using taskset and isolcpu so now it looks like the relevant dpdk cores
use dedicated cores.

It improved the performance though I still see significant difference
between the vm and the host which I can't fully explain.

any further idea?

Regards,
Edgar


On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com> wrote:

>
> > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> wrote:
> >
> > Hi.
> > Some help is needed to understand performance issue on virtual machine.
> >
> > Running testpmd over the host functions well (testpmd forwards 10g
> between
> > two 82599 ports).
> > However same application running on a virtual machine over same host
> > results with huge degradation in performance.
> > The testpmd then is not even able to read 100mbps from nic without
drops,
> > and from a profile i made it looks like a dpdk application runs more
than
> > 10 times slower than over host…
>
> Not sure I understand the overall setup, but did you make sure the NIC/PCI
> bus is on the same socket as the VM. If you have multiple sockets on your
> platform. If you have to access the NIC across the QPI it could explain
> some of the performance drop. Not sure that much drop is this problem.
>
> >
> > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > Qemu is 2.3.0 (though I tried with a newer as well).
> > NICs are connected to guest using pci passthrough, and guest's cpu is
set
> > as passthrough (same as host).
> > On guest start the host allocates transparent hugepages (AnonHugePages)
> so
> > i assume the guest memory is backed with real hugepages on the host.
> > I tried binding with igb_uio and with uio_pci_generic but both results
> with
> > same performance.
> >
> > Due to the performance difference i guess i miss something.
> >
> > Please advise what may i miss here?
> > Is this a native penalty of qemu??
> >
> > Thanks
> > Edgar
>
> Regards,
> Keith
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-24  8:06                                                             ` edgar helmut
@ 2016-12-24 15:52                                                               ` edgar helmut
  2016-12-26  0:52                                                                 ` Hu, Xuekun
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-24 15:52 UTC (permalink / raw)
  To: Hu, Xuekun; +Cc: Wiles, Keith, users

any idea how to reserve hugepages for a guest (and not
transparent/anonymous hugepages) ?
i am using libvirt and any backing method I am trying results with
anonymous hugepage.
disabling the transparent hugepages resulted without any hugepages.

Thanks

On Sat, Dec 24, 2016 at 10:06 AM edgar helmut <helmut.edgar100@gmail.com>
wrote:

> I am looking for a mean to measure in and out packets to and from the vm
> (without asking the vm itself). While pure passthrough doesn't expose an
> interface to query for in/out pkts the macvtap exposes such an interface.
> As for the anonymous hugepages I was looking for a more flexible method
> and I assumed there is no much difference.
> I will make the test with reserved hugepages.
> However is there any knowledge about macvtap performance issues when
> delivering 5-6 gbps?
>
> Thanks
>
>
> On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
>
> Now your setup has a new thing, “macvtap”. I don’t know what’s the
> performance of using macvtap. I only know it has much worse perf than the
> “real” pci pass-through.
>
>
>
> I also don’t know why you select such config for your setup, anonymous
> huge pages and macvtap. Any specific purpose?
>
>
>
> I think you should get a baseline first, then to get how much perf dropped
> if using anonymous hugepages or macvtap。
>
> 1.      Baseline: real hugepage + real pci pass-through
>
> 2.      Anon hugepages vs hugepages
>
> 3.      Real pci pass-through vs. macvtap
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Saturday, December 24, 2016 3:23 AM
> *To:* Hu, Xuekun <xuekun.hu@intel.com>
> *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
>
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> Hello,
>
> I changed the setup but still performance are poor :( and I need your help
> to understand the root cause.
>
> the setup is (sorry for long description):
>
> (test equipment is pktgen using dpdk installed on a second physical
> machine coonected with 82599 NICs)
>
> host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket ,
> ubuntu 16.04, with 4 hugepages of 1G each.
>
> hypervizor (kvm): QEMU emulator version 2.5.0
>
> guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
>
> dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M
> each.
>
> guest total memory is 2G and all of it is backed by the host with
> transparent hugepages (I can see the AnonHugePages consumed at guest
> creation). This memory includes the 512 hugepages for the testpmd
> application.
>
> I pinned and isolated the guest's vcpus (using kernel option isolcapu),
> and could see clearly that the isolation functions well.
>
>
>
> 2 x 82599 NICs connected as passthrough using macvtap interfaces to the
> guest, so the guest receives and forwards packets from one interface to the
> second and vice versa.
>
> at the guest I bind its interfaces using igb_uio.
>
> the testpmd at guest starts dropping packets at about ~800mbps between
> both ports bi-directional using two vcpus for forwarding (one for the
> application management and two for forwarding).
>
> at 1.2 gbps it drops a lot of packets.
>
> the same testpmd configuration on the host (between both 82599 NICs)
> forwards about 5-6gbps on both ports bi-directional.
>
> I assumed that forwarding ~5-6 gbps between two ports should be trivial,
> so it will be great if someone can share its configuration for a tested
> setup.
>
> Any further idea will be highly appreciated.
>
>
>
> Thanks.
>
>
>
> On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100@gmail.com>
> wrote:
>
> That's what I afraid.
>
> In fact i need the host to back the entire guest's memory with hugepages.
>
> I will find the way to do that and make the testing again.
>
>
>
>
>
> On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
>
> You said VM’s memory was 6G, while transparent hugepages was only used ~4G
> (4360192KB). So some were mapped to 4K pages.
>
>
>
> BTW, the memory used by transparent hugepage is not the hugepage you
> reserved in kernel boot option.
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Friday, December 16, 2016 1:24 AM
> *To:* Hu, Xuekun
> *Cc:* Wiles, Keith; users@dpdk.org
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> in fact the vm was created with 6G RAM, its kernel boot args are defined
> with 4 hugepages of 1G each, though when starting the vm i noted that
> anonhugepages increased.
>
> The relevant qemu process id is 6074, and the following sums the amount of
> allocated AnonHugePages:
> sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
> '|awk '{s+=$1} END {print s}'
>
> which results with 4360192
>
> so not all the memory is backed with transparent hugepages though it is
> more than the amount of hugepages the guest supposed to boot with.
>
> How can I be sure that the required 4G hugepages are really allocated?,
> and not, for example, only 2G out of the 4G are allocated (and the rest 2
> are mapping of the default 4K)?
>
>
>
> thanks
>
>
>
> On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com> wrote:
>
> Are you sure the anonhugepages size was equal to the total VM's memory
> size?
> Sometimes, transparent huge page mechanism doesn't grantee the app is using
> the real huge pages.
>
>
>
> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> Sent: Thursday, December 15, 2016 9:32 PM
> To: Wiles, Keith
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
>
> I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> 2.40GHz.
>
> I just made two more steps:
> 1. setting iommu=pt for better usage of the igb_uio
> 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> use dedicated cores.
>
> It improved the performance though I still see significant difference
> between the vm and the host which I can't fully explain.
>
> any further idea?
>
> Regards,
> Edgar
>
>
> On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
> >
> > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> > >
> > > Hi.
> > > Some help is needed to understand performance issue on virtual machine.
> > >
> > > Running testpmd over the host functions well (testpmd forwards 10g
> > between
> > > two 82599 ports).
> > > However same application running on a virtual machine over same host
> > > results with huge degradation in performance.
> > > The testpmd then is not even able to read 100mbps from nic without
> drops,
> > > and from a profile i made it looks like a dpdk application runs more
> than
> > > 10 times slower than over host…
> >
> > Not sure I understand the overall setup, but did you make sure the
> NIC/PCI
> > bus is on the same socket as the VM. If you have multiple sockets on your
> > platform. If you have to access the NIC across the QPI it could explain
> > some of the performance drop. Not sure that much drop is this problem.
> >
> > >
> > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > NICs are connected to guest using pci passthrough, and guest's cpu is
> set
> > > as passthrough (same as host).
> > > On guest start the host allocates transparent hugepages (AnonHugePages)
> > so
> > > i assume the guest memory is backed with real hugepages on the host.
> > > I tried binding with igb_uio and with uio_pci_generic but both results
> > with
> > > same performance.
> > >
> > > Due to the performance difference i guess i miss something.
> > >
> > > Please advise what may i miss here?
> > > Is this a native penalty of qemu??
> > >
> > > Thanks
> > > Edgar
> >
> > Regards,
> > Keith
> >
> >
>
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-24 15:52                                                               ` edgar helmut
@ 2016-12-26  0:52                                                                 ` Hu, Xuekun
  2016-12-27 15:52                                                                   ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: Hu, Xuekun @ 2016-12-26  0:52 UTC (permalink / raw)
  To: edgar helmut; +Cc: Wiles, Keith, users

Searching “hugepages” in https://libvirt.org/formatdomain.html

If you are looking for to measure in and out packets through host, maybe you can look at vhost/virtio interface also.

After your testing, if you can report the performace out with macvtap, that also helps us. ☺


From: edgar helmut [mailto:helmut.edgar100@gmail.com]
Sent: Saturday, December 24, 2016 11:53 PM
To: Hu, Xuekun <xuekun.hu@intel.com>
Cc: Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

any idea how to reserve hugepages for a guest (and not transparent/anonymous hugepages) ?
i am using libvirt and any backing method I am trying results with anonymous hugepage.
disabling the transparent hugepages resulted without any hugepages.
Thanks

On Sat, Dec 24, 2016 at 10:06 AM edgar helmut <helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>> wrote:
I am looking for a mean to measure in and out packets to and from the vm (without asking the vm itself). While pure passthrough doesn't expose an interface to query for in/out pkts the macvtap exposes such an interface.
As for the anonymous hugepages I was looking for a more flexible method and I assumed there is no much difference.
I will make the test with reserved hugepages.
However is there any knowledge about macvtap performance issues when delivering 5-6 gbps?

Thanks


On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu@intel.com<mailto:xuekun.hu@intel.com>> wrote:
Now your setup has a new thing, “macvtap”. I don’t know what’s the performance of using macvtap. I only know it has much worse perf than the “real” pci pass-through.

I also don’t know why you select such config for your setup, anonymous huge pages and macvtap. Any specific purpose?

I think you should get a baseline first, then to get how much perf dropped if using anonymous hugepages or macvtap。

1.      Baseline: real hugepage + real pci pass-through

2.      Anon hugepages vs hugepages

3.      Real pci pass-through vs. macvtap

From: edgar helmut [mailto:helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>]
Sent: Saturday, December 24, 2016 3:23 AM
To: Hu, Xuekun <xuekun.hu@intel.com<mailto:xuekun.hu@intel.com>>
Cc: Wiles, Keith <keith.wiles@intel.com<mailto:keith.wiles@intel.com>>; users@dpdk.org<mailto:users@dpdk.org>

Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

Hello,
I changed the setup but still performance are poor :( and I need your help to understand the root cause.
the setup is (sorry for long description):
(test equipment is pktgen using dpdk installed on a second physical machine coonected with 82599 NICs)
host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket , ubuntu 16.04, with 4 hugepages of 1G each.
hypervizor (kvm): QEMU emulator version 2.5.0
guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M each.
guest total memory is 2G and all of it is backed by the host with transparent hugepages (I can see the AnonHugePages consumed at guest creation). This memory includes the 512 hugepages for the testpmd application.
I pinned and isolated the guest's vcpus (using kernel option isolcapu), and could see clearly that the isolation functions well.

2 x 82599 NICs connected as passthrough using macvtap interfaces to the guest, so the guest receives and forwards packets from one interface to the second and vice versa.
at the guest I bind its interfaces using igb_uio.
the testpmd at guest starts dropping packets at about ~800mbps between both ports bi-directional using two vcpus for forwarding (one for the application management and two for forwarding).
at 1.2 gbps it drops a lot of packets.
the same testpmd configuration on the host (between both 82599 NICs) forwards about 5-6gbps on both ports bi-directional.

I assumed that forwarding ~5-6 gbps between two ports should be trivial, so it will be great if someone can share its configuration for a tested setup.
Any further idea will be highly appreciated.

Thanks.

On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>> wrote:
That's what I afraid.
In fact i need the host to back the entire guest's memory with hugepages.
I will find the way to do that and make the testing again.


On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com<mailto:xuekun.hu@intel.com>> wrote:
You said VM’s memory was 6G, while transparent hugepages was only used ~4G (4360192KB). So some were mapped to 4K pages.

BTW, the memory used by transparent hugepage is not the hugepage you reserved in kernel boot option.

From: edgar helmut [mailto:helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>]
Sent: Friday, December 16, 2016 1:24 AM
To: Hu, Xuekun
Cc: Wiles, Keith; users@dpdk.org<mailto:users@dpdk.org>
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

in fact the vm was created with 6G RAM, its kernel boot args are defined with 4 hugepages of 1G each, though when starting the vm i noted that anonhugepages increased.
The relevant qemu process id is 6074, and the following sums the amount of allocated AnonHugePages:
sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2} '|awk '{s+=$1} END {print s}'
which results with 4360192
so not all the memory is backed with transparent hugepages though it is more than the amount of hugepages the guest supposed to boot with.
How can I be sure that the required 4G hugepages are really allocated?, and not, for example, only 2G out of the 4G are allocated (and the rest 2 are mapping of the default 4K)?

thanks

On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com<mailto:xuekun.hu@intel.com>> wrote:
Are you sure the anonhugepages size was equal to the total VM's memory size?
Sometimes, transparent huge page mechanism doesn't grantee the app is using
the real huge pages.


-----Original Message-----
From: users [mailto:users-bounces@dpdk.org<mailto:users-bounces@dpdk.org>] On Behalf Of edgar helmut
Sent: Thursday, December 15, 2016 9:32 PM
To: Wiles, Keith
Cc: users@dpdk.org<mailto:users@dpdk.org>
Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine

I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz.

I just made two more steps:
1. setting iommu=pt for better usage of the igb_uio
2. using taskset and isolcpu so now it looks like the relevant dpdk cores
use dedicated cores.

It improved the performance though I still see significant difference
between the vm and the host which I can't fully explain.

any further idea?

Regards,
Edgar


On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com<mailto:keith.wiles@intel.com>> wrote:

>
> > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com<mailto:helmut.edgar100@gmail.com>>
> wrote:
> >
> > Hi.
> > Some help is needed to understand performance issue on virtual machine.
> >
> > Running testpmd over the host functions well (testpmd forwards 10g
> between
> > two 82599 ports).
> > However same application running on a virtual machine over same host
> > results with huge degradation in performance.
> > The testpmd then is not even able to read 100mbps from nic without drops,
> > and from a profile i made it looks like a dpdk application runs more than
> > 10 times slower than over host…
>
> Not sure I understand the overall setup, but did you make sure the NIC/PCI
> bus is on the same socket as the VM. If you have multiple sockets on your
> platform. If you have to access the NIC across the QPI it could explain
> some of the performance drop. Not sure that much drop is this problem.
>
> >
> > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > Qemu is 2.3.0 (though I tried with a newer as well).
> > NICs are connected to guest using pci passthrough, and guest's cpu is set
> > as passthrough (same as host).
> > On guest start the host allocates transparent hugepages (AnonHugePages)
> so
> > i assume the guest memory is backed with real hugepages on the host.
> > I tried binding with igb_uio and with uio_pci_generic but both results
> with
> > same performance.
> >
> > Due to the performance difference i guess i miss something.
> >
> > Please advise what may i miss here?
> > Is this a native penalty of qemu??
> >
> > Thanks
> > Edgar
>
> Regards,
> Keith
>
>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-26  0:52                                                                 ` Hu, Xuekun
@ 2016-12-27 15:52                                                                   ` edgar helmut
  2016-12-27 15:59                                                                     ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-27 15:52 UTC (permalink / raw)
  To: Hu, Xuekun; +Cc: Wiles, Keith, users

Thanks. That's the document i am following.
For the best i can only ask that the hugepages won't be shared with others,
but it never reserve it from the pre allocated hugepages of the host.
Did you have a chance to use hugepages for a guest

as for the interfaces, i am using the virtio/vhost which creates the
macvtap:
    <interface type='direct' managed='yes'>
        <source dev='ens6f0' mode='passthrough'/>
        <model type='virtio'/>
        <driver name='vhost' queues='2'/>
        </driver>
        <address type='pci' domain='0x0000' bus='0x04' slot='0x09'
function='0x0'/>
    </interface>

The following is a performance comparison host vs. vm using testpmd. as you
can see vm performance is poor.
(sudo x86_64-native-linuxapp-gcc/app/testpmd -c 0x1f -n 3 -m 1024 --
--coremask=0x1e --portmask=3 -i)


64 128 256 500 800 1000 1500
vm 0.23 0.42 0.75 1.3 2.3 2.7 3.9
host 3.6 6.35 8.3 9.5 9.7 9.8 9.82

I have to improve it dramatically.



On Mon, Dec 26, 2016 at 2:52 AM Hu, Xuekun <xuekun.hu@intel.com> wrote:

> Searching “hugepages” in https://libvirt.org/formatdomain.html
>
>
>
> If you are looking for to measure in and out packets through host, maybe
> you can look at vhost/virtio interface also.
>
>
>
> After your testing, if you can report the performace out with macvtap,
> that also helps us. J
>
>
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Saturday, December 24, 2016 11:53 PM
>
>
> *To:* Hu, Xuekun <xuekun.hu@intel.com>
> *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> any idea how to reserve hugepages for a guest (and not
> transparent/anonymous hugepages) ?
>
> i am using libvirt and any backing method I am trying results with
> anonymous hugepage.
>
> disabling the transparent hugepages resulted without any hugepages.
>
> Thanks
>
>
>
> On Sat, Dec 24, 2016 at 10:06 AM edgar helmut <helmut.edgar100@gmail.com>
> wrote:
>
> I am looking for a mean to measure in and out packets to and from the vm
> (without asking the vm itself). While pure passthrough doesn't expose an
> interface to query for in/out pkts the macvtap exposes such an interface.
>
> As for the anonymous hugepages I was looking for a more flexible method
> and I assumed there is no much difference.
>
> I will make the test with reserved hugepages.
>
> However is there any knowledge about macvtap performance issues when
> delivering 5-6 gbps?
>
>
>
> Thanks
>
>
>
>
>
> On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
>
> Now your setup has a new thing, “macvtap”. I don’t know what’s the
> performance of using macvtap. I only know it has much worse perf than the
> “real” pci pass-through.
>
>
>
> I also don’t know why you select such config for your setup, anonymous
> huge pages and macvtap. Any specific purpose?
>
>
>
> I think you should get a baseline first, then to get how much perf dropped
> if using anonymous hugepages or macvtap。
>
> 1.      Baseline: real hugepage + real pci pass-through
>
> 2.      Anon hugepages vs hugepages
>
> 3.      Real pci pass-through vs. macvtap
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Saturday, December 24, 2016 3:23 AM
> *To:* Hu, Xuekun <xuekun.hu@intel.com>
> *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
>
>
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> Hello,
>
> I changed the setup but still performance are poor :( and I need your help
> to understand the root cause.
>
> the setup is (sorry for long description):
>
> (test equipment is pktgen using dpdk installed on a second physical
> machine coonected with 82599 NICs)
>
> host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket ,
> ubuntu 16.04, with 4 hugepages of 1G each.
>
> hypervizor (kvm): QEMU emulator version 2.5.0
>
> guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
>
> dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M
> each.
>
> guest total memory is 2G and all of it is backed by the host with
> transparent hugepages (I can see the AnonHugePages consumed at guest
> creation). This memory includes the 512 hugepages for the testpmd
> application.
>
> I pinned and isolated the guest's vcpus (using kernel option isolcapu),
> and could see clearly that the isolation functions well.
>
>
>
> 2 x 82599 NICs connected as passthrough using macvtap interfaces to the
> guest, so the guest receives and forwards packets from one interface to the
> second and vice versa.
>
> at the guest I bind its interfaces using igb_uio.
>
> the testpmd at guest starts dropping packets at about ~800mbps between
> both ports bi-directional using two vcpus for forwarding (one for the
> application management and two for forwarding).
>
> at 1.2 gbps it drops a lot of packets.
>
> the same testpmd configuration on the host (between both 82599 NICs)
> forwards about 5-6gbps on both ports bi-directional.
>
> I assumed that forwarding ~5-6 gbps between two ports should be trivial,
> so it will be great if someone can share its configuration for a tested
> setup.
>
> Any further idea will be highly appreciated.
>
>
>
> Thanks.
>
>
>
> On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100@gmail.com>
> wrote:
>
> That's what I afraid.
>
> In fact i need the host to back the entire guest's memory with hugepages.
>
> I will find the way to do that and make the testing again.
>
>
>
>
>
> On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
>
> You said VM’s memory was 6G, while transparent hugepages was only used ~4G
> (4360192KB). So some were mapped to 4K pages.
>
>
>
> BTW, the memory used by transparent hugepage is not the hugepage you
> reserved in kernel boot option.
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Friday, December 16, 2016 1:24 AM
> *To:* Hu, Xuekun
> *Cc:* Wiles, Keith; users@dpdk.org
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> in fact the vm was created with 6G RAM, its kernel boot args are defined
> with 4 hugepages of 1G each, though when starting the vm i noted that
> anonhugepages increased.
>
> The relevant qemu process id is 6074, and the following sums the amount of
> allocated AnonHugePages:
> sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
> '|awk '{s+=$1} END {print s}'
>
> which results with 4360192
>
> so not all the memory is backed with transparent hugepages though it is
> more than the amount of hugepages the guest supposed to boot with.
>
> How can I be sure that the required 4G hugepages are really allocated?,
> and not, for example, only 2G out of the 4G are allocated (and the rest 2
> are mapping of the default 4K)?
>
>
>
> thanks
>
>
>
> On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com> wrote:
>
> Are you sure the anonhugepages size was equal to the total VM's memory
> size?
> Sometimes, transparent huge page mechanism doesn't grantee the app is using
> the real huge pages.
>
>
>
> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> Sent: Thursday, December 15, 2016 9:32 PM
> To: Wiles, Keith
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
>
> I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> 2.40GHz.
>
> I just made two more steps:
> 1. setting iommu=pt for better usage of the igb_uio
> 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> use dedicated cores.
>
> It improved the performance though I still see significant difference
> between the vm and the host which I can't fully explain.
>
> any further idea?
>
> Regards,
> Edgar
>
>
> On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
> >
> > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> > >
> > > Hi.
> > > Some help is needed to understand performance issue on virtual machine.
> > >
> > > Running testpmd over the host functions well (testpmd forwards 10g
> > between
> > > two 82599 ports).
> > > However same application running on a virtual machine over same host
> > > results with huge degradation in performance.
> > > The testpmd then is not even able to read 100mbps from nic without
> drops,
> > > and from a profile i made it looks like a dpdk application runs more
> than
> > > 10 times slower than over host…
> >
> > Not sure I understand the overall setup, but did you make sure the
> NIC/PCI
> > bus is on the same socket as the VM. If you have multiple sockets on your
> > platform. If you have to access the NIC across the QPI it could explain
> > some of the performance drop. Not sure that much drop is this problem.
> >
> > >
> > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > NICs are connected to guest using pci passthrough, and guest's cpu is
> set
> > > as passthrough (same as host).
> > > On guest start the host allocates transparent hugepages (AnonHugePages)
> > so
> > > i assume the guest memory is backed with real hugepages on the host.
> > > I tried binding with igb_uio and with uio_pci_generic but both results
> > with
> > > same performance.
> > >
> > > Due to the performance difference i guess i miss something.
> > >
> > > Please advise what may i miss here?
> > > Is this a native penalty of qemu??
> > >
> > > Thanks
> > > Edgar
> >
> > Regards,
> > Keith
> >
> >
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-27 15:52                                                                   ` edgar helmut
@ 2016-12-27 15:59                                                                     ` edgar helmut
  2016-12-27 18:52                                                                       ` Stephen Hemminger
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-27 15:59 UTC (permalink / raw)
  To: Hu, Xuekun; +Cc: Wiles, Keith, users

short explanation for how to read the comparison:
first row is packet length
throughput is half duplex, means:
second row is vm throughput of port 1 to 2 (port 2 to 1 has approximately
same throughput) in gbps.
third row is host throughput of port 1 to 2 (port 2 to 1 has approximately
same throughput) in gbps.

i.e. on 1500 bytes packet size testpmd delivers ~9.82 gbps from port 1 to 2
and another ~9.82 gbps from port 2 to 1, while at the vm it only delivers
~3.9 gbps for each direction.


On Tue, Dec 27, 2016 at 5:52 PM edgar helmut <helmut.edgar100@gmail.com>
wrote:

> Thanks. That's the document i am following.
> For the best i can only ask that the hugepages won't be shared with
> others, but it never reserve it from the pre allocated hugepages of the
> host.
> Did you have a chance to use hugepages for a guest
>
> as for the interfaces, i am using the virtio/vhost which creates the
> macvtap:
>     <interface type='direct' managed='yes'>
>         <source dev='ens6f0' mode='passthrough'/>
>         <model type='virtio'/>
>         <driver name='vhost' queues='2'/>
>         </driver>
>         <address type='pci' domain='0x0000' bus='0x04' slot='0x09'
> function='0x0'/>
>     </interface>
>
> The following is a performance comparison host vs. vm using testpmd. as
> you can see vm performance is poor.
>
> (sudo x86_64-native-linuxapp-gcc/app/testpmd -c 0x1f -n 3 -m 1024 --
> --coremask=0x1e --portmask=3 -i)
>
>
> 64 128 256 500 800 1000 1500
> vm 0.23 0.42 0.75 1.3 2.3 2.7 3.9
> host 3.6 6.35 8.3 9.5 9.7 9.8 9.82
>
> I have to improve it dramatically.
>
>
>
> On Mon, Dec 26, 2016 at 2:52 AM Hu, Xuekun <xuekun.hu@intel.com> wrote:
>
> Searching “hugepages” in https://libvirt.org/formatdomain.html
>
>
>
> If you are looking for to measure in and out packets through host, maybe
> you can look at vhost/virtio interface also.
>
>
>
> After your testing, if you can report the performace out with macvtap,
> that also helps us. J
>
>
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Saturday, December 24, 2016 11:53 PM
>
>
> *To:* Hu, Xuekun <xuekun.hu@intel.com>
> *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> any idea how to reserve hugepages for a guest (and not
> transparent/anonymous hugepages) ?
>
> i am using libvirt and any backing method I am trying results with
> anonymous hugepage.
>
> disabling the transparent hugepages resulted without any hugepages.
>
> Thanks
>
>
>
> On Sat, Dec 24, 2016 at 10:06 AM edgar helmut <helmut.edgar100@gmail.com>
> wrote:
>
> I am looking for a mean to measure in and out packets to and from the vm
> (without asking the vm itself). While pure passthrough doesn't expose an
> interface to query for in/out pkts the macvtap exposes such an interface.
>
> As for the anonymous hugepages I was looking for a more flexible method
> and I assumed there is no much difference.
>
> I will make the test with reserved hugepages.
>
> However is there any knowledge about macvtap performance issues when
> delivering 5-6 gbps?
>
>
>
> Thanks
>
>
>
>
>
> On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
>
> Now your setup has a new thing, “macvtap”. I don’t know what’s the
> performance of using macvtap. I only know it has much worse perf than the
> “real” pci pass-through.
>
>
>
> I also don’t know why you select such config for your setup, anonymous
> huge pages and macvtap. Any specific purpose?
>
>
>
> I think you should get a baseline first, then to get how much perf dropped
> if using anonymous hugepages or macvtap。
>
> 1.      Baseline: real hugepage + real pci pass-through
>
> 2.      Anon hugepages vs hugepages
>
> 3.      Real pci pass-through vs. macvtap
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Saturday, December 24, 2016 3:23 AM
> *To:* Hu, Xuekun <xuekun.hu@intel.com>
> *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
>
>
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> Hello,
>
> I changed the setup but still performance are poor :( and I need your help
> to understand the root cause.
>
> the setup is (sorry for long description):
>
> (test equipment is pktgen using dpdk installed on a second physical
> machine coonected with 82599 NICs)
>
> host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket ,
> ubuntu 16.04, with 4 hugepages of 1G each.
>
> hypervizor (kvm): QEMU emulator version 2.5.0
>
> guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
>
> dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M
> each.
>
> guest total memory is 2G and all of it is backed by the host with
> transparent hugepages (I can see the AnonHugePages consumed at guest
> creation). This memory includes the 512 hugepages for the testpmd
> application.
>
> I pinned and isolated the guest's vcpus (using kernel option isolcapu),
> and could see clearly that the isolation functions well.
>
>
>
> 2 x 82599 NICs connected as passthrough using macvtap interfaces to the
> guest, so the guest receives and forwards packets from one interface to the
> second and vice versa.
>
> at the guest I bind its interfaces using igb_uio.
>
> the testpmd at guest starts dropping packets at about ~800mbps between
> both ports bi-directional using two vcpus for forwarding (one for the
> application management and two for forwarding).
>
> at 1.2 gbps it drops a lot of packets.
>
> the same testpmd configuration on the host (between both 82599 NICs)
> forwards about 5-6gbps on both ports bi-directional.
>
> I assumed that forwarding ~5-6 gbps between two ports should be trivial,
> so it will be great if someone can share its configuration for a tested
> setup.
>
> Any further idea will be highly appreciated.
>
>
>
> Thanks.
>
>
>
> On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100@gmail.com>
> wrote:
>
> That's what I afraid.
>
> In fact i need the host to back the entire guest's memory with hugepages.
>
> I will find the way to do that and make the testing again.
>
>
>
>
>
> On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
>
> You said VM’s memory was 6G, while transparent hugepages was only used ~4G
> (4360192KB). So some were mapped to 4K pages.
>
>
>
> BTW, the memory used by transparent hugepage is not the hugepage you
> reserved in kernel boot option.
>
>
>
> *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> *Sent:* Friday, December 16, 2016 1:24 AM
> *To:* Hu, Xuekun
> *Cc:* Wiles, Keith; users@dpdk.org
> *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
>
>
>
> in fact the vm was created with 6G RAM, its kernel boot args are defined
> with 4 hugepages of 1G each, though when starting the vm i noted that
> anonhugepages increased.
>
> The relevant qemu process id is 6074, and the following sums the amount of
> allocated AnonHugePages:
> sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
> '|awk '{s+=$1} END {print s}'
>
> which results with 4360192
>
> so not all the memory is backed with transparent hugepages though it is
> more than the amount of hugepages the guest supposed to boot with.
>
> How can I be sure that the required 4G hugepages are really allocated?,
> and not, for example, only 2G out of the 4G are allocated (and the rest 2
> are mapping of the default 4K)?
>
>
>
> thanks
>
>
>
> On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com> wrote:
>
> Are you sure the anonhugepages size was equal to the total VM's memory
> size?
> Sometimes, transparent huge page mechanism doesn't grantee the app is using
> the real huge pages.
>
>
>
> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> Sent: Thursday, December 15, 2016 9:32 PM
> To: Wiles, Keith
> Cc: users@dpdk.org
> Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
>
> I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> 2.40GHz.
>
> I just made two more steps:
> 1. setting iommu=pt for better usage of the igb_uio
> 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> use dedicated cores.
>
> It improved the performance though I still see significant difference
> between the vm and the host which I can't fully explain.
>
> any further idea?
>
> Regards,
> Edgar
>
>
> On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> wrote:
>
> >
> > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> > >
> > > Hi.
> > > Some help is needed to understand performance issue on virtual machine.
> > >
> > > Running testpmd over the host functions well (testpmd forwards 10g
> > between
> > > two 82599 ports).
> > > However same application running on a virtual machine over same host
> > > results with huge degradation in performance.
> > > The testpmd then is not even able to read 100mbps from nic without
> drops,
> > > and from a profile i made it looks like a dpdk application runs more
> than
> > > 10 times slower than over host…
> >
> > Not sure I understand the overall setup, but did you make sure the
> NIC/PCI
> > bus is on the same socket as the VM. If you have multiple sockets on your
> > platform. If you have to access the NIC across the QPI it could explain
> > some of the performance drop. Not sure that much drop is this problem.
> >
> > >
> > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > NICs are connected to guest using pci passthrough, and guest's cpu is
> set
> > > as passthrough (same as host).
> > > On guest start the host allocates transparent hugepages (AnonHugePages)
> > so
> > > i assume the guest memory is backed with real hugepages on the host.
> > > I tried binding with igb_uio and with uio_pci_generic but both results
> > with
> > > same performance.
> > >
> > > Due to the performance difference i guess i miss something.
> > >
> > > Please advise what may i miss here?
> > > Is this a native penalty of qemu??
> > >
> > > Thanks
> > > Edgar
> >
> > Regards,
> > Keith
> >
> >
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-27 15:59                                                                     ` edgar helmut
@ 2016-12-27 18:52                                                                       ` Stephen Hemminger
  2016-12-28  8:09                                                                         ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2016-12-27 18:52 UTC (permalink / raw)
  To: edgar helmut; +Cc: Hu, Xuekun, Wiles, Keith, users

On Tue, 27 Dec 2016 15:59:08 +0000
edgar helmut <helmut.edgar100@gmail.com> wrote:

> short explanation for how to read the comparison:
> first row is packet length
> throughput is half duplex, means:
> second row is vm throughput of port 1 to 2 (port 2 to 1 has approximately
> same throughput) in gbps.
> third row is host throughput of port 1 to 2 (port 2 to 1 has approximately
> same throughput) in gbps.
> 
> i.e. on 1500 bytes packet size testpmd delivers ~9.82 gbps from port 1 to 2
> and another ~9.82 gbps from port 2 to 1, while at the vm it only delivers
> ~3.9 gbps for each direction.
> 
> 
> On Tue, Dec 27, 2016 at 5:52 PM edgar helmut <helmut.edgar100@gmail.com>
> wrote:
> 
> > Thanks. That's the document i am following.
> > For the best i can only ask that the hugepages won't be shared with
> > others, but it never reserve it from the pre allocated hugepages of the
> > host.
> > Did you have a chance to use hugepages for a guest
> >
> > as for the interfaces, i am using the virtio/vhost which creates the
> > macvtap:
> >     <interface type='direct' managed='yes'>
> >         <source dev='ens6f0' mode='passthrough'/>
> >         <model type='virtio'/>
> >         <driver name='vhost' queues='2'/>
> >         </driver>
> >         <address type='pci' domain='0x0000' bus='0x04' slot='0x09'  
> > function='0x0'/>  
> >     </interface>
> >
> > The following is a performance comparison host vs. vm using testpmd. as
> > you can see vm performance is poor.
> >
> > (sudo x86_64-native-linuxapp-gcc/app/testpmd -c 0x1f -n 3 -m 1024 --
> > --coremask=0x1e --portmask=3 -i)
> >
> >
> > 64 128 256 500 800 1000 1500
> > vm 0.23 0.42 0.75 1.3 2.3 2.7 3.9
> > host 3.6 6.35 8.3 9.5 9.7 9.8 9.82
> >
> > I have to improve it dramatically.
> >
> >
> >
> > On Mon, Dec 26, 2016 at 2:52 AM Hu, Xuekun <xuekun.hu@intel.com> wrote:
> >
> > Searching “hugepages” in https://libvirt.org/formatdomain.html
> >
> >
> >
> > If you are looking for to measure in and out packets through host, maybe
> > you can look at vhost/virtio interface also.
> >
> >
> >
> > After your testing, if you can report the performace out with macvtap,
> > that also helps us. J
> >
> >
> >
> >
> >
> > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > *Sent:* Saturday, December 24, 2016 11:53 PM
> >
> >
> > *To:* Hu, Xuekun <xuekun.hu@intel.com>
> > *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
> > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> >
> >
> >
> > any idea how to reserve hugepages for a guest (and not
> > transparent/anonymous hugepages) ?
> >
> > i am using libvirt and any backing method I am trying results with
> > anonymous hugepage.
> >
> > disabling the transparent hugepages resulted without any hugepages.
> >
> > Thanks
> >
> >
> >
> > On Sat, Dec 24, 2016 at 10:06 AM edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> >
> > I am looking for a mean to measure in and out packets to and from the vm
> > (without asking the vm itself). While pure passthrough doesn't expose an
> > interface to query for in/out pkts the macvtap exposes such an interface.
> >
> > As for the anonymous hugepages I was looking for a more flexible method
> > and I assumed there is no much difference.
> >
> > I will make the test with reserved hugepages.
> >
> > However is there any knowledge about macvtap performance issues when
> > delivering 5-6 gbps?
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> >
> > On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
> >
> > Now your setup has a new thing, “macvtap”. I don’t know what’s the
> > performance of using macvtap. I only know it has much worse perf than the
> > “real” pci pass-through.
> >
> >
> >
> > I also don’t know why you select such config for your setup, anonymous
> > huge pages and macvtap. Any specific purpose?
> >
> >
> >
> > I think you should get a baseline first, then to get how much perf dropped
> > if using anonymous hugepages or macvtap。
> >
> > 1.      Baseline: real hugepage + real pci pass-through
> >
> > 2.      Anon hugepages vs hugepages
> >
> > 3.      Real pci pass-through vs. macvtap
> >
> >
> >
> > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > *Sent:* Saturday, December 24, 2016 3:23 AM
> > *To:* Hu, Xuekun <xuekun.hu@intel.com>
> > *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
> >
> >
> > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> >
> >
> >
> > Hello,
> >
> > I changed the setup but still performance are poor :( and I need your help
> > to understand the root cause.
> >
> > the setup is (sorry for long description):
> >
> > (test equipment is pktgen using dpdk installed on a second physical
> > machine coonected with 82599 NICs)
> >
> > host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket ,
> > ubuntu 16.04, with 4 hugepages of 1G each.
> >
> > hypervizor (kvm): QEMU emulator version 2.5.0
> >
> > guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
> >
> > dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of 2M
> > each.
> >
> > guest total memory is 2G and all of it is backed by the host with
> > transparent hugepages (I can see the AnonHugePages consumed at guest
> > creation). This memory includes the 512 hugepages for the testpmd
> > application.
> >
> > I pinned and isolated the guest's vcpus (using kernel option isolcapu),
> > and could see clearly that the isolation functions well.
> >
> >
> >
> > 2 x 82599 NICs connected as passthrough using macvtap interfaces to the
> > guest, so the guest receives and forwards packets from one interface to the
> > second and vice versa.
> >
> > at the guest I bind its interfaces using igb_uio.
> >
> > the testpmd at guest starts dropping packets at about ~800mbps between
> > both ports bi-directional using two vcpus for forwarding (one for the
> > application management and two for forwarding).
> >
> > at 1.2 gbps it drops a lot of packets.
> >
> > the same testpmd configuration on the host (between both 82599 NICs)
> > forwards about 5-6gbps on both ports bi-directional.
> >
> > I assumed that forwarding ~5-6 gbps between two ports should be trivial,
> > so it will be great if someone can share its configuration for a tested
> > setup.
> >
> > Any further idea will be highly appreciated.
> >
> >
> >
> > Thanks.
> >
> >
> >
> > On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> >
> > That's what I afraid.
> >
> > In fact i need the host to back the entire guest's memory with hugepages.
> >
> > I will find the way to do that and make the testing again.
> >
> >
> >
> >
> >
> > On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
> >
> > You said VM’s memory was 6G, while transparent hugepages was only used ~4G
> > (4360192KB). So some were mapped to 4K pages.
> >
> >
> >
> > BTW, the memory used by transparent hugepage is not the hugepage you
> > reserved in kernel boot option.
> >
> >
> >
> > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > *Sent:* Friday, December 16, 2016 1:24 AM
> > *To:* Hu, Xuekun
> > *Cc:* Wiles, Keith; users@dpdk.org
> > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> >
> >
> >
> > in fact the vm was created with 6G RAM, its kernel boot args are defined
> > with 4 hugepages of 1G each, though when starting the vm i noted that
> > anonhugepages increased.
> >
> > The relevant qemu process id is 6074, and the following sums the amount of
> > allocated AnonHugePages:
> > sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print $2}
> > '|awk '{s+=$1} END {print s}'
> >
> > which results with 4360192
> >
> > so not all the memory is backed with transparent hugepages though it is
> > more than the amount of hugepages the guest supposed to boot with.
> >
> > How can I be sure that the required 4G hugepages are really allocated?,
> > and not, for example, only 2G out of the 4G are allocated (and the rest 2
> > are mapping of the default 4K)?
> >
> >
> >
> > thanks
> >
> >
> >
> > On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com> wrote:
> >
> > Are you sure the anonhugepages size was equal to the total VM's memory
> > size?
> > Sometimes, transparent huge page mechanism doesn't grantee the app is using
> > the real huge pages.
> >
> >
> >
> > -----Original Message-----
> > From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> > Sent: Thursday, December 15, 2016 9:32 PM
> > To: Wiles, Keith
> > Cc: users@dpdk.org
> > Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
> >
> > I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> > 2.40GHz.
> >
> > I just made two more steps:
> > 1. setting iommu=pt for better usage of the igb_uio
> > 2. using taskset and isolcpu so now it looks like the relevant dpdk cores
> > use dedicated cores.
> >
> > It improved the performance though I still see significant difference
> > between the vm and the host which I can't fully explain.
> >
> > any further idea?
> >
> > Regards,
> > Edgar
> >
> >
> > On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> > wrote:
> >  
> > >  
> > > > On Dec 15, 2016, at 1:20 AM, edgar helmut <helmut.edgar100@gmail.com>  
> > > wrote:  
> > > >
> > > > Hi.
> > > > Some help is needed to understand performance issue on virtual machine.
> > > >
> > > > Running testpmd over the host functions well (testpmd forwards 10g  
> > > between  
> > > > two 82599 ports).
> > > > However same application running on a virtual machine over same host
> > > > results with huge degradation in performance.
> > > > The testpmd then is not even able to read 100mbps from nic without  
> > drops,  
> > > > and from a profile i made it looks like a dpdk application runs more  
> > than  
> > > > 10 times slower than over host…  
> > >
> > > Not sure I understand the overall setup, but did you make sure the  
> > NIC/PCI  
> > > bus is on the same socket as the VM. If you have multiple sockets on your
> > > platform. If you have to access the NIC across the QPI it could explain
> > > some of the performance drop. Not sure that much drop is this problem.
> > >  
> > > >
> > > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > > NICs are connected to guest using pci passthrough, and guest's cpu is  
> > set  
> > > > as passthrough (same as host).
> > > > On guest start the host allocates transparent hugepages (AnonHugePages)  
> > > so  
> > > > i assume the guest memory is backed with real hugepages on the host.
> > > > I tried binding with igb_uio and with uio_pci_generic but both results  
> > > with  
> > > > same performance.
> > > >
> > > > Due to the performance difference i guess i miss something.
> > > >
> > > > Please advise what may i miss here?
> > > > Is this a native penalty of qemu??
> > > >
> > > > Thanks
> > > > Edgar  

Did you setup KVM host to run guest  in huge pages?

https://access.redhat.com/solutions/36741

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-27 18:52                                                                       ` Stephen Hemminger
@ 2016-12-28  8:09                                                                         ` edgar helmut
  2017-01-04  6:44                                                                           ` edgar helmut
  0 siblings, 1 reply; 21+ messages in thread
From: edgar helmut @ 2016-12-28  8:09 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Hu, Xuekun, Wiles, Keith, users

I tried this procedure as well as few others.
at all of the procedures the HugePages_Free or HugePages_Rsvd doesn't
change when creating the guest, though only AnonHugePages increases.
I am struggling with it for a while without real success.
I do suspect that this anonymous hugepages doesn't really make the job.

any idea?


On Tue, Dec 27, 2016 at 8:52 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Tue, 27 Dec 2016 15:59:08 +0000
> edgar helmut <helmut.edgar100@gmail.com> wrote:
>
> > short explanation for how to read the comparison:
> > first row is packet length
> > throughput is half duplex, means:
> > second row is vm throughput of port 1 to 2 (port 2 to 1 has approximately
> > same throughput) in gbps.
> > third row is host throughput of port 1 to 2 (port 2 to 1 has
> approximately
> > same throughput) in gbps.
> >
> > i.e. on 1500 bytes packet size testpmd delivers ~9.82 gbps from port 1
> to 2
> > and another ~9.82 gbps from port 2 to 1, while at the vm it only delivers
> > ~3.9 gbps for each direction.
> >
> >
> > On Tue, Dec 27, 2016 at 5:52 PM edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> >
> > > Thanks. That's the document i am following.
> > > For the best i can only ask that the hugepages won't be shared with
> > > others, but it never reserve it from the pre allocated hugepages of the
> > > host.
> > > Did you have a chance to use hugepages for a guest
> > >
> > > as for the interfaces, i am using the virtio/vhost which creates the
> > > macvtap:
> > >     <interface type='direct' managed='yes'>
> > >         <source dev='ens6f0' mode='passthrough'/>
> > >         <model type='virtio'/>
> > >         <driver name='vhost' queues='2'/>
> > >         </driver>
> > >         <address type='pci' domain='0x0000' bus='0x04' slot='0x09'
> > > function='0x0'/>
> > >     </interface>
> > >
> > > The following is a performance comparison host vs. vm using testpmd. as
> > > you can see vm performance is poor.
> > >
> > > (sudo x86_64-native-linuxapp-gcc/app/testpmd -c 0x1f -n 3 -m 1024 --
> > > --coremask=0x1e --portmask=3 -i)
> > >
> > >
> > > 64 128 256 500 800 1000 1500
> > > vm 0.23 0.42 0.75 1.3 2.3 2.7 3.9
> > > host 3.6 6.35 8.3 9.5 9.7 9.8 9.82
> > >
> > > I have to improve it dramatically.
> > >
> > >
> > >
> > > On Mon, Dec 26, 2016 at 2:52 AM Hu, Xuekun <xuekun.hu@intel.com>
> wrote:
> > >
> > > Searching “hugepages” in https://libvirt.org/formatdomain.html
> > >
> > >
> > >
> > > If you are looking for to measure in and out packets through host,
> maybe
> > > you can look at vhost/virtio interface also.
> > >
> > >
> > >
> > > After your testing, if you can report the performace out with macvtap,
> > > that also helps us. J
> > >
> > >
> > >
> > >
> > >
> > > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > > *Sent:* Saturday, December 24, 2016 11:53 PM
> > >
> > >
> > > *To:* Hu, Xuekun <xuekun.hu@intel.com>
> > > *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
> > > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> > >
> > >
> > >
> > > any idea how to reserve hugepages for a guest (and not
> > > transparent/anonymous hugepages) ?
> > >
> > > i am using libvirt and any backing method I am trying results with
> > > anonymous hugepage.
> > >
> > > disabling the transparent hugepages resulted without any hugepages.
> > >
> > > Thanks
> > >
> > >
> > >
> > > On Sat, Dec 24, 2016 at 10:06 AM edgar helmut <
> helmut.edgar100@gmail.com>
> > > wrote:
> > >
> > > I am looking for a mean to measure in and out packets to and from the
> vm
> > > (without asking the vm itself). While pure passthrough doesn't expose
> an
> > > interface to query for in/out pkts the macvtap exposes such an
> interface.
> > >
> > > As for the anonymous hugepages I was looking for a more flexible method
> > > and I assumed there is no much difference.
> > >
> > > I will make the test with reserved hugepages.
> > >
> > > However is there any knowledge about macvtap performance issues when
> > > delivering 5-6 gbps?
> > >
> > >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > >
> > > On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
> > >
> > > Now your setup has a new thing, “macvtap”. I don’t know what’s the
> > > performance of using macvtap. I only know it has much worse perf than
> the
> > > “real” pci pass-through.
> > >
> > >
> > >
> > > I also don’t know why you select such config for your setup, anonymous
> > > huge pages and macvtap. Any specific purpose?
> > >
> > >
> > >
> > > I think you should get a baseline first, then to get how much perf
> dropped
> > > if using anonymous hugepages or macvtap。
> > >
> > > 1.      Baseline: real hugepage + real pci pass-through
> > >
> > > 2.      Anon hugepages vs hugepages
> > >
> > > 3.      Real pci pass-through vs. macvtap
> > >
> > >
> > >
> > > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > > *Sent:* Saturday, December 24, 2016 3:23 AM
> > > *To:* Hu, Xuekun <xuekun.hu@intel.com>
> > > *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
> > >
> > >
> > > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> > >
> > >
> > >
> > > Hello,
> > >
> > > I changed the setup but still performance are poor :( and I need your
> help
> > > to understand the root cause.
> > >
> > > the setup is (sorry for long description):
> > >
> > > (test equipment is pktgen using dpdk installed on a second physical
> > > machine coonected with 82599 NICs)
> > >
> > > host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket ,
> > > ubuntu 16.04, with 4 hugepages of 1G each.
> > >
> > > hypervizor (kvm): QEMU emulator version 2.5.0
> > >
> > > guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
> > >
> > > dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of
> 2M
> > > each.
> > >
> > > guest total memory is 2G and all of it is backed by the host with
> > > transparent hugepages (I can see the AnonHugePages consumed at guest
> > > creation). This memory includes the 512 hugepages for the testpmd
> > > application.
> > >
> > > I pinned and isolated the guest's vcpus (using kernel option isolcapu),
> > > and could see clearly that the isolation functions well.
> > >
> > >
> > >
> > > 2 x 82599 NICs connected as passthrough using macvtap interfaces to the
> > > guest, so the guest receives and forwards packets from one interface
> to the
> > > second and vice versa.
> > >
> > > at the guest I bind its interfaces using igb_uio.
> > >
> > > the testpmd at guest starts dropping packets at about ~800mbps between
> > > both ports bi-directional using two vcpus for forwarding (one for the
> > > application management and two for forwarding).
> > >
> > > at 1.2 gbps it drops a lot of packets.
> > >
> > > the same testpmd configuration on the host (between both 82599 NICs)
> > > forwards about 5-6gbps on both ports bi-directional.
> > >
> > > I assumed that forwarding ~5-6 gbps between two ports should be
> trivial,
> > > so it will be great if someone can share its configuration for a tested
> > > setup.
> > >
> > > Any further idea will be highly appreciated.
> > >
> > >
> > >
> > > Thanks.
> > >
> > >
> > >
> > > On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <
> helmut.edgar100@gmail.com>
> > > wrote:
> > >
> > > That's what I afraid.
> > >
> > > In fact i need the host to back the entire guest's memory with
> hugepages.
> > >
> > > I will find the way to do that and make the testing again.
> > >
> > >
> > >
> > >
> > >
> > > On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
> > >
> > > You said VM’s memory was 6G, while transparent hugepages was only used
> ~4G
> > > (4360192KB). So some were mapped to 4K pages.
> > >
> > >
> > >
> > > BTW, the memory used by transparent hugepage is not the hugepage you
> > > reserved in kernel boot option.
> > >
> > >
> > >
> > > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > > *Sent:* Friday, December 16, 2016 1:24 AM
> > > *To:* Hu, Xuekun
> > > *Cc:* Wiles, Keith; users@dpdk.org
> > > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> > >
> > >
> > >
> > > in fact the vm was created with 6G RAM, its kernel boot args are
> defined
> > > with 4 hugepages of 1G each, though when starting the vm i noted that
> > > anonhugepages increased.
> > >
> > > The relevant qemu process id is 6074, and the following sums the
> amount of
> > > allocated AnonHugePages:
> > > sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print
> $2}
> > > '|awk '{s+=$1} END {print s}'
> > >
> > > which results with 4360192
> > >
> > > so not all the memory is backed with transparent hugepages though it is
> > > more than the amount of hugepages the guest supposed to boot with.
> > >
> > > How can I be sure that the required 4G hugepages are really allocated?,
> > > and not, for example, only 2G out of the 4G are allocated (and the
> rest 2
> > > are mapping of the default 4K)?
> > >
> > >
> > >
> > > thanks
> > >
> > >
> > >
> > > On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com>
> wrote:
> > >
> > > Are you sure the anonhugepages size was equal to the total VM's memory
> > > size?
> > > Sometimes, transparent huge page mechanism doesn't grantee the app is
> using
> > > the real huge pages.
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> > > Sent: Thursday, December 15, 2016 9:32 PM
> > > To: Wiles, Keith
> > > Cc: users@dpdk.org
> > > Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
> > >
> > > I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> > > 2.40GHz.
> > >
> > > I just made two more steps:
> > > 1. setting iommu=pt for better usage of the igb_uio
> > > 2. using taskset and isolcpu so now it looks like the relevant dpdk
> cores
> > > use dedicated cores.
> > >
> > > It improved the performance though I still see significant difference
> > > between the vm and the host which I can't fully explain.
> > >
> > > any further idea?
> > >
> > > Regards,
> > > Edgar
> > >
> > >
> > > On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> > > wrote:
> > >
> > > >
> > > > > On Dec 15, 2016, at 1:20 AM, edgar helmut <
> helmut.edgar100@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hi.
> > > > > Some help is needed to understand performance issue on virtual
> machine.
> > > > >
> > > > > Running testpmd over the host functions well (testpmd forwards 10g
> > > > between
> > > > > two 82599 ports).
> > > > > However same application running on a virtual machine over same
> host
> > > > > results with huge degradation in performance.
> > > > > The testpmd then is not even able to read 100mbps from nic without
> > > drops,
> > > > > and from a profile i made it looks like a dpdk application runs
> more
> > > than
> > > > > 10 times slower than over host…
> > > >
> > > > Not sure I understand the overall setup, but did you make sure the
> > > NIC/PCI
> > > > bus is on the same socket as the VM. If you have multiple sockets on
> your
> > > > platform. If you have to access the NIC across the QPI it could
> explain
> > > > some of the performance drop. Not sure that much drop is this
> problem.
> > > >
> > > > >
> > > > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > > > NICs are connected to guest using pci passthrough, and guest's cpu
> is
> > > set
> > > > > as passthrough (same as host).
> > > > > On guest start the host allocates transparent hugepages
> (AnonHugePages)
> > > > so
> > > > > i assume the guest memory is backed with real hugepages on the
> host.
> > > > > I tried binding with igb_uio and with uio_pci_generic but both
> results
> > > > with
> > > > > same performance.
> > > > >
> > > > > Due to the performance difference i guess i miss something.
> > > > >
> > > > > Please advise what may i miss here?
> > > > > Is this a native penalty of qemu??
> > > > >
> > > > > Thanks
> > > > > Edgar
>
> Did you setup KVM host to run guest  in huge pages?
>
> https://access.redhat.com/solutions/36741
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-users] Dpdk poor performance on virtual machine
  2016-12-28  8:09                                                                         ` edgar helmut
@ 2017-01-04  6:44                                                                           ` edgar helmut
  0 siblings, 0 replies; 21+ messages in thread
From: edgar helmut @ 2017-01-04  6:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Hu, Xuekun, Wiles, Keith, users

finally hugepages issue was solved. it looks like there are some confusion
between /dev/hugepages and /run/hugepages.
solved by changing the "Where" at /lib/systemd/system/dev-hugepages.mount
from "/dev/hugepage" to "/run/hugepages/kvm" ... hope this will help others.

The bad news are, it didn't improve the performance dramatically.
digging into that and comparing with real passthrough of the pci it looks
like there is a real problem with the macvtap performance as suggested by
Hu.
Using macvtap the dpdk in the vm could poll up to ~5-6 gbps and after that
it is plenty with drops (missed pkts??), furthermore, the testpmd also lost
packets in its pipline (e.g. rx X pkts and tx Y pkts while X > Y), but I
don't know if it is because of the testpmd pipeling or the tx to the second
macvtap based interface.
However on pure pci passthrough i could transfer 10gbps without any
problem...
All of this were tested under 2, 3, 4 cores for the testpmd.

so to summarize, don't use macvtap (virtio, vhost) if you want to scale to
more than few gbps, use pci passthrough.
these are bad news for me as I can't have 10gbps throughput while capable
to count pkts in/out to and from the vm.
If someone has an idea how to do that i will be more than happy to hear.

edgar


On Wed, Dec 28, 2016 at 10:09 AM edgar helmut <helmut.edgar100@gmail.com>
wrote:

> I tried this procedure as well as few others.
> at all of the procedures the HugePages_Free or HugePages_Rsvd doesn't
> change when creating the guest, though only AnonHugePages increases.
> I am struggling with it for a while without real success.
> I do suspect that this anonymous hugepages doesn't really make the job.
>
> any idea?
>
>
> On Tue, Dec 27, 2016 at 8:52 PM Stephen Hemminger <
> stephen@networkplumber.org> wrote:
>
> On Tue, 27 Dec 2016 15:59:08 +0000
> edgar helmut <helmut.edgar100@gmail.com> wrote:
>
> > short explanation for how to read the comparison:
> > first row is packet length
> > throughput is half duplex, means:
> > second row is vm throughput of port 1 to 2 (port 2 to 1 has approximately
> > same throughput) in gbps.
> > third row is host throughput of port 1 to 2 (port 2 to 1 has
> approximately
> > same throughput) in gbps.
> >
> > i.e. on 1500 bytes packet size testpmd delivers ~9.82 gbps from port 1
> to 2
> > and another ~9.82 gbps from port 2 to 1, while at the vm it only delivers
> > ~3.9 gbps for each direction.
> >
> >
> > On Tue, Dec 27, 2016 at 5:52 PM edgar helmut <helmut.edgar100@gmail.com>
> > wrote:
> >
> > > Thanks. That's the document i am following.
> > > For the best i can only ask that the hugepages won't be shared with
> > > others, but it never reserve it from the pre allocated hugepages of the
> > > host.
> > > Did you have a chance to use hugepages for a guest
> > >
> > > as for the interfaces, i am using the virtio/vhost which creates the
> > > macvtap:
> > >     <interface type='direct' managed='yes'>
> > >         <source dev='ens6f0' mode='passthrough'/>
> > >         <model type='virtio'/>
> > >         <driver name='vhost' queues='2'/>
> > >         </driver>
> > >         <address type='pci' domain='0x0000' bus='0x04' slot='0x09'
> > > function='0x0'/>
> > >     </interface>
> > >
> > > The following is a performance comparison host vs. vm using testpmd. as
> > > you can see vm performance is poor.
> > >
> > > (sudo x86_64-native-linuxapp-gcc/app/testpmd -c 0x1f -n 3 -m 1024 --
> > > --coremask=0x1e --portmask=3 -i)
> > >
> > >
> > > 64 128 256 500 800 1000 1500
> > > vm 0.23 0.42 0.75 1.3 2.3 2.7 3.9
> > > host 3.6 6.35 8.3 9.5 9.7 9.8 9.82
> > >
> > > I have to improve it dramatically.
> > >
> > >
> > >
> > > On Mon, Dec 26, 2016 at 2:52 AM Hu, Xuekun <xuekun.hu@intel.com>
> wrote:
> > >
> > > Searching “hugepages” in https://libvirt.org/formatdomain.html
> > >
> > >
> > >
> > > If you are looking for to measure in and out packets through host,
> maybe
> > > you can look at vhost/virtio interface also.
> > >
> > >
> > >
> > > After your testing, if you can report the performace out with macvtap,
> > > that also helps us. J
> > >
> > >
> > >
> > >
> > >
> > > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > > *Sent:* Saturday, December 24, 2016 11:53 PM
> > >
> > >
> > > *To:* Hu, Xuekun <xuekun.hu@intel.com>
> > > *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
> > > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> > >
> > >
> > >
> > > any idea how to reserve hugepages for a guest (and not
> > > transparent/anonymous hugepages) ?
> > >
> > > i am using libvirt and any backing method I am trying results with
> > > anonymous hugepage.
> > >
> > > disabling the transparent hugepages resulted without any hugepages.
> > >
> > > Thanks
> > >
> > >
> > >
> > > On Sat, Dec 24, 2016 at 10:06 AM edgar helmut <
> helmut.edgar100@gmail.com>
> > > wrote:
> > >
> > > I am looking for a mean to measure in and out packets to and from the
> vm
> > > (without asking the vm itself). While pure passthrough doesn't expose
> an
> > > interface to query for in/out pkts the macvtap exposes such an
> interface.
> > >
> > > As for the anonymous hugepages I was looking for a more flexible method
> > > and I assumed there is no much difference.
> > >
> > > I will make the test with reserved hugepages.
> > >
> > > However is there any knowledge about macvtap performance issues when
> > > delivering 5-6 gbps?
> > >
> > >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > >
> > > On 24 Dec 2016 9:06 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
> > >
> > > Now your setup has a new thing, “macvtap”. I don’t know what’s the
> > > performance of using macvtap. I only know it has much worse perf than
> the
> > > “real” pci pass-through.
> > >
> > >
> > >
> > > I also don’t know why you select such config for your setup, anonymous
> > > huge pages and macvtap. Any specific purpose?
> > >
> > >
> > >
> > > I think you should get a baseline first, then to get how much perf
> dropped
> > > if using anonymous hugepages or macvtap。
> > >
> > > 1.      Baseline: real hugepage + real pci pass-through
> > >
> > > 2.      Anon hugepages vs hugepages
> > >
> > > 3.      Real pci pass-through vs. macvtap
> > >
> > >
> > >
> > > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > > *Sent:* Saturday, December 24, 2016 3:23 AM
> > > *To:* Hu, Xuekun <xuekun.hu@intel.com>
> > > *Cc:* Wiles, Keith <keith.wiles@intel.com>; users@dpdk.org
> > >
> > >
> > > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> > >
> > >
> > >
> > > Hello,
> > >
> > > I changed the setup but still performance are poor :( and I need your
> help
> > > to understand the root cause.
> > >
> > > the setup is (sorry for long description):
> > >
> > > (test equipment is pktgen using dpdk installed on a second physical
> > > machine coonected with 82599 NICs)
> > >
> > > host: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz with single socket ,
> > > ubuntu 16.04, with 4 hugepages of 1G each.
> > >
> > > hypervizor (kvm): QEMU emulator version 2.5.0
> > >
> > > guest: same cpu as host, created with 3 vcpus, using ubuntu 16.04
> > >
> > > dpdk: tried 2.2, 16.04, 16.07, 16.11 - using testpmd and 512 pages of
> 2M
> > > each.
> > >
> > > guest total memory is 2G and all of it is backed by the host with
> > > transparent hugepages (I can see the AnonHugePages consumed at guest
> > > creation). This memory includes the 512 hugepages for the testpmd
> > > application.
> > >
> > > I pinned and isolated the guest's vcpus (using kernel option isolcapu),
> > > and could see clearly that the isolation functions well.
> > >
> > >
> > >
> > > 2 x 82599 NICs connected as passthrough using macvtap interfaces to the
> > > guest, so the guest receives and forwards packets from one interface
> to the
> > > second and vice versa.
> > >
> > > at the guest I bind its interfaces using igb_uio.
> > >
> > > the testpmd at guest starts dropping packets at about ~800mbps between
> > > both ports bi-directional using two vcpus for forwarding (one for the
> > > application management and two for forwarding).
> > >
> > > at 1.2 gbps it drops a lot of packets.
> > >
> > > the same testpmd configuration on the host (between both 82599 NICs)
> > > forwards about 5-6gbps on both ports bi-directional.
> > >
> > > I assumed that forwarding ~5-6 gbps between two ports should be
> trivial,
> > > so it will be great if someone can share its configuration for a tested
> > > setup.
> > >
> > > Any further idea will be highly appreciated.
> > >
> > >
> > >
> > > Thanks.
> > >
> > >
> > >
> > > On Sat, Dec 17, 2016 at 2:56 PM edgar helmut <
> helmut.edgar100@gmail.com>
> > > wrote:
> > >
> > > That's what I afraid.
> > >
> > > In fact i need the host to back the entire guest's memory with
> hugepages.
> > >
> > > I will find the way to do that and make the testing again.
> > >
> > >
> > >
> > >
> > >
> > > On 16 Dec 2016 3:14 AM, "Hu, Xuekun" <xuekun.hu@intel.com> wrote:
> > >
> > > You said VM’s memory was 6G, while transparent hugepages was only used
> ~4G
> > > (4360192KB). So some were mapped to 4K pages.
> > >
> > >
> > >
> > > BTW, the memory used by transparent hugepage is not the hugepage you
> > > reserved in kernel boot option.
> > >
> > >
> > >
> > > *From:* edgar helmut [mailto:helmut.edgar100@gmail.com]
> > > *Sent:* Friday, December 16, 2016 1:24 AM
> > > *To:* Hu, Xuekun
> > > *Cc:* Wiles, Keith; users@dpdk.org
> > > *Subject:* Re: [dpdk-users] Dpdk poor performance on virtual machine
> > >
> > >
> > >
> > > in fact the vm was created with 6G RAM, its kernel boot args are
> defined
> > > with 4 hugepages of 1G each, though when starting the vm i noted that
> > > anonhugepages increased.
> > >
> > > The relevant qemu process id is 6074, and the following sums the
> amount of
> > > allocated AnonHugePages:
> > > sudo grep -e AnonHugePages  /proc/6074/smaps | awk  '{ if($2>0) print
> $2}
> > > '|awk '{s+=$1} END {print s}'
> > >
> > > which results with 4360192
> > >
> > > so not all the memory is backed with transparent hugepages though it is
> > > more than the amount of hugepages the guest supposed to boot with.
> > >
> > > How can I be sure that the required 4G hugepages are really allocated?,
> > > and not, for example, only 2G out of the 4G are allocated (and the
> rest 2
> > > are mapping of the default 4K)?
> > >
> > >
> > >
> > > thanks
> > >
> > >
> > >
> > > On Thu, Dec 15, 2016 at 4:33 PM, Hu, Xuekun <xuekun.hu@intel.com>
> wrote:
> > >
> > > Are you sure the anonhugepages size was equal to the total VM's memory
> > > size?
> > > Sometimes, transparent huge page mechanism doesn't grantee the app is
> using
> > > the real huge pages.
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: users [mailto:users-bounces@dpdk.org] On Behalf Of edgar helmut
> > > Sent: Thursday, December 15, 2016 9:32 PM
> > > To: Wiles, Keith
> > > Cc: users@dpdk.org
> > > Subject: Re: [dpdk-users] Dpdk poor performance on virtual machine
> > >
> > > I have one single socket which is Intel(R) Xeon(R) CPU E5-2640 v4 @
> > > 2.40GHz.
> > >
> > > I just made two more steps:
> > > 1. setting iommu=pt for better usage of the igb_uio
> > > 2. using taskset and isolcpu so now it looks like the relevant dpdk
> cores
> > > use dedicated cores.
> > >
> > > It improved the performance though I still see significant difference
> > > between the vm and the host which I can't fully explain.
> > >
> > > any further idea?
> > >
> > > Regards,
> > > Edgar
> > >
> > >
> > > On Thu, Dec 15, 2016 at 2:54 PM, Wiles, Keith <keith.wiles@intel.com>
> > > wrote:
> > >
> > > >
> > > > > On Dec 15, 2016, at 1:20 AM, edgar helmut <
> helmut.edgar100@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hi.
> > > > > Some help is needed to understand performance issue on virtual
> machine.
> > > > >
> > > > > Running testpmd over the host functions well (testpmd forwards 10g
> > > > between
> > > > > two 82599 ports).
> > > > > However same application running on a virtual machine over same
> host
> > > > > results with huge degradation in performance.
> > > > > The testpmd then is not even able to read 100mbps from nic without
> > > drops,
> > > > > and from a profile i made it looks like a dpdk application runs
> more
> > > than
> > > > > 10 times slower than over host…
> > > >
> > > > Not sure I understand the overall setup, but did you make sure the
> > > NIC/PCI
> > > > bus is on the same socket as the VM. If you have multiple sockets on
> your
> > > > platform. If you have to access the NIC across the QPI it could
> explain
> > > > some of the performance drop. Not sure that much drop is this
> problem.
> > > >
> > > > >
> > > > > Setup is ubuntu 16.04 for host and ubuntu 14.04 for guest.
> > > > > Qemu is 2.3.0 (though I tried with a newer as well).
> > > > > NICs are connected to guest using pci passthrough, and guest's cpu
> is
> > > set
> > > > > as passthrough (same as host).
> > > > > On guest start the host allocates transparent hugepages
> (AnonHugePages)
> > > > so
> > > > > i assume the guest memory is backed with real hugepages on the
> host.
> > > > > I tried binding with igb_uio and with uio_pci_generic but both
> results
> > > > with
> > > > > same performance.
> > > > >
> > > > > Due to the performance difference i guess i miss something.
> > > > >
> > > > > Please advise what may i miss here?
> > > > > Is this a native penalty of qemu??
> > > > >
> > > > > Thanks
> > > > > Edgar
>
> Did you setup KVM host to run guest  in huge pages?
>
> https://access.redhat.com/solutions/36741
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-01-04  6:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CABc_bMBYNa7jdhqu2SNZkwtoJskCwQWGOZrB71RYzBZx5-OTfw@mail.gmail.com>
     [not found] ` <CABc_bMBMxhbkGSg82tw7CKLw7AiccEESLAJaHGA6PAaYAPCmTg@mail.gmail.com>
     [not found]   ` <CABc_bMB2vnXEpacfeiWiC5X-x8iuB7jO6fR67n5jj67Pspcf2g@mail.gmail.com>
     [not found]     ` <CABc_bMB7C2zRjEkFbtAyk7buaP=QWwxCSwz=60AVJT_rjoPZTQ@mail.gmail.com>
     [not found]       ` <CABc_bMDgU62=i5uE52d12VGd8wUzqHq9siOjYKSh8v811TYpug@mail.gmail.com>
     [not found]         ` <CABc_bMCn9LivEn_PJCh+dQy7pPypLdmE+eZFQZtzphsHEgr08g@mail.gmail.com>
     [not found]           ` <CABc_bMADQ3PaZz5ovLmGu=3_w2pTO3c6BvNPM6-BUzJ5BXjxLQ@mail.gmail.com>
     [not found]             ` <CABc_bMBJ+NZLuo___qw_cBB=z=5nzgVx6rMd15nMiSYf_q3WoQ@mail.gmail.com>
     [not found]               ` <CABc_bMC6qP3K-kqVQORc9XKfbcXX63UfN=AdZ+sksHUN+Bx5kw@mail.gmail.com>
     [not found]                 ` <CABc_bMDowcZKMrKc8omf-JqpUW=uPn-fq8sfLYk=AktbH9-aNw@mail.gmail.com>
     [not found]                   ` <CABc_bMACisgaHBZedG05ZzJ3wzmudgBeTHdRr93M3-QOGKDKNw@mail.gmail.com>
     [not found]                     ` <CABc_bMCTRDYy+9ZO6py+KupRStR=Rc4md+J0NhPcyTqaZhKxTA@mail.gmail.com>
     [not found]                       ` <CABc_bMAu8BkZXzBSZLWNs=R6AJgMAw9WrTki=cEzMzjC7Z8LAQ@mail.gmail.com>
     [not found]                         ` <CABc_bMDtzYE4NvhAT8nqi3qZbzV4WauzJLW-tcY_Wi5i88F7yQ@mail.gmail.com>
     [not found]                           ` <CABc_bMAr0EjgH4f7UT-oi353FBqYwxt-UgfP6candcP=jkuyLg@mail.gmail.com>
     [not found]                             ` <CABc_bMCyJ4o94T=VRWPjfyhXP1T3uDmeYVsu+0OrXi1AqUkLrg@mail.gmail.com>
     [not found]                               ` <CABc_bMBK=AQ31=mQX0f7HMXs6REqj_bkqQ9KPD96vbtoKKHWCw@mail.gmail.com>
     [not found]                                 ` <CABc_bMA8z6XBEtnxwPE_LDpngx_DYRathn17KqFxNx2V5DFbng@mail.gmail.com>
     [not found]                                   ` <CABc_bMBwFAL5rc5goMP9pt+2z=TOM=VNfp76AznZ3jend9aS_Q@mail.gmail.com>
     [not found]                                     ` <CABc_bMBr1c_Evd5zxB89SinNFVMUGHFTBDoLusVHdMySBCyaBA@mail.gmail.com>
     [not found]                                       ` <CABc_bMChD5hpWmj1zhvZ5tocsxDCFH0QdE34TKWXsdo_danH-Q@mail.gmail.com>
     [not found]                                         ` <CABc_bMCqu-4V4gn=JPqO491BF6Cnjj=4SaAey9qyTQcha134yw@mail.gmail.com>
2016-12-15  7:20                                           ` [dpdk-users] Dpdk poor performance on virtual machine edgar helmut
2016-12-15 12:54                                             ` Wiles, Keith
2016-12-15 13:32                                               ` edgar helmut
2016-12-15 14:33                                                 ` Hu, Xuekun
2016-12-15 17:17                                                   ` Stephen Hemminger
2016-12-15 17:29                                                     ` edgar helmut
2016-12-15 19:14                                                       ` Stephen Hemminger
2016-12-15 19:29                                                         ` Jes Nielsen
2016-12-15 17:24                                                   ` edgar helmut
2016-12-16  1:14                                                     ` Hu, Xuekun
2016-12-17 12:56                                                       ` edgar helmut
2016-12-23 19:22                                                         ` edgar helmut
2016-12-24  7:06                                                           ` Hu, Xuekun
2016-12-24  8:06                                                             ` edgar helmut
2016-12-24 15:52                                                               ` edgar helmut
2016-12-26  0:52                                                                 ` Hu, Xuekun
2016-12-27 15:52                                                                   ` edgar helmut
2016-12-27 15:59                                                                     ` edgar helmut
2016-12-27 18:52                                                                       ` Stephen Hemminger
2016-12-28  8:09                                                                         ` edgar helmut
2017-01-04  6:44                                                                           ` edgar helmut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).