* [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest @ 2017-08-25 16:02 David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id David Hunt ` (11 more replies) 0 siblings, 12 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev This patchset adds the facility for a guest VM to send a policy down to the host that will allow the host to scale up/down cpu frequencies depending on the policy criteria independently of the DPDK app running in the guest. This differs from the previous vm_power implementation where individual scale up/down requests were send from the guest to the host via virtio-serial. It's a modification of the vm_power_manager app that runs in the host, and the guest_vm_power_app example app that runs in the guest. This allows the guest to send down a policy to the host via virtio-serial, which then allows the host to scale up/down based on the criteria in the policy, resulting in quicker scale up/down than individual requests coming from the guest. It also means that the DPDK application running in the guest does not need to be modified in any way, it is unaware that it's cores are being scaled up/down, reducing the effort in implementing a power-aware infrastructure. The usage model is as follows: 1. Set up the VF's and assign to the guest in the usual way. 2. run vm_power_manager on the host, creating a channel to the guest. 3. Start the guest_vm_power_mgr app on the guest, which establishes a virtio-serial channel to the host. 4. Send down the profile for the guest using the "send_profile now" command. There is an example profile hard-coded into guest_vm_power_mgr. 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. 6. Send traffic into the VFs at varying traffic rates. Observe the frequency change on the host (turbostat -i 1) The sequence of code changes are as follows: Firstly, two new API calls are added to the ethdev layer 1. One to convert a VF id to a PF id. In the patchset this id is a MAC address. This is needed so that the host can map the VFs in the profile to PF so in can monitor the traffic on the relevant PF at the host level. 2. The other function is to read the low-level traffic throughput on the NIC. Currently this API reads a NIC register for speed, but we are looking at using a more generic way to get these stats, suggestions welcome. Next we make an addition to librte_power that adds an extra command to allow the passing of a policy structure from the guest to the host. This struct contains information like busy/quiet hour, packet throughput thresholds, etc. The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to physical CPU (pcpu) IDs so that the host can scale up/down the cores used in the guest. The remaining patches are functionality to process the policy, and take action when the relevant trigger occurs to cause a frequency change. [01/10] net/i40e: add API to convert VF Id to PF Id [02/10] net/i40e: add API to get received packet count [03/10] lib/librte_power: add extra msg type for policies [04/10] examples/vm_power_mgr: add vcpu to pcpu mapping [05/10] examples/vm_power_mgr: add scale to medium freq fn [06/10] examples/vm_power_mgr: add policy to channels [07/10] examples/vm_power_mgr: add port initialisation [08/10] examples/guest_cli: add send policy to host [09/10] examples/vm_power_mgr: set MAC address of VF [10/10] net/i40e: set register for no drop ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-09-22 9:56 ` Thomas Monjalon 2017-09-25 2:43 ` Wu, Jingjing 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet count David Hunt ` (10 subsequent siblings) 11 siblings, 2 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic, Rory Sexton Need a way to convert a vf id to a pf id on the host so as to query the pf for relevant statistics which are used for the frequency changes in the vm_power_manager app. Used when profiles are passed down from the guest to the host, allowing the host to map the vfs to pfs. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 27 +++++++++++++++++++++++++++ drivers/net/i40e/i40e_rxtx.h | 1 + lib/librte_ether/rte_ethdev.h | 11 +++++++++++ 4 files changed, 40 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 5f26e24..8fb67d8 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -445,6 +445,7 @@ static const struct rte_pci_id pci_id_i40e_map[] = { }; static const struct eth_dev_ops i40e_eth_dev_ops = { + .vfid_to_pfid = i40e_vf_mac_to_vsi, .dev_configure = i40e_dev_configure, .dev_start = i40e_dev_start, .dev_stop = i40e_dev_stop, diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index d42c23c..1379d5e 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -806,6 +806,33 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) return nb_rx; } +uint64_t +i40e_vf_mac_to_vsi(struct rte_eth_dev *dev, uint64_t vfid) { + struct ether_addr *vf_mac_addr = (struct ether_addr *)&vfid; + struct ether_addr *mac; + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); + int vsi_id = 0, i, x; + struct i40e_pf_vf *vf; + uint16_t vf_num = pf->vf_num; + + for (x = 0; x < vf_num; x++) { + int mac_addr_matches = 1; + vf = &pf->vfs[x]; + mac = &vf->mac_addr; + + for (i = 0; i < ETHER_ADDR_LEN; i++) { + if (mac->addr_bytes[i] != vf_mac_addr->addr_bytes[i]) + mac_addr_matches = 0; + } + if (mac_addr_matches) { + vsi_id = vf->vsi->vsi_id; + return vsi_id; + } + } + + return -1; +} + uint16_t i40e_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h index 20084d6..bc6d355 100644 --- a/drivers/net/i40e/i40e_rxtx.h +++ b/drivers/net/i40e/i40e_rxtx.h @@ -192,6 +192,7 @@ union i40e_tx_offload { }; }; +uint64_t i40e_vf_mac_to_vsi(struct rte_eth_dev *dev, uint64_t vfid); int i40e_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id); int i40e_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id); int i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id); diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 0adf327..fec7e92 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1411,6 +1411,8 @@ typedef int (*eth_l2_tunnel_offload_set_t) uint8_t en); /**< @internal enable/disable the l2 tunnel offload functions */ +typedef uint64_t (*vfid_to_pfid)(struct rte_eth_dev *dev, + uint64_t vfid); typedef int (*eth_filter_ctrl_t)(struct rte_eth_dev *dev, enum rte_filter_type filter_type, @@ -1429,6 +1431,7 @@ typedef int (*eth_get_dcb_info)(struct rte_eth_dev *dev, * @internal A structure containing the functions exported by an Ethernet driver. */ struct eth_dev_ops { + vfid_to_pfid vfid_to_pfid; /**< Convert vfid to pfid */ eth_dev_configure_t dev_configure; /**< Configure device. */ eth_dev_start_t dev_start; /**< Start device. */ eth_dev_stop_t dev_stop; /**< Stop device. */ @@ -2928,6 +2931,14 @@ static inline int rte_eth_tx_descriptor_status(uint8_t port_id, return (*dev->dev_ops->tx_descriptor_status)(txq, offset); } +static inline uint64_t +vfid_to_pfid_direct(uint8_t port_id, uint64_t vfid) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + uint64_t pfid = (*dev->dev_ops->vfid_to_pfid)(dev, vfid); + return pfid; +} + /** * Send a burst of output packets on a transmit queue of an Ethernet device. * -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id David Hunt @ 2017-09-22 9:56 ` Thomas Monjalon 2017-09-22 12:39 ` Hunt, David 2017-09-25 2:43 ` Wu, Jingjing 1 sibling, 1 reply; 105+ messages in thread From: Thomas Monjalon @ 2017-09-22 9:56 UTC (permalink / raw) To: David Hunt; +Cc: dev, Nemanja Marjanovic, Rory Sexton 25/08/2017 18:02, David Hunt: > > +static inline uint64_t > +vfid_to_pfid_direct(uint8_t port_id, uint64_t vfid) > +{ > + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; > + uint64_t pfid = (*dev->dev_ops->vfid_to_pfid)(dev, vfid); > + return pfid; > +} I would like to comment this API but there is no associated doxygen. If the application is aware of the VFs, it probably already knows how PF and VF are associated. Until now, the functions to control VF from PF are driver-specifics. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id 2017-09-22 9:56 ` Thomas Monjalon @ 2017-09-22 12:39 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-09-22 12:39 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, Nemanja Marjanovic, Rory Sexton, Macnamara, Chris Hi Thomas, On 22/9/2017 10:56 AM, Thomas Monjalon wrote: > 25/08/2017 18:02, David Hunt: >> +static inline uint64_t >> +vfid_to_pfid_direct(uint8_t port_id, uint64_t vfid) >> +{ >> + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; >> + uint64_t pfid = (*dev->dev_ops->vfid_to_pfid)(dev, vfid); >> + return pfid; >> +} > I would like to comment this API but there is no associated doxygen. Sure, we'll add Doxygen comments. > > If the application is aware of the VFs, it probably already knows > how PF and VF are associated. > > Until now, the functions to control VF from PF are driver-specifics. Working out the relationship between the PF and the VF has turned out to be quite a challenge. :) The application on the guest is aware of the VFs. The application on the host is aware of the PF and can access the VFs through the PF. However, the application on the host is not aware of how each VF on VM associates as which VF on the PF. I.E. the PF needs to know which index in its array of VFs the VF in use by the guest application is stored at. This is what this additional function is used for. It gives the PF the index of the VF in question in its array of VFs. We have researched alternative ways to determine this association but this is the only method that provides this functionality. Without this the PF does not know how each VF is associated with PF. We also realize that the mac addresses need to be in sync between the host and the guest for correct operation of this scheme. As mentioned in my previous mail, we are working on an updated patch set, targeting early next week. Regards, Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id David Hunt 2017-09-22 9:56 ` Thomas Monjalon @ 2017-09-25 2:43 ` Wu, Jingjing 2017-09-25 9:57 ` Hunt, David 1 sibling, 1 reply; 105+ messages in thread From: Wu, Jingjing @ 2017-09-25 2:43 UTC (permalink / raw) To: Hunt, David, dev; +Cc: Hunt, David, Marjanovic, Nemanja, Sexton, Rory > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt > Sent: Saturday, August 26, 2017 12:02 AM > To: dev@dpdk.org > Cc: Hunt, David <david.hunt@intel.com>; Marjanovic, Nemanja > <nemanja.marjanovic@intel.com>; Sexton, Rory <rory.sexton@intel.com> > Subject: [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id > > Need a way to convert a vf id to a pf id on the host so as to query the pf for > relevant statistics which are used for the frequency changes in the > vm_power_manager app. Used when profiles are passed down from the guest > to the host, allowing the host to map the vfs to pfs. > > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- > drivers/net/i40e/i40e_ethdev.c | 1 + > drivers/net/i40e/i40e_rxtx.c | 27 +++++++++++++++++++++++++++ > drivers/net/i40e/i40e_rxtx.h | 1 + > lib/librte_ether/rte_ethdev.h | 11 +++++++++++ > 4 files changed, 40 insertions(+) > > diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c > index 5f26e24..8fb67d8 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -445,6 +445,7 @@ static const struct rte_pci_id pci_id_i40e_map[] = { }; > > static const struct eth_dev_ops i40e_eth_dev_ops = { > + .vfid_to_pfid = i40e_vf_mac_to_vsi, > .dev_configure = i40e_dev_configure, > .dev_start = i40e_dev_start, > .dev_stop = i40e_dev_stop, > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index > d42c23c..1379d5e 100644 > --- a/drivers/net/i40e/i40e_rxtx.c > +++ b/drivers/net/i40e/i40e_rxtx.c > @@ -806,6 +806,33 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf > **rx_pkts, uint16_t nb_pkts) > return nb_rx; > } > > +uint64_t > +i40e_vf_mac_to_vsi(struct rte_eth_dev *dev, uint64_t vfid) { > + struct ether_addr *vf_mac_addr = (struct ether_addr *)&vfid; > + struct ether_addr *mac; > + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data- > >dev_private); > + int vsi_id = 0, i, x; > + struct i40e_pf_vf *vf; > + uint16_t vf_num = pf->vf_num; > + > + for (x = 0; x < vf_num; x++) { > + int mac_addr_matches = 1; > + vf = &pf->vfs[x]; > + mac = &vf->mac_addr; > + > + for (i = 0; i < ETHER_ADDR_LEN; i++) { > + if (mac->addr_bytes[i] != vf_mac_addr->addr_bytes[i]) > + mac_addr_matches = 0; > + } > + if (mac_addr_matches) { > + vsi_id = vf->vsi->vsi_id; > + return vsi_id; > + } vsi and vsi_id is not a common concept in API level. How about just return vf_id and rename the function like i40e_query_vf_id_by_mac? In i40e driver, we can get the vsi_id by vf_id. > + } > + > + return -1; It's an ops to API, you need to use error code but not -1. Thanks Jingjing ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id 2017-09-25 2:43 ` Wu, Jingjing @ 2017-09-25 9:57 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-09-25 9:57 UTC (permalink / raw) To: Wu, Jingjing, dev; +Cc: Marjanovic, Nemanja, Sexton, Rory On 25/9/2017 3:43 AM, Wu, Jingjing wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt >> Sent: Saturday, August 26, 2017 12:02 AM >> To: dev@dpdk.org >> Cc: Hunt, David <david.hunt@intel.com>; Marjanovic, Nemanja >> <nemanja.marjanovic@intel.com>; Sexton, Rory <rory.sexton@intel.com> >> Subject: [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id >> >> Need a way to convert a vf id to a pf id on the host so as to query the pf for >> relevant statistics which are used for the frequency changes in the >> vm_power_manager app. Used when profiles are passed down from the guest >> to the host, allowing the host to map the vfs to pfs. >> >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- >> drivers/net/i40e/i40e_ethdev.c | 1 + >> drivers/net/i40e/i40e_rxtx.c | 27 +++++++++++++++++++++++++++ >> drivers/net/i40e/i40e_rxtx.h | 1 + >> lib/librte_ether/rte_ethdev.h | 11 +++++++++++ >> 4 files changed, 40 insertions(+) >> >> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c >> index 5f26e24..8fb67d8 100644 >> --- a/drivers/net/i40e/i40e_ethdev.c >> +++ b/drivers/net/i40e/i40e_ethdev.c >> @@ -445,6 +445,7 @@ static const struct rte_pci_id pci_id_i40e_map[] = { }; >> >> static const struct eth_dev_ops i40e_eth_dev_ops = { >> + .vfid_to_pfid = i40e_vf_mac_to_vsi, >> .dev_configure = i40e_dev_configure, >> .dev_start = i40e_dev_start, >> .dev_stop = i40e_dev_stop, >> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index >> d42c23c..1379d5e 100644 >> --- a/drivers/net/i40e/i40e_rxtx.c >> +++ b/drivers/net/i40e/i40e_rxtx.c >> @@ -806,6 +806,33 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf >> **rx_pkts, uint16_t nb_pkts) >> return nb_rx; >> } >> >> +uint64_t >> +i40e_vf_mac_to_vsi(struct rte_eth_dev *dev, uint64_t vfid) { >> + struct ether_addr *vf_mac_addr = (struct ether_addr *)&vfid; >> + struct ether_addr *mac; >> + struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data- >>> dev_private); >> + int vsi_id = 0, i, x; >> + struct i40e_pf_vf *vf; >> + uint16_t vf_num = pf->vf_num; >> + >> + for (x = 0; x < vf_num; x++) { >> + int mac_addr_matches = 1; >> + vf = &pf->vfs[x]; >> + mac = &vf->mac_addr; >> + >> + for (i = 0; i < ETHER_ADDR_LEN; i++) { >> + if (mac->addr_bytes[i] != vf_mac_addr->addr_bytes[i]) >> + mac_addr_matches = 0; >> + } >> + if (mac_addr_matches) { >> + vsi_id = vf->vsi->vsi_id; >> + return vsi_id; >> + } > vsi and vsi_id is not a common concept in API level. Agreed. We're removing from the API level in next patch set version. > How about just return vf_id and rename the function like i40e_query_vf_id_by_mac? > In i40e driver, we can get the vsi_id by vf_id. The next revision will just return the vf_id, and we'll rename the function. >> + } >> + >> + return -1; > It's an ops to API, you need to use error code but not -1. Will fix. Thanks, Dave, ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet count 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-09-25 2:47 ` Wu, Jingjing 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 03/10] lib/librte_power: add extra msg type for policies David Hunt ` (9 subsequent siblings) 11 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 10 ++++++++++ drivers/net/i40e/i40e_rxtx.h | 1 + lib/librte_ether/rte_ethdev.h | 19 +++++++++++++++++++ 4 files changed, 31 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 8fb67d8..d9806fc 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -446,6 +446,7 @@ static const struct rte_pci_id pci_id_i40e_map[] = { static const struct eth_dev_ops i40e_eth_dev_ops = { .vfid_to_pfid = i40e_vf_mac_to_vsi, + .read_pf_stats = i40e_vsi_stats_read, .dev_configure = i40e_dev_configure, .dev_start = i40e_dev_start, .dev_stop = i40e_dev_stop, diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 1379d5e..b7b64d2 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -833,6 +833,16 @@ i40e_vf_mac_to_vsi(struct rte_eth_dev *dev, uint64_t vfid) { return -1; } +uint64_t +i40e_vsi_stats_read(struct rte_eth_dev *dev, uint8_t vsi_id) { + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); + + uint64_t glv_uprch = I40E_READ_REG(hw, + I40E_GLV_UPRCH(vsi_id)) & 0x0000FFFF; + uint64_t glv_uprcl = I40E_READ_REG(hw, I40E_GLV_UPRCL(vsi_id)); + return glv_uprcl + (glv_uprch << 32); +} + uint16_t i40e_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h index bc6d355..db19153 100644 --- a/drivers/net/i40e/i40e_rxtx.h +++ b/drivers/net/i40e/i40e_rxtx.h @@ -193,6 +193,7 @@ union i40e_tx_offload { }; uint64_t i40e_vf_mac_to_vsi(struct rte_eth_dev *dev, uint64_t vfid); +uint64_t i40e_vsi_stats_read(struct rte_eth_dev *dev, uint8_t vsi_id); int i40e_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id); int i40e_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id); int i40e_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id); diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index fec7e92..4917233 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1413,6 +1413,9 @@ typedef int (*eth_l2_tunnel_offload_set_t) typedef uint64_t (*vfid_to_pfid)(struct rte_eth_dev *dev, uint64_t vfid); +/**< @internal Ethernet device configuration. */ +typedef uint64_t (*read_pf_stats)(struct rte_eth_dev *dev, uint8_t pfid); + typedef int (*eth_filter_ctrl_t)(struct rte_eth_dev *dev, enum rte_filter_type filter_type, @@ -1432,6 +1435,7 @@ typedef int (*eth_get_dcb_info)(struct rte_eth_dev *dev, */ struct eth_dev_ops { vfid_to_pfid vfid_to_pfid; /**< Convert vfid to pfid */ + read_pf_stats read_pf_stats;/**<Read low-level pf stats .*/ eth_dev_configure_t dev_configure; /**< Configure device. */ eth_dev_start_t dev_start; /**< Start device. */ eth_dev_stop_t dev_stop; /**< Stop device. */ @@ -2939,6 +2943,21 @@ vfid_to_pfid_direct(uint8_t port_id, uint64_t vfid) return pfid; } +/* + * Reads the NIC occupancy if possible with device in use. + * @param port_id + * The port identifier of the Ethernet device. + * @return + * Nic occupany in bytes. + */ +static inline uint64_t +read_pf_stats_direct(uint8_t port_id, uint8_t pfid) +{ + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; + uint64_t pkt_count = (*dev->dev_ops->read_pf_stats)(dev, pfid); + return pkt_count; +} + /** * Send a burst of output packets on a transmit queue of an Ethernet device. * -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet count 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet count David Hunt @ 2017-09-25 2:47 ` Wu, Jingjing 2017-09-25 9:59 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: Wu, Jingjing @ 2017-09-25 2:47 UTC (permalink / raw) To: Hunt, David, dev; +Cc: Hunt, David, Marjanovic, Nemanja, Sexton, Rory > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt > Sent: Saturday, August 26, 2017 12:02 AM > To: dev@dpdk.org > Cc: Hunt, David <david.hunt@intel.com>; Marjanovic, Nemanja > <nemanja.marjanovic@intel.com>; Sexton, Rory <rory.sexton@intel.com> > Subject: [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet > count > > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- > drivers/net/i40e/i40e_ethdev.c | 1 + > drivers/net/i40e/i40e_rxtx.c | 10 ++++++++++ > drivers/net/i40e/i40e_rxtx.h | 1 + > lib/librte_ether/rte_ethdev.h | 19 +++++++++++++++++++ > 4 files changed, 31 insertions(+) > > diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c > index 8fb67d8..d9806fc 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -446,6 +446,7 @@ static const struct rte_pci_id pci_id_i40e_map[] = { > > static const struct eth_dev_ops i40e_eth_dev_ops = { > .vfid_to_pfid = i40e_vf_mac_to_vsi, > + .read_pf_stats = i40e_vsi_stats_read, > .dev_configure = i40e_dev_configure, > .dev_start = i40e_dev_start, > .dev_stop = i40e_dev_stop, > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index > 1379d5e..b7b64d2 100644 > --- a/drivers/net/i40e/i40e_rxtx.c > +++ b/drivers/net/i40e/i40e_rxtx.c > @@ -833,6 +833,16 @@ i40e_vf_mac_to_vsi(struct rte_eth_dev *dev, uint64_t > vfid) { > return -1; > } > > +uint64_t > +i40e_vsi_stats_read(struct rte_eth_dev *dev, uint8_t vsi_id) { > + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data- > >dev_private); > + > + uint64_t glv_uprch = I40E_READ_REG(hw, > + I40E_GLV_UPRCH(vsi_id)) & 0x0000FFFF; > + uint64_t glv_uprcl = I40E_READ_REG(hw, I40E_GLV_UPRCL(vsi_id)); > + return glv_uprcl + (glv_uprch << 32); > +} You can change the input to vf_id, and then get the vsi_id internally. Anyway, the counter registers are cleared when read. It will impact the Ops like stats_get/ stats_reset. We have func called i40e_update_vsi_stats which record the packets count. I think you can use it. Thanks Jingjing ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet count 2017-09-25 2:47 ` Wu, Jingjing @ 2017-09-25 9:59 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-09-25 9:59 UTC (permalink / raw) To: Wu, Jingjing, dev; +Cc: Marjanovic, Nemanja, Sexton, Rory On 25/9/2017 3:47 AM, Wu, Jingjing wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt >> Sent: Saturday, August 26, 2017 12:02 AM >> To: dev@dpdk.org >> Cc: Hunt, David <david.hunt@intel.com>; Marjanovic, Nemanja >> <nemanja.marjanovic@intel.com>; Sexton, Rory <rory.sexton@intel.com> >> Subject: [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet >> count >> >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- >> drivers/net/i40e/i40e_ethdev.c | 1 + >> drivers/net/i40e/i40e_rxtx.c | 10 ++++++++++ >> drivers/net/i40e/i40e_rxtx.h | 1 + >> lib/librte_ether/rte_ethdev.h | 19 +++++++++++++++++++ >> 4 files changed, 31 insertions(+) >> >> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c >> index 8fb67d8..d9806fc 100644 >> --- a/drivers/net/i40e/i40e_ethdev.c >> +++ b/drivers/net/i40e/i40e_ethdev.c >> @@ -446,6 +446,7 @@ static const struct rte_pci_id pci_id_i40e_map[] = { >> >> static const struct eth_dev_ops i40e_eth_dev_ops = { >> .vfid_to_pfid = i40e_vf_mac_to_vsi, >> + .read_pf_stats = i40e_vsi_stats_read, >> .dev_configure = i40e_dev_configure, >> .dev_start = i40e_dev_start, >> .dev_stop = i40e_dev_stop, >> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index >> 1379d5e..b7b64d2 100644 >> --- a/drivers/net/i40e/i40e_rxtx.c >> +++ b/drivers/net/i40e/i40e_rxtx.c >> @@ -833,6 +833,16 @@ i40e_vf_mac_to_vsi(struct rte_eth_dev *dev, uint64_t >> vfid) { >> return -1; >> } >> >> +uint64_t >> +i40e_vsi_stats_read(struct rte_eth_dev *dev, uint8_t vsi_id) { >> + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data- >>> dev_private); >> + >> + uint64_t glv_uprch = I40E_READ_REG(hw, >> + I40E_GLV_UPRCH(vsi_id)) & 0x0000FFFF; >> + uint64_t glv_uprcl = I40E_READ_REG(hw, I40E_GLV_UPRCL(vsi_id)); >> + return glv_uprcl + (glv_uprch << 32); >> +} > You can change the input to vf_id, and then get the vsi_id internally. > Anyway, the counter registers are cleared when read. It will impact the > Ops like stats_get/ stats_reset. > > We have func called i40e_update_vsi_stats which record the packets count. I think you can use it. > > > Thanks > Jingjing We've changed to using the existing stats functions in the next revision of the patch. Simplifies things a bit. Thanks, Dave ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 03/10] lib/librte_power: add extra msg type for policies 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet count David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 04/10] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt ` (8 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- lib/librte_power/channel_commands.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index 383897b..79799b7 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -46,17 +46,50 @@ extern "C" { /* Valid Commands */ #define CPU_POWER 1 #define CPU_POWER_CONNECT 2 +#define PKT_POLICY 3 /* CPU Power Command Scaling */ #define CPU_POWER_SCALE_UP 1 #define CPU_POWER_SCALE_DOWN 2 #define CPU_POWER_SCALE_MAX 3 #define CPU_POWER_SCALE_MIN 4 +#define HOURS 24 +#define MAX_VFS 10 + +typedef enum {false, true} bool; + +struct t_boost_status { + bool tbEnabled; +}; + +struct timer_profile { + int busy_hours[HOURS]; + int quiet_hours[HOURS]; + int hours_to_use_traffic_profile[HOURS]; +}; + +enum workload {HIGH, MEDIUM, LOW}; +enum policy_to_use {TRAFFIC, TIME, WORKLOAD}; + +struct traffic { + uint32_t min_packet_thresh; + uint32_t avg_max_packet_thresh; + uint32_t max_max_packet_thresh; +}; struct channel_packet { uint64_t resource_id; /**< core_num, device */ uint32_t unit; /**< scale down/up/min/max */ uint32_t command; /**< Power, IO, etc */ + char vm_name[32]; + uint64_t vfid[MAX_VFS]; + int nb_mac_to_monitor; + uint8_t vcpu_to_control[5]; + struct traffic traffic_policy; + struct timer_profile timer_policy; + enum workload workload; + enum policy_to_use policy_to_use; + struct t_boost_status t_boost_status; }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 04/10] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (2 preceding siblings ...) 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 03/10] lib/librte_power: add extra msg type for policies David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 05/10] examples/vm_power_mgr: add scale to medium freq fn David Hunt ` (7 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/channel_manager.c | 55 +++++++++++++++++++++++++++++ examples/vm_power_manager/channel_manager.h | 16 ++++++++- 2 files changed, 70 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index e068ae2..2abba9c 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -574,6 +574,61 @@ set_channel_status(const char *vm_name, unsigned *channel_list, return num_channels_changed; } +void +get_all_vm(int *noVms, int *noVcpus) { + + virNodeInfo info; + virDomainPtr *domptr; + uint64_t mask; + int ret, i, numVcpus[MAX_VCPUS], cpu; + unsigned int ii, jj, n_vcpus; + const char *vm_name; + unsigned int flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; + unsigned int flag = VIR_DOMAIN_VCPU_CONFIG; + + + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); + if (virNodeGetInfo(global_vir_conn_ptr, &info)) + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); + + /*Returns number of pcpus*/ + global_n_host_cpus = (unsigned int)info.cpus; + + /*Returns number of active domains */ + ret = virConnectListAllDomains(global_vir_conn_ptr, &domptr, flags); + *noVms = ret; + if (ret < 0) + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); + + for (i = 0; i < ret; i++) { + + /*Get Domain Names*/ + vm_name = virDomainGetName(domptr[i]); + lvm_info[i].vm_name = vm_name; + /*Get Number of Vcpus*/ + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], flag); + /*Get Number of VCpus & VcpuPinInfo*/ + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], + numVcpus[i], global_cpumaps, + global_maplen, flag); + if ((int)n_vcpus > *noVcpus) + *noVcpus = n_vcpus; + for (ii = 0; ii < n_vcpus; ii++) { + mask = 0; + for (jj = 0; jj < global_n_host_cpus; jj++) { + if (VIR_CPU_USABLE(global_cpumaps, + global_maplen, ii, jj) > 0) { + mask |= 1ULL << jj; + } + } + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { + lvm_info[i].pcpus[ii] = cpu; + } + } + } +} + int get_info_vm(const char *vm_name, struct vm_info *info) { diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h index 47c3b9c..8dff76c 100644 --- a/examples/vm_power_manager/channel_manager.h +++ b/examples/vm_power_manager/channel_manager.h @@ -66,6 +66,16 @@ struct sockaddr_un _sockaddr_un; #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) #endif +#define MAX_VMS 4 +#define MAX_VCPUS 20 + + +struct libvirt_vm_info { + const char *vm_name; + unsigned int pcpus[22]; +}; + +struct libvirt_vm_info lvm_info[MAX_VMS]; /* Communication Channel Status */ enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, CHANNEL_MGR_CHANNEL_CONNECTED, @@ -318,7 +328,11 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, * - Negative on error. */ int get_info_vm(const char *vm_name, struct vm_info *info); - +/** + * Populates a table with all domains running and their physical cpu. + * All information is gathered through libvirt api. + */ +void get_all_vm(int *noVms, int *noVcpus); #ifdef __cplusplus } #endif -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 05/10] examples/vm_power_mgr: add scale to medium freq fn 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (3 preceding siblings ...) 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 04/10] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 06/10] examples/vm_power_mgr: add policy to channels David Hunt ` (6 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/power_manager.c | 15 +++++++++++++++ examples/vm_power_manager/power_manager.h | 13 +++++++++++++ 2 files changed, 28 insertions(+) diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c index 2644fce..7b0afda 100644 --- a/examples/vm_power_manager/power_manager.c +++ b/examples/vm_power_manager/power_manager.c @@ -250,3 +250,18 @@ power_manager_scale_core_max(unsigned core_num) POWER_SCALE_CORE(max, core_num, ret); return ret; } + +int +power_manager_scale_core_med(unsigned int core_num) +{ + int ret = 0; + + if (core_num >= POWER_MGR_MAX_CPUS) + return -1; + if (!(global_enabled_cpus & (1ULL << core_num))) + return -1; + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); + ret = rte_power_set_freq(core_num, 5); + rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); + return ret; +} diff --git a/examples/vm_power_manager/power_manager.h b/examples/vm_power_manager/power_manager.h index 1b45bab..6cdec7a 100644 --- a/examples/vm_power_manager/power_manager.h +++ b/examples/vm_power_manager/power_manager.h @@ -179,6 +179,19 @@ int power_manager_scale_core_max(unsigned core_num); */ uint32_t power_manager_get_current_frequency(unsigned core_num); +/** + * Scale to medium frequency for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to change frequency + * + * @return + * - 1 on success. + * - 0 if frequency not changed. + * - Negative on error. + */ +int power_manager_scale_core_med(unsigned int core_num); #ifdef __cplusplus } -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 06/10] examples/vm_power_mgr: add policy to channels 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (4 preceding siblings ...) 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 05/10] examples/vm_power_mgr: add scale to medium freq fn David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 07/10] examples/vm_power_mgr: add port initialisation David Hunt ` (5 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/channel_monitor.c | 302 +++++++++++++++++++++++++++- examples/vm_power_manager/channel_monitor.h | 18 ++ 2 files changed, 312 insertions(+), 8 deletions(-) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index e7f5cc4..94fa03c 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -41,13 +41,16 @@ #include <sys/types.h> #include <sys/epoll.h> #include <sys/queue.h> +#include <sys/time.h> #include <rte_log.h> #include <rte_memory.h> #include <rte_malloc.h> #include <rte_atomic.h> +#include <rte_cycles.h> +#include <rte_ethdev.h> - +#include <libvirt/libvirt.h> #include "channel_monitor.h" #include "channel_commands.h" #include "channel_manager.h" @@ -57,10 +60,15 @@ #define MAX_EVENTS 256 - +uint64_t vsi_pkt_count_prev[384]; +uint64_t rdtsc_prev[384]; +double time_period_s = 1; +double cpu_tsc_hz = 2200000000; static volatile unsigned run_loop = 1; static int global_event_fd; +static unsigned int policy_is_set; static struct epoll_event *global_events_list; +static struct policy policies[MAX_VMS]; void channel_monitor_exit(void) { @@ -68,6 +76,266 @@ void channel_monitor_exit(void) rte_free(global_events_list); } +static void +core_share(int pNo, int z, int x, int t) +{ + if (policies[pNo].core_share[z].pcpu == lvm_info[x].pcpus[t]) { + if (strcmp(policies[pNo].pkt.vm_name, + lvm_info[x].vm_name) != 0) { + policies[pNo].core_share[z].status = 1; + power_manager_scale_core_max( + policies[pNo].core_share[z].pcpu); + } + } +} + +static void +core_share_status(int pNo) { + + int noVms, noVcpus, z, x, t; + + get_all_vm(&noVms, &noVcpus); + + /* Reset Core Share Status. */ + for (z = 0; z < noVcpus; z++) + policies[pNo].core_share[z].status = 0; + + /* Foreach vcpu in a policy. */ + for (z = 0; z < noVcpus; z++) { + /* Foreach VM on the platform. */ + for (x = 0; x < noVms; x++) { + /* Foreach vcpu of VMs on platform. */ + for (t = 0; t < noVcpus; t++) + core_share(pNo, z, x, t); + } + } +} + +static void +get_pcpu_to_control(struct policy *pol) { + + /* Convert vcpu to pcpu. */ + struct vm_info info; + int pcpu, count; + uint64_t mask_u64b; + + printf("Looking for pcpu for %s\n", pol->pkt.vm_name); + get_info_vm(pol->pkt.vm_name, &info); + + for (count = 0; count < 2; count++) { + mask_u64b = info.pcpu_mask[pol->pkt.vcpu_to_control[count]]; + for (pcpu = 0; mask_u64b; mask_u64b &= ~(1ULL << pcpu++)) { + if ((mask_u64b >> pcpu) & 1) + pol->core_share[count].pcpu = pcpu; + } + } +} + +static int +get_pfid(struct policy *pol) { + + int i, x, ret = 0, nb_ports; + + nb_ports = rte_eth_dev_count(); + for (i = 0; i < pol->pkt.nb_mac_to_monitor; i++) { + + for (x = 0; x < nb_ports; x++) { + ret = vfid_to_pfid_direct(x, pol->pkt.vfid[i]); + if (ret != -1) { + pol->port[i] = x; + break; + } + } + if (ret == -1) { + RTE_LOG(ERR, CHANNEL_MONITOR, + "Error with Policy. MAC not found on " + "attached ports "); + pol->enabled = 0; + return ret; + } + pol->pfid[i] = ret; + } + return 1; +} + +static int +update_policy(struct channel_packet *pkt) { + + unsigned int updated = 0; + + for (int i = 0; i < MAX_VMS; i++) { + if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) { + updated = 1; + break; + } + core_share_status(i); + policies[i].enabled = 1; + updated = 1; + } + } + if (!updated) { + for (int i = 0; i < MAX_VMS; i++) { + if (policies[i].enabled == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) + break; + core_share_status(i); + policies[i].enabled = 1; + break; + } + } + } + return 0; +} + +static uint64_t +get_pkt_diff(struct policy *pol) { + + uint64_t vsi_pkt_count, + vsi_pkt_total = 0, + vsi_pkt_count_prev_total = 0; + double rdtsc_curr, rdtsc_diff, diff; + int x; + + for (x = 0; x < pol->pkt.nb_mac_to_monitor; x++) { + + /*Read vsi stats*/ + vsi_pkt_count = read_pf_stats_direct(x, pol->pfid[x]); + vsi_pkt_total += vsi_pkt_count; + + vsi_pkt_count_prev_total += vsi_pkt_count_prev[pol->pfid[x]]; + vsi_pkt_count_prev[pol->pfid[x]] = vsi_pkt_count; + } + + rdtsc_curr = rte_rdtsc_precise(); + rdtsc_diff = rdtsc_curr - rdtsc_prev[pol->pfid[x-1]]; + rdtsc_prev[pol->pfid[x-1]] = rdtsc_curr; + + diff = (vsi_pkt_total - vsi_pkt_count_prev_total) * + (cpu_tsc_hz / rdtsc_diff); + + return diff; +} + +static void +apply_traffic_profile(struct policy *pol) { + + int count; + uint64_t diff = 0; + + diff = get_pkt_diff(pol); + + printf("Applying traffic profile\n"); + + if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) { + for (count = 0; count < 2; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (diff >= (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < 2; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (diff < (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < 2; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_time_profile(struct policy *pol) { + + int count, x; + struct timeval tv; + struct tm *ptm; + char time_string[40]; + + /* Obtain the time of day, and convert it to a tm struct. */ + gettimeofday(&tv, NULL); + ptm = localtime(&tv.tv_sec); + /* Format the date and time, down to a single second. */ + strftime(time_string, sizeof(time_string), "%Y-%m-%d %H:%M:%S", ptm); + + for (x = 0; x < HOURS; x++) { + + if (ptm->tm_hour == pol->pkt.timer_policy.busy_hours[x]) { + for (count = 0; count < 2; count++) { + if (pol->core_share[count].status != 1) + printf("Scaling up core %d to max\n", + pol->core_share[count].pcpu); + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.quiet_hours[x]) { + for (count = 0; count < 2; count++) { + if (pol->core_share[count].status != 1) + printf("Scaling down core %d to min\n", + pol->core_share[count].pcpu); + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.hours_to_use_traffic_profile[x]) { + apply_traffic_profile(pol); + break; + } + } +} + +static void +apply_workload_profile(struct policy *pol) { + + int count; + + if (pol->pkt.workload == HIGH) { + for (count = 0; count < 2; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == MEDIUM) { + for (count = 0; count < 2; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == LOW) { + for (count = 0; count < 2; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_policy(struct policy *pol) { + + struct channel_packet *pkt = &pol->pkt; + + /*Check policy to use*/ + if (pkt->policy_to_use == TRAFFIC) + apply_traffic_profile(pol); + else if (pkt->policy_to_use == TIME) + apply_time_profile(pol); + else if (pkt->policy_to_use == WORKLOAD) + apply_workload_profile(pol); +} + + static int process_request(struct channel_packet *pkt, struct channel_info *chan_info) { @@ -128,6 +396,13 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) } } + + if (pkt->command == PKT_POLICY) { + printf("\nProcessing Policy request from Guest\n"); + update_policy(pkt); + policy_is_set = 1; + } + /* Return is not checked as channel status may have been set to DISABLED * from management thread */ @@ -197,9 +472,10 @@ run_channel_monitor(void) struct channel_info *chan_info = (struct channel_info *) global_events_list[i].data.ptr; if ((global_events_list[i].events & EPOLLERR) || - (global_events_list[i].events & EPOLLHUP)) { + (global_events_list[i].events & EPOLLHUP)) { RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed connection for " - "channel '%s'\n", chan_info->channel_path); + "channel '%s'\n", + chan_info->channel_path); remove_channel(&chan_info); continue; } @@ -211,14 +487,17 @@ run_channel_monitor(void) int buffer_len = sizeof(pkt); while (buffer_len > 0) { - n_bytes = read(chan_info->fd, buffer, buffer_len); + n_bytes = read(chan_info->fd, + buffer, buffer_len); if (n_bytes == buffer_len) break; if (n_bytes == -1) { err = errno; - RTE_LOG(DEBUG, CHANNEL_MONITOR, "Received error on " - "channel '%s' read: %s\n", - chan_info->channel_path, strerror(err)); + RTE_LOG(DEBUG, CHANNEL_MONITOR, + "Received error on " + "channel '%s' read: %s\n", + chan_info->channel_path, + strerror(err)); remove_channel(&chan_info); break; } @@ -229,5 +508,12 @@ run_channel_monitor(void) process_request(&pkt, chan_info); } } + rte_delay_us(time_period_s*1000000); + if (policy_is_set) { + for (int j = 0; j < MAX_VMS; j++) { + if (policies[j].enabled == 1) + apply_policy(&policies[j]); + } + } } } diff --git a/examples/vm_power_manager/channel_monitor.h b/examples/vm_power_manager/channel_monitor.h index c138607..eb1383f 100644 --- a/examples/vm_power_manager/channel_monitor.h +++ b/examples/vm_power_manager/channel_monitor.h @@ -35,6 +35,24 @@ #define CHANNEL_MONITOR_H_ #include "channel_manager.h" +#include "channel_commands.h" + +struct core_share { + unsigned int pcpu; + /* + * 1 CORE SHARE + * 0 NOT SHARED + */ + int status; +}; + +struct policy { + struct channel_packet pkt; + uint32_t pfid[MAX_VFS]; + uint32_t port[MAX_VFS]; + unsigned int enabled; + struct core_share core_share[2]; +}; #ifdef __cplusplus extern "C" { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 07/10] examples/vm_power_mgr: add port initialisation 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (5 preceding siblings ...) 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 06/10] examples/vm_power_mgr: add policy to channels David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 08/10] examples/guest_cli: add send policy to host David Hunt ` (4 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic We need to initialise the port's we're monitoring to be able to see the throughput. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 220 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index c33fcc9..698abca 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -49,6 +49,9 @@ #include <rte_log.h> #include <rte_per_lcore.h> #include <rte_lcore.h> +#include <rte_ethdev.h> +#include <getopt.h> +#include <rte_cycles.h> #include <rte_debug.h> #include "channel_manager.h" @@ -56,6 +59,192 @@ #include "power_manager.h" #include "vm_power_cli.h" +#define RX_RING_SIZE 512 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static uint32_t enabled_port_mask; +static volatile bool force_quit; + +/****************/ +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned int)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + + return 0; +} + +static int +parse_portmask(const char *portmask) +{ + char *end = NULL; + unsigned long pm; + + /* parse hexadecimal string */ + pm = strtoul(portmask, &end, 16); + if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + + if (pm == 0) + return -1; + + return pm; +} +/* Parse the argument given in the command line of the application */ +static int +parse_args(int argc, char **argv) +{ + int opt, ret; + char **argvopt; + int option_index; + char *prgname = argv[0]; + static struct option lgopts[] = { + { "mac-updating", no_argument, 0, 1}, + { "no-mac-updating", no_argument, 0, 0}, + {NULL, 0, 0, 0} + }; + argvopt = argv; + + while ((opt = getopt_long(argc, argvopt, "p:q:T:", + lgopts, &option_index)) != EOF) { + + switch (opt) { + /* portmask */ + case 'p': + enabled_port_mask = parse_portmask(optarg); + if (enabled_port_mask == 0) { + printf("invalid portmask\n"); + return -1; + } + break; + /* long options */ + case 0: + break; + + default: + return -1; + } + } + + if (optind >= 0) + argv[optind-1] = prgname; + + ret = optind-1; + optind = 0; /* reset getopt lib */ + return ret; +} + +static void +check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) +{ +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ + uint8_t portid, count, all_ports_up, print_flag = 0; + struct rte_eth_link link; + + printf("\nChecking link status"); + fflush(stdout); + for (count = 0; count <= MAX_CHECK_TIME; count++) { + if (force_quit) + return; + all_ports_up = 1; + for (portid = 0; portid < port_num; portid++) { + if (force_quit) + return; + if ((port_mask & (1 << portid)) == 0) + continue; + memset(&link, 0, sizeof(link)); + rte_eth_link_get_nowait(portid, &link); + /* print link status if flag set */ + if (print_flag == 1) { + if (link.link_status) + printf("Port %d Link Up - speed %u " + "Mbps - %s\n", (uint8_t)portid, + (unsigned int)link.link_speed, + (link.link_duplex == ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex\n")); + else + printf("Port %d Link Down\n", + (uint8_t)portid); + continue; + } + /* clear all_ports_up flag if any link down */ + if (link.link_status == ETH_LINK_DOWN) { + all_ports_up = 0; + break; + } + } + /* after finally printing all link status, get out */ + if (print_flag == 1) + break; + + if (all_ports_up == 0) { + printf("."); + fflush(stdout); + rte_delay_ms(CHECK_INTERVAL); + } + + /* set the print_flag if all ports up or timeout */ + if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) { + print_flag = 1; + printf("done\n"); + } + } +} static int run_monitor(__attribute__((unused)) void *arg) { @@ -82,6 +271,10 @@ main(int argc, char **argv) { int ret; unsigned lcore_id; + unsigned int nb_ports; + struct rte_mempool *mbuf_pool; + uint8_t portid; + ret = rte_eal_init(argc, argv); if (ret < 0) @@ -90,12 +283,39 @@ main(int argc, char **argv) signal(SIGINT, sig_handler); signal(SIGTERM, sig_handler); + argc -= ret; + argv += ret; + + /* parse application arguments (after the EAL ones) */ + ret = parse_args(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid arguments\n"); + + nb_ports = rte_eth_dev_count(); + + mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports, + MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize ports. */ + for (portid = 0; portid < nb_ports; portid++) { + if ((enabled_port_mask & (1 << portid)) == 0) + continue; + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + } + lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { RTE_LOG(ERR, EAL, "A minimum of two cores are required to run " "application\n"); return 0; } + + check_all_ports_link_status(nb_ports, enabled_port_mask); rte_eal_remote_launch(run_monitor, NULL, lcore_id); if (power_manager_init() < 0) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 08/10] examples/guest_cli: add send policy to host 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (6 preceding siblings ...) 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 07/10] examples/vm_power_mgr: add port initialisation David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 09/10] examples/vm_power_mgr: set MAC address of VF David Hunt ` (3 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic, Rory Sexton Here we're adding an example of setting up a policy, and allowing the vm_cli_guest app to send it to the host using the cli command "send_policy now" Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- .../guest_cli/vm_power_cli_guest.c | 94 ++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 7931135..bff2afc 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -45,6 +45,7 @@ #include <cmdline.h> #include <rte_log.h> #include <rte_lcore.h> +#include <rte_ethdev.h> #include <rte_power.h> @@ -135,8 +136,101 @@ cmdline_parse_inst_t cmd_set_cpu_freq_set = { }, }; +struct cmd_send_policy_result { + cmdline_fixed_string_t send_policy; + cmdline_fixed_string_t cmd; +}; + +union PFID { + struct ether_addr addr; + uint64_t pfid; +}; + +static inline int +send_policy(void) +{ + struct channel_packet pkt; + union PFID pfid; + int ret; + + /* Use port MAC address as the vfid */ + rte_eth_macaddr_get(0, &pfid.addr); + printf("Port %u MAC: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":" + "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n", + 1, + pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1], + pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3], + pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]); + pkt.vfid[0] = pfid.pfid; + + pkt.nb_mac_to_monitor = 1; + pkt.t_boost_status.tbEnabled = false; + + pkt.vcpu_to_control[0] = 0; + pkt.vcpu_to_control[1] = 1; + /* Dummy Population. */ + pkt.traffic_policy.min_packet_thresh = 96000; + pkt.traffic_policy.avg_max_packet_thresh = 1800000; + pkt.traffic_policy.max_max_packet_thresh = 2000000; + + pkt.timer_policy.busy_hours[0] = 3; + pkt.timer_policy.busy_hours[1] = 4; + pkt.timer_policy.busy_hours[2] = 5; + pkt.timer_policy.quiet_hours[0] = 11; + pkt.timer_policy.quiet_hours[1] = 12; + pkt.timer_policy.quiet_hours[2] = 13; + pkt.timer_policy.hours_to_use_traffic_profile[0] = 8; + pkt.timer_policy.hours_to_use_traffic_profile[1] = 10; + + pkt.workload = LOW; + pkt.policy_to_use = TRAFFIC; + pkt.command = PKT_POLICY; + strcpy(pkt.vm_name, "ubintu2"); + ret = guest_channel_send_msg(&pkt, 1); + if (ret == 0) + return 1; + RTE_LOG(DEBUG, POWER, "Error sending message: %s\n", + ret > 0 ? strerror(ret) : "channel not connected"); + return -1; +} + +static void +cmd_send_policy_parsed(void *parsed_result, struct cmdline *cl, + __attribute__((unused)) void *data) +{ + int ret = -1; + struct cmd_send_policy_result *res = parsed_result; + + if (!strcmp(res->cmd, "now")) { + printf("Sending Policy down now!\n"); + ret = send_policy(); + } + if (ret != 1) + cmdline_printf(cl, "Error sending message: %s\n", + strerror(ret)); +} + +cmdline_parse_token_string_t cmd_send_policy = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + send_policy, "send_policy"); +cmdline_parse_token_string_t cmd_send_policy_cmd_cmd = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + cmd, "now"); + +cmdline_parse_inst_t cmd_send_policy_set = { + .f = cmd_send_policy_parsed, + .data = NULL, + .help_str = "send_policy now", + .tokens = { + (void *)&cmd_send_policy, + (void *)&cmd_send_policy_cmd_cmd, + NULL, + }, +}; + cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_quit, + (cmdline_parse_inst_t *)&cmd_send_policy_set, (cmdline_parse_inst_t *)&cmd_set_cpu_freq_set, NULL, }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 09/10] examples/vm_power_mgr: set MAC address of VF 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (7 preceding siblings ...) 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 08/10] examples/guest_cli: add send policy to host David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop David Hunt ` (2 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt We need to set vf mac from the host, so that they will be in sync on the guest and the host. Otherwise, we'll have a random mac on the guest, and a 00:00:00:00:00:00 mac on the host. Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 58 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index 698abca..f307ec7 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -58,6 +58,15 @@ #include "channel_monitor.h" #include "power_manager.h" #include "vm_power_cli.h" +#ifdef RTE_LIBRTE_IXGBE_PMD +#include <rte_pmd_ixgbe.h> +#endif +#ifdef RTE_LIBRTE_I40E_PMD +#include <rte_pmd_i40e.h> +#endif +#ifdef RTE_LIBRTE_BNXT_PMD +#include <rte_pmd_bnxt.h> +#endif #define RX_RING_SIZE 512 #define TX_RING_SIZE 512 @@ -222,7 +231,7 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) (uint8_t)portid); continue; } - /* clear all_ports_up flag if any link down */ + /* clear all_ports_up flag if any link down */ if (link.link_status == ETH_LINK_DOWN) { all_ports_up = 0; break; @@ -301,11 +310,58 @@ main(int argc, char **argv) /* Initialize ports. */ for (portid = 0; portid < nb_ports; portid++) { + struct ether_addr eth; + int w, j; + int ret = -ENOTSUP; + if ((enabled_port_mask & (1 << portid)) == 0) continue; + + eth.addr_bytes[0] = 0xe0; + eth.addr_bytes[1] = 0xe0; + eth.addr_bytes[2] = 0xe0; + eth.addr_bytes[3] = 0xe0; + eth.addr_bytes[4] = portid + 0xf0; + if (port_init(portid, mbuf_pool) != 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", portid); + + for (w = 0; w < MAX_VFS; w++) { + eth.addr_bytes[5] = w + 0xf0; + +#ifdef RTE_LIBRTE_IXGBE_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_ixgbe_set_vf_mac_addr(portid, + w, ð); +#endif +#ifdef RTE_LIBRTE_I40E_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_i40e_set_vf_mac_addr(portid, + w, ð); +#endif +#ifdef RTE_LIBRTE_BNXT_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_bnxt_set_vf_mac_addr(portid, + w, ð); +#endif + + + ret = rte_pmd_i40e_set_vf_mac_addr(portid, w, + ð); + switch (ret) { + case 0: + printf("Port %d VF %d MAC: ", + portid, w); + for (j = 0; j < 6; j++) { + printf("%02x", eth.addr_bytes[j]); + if (j < 5) + printf(":"); + } + printf("\n"); + break; + } + } } lcore_id = rte_get_next_lcore(-1, 1, 0); -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (8 preceding siblings ...) 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 09/10] examples/vm_power_mgr: set MAC address of VF David Hunt @ 2017-08-25 16:02 ` David Hunt 2017-09-25 2:50 ` Wu, Jingjing 2017-08-29 13:03 ` [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest Ananyev, Konstantin 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt 11 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-08-25 16:02 UTC (permalink / raw) To: dev; +Cc: David Hunt, Nemanja Marjanovic, Rory Sexton See the XL710 controller datasheet for more information on this register Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- drivers/net/i40e/i40e_ethdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index d9806fc..24b713e 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -1156,7 +1156,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) * in firmware in the future. */ i40e_configure_registers(hw); - + I40E_WRITE_REG(hw, I40E_PRTDCB_TC2PFC, 0xff); /* Get hw capabilities */ ret = i40e_get_cap(hw); if (ret != I40E_SUCCESS) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop David Hunt @ 2017-09-25 2:50 ` Wu, Jingjing 2017-09-25 9:44 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: Wu, Jingjing @ 2017-09-25 2:50 UTC (permalink / raw) To: Hunt, David, dev; +Cc: Hunt, David, Marjanovic, Nemanja, Sexton, Rory > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt > Sent: Saturday, August 26, 2017 12:02 AM > To: dev@dpdk.org > Cc: Hunt, David <david.hunt@intel.com>; Marjanovic, Nemanja > <nemanja.marjanovic@intel.com>; Sexton, Rory <rory.sexton@intel.com> > Subject: [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop > > See the XL710 controller datasheet for more information on this register > > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- > drivers/net/i40e/i40e_ethdev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c > index d9806fc..24b713e 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -1156,7 +1156,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) > * in firmware in the future. > */ > i40e_configure_registers(hw); > - > + I40E_WRITE_REG(hw, I40E_PRTDCB_TC2PFC, 0xff); What is the relationship with VM power manager? And about no-drop setting, it is the responsibility of flow control, please check http://www.dpdk.org/dev/patchwork/patch/19449/ Thanks Jingjing ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop 2017-09-25 2:50 ` Wu, Jingjing @ 2017-09-25 9:44 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-09-25 9:44 UTC (permalink / raw) To: Wu, Jingjing, dev; +Cc: Marjanovic, Nemanja, Sexton, Rory On 25/9/2017 3:50 AM, Wu, Jingjing wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt >> Sent: Saturday, August 26, 2017 12:02 AM >> To: dev@dpdk.org >> Cc: Hunt, David <david.hunt@intel.com>; Marjanovic, Nemanja >> <nemanja.marjanovic@intel.com>; Sexton, Rory <rory.sexton@intel.com> >> Subject: [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop >> >> See the XL710 controller datasheet for more information on this register >> >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- >> drivers/net/i40e/i40e_ethdev.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c >> index d9806fc..24b713e 100644 >> --- a/drivers/net/i40e/i40e_ethdev.c >> +++ b/drivers/net/i40e/i40e_ethdev.c >> @@ -1156,7 +1156,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) >> * in firmware in the future. >> */ >> i40e_configure_registers(hw); >> - >> + I40E_WRITE_REG(hw, I40E_PRTDCB_TC2PFC, 0xff); > What is the relationship with VM power manager? > > And about no-drop setting, it is the responsibility of flow control, please check http://www.dpdk.org/dev/patchwork/patch/19449/ > > > Thanks > Jingjing Hi Jingjing, Yes, we've removed this now. It's left to flow control. Will be removed from next patch set. Rgds, Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (9 preceding siblings ...) 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop David Hunt @ 2017-08-29 13:03 ` Ananyev, Konstantin 2017-09-22 9:51 ` Thomas Monjalon 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt 11 siblings, 1 reply; 105+ messages in thread From: Ananyev, Konstantin @ 2017-08-29 13:03 UTC (permalink / raw) To: Hunt, David, dev Hi Dave, > This patchset adds the facility for a guest VM to send a policy down to > the host that will allow the host to scale up/down cpu frequencies > depending on the policy criteria independently of the DPDK app running in > the guest. This differs from the previous vm_power implementation where > individual scale up/down requests were send from the guest to the host via > virtio-serial. > > It's a modification of the vm_power_manager app that runs in the host, and > the guest_vm_power_app example app that runs in the guest. This allows the > guest to send down a policy to the host via virtio-serial, which then allows > the host to scale up/down based on the criteria in the policy, resulting in > quicker scale up/down than individual requests coming from the guest. > It also means that the DPDK application running in the guest does not need > to be modified in any way, it is unaware that it's cores are being scaled > up/down, reducing the effort in implementing a power-aware infrastructure. > > The usage model is as follows: > 1. Set up the VF's and assign to the guest in the usual way. > 2. run vm_power_manager on the host, creating a channel to the guest. > 3. Start the guest_vm_power_mgr app on the guest, which establishes > a virtio-serial channel to the host. > 4. Send down the profile for the guest using the "send_profile now" command. > There is an example profile hard-coded into guest_vm_power_mgr. > 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. > 6. Send traffic into the VFs at varying traffic rates. > Observe the frequency change on the host (turbostat -i 1) > > The sequence of code changes are as follows: > > Firstly, two new API calls are added to the ethdev layer > 1. One to convert a VF id to a PF id. In the patchset > this id is a MAC address. This is needed so that the host can map the VFs > in the profile to PF so in can monitor the traffic on the relevant PF at the > host level. > 2. The other function is to read the low-level traffic throughput on the NIC. > Currently this API reads a NIC register for speed, but we are looking at > using a more generic way to get these stats, suggestions welcome. Why do you need a server (host) to collect RX/TX statistics for VM? Such method seems to have a lot of limitations: - no clear method to identify to which VM that VF belongs. - rely on HW ability to provide such statistics for PF (limited HW support). - wouldn't work if PF is not controlled by the same DPDK app. Why not to make it client(VM) responsibility to collect that statistics and periodically send it to the server? Then server just will have to process that data and make decision. Konstantin > > Next we make an addition to librte_power that adds an extra command to allow > the passing of a policy structure from the guest to the host. This struct > contains information like busy/quiet hour, packet throughput thresholds, etc. > > The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to > physical CPU (pcpu) IDs so that the host can scale up/down the cores used > in the guest. > > The remaining patches are functionality to process the policy, and take action > when the relevant trigger occurs to cause a frequency change. > > [01/10] net/i40e: add API to convert VF Id to PF Id > [02/10] net/i40e: add API to get received packet count > [03/10] lib/librte_power: add extra msg type for policies > [04/10] examples/vm_power_mgr: add vcpu to pcpu mapping > [05/10] examples/vm_power_mgr: add scale to medium freq fn > [06/10] examples/vm_power_mgr: add policy to channels > [07/10] examples/vm_power_mgr: add port initialisation > [08/10] examples/guest_cli: add send policy to host > [09/10] examples/vm_power_mgr: set MAC address of VF > [10/10] net/i40e: set register for no drop ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest 2017-08-29 13:03 ` [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest Ananyev, Konstantin @ 2017-09-22 9:51 ` Thomas Monjalon 2017-09-22 10:28 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: Thomas Monjalon @ 2017-09-22 9:51 UTC (permalink / raw) To: Hunt, David; +Cc: dev, Ananyev, Konstantin 29/08/2017 15:03, Ananyev, Konstantin: > > Hi Dave, > > > This patchset adds the facility for a guest VM to send a policy down to > > the host that will allow the host to scale up/down cpu frequencies > > depending on the policy criteria independently of the DPDK app running in > > the guest. This differs from the previous vm_power implementation where > > individual scale up/down requests were send from the guest to the host via > > virtio-serial. > > > > It's a modification of the vm_power_manager app that runs in the host, and > > the guest_vm_power_app example app that runs in the guest. This allows the > > guest to send down a policy to the host via virtio-serial, which then allows > > the host to scale up/down based on the criteria in the policy, resulting in > > quicker scale up/down than individual requests coming from the guest. > > It also means that the DPDK application running in the guest does not need > > to be modified in any way, it is unaware that it's cores are being scaled > > up/down, reducing the effort in implementing a power-aware infrastructure. > > > > The usage model is as follows: > > 1. Set up the VF's and assign to the guest in the usual way. > > 2. run vm_power_manager on the host, creating a channel to the guest. > > 3. Start the guest_vm_power_mgr app on the guest, which establishes > > a virtio-serial channel to the host. > > 4. Send down the profile for the guest using the "send_profile now" command. > > There is an example profile hard-coded into guest_vm_power_mgr. > > 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. > > 6. Send traffic into the VFs at varying traffic rates. > > Observe the frequency change on the host (turbostat -i 1) > > > > The sequence of code changes are as follows: > > > > Firstly, two new API calls are added to the ethdev layer > > 1. One to convert a VF id to a PF id. In the patchset > > this id is a MAC address. This is needed so that the host can map the VFs > > in the profile to PF so in can monitor the traffic on the relevant PF at the > > host level. > > 2. The other function is to read the low-level traffic throughput on the NIC. > > Currently this API reads a NIC register for speed, but we are looking at > > using a more generic way to get these stats, suggestions welcome. > > Why do you need a server (host) to collect RX/TX statistics for VM? > Such method seems to have a lot of limitations: > - no clear method to identify to which VM that VF belongs. > - rely on HW ability to provide such statistics for PF > (limited HW support). > - wouldn't work if PF is not controlled by the same DPDK app. > Why not to make it client(VM) responsibility to collect that statistics and > periodically send it to the server? > Then server just will have to process that data and make decision. Any progress Dave? You have another series "turbo boost API". Does it depends on this one? ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest 2017-09-22 9:51 ` Thomas Monjalon @ 2017-09-22 10:28 ` Hunt, David 2017-09-22 13:03 ` Thomas Monjalon 0 siblings, 1 reply; 105+ messages in thread From: Hunt, David @ 2017-09-22 10:28 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, Ananyev, Konstantin On 22/9/2017 10:51 AM, Thomas Monjalon wrote: > 29/08/2017 15:03, Ananyev, Konstantin: >> Hi Dave, >> >>> This patchset adds the facility for a guest VM to send a policy down to >>> the host that will allow the host to scale up/down cpu frequencies >>> depending on the policy criteria independently of the DPDK app running in >>> the guest. This differs from the previous vm_power implementation where >>> individual scale up/down requests were send from the guest to the host via >>> virtio-serial. >>> >>> It's a modification of the vm_power_manager app that runs in the host, and >>> the guest_vm_power_app example app that runs in the guest. This allows the >>> guest to send down a policy to the host via virtio-serial, which then allows >>> the host to scale up/down based on the criteria in the policy, resulting in >>> quicker scale up/down than individual requests coming from the guest. >>> It also means that the DPDK application running in the guest does not need >>> to be modified in any way, it is unaware that it's cores are being scaled >>> up/down, reducing the effort in implementing a power-aware infrastructure. >>> >>> The usage model is as follows: >>> 1. Set up the VF's and assign to the guest in the usual way. >>> 2. run vm_power_manager on the host, creating a channel to the guest. >>> 3. Start the guest_vm_power_mgr app on the guest, which establishes >>> a virtio-serial channel to the host. >>> 4. Send down the profile for the guest using the "send_profile now" command. >>> There is an example profile hard-coded into guest_vm_power_mgr. >>> 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. >>> 6. Send traffic into the VFs at varying traffic rates. >>> Observe the frequency change on the host (turbostat -i 1) >>> >>> The sequence of code changes are as follows: >>> >>> Firstly, two new API calls are added to the ethdev layer >>> 1. One to convert a VF id to a PF id. In the patchset >>> this id is a MAC address. This is needed so that the host can map the VFs >>> in the profile to PF so in can monitor the traffic on the relevant PF at the >>> host level. >>> 2. The other function is to read the low-level traffic throughput on the NIC. >>> Currently this API reads a NIC register for speed, but we are looking at >>> using a more generic way to get these stats, suggestions welcome. >> Why do you need a server (host) to collect RX/TX statistics for VM? >> Such method seems to have a lot of limitations: >> - no clear method to identify to which VM that VF belongs. >> - rely on HW ability to provide such statistics for PF >> (limited HW support). >> - wouldn't work if PF is not controlled by the same DPDK app. >> Why not to make it client(VM) responsibility to collect that statistics and >> periodically send it to the server? >> Then server just will have to process that data and make decision. > Any progress Dave? > > You have another series "turbo boost API". Does it depends on this one? Hi Thomas, We're still working on updates based on Konstantin's feedback above, and hope to have a new patch set submitted to the mailing list early next week. This will remove the ethdev layer changes, and uses pre-existing stats-api. In relation to the Turbo patch, they are still independent, but when we have the next revision of the Policy patch submitted, I'll do a new version of the Turbo patch so that it can be applied on top of the policy patch. Regards, Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest 2017-09-22 10:28 ` Hunt, David @ 2017-09-22 13:03 ` Thomas Monjalon 2017-09-22 13:12 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: Thomas Monjalon @ 2017-09-22 13:03 UTC (permalink / raw) To: Hunt, David; +Cc: dev, Ananyev, Konstantin 22/09/2017 12:28, Hunt, David: > > On 22/9/2017 10:51 AM, Thomas Monjalon wrote: > > 29/08/2017 15:03, Ananyev, Konstantin: > >> Hi Dave, > >> > >>> This patchset adds the facility for a guest VM to send a policy down to > >>> the host that will allow the host to scale up/down cpu frequencies > >>> depending on the policy criteria independently of the DPDK app running in > >>> the guest. This differs from the previous vm_power implementation where > >>> individual scale up/down requests were send from the guest to the host via > >>> virtio-serial. > >>> > >>> It's a modification of the vm_power_manager app that runs in the host, and > >>> the guest_vm_power_app example app that runs in the guest. This allows the > >>> guest to send down a policy to the host via virtio-serial, which then allows > >>> the host to scale up/down based on the criteria in the policy, resulting in > >>> quicker scale up/down than individual requests coming from the guest. > >>> It also means that the DPDK application running in the guest does not need > >>> to be modified in any way, it is unaware that it's cores are being scaled > >>> up/down, reducing the effort in implementing a power-aware infrastructure. > >>> > >>> The usage model is as follows: > >>> 1. Set up the VF's and assign to the guest in the usual way. > >>> 2. run vm_power_manager on the host, creating a channel to the guest. > >>> 3. Start the guest_vm_power_mgr app on the guest, which establishes > >>> a virtio-serial channel to the host. > >>> 4. Send down the profile for the guest using the "send_profile now" command. > >>> There is an example profile hard-coded into guest_vm_power_mgr. > >>> 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. > >>> 6. Send traffic into the VFs at varying traffic rates. > >>> Observe the frequency change on the host (turbostat -i 1) > >>> > >>> The sequence of code changes are as follows: > >>> > >>> Firstly, two new API calls are added to the ethdev layer > >>> 1. One to convert a VF id to a PF id. In the patchset > >>> this id is a MAC address. This is needed so that the host can map the VFs > >>> in the profile to PF so in can monitor the traffic on the relevant PF at the > >>> host level. > >>> 2. The other function is to read the low-level traffic throughput on the NIC. > >>> Currently this API reads a NIC register for speed, but we are looking at > >>> using a more generic way to get these stats, suggestions welcome. > >> Why do you need a server (host) to collect RX/TX statistics for VM? > >> Such method seems to have a lot of limitations: > >> - no clear method to identify to which VM that VF belongs. > >> - rely on HW ability to provide such statistics for PF > >> (limited HW support). > >> - wouldn't work if PF is not controlled by the same DPDK app. > >> Why not to make it client(VM) responsibility to collect that statistics and > >> periodically send it to the server? > >> Then server just will have to process that data and make decision. > > Any progress Dave? > > > > You have another series "turbo boost API". Does it depends on this one? > > Hi Thomas, > > We're still working on updates based on Konstantin's feedback above, and > hope to have a new patch set submitted to the mailing list early next > week. This will remove the ethdev layer changes, and uses pre-existing > stats-api. > > In relation to the Turbo patch, they are still independent, but when we > have the next revision of the Policy patch submitted, I'll do a new > version of the Turbo patch so that it can be applied on top of the > policy patch. OK, thanks If the turbo patch is independent, I can push it now? ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest 2017-09-22 13:03 ` Thomas Monjalon @ 2017-09-22 13:12 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-09-22 13:12 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, Ananyev, Konstantin On 22/9/2017 2:03 PM, Thomas Monjalon wrote: > 22/09/2017 12:28, Hunt, David: >> On 22/9/2017 10:51 AM, Thomas Monjalon wrote: >>> 29/08/2017 15:03, Ananyev, Konstantin: >>>> Hi Dave, >>>> >>>>> This patchset adds the facility for a guest VM to send a policy down to >>>>> the host that will allow the host to scale up/down cpu frequencies >>>>> depending on the policy criteria independently of the DPDK app running in >>>>> the guest. This differs from the previous vm_power implementation where >>>>> individual scale up/down requests were send from the guest to the host via >>>>> virtio-serial. >>>>> >>>>> It's a modification of the vm_power_manager app that runs in the host, and >>>>> the guest_vm_power_app example app that runs in the guest. This allows the >>>>> guest to send down a policy to the host via virtio-serial, which then allows >>>>> the host to scale up/down based on the criteria in the policy, resulting in >>>>> quicker scale up/down than individual requests coming from the guest. >>>>> It also means that the DPDK application running in the guest does not need >>>>> to be modified in any way, it is unaware that it's cores are being scaled >>>>> up/down, reducing the effort in implementing a power-aware infrastructure. >>>>> >>>>> The usage model is as follows: >>>>> 1. Set up the VF's and assign to the guest in the usual way. >>>>> 2. run vm_power_manager on the host, creating a channel to the guest. >>>>> 3. Start the guest_vm_power_mgr app on the guest, which establishes >>>>> a virtio-serial channel to the host. >>>>> 4. Send down the profile for the guest using the "send_profile now" command. >>>>> There is an example profile hard-coded into guest_vm_power_mgr. >>>>> 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. >>>>> 6. Send traffic into the VFs at varying traffic rates. >>>>> Observe the frequency change on the host (turbostat -i 1) >>>>> >>>>> The sequence of code changes are as follows: >>>>> >>>>> Firstly, two new API calls are added to the ethdev layer >>>>> 1. One to convert a VF id to a PF id. In the patchset >>>>> this id is a MAC address. This is needed so that the host can map the VFs >>>>> in the profile to PF so in can monitor the traffic on the relevant PF at the >>>>> host level. >>>>> 2. The other function is to read the low-level traffic throughput on the NIC. >>>>> Currently this API reads a NIC register for speed, but we are looking at >>>>> using a more generic way to get these stats, suggestions welcome. >>>> Why do you need a server (host) to collect RX/TX statistics for VM? >>>> Such method seems to have a lot of limitations: >>>> - no clear method to identify to which VM that VF belongs. >>>> - rely on HW ability to provide such statistics for PF >>>> (limited HW support). >>>> - wouldn't work if PF is not controlled by the same DPDK app. >>>> Why not to make it client(VM) responsibility to collect that statistics and >>>> periodically send it to the server? >>>> Then server just will have to process that data and make decision. >>> Any progress Dave? >>> >>> You have another series "turbo boost API". Does it depends on this one? >> Hi Thomas, >> >> We're still working on updates based on Konstantin's feedback above, and >> hope to have a new patch set submitted to the mailing list early next >> week. This will remove the ethdev layer changes, and uses pre-existing >> stats-api. >> >> In relation to the Turbo patch, they are still independent, but when we >> have the next revision of the Policy patch submitted, I'll do a new >> version of the Turbo patch so that it can be applied on top of the >> policy patch. > OK, thanks > > If the turbo patch is independent, I can push it now? Yes, absolutely. And I can then ensure the next version of the policy patch-set applies on top of it. Regards, Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2] Policy Based Power Control for Guest 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt ` (10 preceding siblings ...) 2017-08-29 13:03 ` [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest Ananyev, Konstantin @ 2017-09-25 12:27 ` David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 1/8] net/i40e: add API to convert VF MAC to VSI index David Hunt ` (8 more replies) 11 siblings, 9 replies; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu Policy Based Power Control for Guest This patchset adds the facility for a guest VM to send a policy down to the host that will allow the host to scale up/down cpu frequencies depending on the policy criteria independently of the DPDK app running in the guest. This differs from the previous vm_power implementation where individual scale up/down requests were send from the guest to the host via virtio-serial. V2 patchet changes: * Removed API's in ethdev layer. * Now just a single new API in the i40e driver for mapping VF MAC to VF index. * Moved new function from rte_rxtx.c to rte_pmd_i40e.c * Removed function for reading i40e register, moved to using the standard stats API. * Renamed i40e function to rte_pmd_i40e_query_vfid_by_mac * Cleaned up policy generation code. It's a modification of the vm_power_manager app that runs in the host, and the guest_vm_power_app example app that runs in the guest. This allows the guest to send down a policy to the host via virtio-serial, which then allows the host to scale up/down based on the criteria in the policy, resulting in quicker scale up/down than individual requests coming from the guest. It also means that the DPDK application running in the guest does not need to be modified in any way, it is unaware that it's cores are being scaled up/down, reducing the effort in implementing a power-aware infrastructure. The usage model is as follows: 1. Set up the VF's and assign to the guest in the usual way. 2. run vm_power_manager on the host, creating a channel to the guest. 3. Start the guest_vm_power_mgr app on the guest, which establishes a virtio-serial channel to the host. 4. Send down the profile for the guest using the "send_profile now" command. There is an example profile hard-coded into guest_vm_power_mgr. 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. 6. Send traffic into the VFs at varying traffic rates. Observe the frequency change on the host (turbostat -i 1) The sequence of code changes are as follows: A new function has been aded to the i40e driver to allow mapping of a VF MAC to VF index. Next we make an addition to librte_power that adds an extra command to allow the passing of a policy structure from the guest to the host. This struct contains information like busy/quiet hour, packet throughput thresholds, etc. The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to physical CPU (pcpu) IDs so that the host can scale up/down the cores used in the guest. The remaining patches are functionality to process the policy, and take action when the relevant trigger occurs to cause a frequency change. [1/8] net/i40e: add API to convert VF MAC to VSI index [2/8] lib/librte_power: add extra msg type for policies [3/8] examples/vm_power_mgr: add vcpu to pcpu mapping [4/8] examples/vm_power_mgr: add scale to medium freq fn [5/8] examples/vm_power_mgr: add policy to channels [6/8] examples/vm_power_mgr: add port initialisation [7/8] examples/guest_cli: add send policy to host [8/8] examples/vm_power_mgr: set MAC address of VF ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2 1/8] net/i40e: add API to convert VF MAC to VSI index 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt @ 2017-09-25 12:27 ` David Hunt 2017-09-26 14:04 ` Wu, Jingjing 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 2/8] lib/librte_power: add extra msg type for policies David Hunt ` (7 subsequent siblings) 8 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Need a way to convert a vf id to a pf id on the host so as to query the pf for relevant statistics which are used for the frequency changes in the vm_power_manager app. Used when profiles are passed down from the guest to the host, allowing the host to map the vfs to pfs. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- drivers/net/i40e/rte_pmd_i40e.c | 35 +++++++++++++++++++++++++++++++++++ drivers/net/i40e/rte_pmd_i40e.h | 13 +++++++++++++ 2 files changed, 48 insertions(+) diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c index f12b7f4..85b540f 100644 --- a/drivers/net/i40e/rte_pmd_i40e.c +++ b/drivers/net/i40e/rte_pmd_i40e.c @@ -2115,3 +2115,38 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, return 0; } + +uint64_t +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, uint64_t vf_mac) { + struct rte_eth_dev *dev; + struct ether_addr *vf_mac_addr = (struct ether_addr *)&vf_mac; + struct ether_addr *mac; + struct i40e_pf *pf; + int i, x; + struct i40e_pf_vf *vf; + uint16_t vf_num; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); + dev = &rte_eth_devices[port]; + + if (!is_i40e_supported(dev)) + return -ENOTSUP; + + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); + vf_num = pf->vf_num; + + for (x = 0; x < vf_num; x++) { + int mac_addr_matches = 1; + vf = &pf->vfs[x]; + mac = &vf->mac_addr; + + for (i = 0; i < ETHER_ADDR_LEN; i++) { + if (mac->addr_bytes[i] != vf_mac_addr->addr_bytes[i]) + mac_addr_matches = 0; + } + if (mac_addr_matches) + return x; + } + + return -EINVAL; +} diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h index 356fa89..6984105 100644 --- a/drivers/net/i40e/rte_pmd_i40e.h +++ b/drivers/net/i40e/rte_pmd_i40e.h @@ -637,4 +637,17 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, uint8_t mask, uint32_t pkt_type); +/** + * Translate a VF mac address into VF index in array of pf->vfs + * + * @param port + * pointer to port identifier of the device + * @param vf_mac + * the mac address of the vf to determine index of + * @return + * -(-22 EINVAL) the vf mac does not exist on this port + * -(!-22) the index of vfid in pf->vfs + */ +uint64_t rte_pmd_i40e_query_vfid_by_mac(uint8_t port, uint64_t vf_mac); + #endif /* _PMD_I40E_H_ */ -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/8] net/i40e: add API to convert VF MAC to VSI index 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 1/8] net/i40e: add API to convert VF MAC to VSI index David Hunt @ 2017-09-26 14:04 ` Wu, Jingjing 0 siblings, 0 replies; 105+ messages in thread From: Wu, Jingjing @ 2017-09-26 14:04 UTC (permalink / raw) To: Hunt, David, dev; +Cc: Ananyev, Konstantin, Sexton, Rory, Marjanovic, Nemanja > -----Original Message----- > From: Hunt, David > Sent: Monday, September 25, 2017 8:27 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wu, Jingjing > <jingjing.wu@intel.com>; Sexton, Rory <rory.sexton@intel.com>; Marjanovic, Nemanja > <nemanja.marjanovic@intel.com>; Hunt, David <david.hunt@intel.com> > Subject: [PATCH v2 1/8] net/i40e: add API to convert VF MAC to VSI index > VSI index -> VF id. > From: "Sexton, Rory" <rory.sexton@intel.com> > > Need a way to convert a vf id to a pf id on the host so as to query the pf > for relevant statistics which are used for the frequency changes in the > vm_power_manager app. Used when profiles are passed down from the guest > to the host, allowing the host to map the vfs to pfs. > > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- > drivers/net/i40e/rte_pmd_i40e.c | 35 +++++++++++++++++++++++++++++++++++ > drivers/net/i40e/rte_pmd_i40e.h | 13 +++++++++++++ > 2 files changed, 48 insertions(+) > > diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c > index f12b7f4..85b540f 100644 > --- a/drivers/net/i40e/rte_pmd_i40e.c > +++ b/drivers/net/i40e/rte_pmd_i40e.c > @@ -2115,3 +2115,38 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, > > return 0; > } > + > +uint64_t > +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, uint64_t vf_mac) { > + struct rte_eth_dev *dev; > + struct ether_addr *vf_mac_addr = (struct ether_addr *)&vf_mac; > + struct ether_addr *mac; > + struct i40e_pf *pf; > + int i, x; > + struct i40e_pf_vf *vf; > + uint16_t vf_num; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); > + dev = &rte_eth_devices[port]; > + > + if (!is_i40e_supported(dev)) > + return -ENOTSUP; > + > + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); > + vf_num = pf->vf_num; > + > + for (x = 0; x < vf_num; x++) { > + int mac_addr_matches = 1; > + vf = &pf->vfs[x]; > + mac = &vf->mac_addr; > + > + for (i = 0; i < ETHER_ADDR_LEN; i++) { > + if (mac->addr_bytes[i] != vf_mac_addr->addr_bytes[i]) > + mac_addr_matches = 0; > + } You can use is_same_ether_addr instead. > + if (mac_addr_matches) > + return x; > + } > + > + return -EINVAL; > +} > diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h > index 356fa89..6984105 100644 > --- a/drivers/net/i40e/rte_pmd_i40e.h > +++ b/drivers/net/i40e/rte_pmd_i40e.h > @@ -637,4 +637,17 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, > uint8_t mask, > uint32_t pkt_type); > > +/** > + * Translate a VF mac address into VF index in array of pf->vfs VF id is a command concept, no need to say it is the index in pf->vfs > + * > + * @param port > + * pointer to port identifier of the device > + * @param vf_mac > + * the mac address of the vf to determine index of > + * @return > + * -(-22 EINVAL) the vf mac does not exist on this port > + * -(!-22) the index of vfid in pf->vfs Thanks Jingjing ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2 2/8] lib/librte_power: add extra msg type for policies 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 1/8] net/i40e: add API to convert VF MAC to VSI index David Hunt @ 2017-09-25 12:27 ` David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 3/8] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt ` (6 subsequent siblings) 8 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- lib/librte_power/channel_commands.h | 52 +++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index 484085b..1599706 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -46,6 +46,7 @@ extern "C" { /* Valid Commands */ #define CPU_POWER 1 #define CPU_POWER_CONNECT 2 +#define PKT_POLICY 3 /* CPU Power Command Scaling */ #define CPU_POWER_SCALE_UP 1 @@ -54,11 +55,62 @@ extern "C" { #define CPU_POWER_SCALE_MIN 4 #define CPU_POWER_ENABLE_TURBO 5 #define CPU_POWER_DISABLE_TURBO 6 +#define HOURS 24 + +#ifdef RTE_LIBRTE_I40E_PMD +#define MAX_VFS 10 +#endif + +#define MAX_VCPU_PER_VM 8 + +typedef enum {false, true} bool; + +struct t_boost_status { + bool tbEnabled; +}; + +struct timer_profile { + int busy_hours[HOURS]; + int quiet_hours[HOURS]; +#ifdef RTE_LIBRTE_I40E_PMD + int hours_to_use_traffic_profile[HOURS]; +#endif +}; + +enum workload {HIGH, MEDIUM, LOW}; +enum policy_to_use { +#ifdef RTE_LIBRTE_I40E_PMD + TRAFFIC, +#endif + TIME, + WORKLOAD +}; + +#ifdef RTE_LIBRTE_I40E_PMD +struct traffic { + uint32_t min_packet_thresh; + uint32_t avg_max_packet_thresh; + uint32_t max_max_packet_thresh; +}; +#endif struct channel_packet { uint64_t resource_id; /**< core_num, device */ uint32_t unit; /**< scale down/up/min/max */ uint32_t command; /**< Power, IO, etc */ + char vm_name[32]; + +#ifdef RTE_LIBRTE_I40E_PMD + uint64_t vfid[MAX_VFS]; + int nb_mac_to_monitor; + struct traffic traffic_policy; +#endif + uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; + uint8_t num_vcpu; + struct timer_profile timer_policy; + enum workload workload; + enum policy_to_use policy_to_use; + struct t_boost_status t_boost_status; }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2 3/8] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 1/8] net/i40e: add API to convert VF MAC to VSI index David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 2/8] lib/librte_power: add extra msg type for policies David Hunt @ 2017-09-25 12:27 ` David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 4/8] examples/vm_power_mgr: add scale to medium freq fn David Hunt ` (5 subsequent siblings) 8 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/channel_manager.c | 61 +++++++++++++++++++++++++++++ examples/vm_power_manager/channel_manager.h | 25 ++++++++++++ 2 files changed, 86 insertions(+) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index e068ae2..d5eda89 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -574,6 +574,67 @@ set_channel_status(const char *vm_name, unsigned *channel_list, return num_channels_changed; } +void +get_all_vm(int *num_vm, int *num_cpu) { + + virNodeInfo node_info; + virDomainPtr *domptr; + uint64_t mask; + int i, ii, numVcpus[MAX_VCPUS], cpu, n_vcpus; + unsigned int jj; + const char *vm_name; + unsigned int flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; + unsigned int flag = VIR_DOMAIN_VCPU_CONFIG; + + + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); + if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); + + /* Returns number of pcpus */ + global_n_host_cpus = (unsigned int)node_info.cpus; + + /* Returns number of active domains */ + *num_vm = virConnectListAllDomains(global_vir_conn_ptr, &domptr, flags); + if (*num_vm <= 0) + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); + + for (i = 0; i < *num_vm; i++) { + + /* Get Domain Names */ + vm_name = virDomainGetName(domptr[i]); + lvm_info[i].vm_name = vm_name; + + /* Get Number of Vcpus */ + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], flag); + + /* Get Number of VCpus & VcpuPinInfo */ + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], + numVcpus[i], global_cpumaps, + global_maplen, flag); + + if ((int)n_vcpus > 0) { + *num_cpu = n_vcpus; + lvm_info[i].num_cpus = n_vcpus; + } + + /* Save pcpu in use by libvirt VMs */ + for (ii = 0; ii < n_vcpus; ii++) { + mask = 0; + for (jj = 0; jj < global_n_host_cpus; jj++) { + if (VIR_CPU_USABLE(global_cpumaps, + global_maplen, ii, jj) > 0) { + mask |= 1ULL << jj; + } + } + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { + lvm_info[i].pcpus[ii] = cpu; + } + } + } +} + int get_info_vm(const char *vm_name, struct vm_info *info) { diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h index 47c3b9c..788c1e6 100644 --- a/examples/vm_power_manager/channel_manager.h +++ b/examples/vm_power_manager/channel_manager.h @@ -66,6 +66,17 @@ struct sockaddr_un _sockaddr_un; #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) #endif +#define MAX_VMS 4 +#define MAX_VCPUS 20 + + +struct libvirt_vm_info { + const char *vm_name; + unsigned int pcpus[MAX_VCPUS]; + uint8_t num_cpus; +}; + +struct libvirt_vm_info lvm_info[MAX_VMS]; /* Communication Channel Status */ enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, CHANNEL_MGR_CHANNEL_CONNECTED, @@ -319,6 +330,20 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, */ int get_info_vm(const char *vm_name, struct vm_info *info); +/** + * Populates a table with all domains running and their physical cpu. + * All information is gathered through libvirt api. + * + * @param noVms + * modified to store number of active VMs + * + * @param noVcpus + modified to store number of vcpus active + * + * @return + * void + */ +void get_all_vm(int *noVms, int *noVcpus); #ifdef __cplusplus } #endif -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2 4/8] examples/vm_power_mgr: add scale to medium freq fn 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt ` (2 preceding siblings ...) 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 3/8] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt @ 2017-09-25 12:27 ` David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 5/8] examples/vm_power_mgr: add policy to channels David Hunt ` (4 subsequent siblings) 8 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/power_manager.c | 15 +++++++++++++++ examples/vm_power_manager/power_manager.h | 13 +++++++++++++ 2 files changed, 28 insertions(+) diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c index 80705f9..c021c1d 100644 --- a/examples/vm_power_manager/power_manager.c +++ b/examples/vm_power_manager/power_manager.c @@ -286,3 +286,18 @@ power_manager_disable_turbo_core(unsigned int core_num) POWER_SCALE_CORE(disable_turbo, core_num, ret); return ret; } + +int +power_manager_scale_core_med(unsigned int core_num) +{ + int ret = 0; + + if (core_num >= POWER_MGR_MAX_CPUS) + return -1; + if (!(global_enabled_cpus & (1ULL << core_num))) + return -1; + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); + ret = rte_power_set_freq(core_num, 5); + rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); + return ret; +} diff --git a/examples/vm_power_manager/power_manager.h b/examples/vm_power_manager/power_manager.h index b74d09b..b52fb4c 100644 --- a/examples/vm_power_manager/power_manager.h +++ b/examples/vm_power_manager/power_manager.h @@ -231,6 +231,19 @@ int power_manager_disable_turbo_core(unsigned int core_num); */ uint32_t power_manager_get_current_frequency(unsigned core_num); +/** + * Scale to medium frequency for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to change frequency + * + * @return + * - 1 on success. + * - 0 if frequency not changed. + * - Negative on error. + */ +int power_manager_scale_core_med(unsigned int core_num); #ifdef __cplusplus } -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2 5/8] examples/vm_power_mgr: add policy to channels 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt ` (3 preceding siblings ...) 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 4/8] examples/vm_power_mgr: add scale to medium freq fn David Hunt @ 2017-09-25 12:27 ` David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 6/8] examples/vm_power_mgr: add port initialisation David Hunt ` (3 subsequent siblings) 8 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/channel_monitor.c | 331 +++++++++++++++++++++++++++- examples/vm_power_manager/channel_monitor.h | 20 ++ 2 files changed, 345 insertions(+), 6 deletions(-) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index ac40dac..3c8f72d 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -41,13 +41,20 @@ #include <sys/types.h> #include <sys/epoll.h> #include <sys/queue.h> +#include <sys/time.h> #include <rte_log.h> #include <rte_memory.h> #include <rte_malloc.h> #include <rte_atomic.h> +#include <rte_cycles.h> +#include <rte_ethdev.h> +#ifdef RTE_LIBRTE_I40E_PMD +#include <rte_pmd_i40e.h> +#endif +#include <libvirt/libvirt.h> #include "channel_monitor.h" #include "channel_commands.h" #include "channel_manager.h" @@ -57,10 +64,17 @@ #define MAX_EVENTS 256 +#ifdef RTE_LIBRTE_I40E_PMD +uint64_t vsi_pkt_count_prev[384]; +uint64_t rdtsc_prev[384]; +#endif +double time_period_s = 1; static volatile unsigned run_loop = 1; static int global_event_fd; +static unsigned int policy_is_set; static struct epoll_event *global_events_list; +static struct policy policies[MAX_VMS]; void channel_monitor_exit(void) { @@ -68,6 +82,293 @@ void channel_monitor_exit(void) rte_free(global_events_list); } +static void +core_share(int pNo, int z, int x, int t) +{ + if (policies[pNo].core_share[z].pcpu == lvm_info[x].pcpus[t]) { + if (strcmp(policies[pNo].pkt.vm_name, + lvm_info[x].vm_name) != 0) { + policies[pNo].core_share[z].status = 1; + power_manager_scale_core_max( + policies[pNo].core_share[z].pcpu); + } + } +} + +static void +core_share_status(int pNo) { + + int noVms, noVcpus, z, x, t; + + get_all_vm(&noVms, &noVcpus); + + /* Reset Core Share Status. */ + for (z = 0; z < noVcpus; z++) + policies[pNo].core_share[z].status = 0; + + /* Foreach vcpu in a policy. */ + for (z = 0; z < policies[pNo].pkt.num_vcpu; z++) { + /* Foreach VM on the platform. */ + for (x = 0; x < noVms; x++) { + /* Foreach vcpu of VMs on platform. */ + for (t = 0; t < lvm_info[x].num_cpus; t++) + core_share(pNo, z, x, t); + } + } +} + +static void +get_pcpu_to_control(struct policy *pol) { + + /* Convert vcpu to pcpu. */ + struct vm_info info; + int pcpu, count; + uint64_t mask_u64b; + + RTE_LOG(INFO, CHANNEL_MONITOR, "Looking for pcpu for %s\n", + pol->pkt.vm_name); + get_info_vm(pol->pkt.vm_name, &info); + + for (count = 0; count < pol->pkt.num_vcpu; count++) { + mask_u64b = info.pcpu_mask[pol->pkt.vcpu_to_control[count]]; + for (pcpu = 0; mask_u64b; mask_u64b &= ~(1ULL << pcpu++)) { + if ((mask_u64b >> pcpu) & 1) + pol->core_share[count].pcpu = pcpu; + } + } +} + +#ifdef RTE_LIBRTE_I40E_PMD +static int +get_pfid(struct policy *pol) { + + int i, x, ret = 0, nb_ports; + + nb_ports = rte_eth_dev_count(); + for (i = 0; i < pol->pkt.nb_mac_to_monitor; i++) { + + for (x = 0; x < nb_ports; x++) { + ret = rte_pmd_i40e_query_vfid_by_mac(x, + pol->pkt.vfid[i]); + if (ret != -EINVAL) { + pol->port[i] = x; + break; + } + } + if (ret == -EINVAL) { + RTE_LOG(INFO, CHANNEL_MONITOR, + "Error with Policy. MAC not found on " + "attached ports "); + pol->enabled = 0; + return ret; + } + pol->pfid[i] = ret; + } + return 1; +} +#endif + +static int +update_policy(struct channel_packet *pkt) { + + unsigned int updated = 0; + + for (int i = 0; i < MAX_VMS; i++) { + if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); +#ifdef RTE_LIBRTE_I40E_PMD + if (get_pfid(&policies[i]) == -1) { + updated = 1; + break; + } +#endif + core_share_status(i); + policies[i].enabled = 1; + updated = 1; + } + } + if (!updated) { + for (int i = 0; i < MAX_VMS; i++) { + if (policies[i].enabled == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); +#ifdef RTE_LIBRTE_I40E_PMD + if (get_pfid(&policies[i]) == -1) + break; +#endif + core_share_status(i); + policies[i].enabled = 1; + break; + } + } + } + return 0; +} + +#ifdef RTE_LIBRTE_I40E_PMD +static uint64_t +get_pkt_diff(struct policy *pol) { + + uint64_t vsi_pkt_count, + vsi_pkt_total = 0, + vsi_pkt_count_prev_total = 0; + double rdtsc_curr, rdtsc_diff, diff; + int x; + struct rte_eth_stats vf_stats; + + for (x = 0; x < pol->pkt.nb_mac_to_monitor; x++) { + + /*Read vsi stats*/ + if (rte_pmd_i40e_get_vf_stats(x, pol->pfid[x], &vf_stats) == 0) + vsi_pkt_count = vf_stats.ipackets; + else + vsi_pkt_count = -1; + + vsi_pkt_total += vsi_pkt_count; + + vsi_pkt_count_prev_total += vsi_pkt_count_prev[pol->pfid[x]]; + vsi_pkt_count_prev[pol->pfid[x]] = vsi_pkt_count; + } + + rdtsc_curr = rte_rdtsc_precise(); + rdtsc_diff = rdtsc_curr - rdtsc_prev[pol->pfid[x-1]]; + rdtsc_prev[pol->pfid[x-1]] = rdtsc_curr; + + diff = (vsi_pkt_total - vsi_pkt_count_prev_total) * + ((double)rte_get_tsc_hz() / rdtsc_diff); + + return diff; +} + +static void +apply_traffic_profile(struct policy *pol) { + + int count; + uint64_t diff = 0; + + diff = get_pkt_diff(pol); + + RTE_LOG(INFO, CHANNEL_MONITOR, "Applying traffic profile\n"); + + if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (diff >= (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (diff < (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} +#endif + +static void +apply_time_profile(struct policy *pol) { + + int count, x; + struct timeval tv; + struct tm *ptm; + char time_string[40]; + + /* Obtain the time of day, and convert it to a tm struct. */ + gettimeofday(&tv, NULL); + ptm = localtime(&tv.tv_sec); + /* Format the date and time, down to a single second. */ + strftime(time_string, sizeof(time_string), "%Y-%m-%d %H:%M:%S", ptm); + + for (x = 0; x < HOURS; x++) { + + if (ptm->tm_hour == pol->pkt.timer_policy.busy_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_max( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling up core %d to max\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.quiet_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_min( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling down core %d to min\n", + pol->core_share[count].pcpu); + } + } + break; +#ifdef RTE_LIBRTE_I40E_PMD + } else if (ptm->tm_hour == + pol->pkt.timer_policy.hours_to_use_traffic_profile[x]) { + apply_traffic_profile(pol); + break; + } +#else + } +#endif + } +} + +static void +apply_workload_profile(struct policy *pol) { + + int count; + + if (pol->pkt.workload == HIGH) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == MEDIUM) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == LOW) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_policy(struct policy *pol) { + + struct channel_packet *pkt = &pol->pkt; + + /*Check policy to use*/ +#ifdef RTE_LIBRTE_I40E_PMD + if (pkt->policy_to_use == TRAFFIC) + apply_traffic_profile(pol); + else if (pkt->policy_to_use == TIME) +#else + if (pkt->policy_to_use == TIME) +#endif + apply_time_profile(pol); + else if (pkt->policy_to_use == WORKLOAD) + apply_workload_profile(pol); +} + + static int process_request(struct channel_packet *pkt, struct channel_info *chan_info) { @@ -140,6 +441,13 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) } } + + if (pkt->command == PKT_POLICY) { + RTE_LOG(INFO, CHANNEL_MONITOR, "\nProcessing Policy request from Guest\n"); + update_policy(pkt); + policy_is_set = 1; + } + /* Return is not checked as channel status may have been set to DISABLED * from management thread */ @@ -209,9 +517,10 @@ run_channel_monitor(void) struct channel_info *chan_info = (struct channel_info *) global_events_list[i].data.ptr; if ((global_events_list[i].events & EPOLLERR) || - (global_events_list[i].events & EPOLLHUP)) { + (global_events_list[i].events & EPOLLHUP)) { RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed connection for " - "channel '%s'\n", chan_info->channel_path); + "channel '%s'\n", + chan_info->channel_path); remove_channel(&chan_info); continue; } @@ -223,14 +532,17 @@ run_channel_monitor(void) int buffer_len = sizeof(pkt); while (buffer_len > 0) { - n_bytes = read(chan_info->fd, buffer, buffer_len); + n_bytes = read(chan_info->fd, + buffer, buffer_len); if (n_bytes == buffer_len) break; if (n_bytes == -1) { err = errno; - RTE_LOG(DEBUG, CHANNEL_MONITOR, "Received error on " - "channel '%s' read: %s\n", - chan_info->channel_path, strerror(err)); + RTE_LOG(DEBUG, CHANNEL_MONITOR, + "Received error on " + "channel '%s' read: %s\n", + chan_info->channel_path, + strerror(err)); remove_channel(&chan_info); break; } @@ -241,5 +553,12 @@ run_channel_monitor(void) process_request(&pkt, chan_info); } } + rte_delay_us(time_period_s*1000000); + if (policy_is_set) { + for (int j = 0; j < MAX_VMS; j++) { + if (policies[j].enabled == 1) + apply_policy(&policies[j]); + } + } } } diff --git a/examples/vm_power_manager/channel_monitor.h b/examples/vm_power_manager/channel_monitor.h index c138607..11f5f75 100644 --- a/examples/vm_power_manager/channel_monitor.h +++ b/examples/vm_power_manager/channel_monitor.h @@ -35,6 +35,26 @@ #define CHANNEL_MONITOR_H_ #include "channel_manager.h" +#include "channel_commands.h" + +struct core_share { + unsigned int pcpu; + /* + * 1 CORE SHARE + * 0 NOT SHARED + */ + int status; +}; + +struct policy { + struct channel_packet pkt; +#ifdef RTE_LIBRTE_I40E_PMD + uint32_t pfid[MAX_VFS]; + uint32_t port[MAX_VFS]; +#endif + unsigned int enabled; + struct core_share core_share[MAX_VCPU_PER_VM]; +}; #ifdef __cplusplus extern "C" { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2 6/8] examples/vm_power_mgr: add port initialisation 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt ` (4 preceding siblings ...) 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 5/8] examples/vm_power_mgr: add policy to channels David Hunt @ 2017-09-25 12:27 ` David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 7/8] examples/guest_cli: add send policy to host David Hunt ` (2 subsequent siblings) 8 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic We need to initialise the port's we're monitoring to be able to see the throughput. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 220 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index c33fcc9..698abca 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -49,6 +49,9 @@ #include <rte_log.h> #include <rte_per_lcore.h> #include <rte_lcore.h> +#include <rte_ethdev.h> +#include <getopt.h> +#include <rte_cycles.h> #include <rte_debug.h> #include "channel_manager.h" @@ -56,6 +59,192 @@ #include "power_manager.h" #include "vm_power_cli.h" +#define RX_RING_SIZE 512 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static uint32_t enabled_port_mask; +static volatile bool force_quit; + +/****************/ +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned int)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + + return 0; +} + +static int +parse_portmask(const char *portmask) +{ + char *end = NULL; + unsigned long pm; + + /* parse hexadecimal string */ + pm = strtoul(portmask, &end, 16); + if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + + if (pm == 0) + return -1; + + return pm; +} +/* Parse the argument given in the command line of the application */ +static int +parse_args(int argc, char **argv) +{ + int opt, ret; + char **argvopt; + int option_index; + char *prgname = argv[0]; + static struct option lgopts[] = { + { "mac-updating", no_argument, 0, 1}, + { "no-mac-updating", no_argument, 0, 0}, + {NULL, 0, 0, 0} + }; + argvopt = argv; + + while ((opt = getopt_long(argc, argvopt, "p:q:T:", + lgopts, &option_index)) != EOF) { + + switch (opt) { + /* portmask */ + case 'p': + enabled_port_mask = parse_portmask(optarg); + if (enabled_port_mask == 0) { + printf("invalid portmask\n"); + return -1; + } + break; + /* long options */ + case 0: + break; + + default: + return -1; + } + } + + if (optind >= 0) + argv[optind-1] = prgname; + + ret = optind-1; + optind = 0; /* reset getopt lib */ + return ret; +} + +static void +check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) +{ +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ + uint8_t portid, count, all_ports_up, print_flag = 0; + struct rte_eth_link link; + + printf("\nChecking link status"); + fflush(stdout); + for (count = 0; count <= MAX_CHECK_TIME; count++) { + if (force_quit) + return; + all_ports_up = 1; + for (portid = 0; portid < port_num; portid++) { + if (force_quit) + return; + if ((port_mask & (1 << portid)) == 0) + continue; + memset(&link, 0, sizeof(link)); + rte_eth_link_get_nowait(portid, &link); + /* print link status if flag set */ + if (print_flag == 1) { + if (link.link_status) + printf("Port %d Link Up - speed %u " + "Mbps - %s\n", (uint8_t)portid, + (unsigned int)link.link_speed, + (link.link_duplex == ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex\n")); + else + printf("Port %d Link Down\n", + (uint8_t)portid); + continue; + } + /* clear all_ports_up flag if any link down */ + if (link.link_status == ETH_LINK_DOWN) { + all_ports_up = 0; + break; + } + } + /* after finally printing all link status, get out */ + if (print_flag == 1) + break; + + if (all_ports_up == 0) { + printf("."); + fflush(stdout); + rte_delay_ms(CHECK_INTERVAL); + } + + /* set the print_flag if all ports up or timeout */ + if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) { + print_flag = 1; + printf("done\n"); + } + } +} static int run_monitor(__attribute__((unused)) void *arg) { @@ -82,6 +271,10 @@ main(int argc, char **argv) { int ret; unsigned lcore_id; + unsigned int nb_ports; + struct rte_mempool *mbuf_pool; + uint8_t portid; + ret = rte_eal_init(argc, argv); if (ret < 0) @@ -90,12 +283,39 @@ main(int argc, char **argv) signal(SIGINT, sig_handler); signal(SIGTERM, sig_handler); + argc -= ret; + argv += ret; + + /* parse application arguments (after the EAL ones) */ + ret = parse_args(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid arguments\n"); + + nb_ports = rte_eth_dev_count(); + + mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports, + MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize ports. */ + for (portid = 0; portid < nb_ports; portid++) { + if ((enabled_port_mask & (1 << portid)) == 0) + continue; + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + } + lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { RTE_LOG(ERR, EAL, "A minimum of two cores are required to run " "application\n"); return 0; } + + check_all_ports_link_status(nb_ports, enabled_port_mask); rte_eal_remote_launch(run_monitor, NULL, lcore_id); if (power_manager_init() < 0) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2 7/8] examples/guest_cli: add send policy to host 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt ` (5 preceding siblings ...) 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 6/8] examples/vm_power_mgr: add port initialisation David Hunt @ 2017-09-25 12:27 ` David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 8/8] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt 8 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Here we're adding an example of setting up a policy, and allowing the vm_cli_guest app to send it to the host using the cli command "send_policy now" Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- .../guest_cli/vm_power_cli_guest.c | 104 +++++++++++++++++++++ 1 file changed, 104 insertions(+) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 4e982bd..33663f6 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -45,6 +45,7 @@ #include <cmdline.h> #include <rte_log.h> #include <rte_lcore.h> +#include <rte_ethdev.h> #include <rte_power.h> @@ -139,8 +140,111 @@ cmdline_parse_inst_t cmd_set_cpu_freq_set = { }, }; +struct cmd_send_policy_result { + cmdline_fixed_string_t send_policy; + cmdline_fixed_string_t cmd; +}; + +#ifdef RTE_LIBRTE_I40E_PMD +union PFID { + struct ether_addr addr; + uint64_t pfid; +}; +#endif + +static inline int +send_policy(void) +{ + struct channel_packet pkt; + int ret; + +#ifdef RTE_LIBRTE_I40E_PMD + union PFID pfid; + /* Use port MAC address as the vfid */ + rte_eth_macaddr_get(0, &pfid.addr); + printf("Port %u MAC: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":" + "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n", + 1, + pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1], + pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3], + pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]); + pkt.vfid[0] = pfid.pfid; + + pkt.nb_mac_to_monitor = 1; +#endif + pkt.t_boost_status.tbEnabled = false; + + pkt.vcpu_to_control[0] = 0; + pkt.vcpu_to_control[1] = 1; + pkt.num_vcpu = 2; + /* Dummy Population. */ +#ifdef RTE_LIBRTE_I40E_PMD + pkt.traffic_policy.min_packet_thresh = 96000; + pkt.traffic_policy.avg_max_packet_thresh = 1800000; + pkt.traffic_policy.max_max_packet_thresh = 2000000; +#endif + + pkt.timer_policy.busy_hours[0] = 3; + pkt.timer_policy.busy_hours[1] = 4; + pkt.timer_policy.busy_hours[2] = 5; + pkt.timer_policy.quiet_hours[0] = 11; + pkt.timer_policy.quiet_hours[1] = 12; + pkt.timer_policy.quiet_hours[2] = 13; + +#ifdef RTE_LIBRTE_I40E_PMD + pkt.timer_policy.hours_to_use_traffic_profile[0] = 8; + pkt.timer_policy.hours_to_use_traffic_profile[1] = 10; +#endif + + pkt.workload = LOW; + pkt.policy_to_use = TIME; + pkt.command = PKT_POLICY; + strcpy(pkt.vm_name, "ubintu2"); + ret = guest_channel_send_msg(&pkt, 1); + if (ret == 0) + return 1; + RTE_LOG(DEBUG, POWER, "Error sending message: %s\n", + ret > 0 ? strerror(ret) : "channel not connected"); + return -1; +} + +static void +cmd_send_policy_parsed(void *parsed_result, struct cmdline *cl, + __attribute__((unused)) void *data) +{ + int ret = -1; + struct cmd_send_policy_result *res = parsed_result; + + if (!strcmp(res->cmd, "now")) { + printf("Sending Policy down now!\n"); + ret = send_policy(); + } + if (ret != 1) + cmdline_printf(cl, "Error sending message: %s\n", + strerror(ret)); +} + +cmdline_parse_token_string_t cmd_send_policy = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + send_policy, "send_policy"); +cmdline_parse_token_string_t cmd_send_policy_cmd_cmd = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + cmd, "now"); + +cmdline_parse_inst_t cmd_send_policy_set = { + .f = cmd_send_policy_parsed, + .data = NULL, + .help_str = "send_policy now", + .tokens = { + (void *)&cmd_send_policy, + (void *)&cmd_send_policy_cmd_cmd, + NULL, + }, +}; + cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_quit, + (cmdline_parse_inst_t *)&cmd_send_policy_set, (cmdline_parse_inst_t *)&cmd_set_cpu_freq_set, NULL, }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v2 8/8] examples/vm_power_mgr: set MAC address of VF 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt ` (6 preceding siblings ...) 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 7/8] examples/guest_cli: add send policy to host David Hunt @ 2017-09-25 12:27 ` David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt 8 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-09-25 12:27 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, David Hunt We need to set vf mac from the host, so that they will be in sync on the guest and the host. Otherwise, we'll have a random mac on the guest, and a 00:00:00:00:00:00 mac on the host. Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 60 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index 698abca..18f5e7f 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -58,6 +58,15 @@ #include "channel_monitor.h" #include "power_manager.h" #include "vm_power_cli.h" +#ifdef RTE_LIBRTE_IXGBE_PMD +#include <rte_pmd_ixgbe.h> +#endif +#ifdef RTE_LIBRTE_I40E_PMD +#include <rte_pmd_i40e.h> +#endif +#ifdef RTE_LIBRTE_BNXT_PMD +#include <rte_pmd_bnxt.h> +#endif #define RX_RING_SIZE 512 #define TX_RING_SIZE 512 @@ -222,7 +231,7 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) (uint8_t)portid); continue; } - /* clear all_ports_up flag if any link down */ + /* clear all_ports_up flag if any link down */ if (link.link_status == ETH_LINK_DOWN) { all_ports_up = 0; break; @@ -273,7 +282,9 @@ main(int argc, char **argv) unsigned lcore_id; unsigned int nb_ports; struct rte_mempool *mbuf_pool; +#ifdef RTE_LIBRTE_I40E_PMD uint8_t portid; +#endif ret = rte_eal_init(argc, argv); @@ -300,13 +311,60 @@ main(int argc, char **argv) rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); /* Initialize ports. */ +#ifdef RTE_LIBRTE_I40E_PMD for (portid = 0; portid < nb_ports; portid++) { + struct ether_addr eth; + int w, j; + int ret = -ENOTSUP; + if ((enabled_port_mask & (1 << portid)) == 0) continue; + + eth.addr_bytes[0] = 0xe0; + eth.addr_bytes[1] = 0xe0; + eth.addr_bytes[2] = 0xe0; + eth.addr_bytes[3] = 0xe0; + eth.addr_bytes[4] = portid + 0xf0; + if (port_init(portid, mbuf_pool) != 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", portid); + + for (w = 0; w < MAX_VFS; w++) { + eth.addr_bytes[5] = w + 0xf0; + +#ifdef RTE_LIBRTE_IXGBE_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_ixgbe_set_vf_mac_addr(portid, + w, ð); +#endif +#ifdef RTE_LIBRTE_I40E_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_i40e_set_vf_mac_addr(portid, + w, ð); +#endif +#ifdef RTE_LIBRTE_BNXT_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_bnxt_set_vf_mac_addr(portid, + w, ð); +#endif + + + switch (ret) { + case 0: + printf("Port %d VF %d MAC: ", + portid, w); + for (j = 0; j < 6; j++) { + printf("%02x", eth.addr_bytes[j]); + if (j < 5) + printf(":"); + } + printf("\n"); + break; + } + } } +#endif lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt ` (7 preceding siblings ...) 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 8/8] examples/vm_power_mgr: set MAC address of VF David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt ` (9 more replies) 8 siblings, 10 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu This patchset adds the facility for a guest VM to send a policy down to the host that will allow the host to scale up/down cpu frequencies depending on the policy criteria independently of the DPDK app running in the guest. This differs from the previous vm_power implementation where individual scale up/down requests were send from the guest to the host via virtio-serial. V4 patchset changes: * None, replying into correct email thread. V3 was a reply to the turbo patch set, should have been inband policy power patchset. V3 patchset changes: * Changed to using is_same_ether_addr() instead of looping through the mac address bytes to compare them. * Tweaked some comments and working in the i40e patch after review. * Added a patch to the set to add new i40e function to map file, so as to allow shared library builds. The power library API needs a cleanup in next release, so will add API/ABI warning for this cleanup in a separate patch. V2 patchset changes: * Removed API's in ethdev layer. * Now just a single new API in the i40e driver for mapping VF MAC to VF index. * Moved new function from rte_rxtx.c to rte_pmd_i40e.c * Removed function for reading i40e register, moved to using the standard stats API. * Renamed i40e function to rte_pmd_i40e_query_vfid_by_mac * Cleaned up policy generation code. It's a modification of the vm_power_manager app that runs in the host, and the guest_vm_power_app example app that runs in the guest. This allows the guest to send down a policy to the host via virtio-serial, which then allows the host to scale up/down based on the criteria in the policy, resulting in quicker scale up/down than individual requests coming from the guest. It also means that the DPDK application running in the guest does not need to be modified in any way, it is unaware that it's cores are being scaled up/down, reducing the effort in implementing a power-aware infrastructure. The usage model is as follows: 1. Set up the VF's and assign to the guest in the usual way. 2. run vm_power_manager on the host, creating a channel to the guest. 3. Start the guest_vm_power_mgr app on the guest, which establishes a virtio-serial channel to the host. 4. Send down the profile for the guest using the "send_profile now" command. There is an example profile hard-coded into guest_vm_power_mgr. 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. 6. Send traffic into the VFs at varying traffic rates. Observe the frequency change on the host (turbostat -i 1) The sequence of code changes are as follows: A new function has been aded to the i40e driver to allow mapping of a VF MAC to VF index. Next we make an addition to librte_power that adds an extra command to allow the passing of a policy structure from the guest to the host. This struct contains information like busy/quiet hour, packet throughput thresholds, etc. The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to physical CPU (pcpu) IDs so that the host can scale up/down the cores used in the guest. The remaining patches are functionality to process the policy, and take action when the relevant trigger occurs to cause a frequency change. [1/9] net/i40e: add API to convert VF MAC to VF id [2/9] lib/librte_power: add extra msg type for policies [3/9] examples/vm_power_mgr: add vcpu to pcpu mapping [4/9] examples/vm_power_mgr: add scale to medium freq fn [5/9] examples/vm_power_mgr: add policy to channels [6/9] examples/vm_power_mgr: add port initialisation [7/9] power: add send channel msg function to map file [8/9] examples/guest_cli: add send policy to host [9/9] examples/vm_power_mgr: set MAC address of VF ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 15:26 ` santosh 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies David Hunt ` (8 subsequent siblings) 9 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Need a way to convert a vf id to a pf id on the host so as to query the pf for relevant statistics which are used for the frequency changes in the vm_power_manager app. Used when profiles are passed down from the guest to the host, allowing the host to map the vfs to pfs. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- drivers/net/i40e/rte_pmd_i40e.c | 31 +++++++++++++++++++++++++++++++ drivers/net/i40e/rte_pmd_i40e.h | 13 +++++++++++++ drivers/net/i40e/rte_pmd_i40e_version.map | 7 +++++++ 3 files changed, 51 insertions(+) diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c index f12b7f4..21efb2f 100644 --- a/drivers/net/i40e/rte_pmd_i40e.c +++ b/drivers/net/i40e/rte_pmd_i40e.c @@ -2115,3 +2115,34 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, return 0; } + +uint64_t +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, uint64_t vf_mac) +{ + struct rte_eth_dev *dev; + struct ether_addr *vf_mac_addr = (struct ether_addr *)&vf_mac; + struct ether_addr *mac; + struct i40e_pf *pf; + int vf_id; + struct i40e_pf_vf *vf; + uint16_t vf_num; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); + dev = &rte_eth_devices[port]; + + if (!is_i40e_supported(dev)) + return -ENOTSUP; + + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); + vf_num = pf->vf_num; + + for (vf_id = 0; vf_id < vf_num; vf_id++) { + vf = &pf->vfs[vf_id]; + mac = &vf->mac_addr; + + if (is_same_ether_addr(mac, vf_mac_addr)) + return vf_id; + } + + return -EINVAL; +} diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h index 356fa89..a7ae0f0 100644 --- a/drivers/net/i40e/rte_pmd_i40e.h +++ b/drivers/net/i40e/rte_pmd_i40e.h @@ -637,4 +637,17 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, uint8_t mask, uint32_t pkt_type); +/** + * On the PF, find VF index based on VF MAC address + * + * @param port + * pointer to port identifier of the device + * @param vf_mac + * the mac address of the vf to determine index of + * @return + * -(-22 EINVAL) the vf mac does not exist on this port + * -(!-22) the index of vfid in pf->vfs + */ +uint64_t rte_pmd_i40e_query_vfid_by_mac(uint8_t port, uint64_t vf_mac); + #endif /* _PMD_I40E_H_ */ diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map b/drivers/net/i40e/rte_pmd_i40e_version.map index 20cc980..d8b74bd 100644 --- a/drivers/net/i40e/rte_pmd_i40e_version.map +++ b/drivers/net/i40e/rte_pmd_i40e_version.map @@ -45,3 +45,10 @@ DPDK_17.08 { rte_pmd_i40e_get_ddp_info; } DPDK_17.05; + +DPDK_17.11 { + global: + + rte_pmd_i40e_query_vfid_by_mac; + +} DPDK_17.08; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt @ 2017-10-04 15:26 ` santosh 0 siblings, 0 replies; 105+ messages in thread From: santosh @ 2017-10-04 15:26 UTC (permalink / raw) To: David Hunt, dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic On Wednesday 04 October 2017 02:45 PM, David Hunt wrote: > From: "Sexton, Rory" <rory.sexton@intel.com> > > Need a way to convert a vf id to a pf id on the host so as to query the pf > for relevant statistics which are used for the frequency changes in the > vm_power_manager app. Used when profiles are passed down from the guest > to the host, allowing the host to map the vfs to pfs. > > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- > drivers/net/i40e/rte_pmd_i40e.c | 31 +++++++++++++++++++++++++++++++ > drivers/net/i40e/rte_pmd_i40e.h | 13 +++++++++++++ > drivers/net/i40e/rte_pmd_i40e_version.map | 7 +++++++ > 3 files changed, 51 insertions(+) > > diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c > index f12b7f4..21efb2f 100644 > --- a/drivers/net/i40e/rte_pmd_i40e.c > +++ b/drivers/net/i40e/rte_pmd_i40e.c > @@ -2115,3 +2115,34 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, > > return 0; > } > + > +uint64_t > +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, uint64_t vf_mac) > +{ > + struct rte_eth_dev *dev; > + struct ether_addr *vf_mac_addr = (struct ether_addr *)&vf_mac; > + struct ether_addr *mac; > + struct i40e_pf *pf; > + int vf_id; > + struct i40e_pf_vf *vf; > + uint16_t vf_num; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); > + dev = &rte_eth_devices[port]; > + > + if (!is_i40e_supported(dev)) > + return -ENOTSUP; > + > + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); > + vf_num = pf->vf_num; > + > + for (vf_id = 0; vf_id < vf_num; vf_id++) { > + vf = &pf->vfs[vf_id]; > + mac = &vf->mac_addr; > + > + if (is_same_ether_addr(mac, vf_mac_addr)) > + return vf_id; > + } > + > + return -EINVAL; > +} > diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h > index 356fa89..a7ae0f0 100644 > --- a/drivers/net/i40e/rte_pmd_i40e.h > +++ b/drivers/net/i40e/rte_pmd_i40e.h > @@ -637,4 +637,17 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, > uint8_t mask, > uint32_t pkt_type); > > +/** > + * On the PF, find VF index based on VF MAC address > + * > + * @param port > + * pointer to port identifier of the device > + * @param vf_mac > + * the mac address of the vf to determine index of > + * @return > + * -(-22 EINVAL) the vf mac does not exist on this port > + * -(!-22) the index of vfid in pf->vfs > + */ > +uint64_t rte_pmd_i40e_query_vfid_by_mac(uint8_t port, uint64_t vf_mac); > + On @return: - The index of vfid If successful. - -EINVAL: vf mac address does not exits for this port - -ENOTSUP: i40e not supported for this port. and return type should s/uint64_t/int since error case -ve value. Thanks. ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 15:36 ` santosh 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt ` (7 subsequent siblings) 9 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- lib/librte_power/channel_commands.h | 52 +++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index 484085b..1599706 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -46,6 +46,7 @@ extern "C" { /* Valid Commands */ #define CPU_POWER 1 #define CPU_POWER_CONNECT 2 +#define PKT_POLICY 3 /* CPU Power Command Scaling */ #define CPU_POWER_SCALE_UP 1 @@ -54,11 +55,62 @@ extern "C" { #define CPU_POWER_SCALE_MIN 4 #define CPU_POWER_ENABLE_TURBO 5 #define CPU_POWER_DISABLE_TURBO 6 +#define HOURS 24 + +#ifdef RTE_LIBRTE_I40E_PMD +#define MAX_VFS 10 +#endif + +#define MAX_VCPU_PER_VM 8 + +typedef enum {false, true} bool; + +struct t_boost_status { + bool tbEnabled; +}; + +struct timer_profile { + int busy_hours[HOURS]; + int quiet_hours[HOURS]; +#ifdef RTE_LIBRTE_I40E_PMD + int hours_to_use_traffic_profile[HOURS]; +#endif +}; + +enum workload {HIGH, MEDIUM, LOW}; +enum policy_to_use { +#ifdef RTE_LIBRTE_I40E_PMD + TRAFFIC, +#endif + TIME, + WORKLOAD +}; + +#ifdef RTE_LIBRTE_I40E_PMD +struct traffic { + uint32_t min_packet_thresh; + uint32_t avg_max_packet_thresh; + uint32_t max_max_packet_thresh; +}; +#endif struct channel_packet { uint64_t resource_id; /**< core_num, device */ uint32_t unit; /**< scale down/up/min/max */ uint32_t command; /**< Power, IO, etc */ + char vm_name[32]; + +#ifdef RTE_LIBRTE_I40E_PMD + uint64_t vfid[MAX_VFS]; + int nb_mac_to_monitor; + struct traffic traffic_policy; +#endif + uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; + uint8_t num_vcpu; + struct timer_profile timer_policy; + enum workload workload; + enum policy_to_use policy_to_use; + struct t_boost_status t_boost_status; }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies David Hunt @ 2017-10-04 15:36 ` santosh 2017-10-05 8:38 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: santosh @ 2017-10-04 15:36 UTC (permalink / raw) To: David Hunt, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi David, On Wednesday 04 October 2017 02:45 PM, David Hunt wrote: > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- my 2cent: General comment on implementation approach: IMO, we should avoid PMD details in common lib area. example: file channel_commons.h has ifdef clutter referencing i40e pmds all over. Perhaps we should introduce opaque handle example void * or introduce pmd specific callback/handle which points to PMD specific metadata in power library. Example: struct channel_packet { void *pmd_specific_metadata; } Or someway via callback (I'm not sure at the moment) so that we could hide PMD details in common area. Thanks. > lib/librte_power/channel_commands.h | 52 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 52 insertions(+) > > diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h > index 484085b..1599706 100644 > --- a/lib/librte_power/channel_commands.h > +++ b/lib/librte_power/channel_commands.h > @@ -46,6 +46,7 @@ extern "C" { > /* Valid Commands */ > #define CPU_POWER 1 > #define CPU_POWER_CONNECT 2 > +#define PKT_POLICY 3 > > /* CPU Power Command Scaling */ > #define CPU_POWER_SCALE_UP 1 > @@ -54,11 +55,62 @@ extern "C" { > #define CPU_POWER_SCALE_MIN 4 > #define CPU_POWER_ENABLE_TURBO 5 > #define CPU_POWER_DISABLE_TURBO 6 > +#define HOURS 24 > + > +#ifdef RTE_LIBRTE_I40E_PMD > +#define MAX_VFS 10 > +#endif > + > +#define MAX_VCPU_PER_VM 8 > + > +typedef enum {false, true} bool; > + > +struct t_boost_status { > + bool tbEnabled; > +}; > + > +struct timer_profile { > + int busy_hours[HOURS]; > + int quiet_hours[HOURS]; > +#ifdef RTE_LIBRTE_I40E_PMD > + int hours_to_use_traffic_profile[HOURS]; > +#endif > +}; > + > +enum workload {HIGH, MEDIUM, LOW}; > +enum policy_to_use { > +#ifdef RTE_LIBRTE_I40E_PMD > + TRAFFIC, > +#endif > + TIME, > + WORKLOAD > +}; > + > +#ifdef RTE_LIBRTE_I40E_PMD > +struct traffic { > + uint32_t min_packet_thresh; > + uint32_t avg_max_packet_thresh; > + uint32_t max_max_packet_thresh; > +}; > +#endif > > struct channel_packet { > uint64_t resource_id; /**< core_num, device */ > uint32_t unit; /**< scale down/up/min/max */ > uint32_t command; /**< Power, IO, etc */ > + char vm_name[32]; > + > +#ifdef RTE_LIBRTE_I40E_PMD > + uint64_t vfid[MAX_VFS]; > + int nb_mac_to_monitor; > + struct traffic traffic_policy; > +#endif > + uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; > + uint8_t num_vcpu; > + struct timer_profile timer_policy; > + enum workload workload; > + enum policy_to_use policy_to_use; > + struct t_boost_status t_boost_status; > }; > > ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies 2017-10-04 15:36 ` santosh @ 2017-10-05 8:38 ` Hunt, David 2017-10-05 9:21 ` santosh 0 siblings, 1 reply; 105+ messages in thread From: Hunt, David @ 2017-10-05 8:38 UTC (permalink / raw) To: santosh, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi Santosh, On 4/10/2017 4:36 PM, santosh wrote: > Hi David, > > > On Wednesday 04 October 2017 02:45 PM, David Hunt wrote: >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- > my 2cent: > General comment on implementation approach: > IMO, we should avoid PMD details in common lib area. > example: file channel_commons.h has ifdef clutter referencing > i40e pmds all over. > > Perhaps we should introduce opaque handle example void * or introduce pmd > specific callback/handle which points to PMD specific metadata in power library. > > Example: > struct channel_packet { > void *pmd_specific_metadata; > } > > Or someway via callback (I'm not sure at the moment) > so that we could hide PMD details in common area. > > Thanks. I would agree that PMD specific details are good left to the PMDs, however I think that the initial example should be OK as is, and as new PMDs are added, we can find commonality between them which stays in the example, and any really specific stuff can be pushed back behind an opaque. What about the v5 I submitted (without the #ifdef's)? Are you OK with that for this release, and we can fine tune as other PMDS are added in future releases? Regards, Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies 2017-10-05 8:38 ` Hunt, David @ 2017-10-05 9:21 ` santosh 2017-10-05 9:51 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: santosh @ 2017-10-05 9:21 UTC (permalink / raw) To: Hunt, David, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi David, On Thursday 05 October 2017 02:08 PM, Hunt, David wrote: > > Hi Santosh, > > On 4/10/2017 4:36 PM, santosh wrote: >> Hi David, >> >> >> On Wednesday 04 October 2017 02:45 PM, David Hunt wrote: >>> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >>> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >>> Signed-off-by: David Hunt <david.hunt@intel.com> >>> --- >> my 2cent: >> General comment on implementation approach: >> IMO, we should avoid PMD details in common lib area. >> example: file channel_commons.h has ifdef clutter referencing >> i40e pmds all over. >> >> Perhaps we should introduce opaque handle example void * or introduce pmd >> specific callback/handle which points to PMD specific metadata in power library. >> >> Example: >> struct channel_packet { >> void *pmd_specific_metadata; >> } >> >> Or someway via callback (I'm not sure at the moment) >> so that we could hide PMD details in common area. >> >> Thanks. > > I would agree that PMD specific details are good left to the PMDs, however I think that the initial > example should be OK as is, and as new PMDs are added, we can find commonality between them > which stays in the example, and any really specific stuff can be pushed back behind an opaque. > > What about the v5 I submitted (without the #ifdef's)? Are you OK with that for this release, and we can > fine tune as other PMDS are added in future releases? > Yes. But in future releases, we should do more code clean up in power lib and example area.. meaning; current example implementation uses names like _vsi.. specific to intel NICs, we should remove such naming and their dependency code from example area. Thanks. > Regards, > Dave. > > ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies 2017-10-05 9:21 ` santosh @ 2017-10-05 9:51 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-10-05 9:51 UTC (permalink / raw) To: santosh, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton On 5/10/2017 10:21 AM, santosh wrote: > Hi David, > > > On Thursday 05 October 2017 02:08 PM, Hunt, David wrote: >> Hi Santosh, >> >> On 4/10/2017 4:36 PM, santosh wrote: >>> Hi David, >>> >>> >>> On Wednesday 04 October 2017 02:45 PM, David Hunt wrote: >>>> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >>>> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >>>> Signed-off-by: David Hunt <david.hunt@intel.com> >>>> --- >>> my 2cent: >>> General comment on implementation approach: >>> IMO, we should avoid PMD details in common lib area. >>> example: file channel_commons.h has ifdef clutter referencing >>> i40e pmds all over. >>> >>> Perhaps we should introduce opaque handle example void * or introduce pmd >>> specific callback/handle which points to PMD specific metadata in power library. >>> >>> Example: >>> struct channel_packet { >>> void *pmd_specific_metadata; >>> } >>> >>> Or someway via callback (I'm not sure at the moment) >>> so that we could hide PMD details in common area. >>> >>> Thanks. >> I would agree that PMD specific details are good left to the PMDs, however I think that the initial >> example should be OK as is, and as new PMDs are added, we can find commonality between them >> which stays in the example, and any really specific stuff can be pushed back behind an opaque. >> >> What about the v5 I submitted (without the #ifdef's)? Are you OK with that for this release, and we can >> fine tune as other PMDS are added in future releases? >> > Yes. But in future releases, we should do more code clean up in power lib and example area.. > meaning; current example implementation uses names like _vsi.. specific to intel NICs, > we should remove such naming and their dependency code from example area. > > Thanks. I agree. I plan to clean up the API in the next release of DPDK. For exmaple, there are private header files that are called rte_*.h that expose private functions to the documentation. These need to be renamed, as well as moving some structures around. I can also look at re-naming some of the vsi vars to something more generic. Thanks, Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt ` (6 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/channel_manager.c | 62 +++++++++++++++++++++++++++++ examples/vm_power_manager/channel_manager.h | 25 ++++++++++++ 2 files changed, 87 insertions(+) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index e068ae2..03fa626 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -574,6 +574,68 @@ set_channel_status(const char *vm_name, unsigned *channel_list, return num_channels_changed; } +void +get_all_vm(int *num_vm, int *num_cpu) +{ + + virNodeInfo node_info; + virDomainPtr *domptr; + uint64_t mask; + int i, ii, numVcpus[MAX_VCPUS], cpu, n_vcpus; + unsigned int jj; + const char *vm_name; + unsigned int flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; + unsigned int flag = VIR_DOMAIN_VCPU_CONFIG; + + + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); + if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); + + /* Returns number of pcpus */ + global_n_host_cpus = (unsigned int)node_info.cpus; + + /* Returns number of active domains */ + *num_vm = virConnectListAllDomains(global_vir_conn_ptr, &domptr, flags); + if (*num_vm <= 0) + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); + + for (i = 0; i < *num_vm; i++) { + + /* Get Domain Names */ + vm_name = virDomainGetName(domptr[i]); + lvm_info[i].vm_name = vm_name; + + /* Get Number of Vcpus */ + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], flag); + + /* Get Number of VCpus & VcpuPinInfo */ + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], + numVcpus[i], global_cpumaps, + global_maplen, flag); + + if ((int)n_vcpus > 0) { + *num_cpu = n_vcpus; + lvm_info[i].num_cpus = n_vcpus; + } + + /* Save pcpu in use by libvirt VMs */ + for (ii = 0; ii < n_vcpus; ii++) { + mask = 0; + for (jj = 0; jj < global_n_host_cpus; jj++) { + if (VIR_CPU_USABLE(global_cpumaps, + global_maplen, ii, jj) > 0) { + mask |= 1ULL << jj; + } + } + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { + lvm_info[i].pcpus[ii] = cpu; + } + } + } +} + int get_info_vm(const char *vm_name, struct vm_info *info) { diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h index 47c3b9c..788c1e6 100644 --- a/examples/vm_power_manager/channel_manager.h +++ b/examples/vm_power_manager/channel_manager.h @@ -66,6 +66,17 @@ struct sockaddr_un _sockaddr_un; #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) #endif +#define MAX_VMS 4 +#define MAX_VCPUS 20 + + +struct libvirt_vm_info { + const char *vm_name; + unsigned int pcpus[MAX_VCPUS]; + uint8_t num_cpus; +}; + +struct libvirt_vm_info lvm_info[MAX_VMS]; /* Communication Channel Status */ enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, CHANNEL_MGR_CHANNEL_CONNECTED, @@ -319,6 +330,20 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, */ int get_info_vm(const char *vm_name, struct vm_info *info); +/** + * Populates a table with all domains running and their physical cpu. + * All information is gathered through libvirt api. + * + * @param noVms + * modified to store number of active VMs + * + * @param noVcpus + modified to store number of vcpus active + * + * @return + * void + */ +void get_all_vm(int *noVms, int *noVcpus); #ifdef __cplusplus } #endif -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 4/9] examples/vm_power_mgr: add scale to medium freq fn 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt ` (2 preceding siblings ...) 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 5/9] examples/vm_power_mgr: add policy to channels David Hunt ` (5 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/power_manager.c | 15 +++++++++++++++ examples/vm_power_manager/power_manager.h | 13 +++++++++++++ 2 files changed, 28 insertions(+) diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c index 80705f9..c021c1d 100644 --- a/examples/vm_power_manager/power_manager.c +++ b/examples/vm_power_manager/power_manager.c @@ -286,3 +286,18 @@ power_manager_disable_turbo_core(unsigned int core_num) POWER_SCALE_CORE(disable_turbo, core_num, ret); return ret; } + +int +power_manager_scale_core_med(unsigned int core_num) +{ + int ret = 0; + + if (core_num >= POWER_MGR_MAX_CPUS) + return -1; + if (!(global_enabled_cpus & (1ULL << core_num))) + return -1; + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); + ret = rte_power_set_freq(core_num, 5); + rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); + return ret; +} diff --git a/examples/vm_power_manager/power_manager.h b/examples/vm_power_manager/power_manager.h index b74d09b..b52fb4c 100644 --- a/examples/vm_power_manager/power_manager.h +++ b/examples/vm_power_manager/power_manager.h @@ -231,6 +231,19 @@ int power_manager_disable_turbo_core(unsigned int core_num); */ uint32_t power_manager_get_current_frequency(unsigned core_num); +/** + * Scale to medium frequency for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to change frequency + * + * @return + * - 1 on success. + * - 0 if frequency not changed. + * - Negative on error. + */ +int power_manager_scale_core_med(unsigned int core_num); #ifdef __cplusplus } -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 5/9] examples/vm_power_mgr: add policy to channels 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt ` (3 preceding siblings ...) 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 6/9] examples/vm_power_mgr: add port initialisation David Hunt ` (4 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/Makefile | 16 ++ examples/vm_power_manager/channel_monitor.c | 340 +++++++++++++++++++++++++++- examples/vm_power_manager/channel_monitor.h | 20 ++ 3 files changed, 370 insertions(+), 6 deletions(-) diff --git a/examples/vm_power_manager/Makefile b/examples/vm_power_manager/Makefile index 59a9641..9cf20a2 100644 --- a/examples/vm_power_manager/Makefile +++ b/examples/vm_power_manager/Makefile @@ -54,6 +54,22 @@ CFLAGS += $(WERROR_FLAGS) LDLIBS += -lvirt +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) + +ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y) +LDLIBS += -lrte_pmd_ixgbe +endif + +ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y) +LDLIBS += -lrte_pmd_i40e +endif + +ifeq ($(CONFIG_RTE_LIBRTE_BNXT_PMD),y) +LDLIBS += -lrte_pmd_bnxt +endif + +endif + # workaround for a gcc bug with noreturn attribute # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index ac40dac..7db98ad 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -41,13 +41,20 @@ #include <sys/types.h> #include <sys/epoll.h> #include <sys/queue.h> +#include <sys/time.h> #include <rte_log.h> #include <rte_memory.h> #include <rte_malloc.h> #include <rte_atomic.h> +#include <rte_cycles.h> +#include <rte_ethdev.h> +#ifdef RTE_LIBRTE_I40E_PMD +#include <rte_pmd_i40e.h> +#endif +#include <libvirt/libvirt.h> #include "channel_monitor.h" #include "channel_commands.h" #include "channel_manager.h" @@ -57,10 +64,17 @@ #define MAX_EVENTS 256 +#ifdef RTE_LIBRTE_I40E_PMD +uint64_t vsi_pkt_count_prev[384]; +uint64_t rdtsc_prev[384]; +#endif +double time_period_s = 1; static volatile unsigned run_loop = 1; static int global_event_fd; +static unsigned int policy_is_set; static struct epoll_event *global_events_list; +static struct policy policies[MAX_VMS]; void channel_monitor_exit(void) { @@ -68,6 +82,302 @@ void channel_monitor_exit(void) rte_free(global_events_list); } +static void +core_share(int pNo, int z, int x, int t) +{ + if (policies[pNo].core_share[z].pcpu == lvm_info[x].pcpus[t]) { + if (strcmp(policies[pNo].pkt.vm_name, + lvm_info[x].vm_name) != 0) { + policies[pNo].core_share[z].status = 1; + power_manager_scale_core_max( + policies[pNo].core_share[z].pcpu); + } + } +} + +static void +core_share_status(int pNo) +{ + + int noVms, noVcpus, z, x, t; + + get_all_vm(&noVms, &noVcpus); + + /* Reset Core Share Status. */ + for (z = 0; z < noVcpus; z++) + policies[pNo].core_share[z].status = 0; + + /* Foreach vcpu in a policy. */ + for (z = 0; z < policies[pNo].pkt.num_vcpu; z++) { + /* Foreach VM on the platform. */ + for (x = 0; x < noVms; x++) { + /* Foreach vcpu of VMs on platform. */ + for (t = 0; t < lvm_info[x].num_cpus; t++) + core_share(pNo, z, x, t); + } + } +} + +static void +get_pcpu_to_control(struct policy *pol) +{ + + /* Convert vcpu to pcpu. */ + struct vm_info info; + int pcpu, count; + uint64_t mask_u64b; + + RTE_LOG(INFO, CHANNEL_MONITOR, "Looking for pcpu for %s\n", + pol->pkt.vm_name); + get_info_vm(pol->pkt.vm_name, &info); + + for (count = 0; count < pol->pkt.num_vcpu; count++) { + mask_u64b = info.pcpu_mask[pol->pkt.vcpu_to_control[count]]; + for (pcpu = 0; mask_u64b; mask_u64b &= ~(1ULL << pcpu++)) { + if ((mask_u64b >> pcpu) & 1) + pol->core_share[count].pcpu = pcpu; + } + } +} + +#ifdef RTE_LIBRTE_I40E_PMD +static int +get_pfid(struct policy *pol) +{ + + int i, x, ret = 0, nb_ports; + + nb_ports = rte_eth_dev_count(); + for (i = 0; i < pol->pkt.nb_mac_to_monitor; i++) { + + for (x = 0; x < nb_ports; x++) { + ret = rte_pmd_i40e_query_vfid_by_mac(x, + pol->pkt.vfid[i]); + if (ret != -EINVAL) { + pol->port[i] = x; + break; + } + } + if (ret == -EINVAL) { + RTE_LOG(INFO, CHANNEL_MONITOR, + "Error with Policy. MAC not found on " + "attached ports "); + pol->enabled = 0; + return ret; + } + pol->pfid[i] = ret; + } + return 1; +} +#endif + +static int +update_policy(struct channel_packet *pkt) +{ + + unsigned int updated = 0; + + for (int i = 0; i < MAX_VMS; i++) { + if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); +#ifdef RTE_LIBRTE_I40E_PMD + if (get_pfid(&policies[i]) == -1) { + updated = 1; + break; + } +#endif + core_share_status(i); + policies[i].enabled = 1; + updated = 1; + } + } + if (!updated) { + for (int i = 0; i < MAX_VMS; i++) { + if (policies[i].enabled == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); +#ifdef RTE_LIBRTE_I40E_PMD + if (get_pfid(&policies[i]) == -1) + break; +#endif + core_share_status(i); + policies[i].enabled = 1; + break; + } + } + } + return 0; +} + +#ifdef RTE_LIBRTE_I40E_PMD +static uint64_t +get_pkt_diff(struct policy *pol) +{ + + uint64_t vsi_pkt_count, + vsi_pkt_total = 0, + vsi_pkt_count_prev_total = 0; + double rdtsc_curr, rdtsc_diff, diff; + int x; + struct rte_eth_stats vf_stats; + + for (x = 0; x < pol->pkt.nb_mac_to_monitor; x++) { + + /*Read vsi stats*/ + if (rte_pmd_i40e_get_vf_stats(x, pol->pfid[x], &vf_stats) == 0) + vsi_pkt_count = vf_stats.ipackets; + else + vsi_pkt_count = -1; + + vsi_pkt_total += vsi_pkt_count; + + vsi_pkt_count_prev_total += vsi_pkt_count_prev[pol->pfid[x]]; + vsi_pkt_count_prev[pol->pfid[x]] = vsi_pkt_count; + } + + rdtsc_curr = rte_rdtsc_precise(); + rdtsc_diff = rdtsc_curr - rdtsc_prev[pol->pfid[x-1]]; + rdtsc_prev[pol->pfid[x-1]] = rdtsc_curr; + + diff = (vsi_pkt_total - vsi_pkt_count_prev_total) * + ((double)rte_get_tsc_hz() / rdtsc_diff); + + return diff; +} + +static void +apply_traffic_profile(struct policy *pol) +{ + + int count; + uint64_t diff = 0; + + diff = get_pkt_diff(pol); + + RTE_LOG(INFO, CHANNEL_MONITOR, "Applying traffic profile\n"); + + if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (diff >= (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (diff < (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} +#endif + +static void +apply_time_profile(struct policy *pol) +{ + + int count, x; + struct timeval tv; + struct tm *ptm; + char time_string[40]; + + /* Obtain the time of day, and convert it to a tm struct. */ + gettimeofday(&tv, NULL); + ptm = localtime(&tv.tv_sec); + /* Format the date and time, down to a single second. */ + strftime(time_string, sizeof(time_string), "%Y-%m-%d %H:%M:%S", ptm); + + for (x = 0; x < HOURS; x++) { + + if (ptm->tm_hour == pol->pkt.timer_policy.busy_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_max( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling up core %d to max\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.quiet_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_min( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling down core %d to min\n", + pol->core_share[count].pcpu); + } + } + break; +#ifdef RTE_LIBRTE_I40E_PMD + } else if (ptm->tm_hour == + pol->pkt.timer_policy.hours_to_use_traffic_profile[x]) { + apply_traffic_profile(pol); + break; + } +#else + } +#endif + } +} + +static void +apply_workload_profile(struct policy *pol) +{ + + int count; + + if (pol->pkt.workload == HIGH) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == MEDIUM) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == LOW) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_policy(struct policy *pol) +{ + + struct channel_packet *pkt = &pol->pkt; + + /*Check policy to use*/ +#ifdef RTE_LIBRTE_I40E_PMD + if (pkt->policy_to_use == TRAFFIC) + apply_traffic_profile(pol); + else if (pkt->policy_to_use == TIME) +#else + if (pkt->policy_to_use == TIME) +#endif + apply_time_profile(pol); + else if (pkt->policy_to_use == WORKLOAD) + apply_workload_profile(pol); +} + + static int process_request(struct channel_packet *pkt, struct channel_info *chan_info) { @@ -140,6 +450,13 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) } } + + if (pkt->command == PKT_POLICY) { + RTE_LOG(INFO, CHANNEL_MONITOR, "\nProcessing Policy request from Guest\n"); + update_policy(pkt); + policy_is_set = 1; + } + /* Return is not checked as channel status may have been set to DISABLED * from management thread */ @@ -209,9 +526,10 @@ run_channel_monitor(void) struct channel_info *chan_info = (struct channel_info *) global_events_list[i].data.ptr; if ((global_events_list[i].events & EPOLLERR) || - (global_events_list[i].events & EPOLLHUP)) { + (global_events_list[i].events & EPOLLHUP)) { RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed connection for " - "channel '%s'\n", chan_info->channel_path); + "channel '%s'\n", + chan_info->channel_path); remove_channel(&chan_info); continue; } @@ -223,14 +541,17 @@ run_channel_monitor(void) int buffer_len = sizeof(pkt); while (buffer_len > 0) { - n_bytes = read(chan_info->fd, buffer, buffer_len); + n_bytes = read(chan_info->fd, + buffer, buffer_len); if (n_bytes == buffer_len) break; if (n_bytes == -1) { err = errno; - RTE_LOG(DEBUG, CHANNEL_MONITOR, "Received error on " - "channel '%s' read: %s\n", - chan_info->channel_path, strerror(err)); + RTE_LOG(DEBUG, CHANNEL_MONITOR, + "Received error on " + "channel '%s' read: %s\n", + chan_info->channel_path, + strerror(err)); remove_channel(&chan_info); break; } @@ -241,5 +562,12 @@ run_channel_monitor(void) process_request(&pkt, chan_info); } } + rte_delay_us(time_period_s*1000000); + if (policy_is_set) { + for (int j = 0; j < MAX_VMS; j++) { + if (policies[j].enabled == 1) + apply_policy(&policies[j]); + } + } } } diff --git a/examples/vm_power_manager/channel_monitor.h b/examples/vm_power_manager/channel_monitor.h index c138607..11f5f75 100644 --- a/examples/vm_power_manager/channel_monitor.h +++ b/examples/vm_power_manager/channel_monitor.h @@ -35,6 +35,26 @@ #define CHANNEL_MONITOR_H_ #include "channel_manager.h" +#include "channel_commands.h" + +struct core_share { + unsigned int pcpu; + /* + * 1 CORE SHARE + * 0 NOT SHARED + */ + int status; +}; + +struct policy { + struct channel_packet pkt; +#ifdef RTE_LIBRTE_I40E_PMD + uint32_t pfid[MAX_VFS]; + uint32_t port[MAX_VFS]; +#endif + unsigned int enabled; + struct core_share core_share[MAX_VCPU_PER_VM]; +}; #ifdef __cplusplus extern "C" { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 6/9] examples/vm_power_mgr: add port initialisation 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt ` (4 preceding siblings ...) 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 5/9] examples/vm_power_mgr: add policy to channels David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 7/9] power: add send channel msg function to map file David Hunt ` (3 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic We need to initialise the port's we're monitoring to be able to see the throughput. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 220 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index c33fcc9..698abca 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -49,6 +49,9 @@ #include <rte_log.h> #include <rte_per_lcore.h> #include <rte_lcore.h> +#include <rte_ethdev.h> +#include <getopt.h> +#include <rte_cycles.h> #include <rte_debug.h> #include "channel_manager.h" @@ -56,6 +59,192 @@ #include "power_manager.h" #include "vm_power_cli.h" +#define RX_RING_SIZE 512 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static uint32_t enabled_port_mask; +static volatile bool force_quit; + +/****************/ +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned int)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + + return 0; +} + +static int +parse_portmask(const char *portmask) +{ + char *end = NULL; + unsigned long pm; + + /* parse hexadecimal string */ + pm = strtoul(portmask, &end, 16); + if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + + if (pm == 0) + return -1; + + return pm; +} +/* Parse the argument given in the command line of the application */ +static int +parse_args(int argc, char **argv) +{ + int opt, ret; + char **argvopt; + int option_index; + char *prgname = argv[0]; + static struct option lgopts[] = { + { "mac-updating", no_argument, 0, 1}, + { "no-mac-updating", no_argument, 0, 0}, + {NULL, 0, 0, 0} + }; + argvopt = argv; + + while ((opt = getopt_long(argc, argvopt, "p:q:T:", + lgopts, &option_index)) != EOF) { + + switch (opt) { + /* portmask */ + case 'p': + enabled_port_mask = parse_portmask(optarg); + if (enabled_port_mask == 0) { + printf("invalid portmask\n"); + return -1; + } + break; + /* long options */ + case 0: + break; + + default: + return -1; + } + } + + if (optind >= 0) + argv[optind-1] = prgname; + + ret = optind-1; + optind = 0; /* reset getopt lib */ + return ret; +} + +static void +check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) +{ +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ + uint8_t portid, count, all_ports_up, print_flag = 0; + struct rte_eth_link link; + + printf("\nChecking link status"); + fflush(stdout); + for (count = 0; count <= MAX_CHECK_TIME; count++) { + if (force_quit) + return; + all_ports_up = 1; + for (portid = 0; portid < port_num; portid++) { + if (force_quit) + return; + if ((port_mask & (1 << portid)) == 0) + continue; + memset(&link, 0, sizeof(link)); + rte_eth_link_get_nowait(portid, &link); + /* print link status if flag set */ + if (print_flag == 1) { + if (link.link_status) + printf("Port %d Link Up - speed %u " + "Mbps - %s\n", (uint8_t)portid, + (unsigned int)link.link_speed, + (link.link_duplex == ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex\n")); + else + printf("Port %d Link Down\n", + (uint8_t)portid); + continue; + } + /* clear all_ports_up flag if any link down */ + if (link.link_status == ETH_LINK_DOWN) { + all_ports_up = 0; + break; + } + } + /* after finally printing all link status, get out */ + if (print_flag == 1) + break; + + if (all_ports_up == 0) { + printf("."); + fflush(stdout); + rte_delay_ms(CHECK_INTERVAL); + } + + /* set the print_flag if all ports up or timeout */ + if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) { + print_flag = 1; + printf("done\n"); + } + } +} static int run_monitor(__attribute__((unused)) void *arg) { @@ -82,6 +271,10 @@ main(int argc, char **argv) { int ret; unsigned lcore_id; + unsigned int nb_ports; + struct rte_mempool *mbuf_pool; + uint8_t portid; + ret = rte_eal_init(argc, argv); if (ret < 0) @@ -90,12 +283,39 @@ main(int argc, char **argv) signal(SIGINT, sig_handler); signal(SIGTERM, sig_handler); + argc -= ret; + argv += ret; + + /* parse application arguments (after the EAL ones) */ + ret = parse_args(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid arguments\n"); + + nb_ports = rte_eth_dev_count(); + + mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports, + MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize ports. */ + for (portid = 0; portid < nb_ports; portid++) { + if ((enabled_port_mask & (1 << portid)) == 0) + continue; + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + } + lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { RTE_LOG(ERR, EAL, "A minimum of two cores are required to run " "application\n"); return 0; } + + check_all_ports_link_status(nb_ports, enabled_port_mask); rte_eal_remote_launch(run_monitor, NULL, lcore_id); if (power_manager_init() < 0) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 7/9] power: add send channel msg function to map file 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt ` (5 preceding siblings ...) 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 6/9] examples/vm_power_mgr: add port initialisation David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 8/9] examples/guest_cli: add send policy to host David Hunt ` (2 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, David Hunt Adding new wrapper function to existing private (but unused 'till now) function with an rte_power_ prefix. The plan is to clean up all the header files in the next release so that only the intended public functions are in the map file and only the relevant headers have the rte_ prefix so that only they are included in the documentation. Signed-off-by: David Hunt <david.hunt@intel.com> --- lib/librte_power/guest_channel.c | 7 +++++++ lib/librte_power/guest_channel.h | 15 +++++++++++++++ lib/librte_power/rte_power_version.map | 1 + 3 files changed, 23 insertions(+) diff --git a/lib/librte_power/guest_channel.c b/lib/librte_power/guest_channel.c index 85c92fa..fa5de0f 100644 --- a/lib/librte_power/guest_channel.c +++ b/lib/librte_power/guest_channel.c @@ -148,6 +148,13 @@ guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id) return 0; } +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id) +{ + return guest_channel_send_msg(pkt, lcore_id); +} + + void guest_channel_host_disconnect(unsigned lcore_id) { diff --git a/lib/librte_power/guest_channel.h b/lib/librte_power/guest_channel.h index 9e18af5..741339c 100644 --- a/lib/librte_power/guest_channel.h +++ b/lib/librte_power/guest_channel.h @@ -81,6 +81,21 @@ void guest_channel_host_disconnect(unsigned lcore_id); */ int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); +/** + * Send a message contained in pkt over the Virtio-Serial to the host endpoint. + * + * @param pkt + * Pointer to a populated struct channel_packet + * + * @param lcore_id + * lcore_id. + * + * @return + * - 0 on success. + * - Negative on error. + */ +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id); #ifdef __cplusplus } diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map index 9ae0627..96dc42e 100644 --- a/lib/librte_power/rte_power_version.map +++ b/lib/librte_power/rte_power_version.map @@ -20,6 +20,7 @@ DPDK_2.0 { DPDK_17.11 { global: + rte_power_guest_channel_send_msg; rte_power_freq_disable_turbo; rte_power_freq_enable_turbo; rte_power_turbo_status; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 8/9] examples/guest_cli: add send policy to host 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt ` (6 preceding siblings ...) 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 7/9] power: add send channel msg function to map file David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Here we're adding an example of setting up a policy, and allowing the vm_cli_guest app to send it to the host using the cli command "send_policy now" Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- .../guest_cli/vm_power_cli_guest.c | 105 +++++++++++++++++++++ .../guest_cli/vm_power_cli_guest.h | 6 -- 2 files changed, 105 insertions(+), 6 deletions(-) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 4e982bd..fe0d77a 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -45,8 +45,10 @@ #include <cmdline.h> #include <rte_log.h> #include <rte_lcore.h> +#include <rte_ethdev.h> #include <rte_power.h> +#include <guest_channel.h> #include "vm_power_cli_guest.h" @@ -139,8 +141,111 @@ cmdline_parse_inst_t cmd_set_cpu_freq_set = { }, }; +struct cmd_send_policy_result { + cmdline_fixed_string_t send_policy; + cmdline_fixed_string_t cmd; +}; + +#ifdef RTE_LIBRTE_I40E_PMD +union PFID { + struct ether_addr addr; + uint64_t pfid; +}; +#endif + +static inline int +send_policy(void) +{ + struct channel_packet pkt; + int ret; + +#ifdef RTE_LIBRTE_I40E_PMD + union PFID pfid; + /* Use port MAC address as the vfid */ + rte_eth_macaddr_get(0, &pfid.addr); + printf("Port %u MAC: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":" + "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n", + 1, + pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1], + pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3], + pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]); + pkt.vfid[0] = pfid.pfid; + + pkt.nb_mac_to_monitor = 1; +#endif + pkt.t_boost_status.tbEnabled = false; + + pkt.vcpu_to_control[0] = 0; + pkt.vcpu_to_control[1] = 1; + pkt.num_vcpu = 2; + /* Dummy Population. */ +#ifdef RTE_LIBRTE_I40E_PMD + pkt.traffic_policy.min_packet_thresh = 96000; + pkt.traffic_policy.avg_max_packet_thresh = 1800000; + pkt.traffic_policy.max_max_packet_thresh = 2000000; +#endif + + pkt.timer_policy.busy_hours[0] = 3; + pkt.timer_policy.busy_hours[1] = 4; + pkt.timer_policy.busy_hours[2] = 5; + pkt.timer_policy.quiet_hours[0] = 11; + pkt.timer_policy.quiet_hours[1] = 12; + pkt.timer_policy.quiet_hours[2] = 13; + +#ifdef RTE_LIBRTE_I40E_PMD + pkt.timer_policy.hours_to_use_traffic_profile[0] = 8; + pkt.timer_policy.hours_to_use_traffic_profile[1] = 10; +#endif + + pkt.workload = LOW; + pkt.policy_to_use = TIME; + pkt.command = PKT_POLICY; + strcpy(pkt.vm_name, "ubintu2"); + ret = rte_power_guest_channel_send_msg(&pkt, 1); + if (ret == 0) + return 1; + RTE_LOG(DEBUG, POWER, "Error sending message: %s\n", + ret > 0 ? strerror(ret) : "channel not connected"); + return -1; +} + +static void +cmd_send_policy_parsed(void *parsed_result, struct cmdline *cl, + __attribute__((unused)) void *data) +{ + int ret = -1; + struct cmd_send_policy_result *res = parsed_result; + + if (!strcmp(res->cmd, "now")) { + printf("Sending Policy down now!\n"); + ret = send_policy(); + } + if (ret != 1) + cmdline_printf(cl, "Error sending message: %s\n", + strerror(ret)); +} + +cmdline_parse_token_string_t cmd_send_policy = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + send_policy, "send_policy"); +cmdline_parse_token_string_t cmd_send_policy_cmd_cmd = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + cmd, "now"); + +cmdline_parse_inst_t cmd_send_policy_set = { + .f = cmd_send_policy_parsed, + .data = NULL, + .help_str = "send_policy now", + .tokens = { + (void *)&cmd_send_policy, + (void *)&cmd_send_policy_cmd_cmd, + NULL, + }, +}; + cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_quit, + (cmdline_parse_inst_t *)&cmd_send_policy_set, (cmdline_parse_inst_t *)&cmd_set_cpu_freq_set, NULL, }; diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h index 0c4bdd5..277eab3 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h @@ -40,12 +40,6 @@ extern "C" { #include "channel_commands.h" -int guest_channel_host_connect(unsigned lcore_id); - -int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); - -void guest_channel_host_disconnect(unsigned lcore_id); - void run_cli(__attribute__((unused)) void *arg); #ifdef __cplusplus -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v4 9/9] examples/vm_power_mgr: set MAC address of VF 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt ` (7 preceding siblings ...) 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 8/9] examples/guest_cli: add send policy to host David Hunt @ 2017-10-04 9:15 ` David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 9:15 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, David Hunt We need to set vf mac from the host, so that they will be in sync on the guest and the host. Otherwise, we'll have a random mac on the guest, and a 00:00:00:00:00:00 mac on the host. Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 60 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index 698abca..18f5e7f 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -58,6 +58,15 @@ #include "channel_monitor.h" #include "power_manager.h" #include "vm_power_cli.h" +#ifdef RTE_LIBRTE_IXGBE_PMD +#include <rte_pmd_ixgbe.h> +#endif +#ifdef RTE_LIBRTE_I40E_PMD +#include <rte_pmd_i40e.h> +#endif +#ifdef RTE_LIBRTE_BNXT_PMD +#include <rte_pmd_bnxt.h> +#endif #define RX_RING_SIZE 512 #define TX_RING_SIZE 512 @@ -222,7 +231,7 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) (uint8_t)portid); continue; } - /* clear all_ports_up flag if any link down */ + /* clear all_ports_up flag if any link down */ if (link.link_status == ETH_LINK_DOWN) { all_ports_up = 0; break; @@ -273,7 +282,9 @@ main(int argc, char **argv) unsigned lcore_id; unsigned int nb_ports; struct rte_mempool *mbuf_pool; +#ifdef RTE_LIBRTE_I40E_PMD uint8_t portid; +#endif ret = rte_eal_init(argc, argv); @@ -300,13 +311,60 @@ main(int argc, char **argv) rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); /* Initialize ports. */ +#ifdef RTE_LIBRTE_I40E_PMD for (portid = 0; portid < nb_ports; portid++) { + struct ether_addr eth; + int w, j; + int ret = -ENOTSUP; + if ((enabled_port_mask & (1 << portid)) == 0) continue; + + eth.addr_bytes[0] = 0xe0; + eth.addr_bytes[1] = 0xe0; + eth.addr_bytes[2] = 0xe0; + eth.addr_bytes[3] = 0xe0; + eth.addr_bytes[4] = portid + 0xf0; + if (port_init(portid, mbuf_pool) != 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", portid); + + for (w = 0; w < MAX_VFS; w++) { + eth.addr_bytes[5] = w + 0xf0; + +#ifdef RTE_LIBRTE_IXGBE_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_ixgbe_set_vf_mac_addr(portid, + w, ð); +#endif +#ifdef RTE_LIBRTE_I40E_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_i40e_set_vf_mac_addr(portid, + w, ð); +#endif +#ifdef RTE_LIBRTE_BNXT_PMD + if (ret == -ENOTSUP) + ret = rte_pmd_bnxt_set_vf_mac_addr(portid, + w, ð); +#endif + + + switch (ret) { + case 0: + printf("Port %d VF %d MAC: ", + portid, w); + for (j = 0; j < 6; j++) { + printf("%02x", eth.addr_bytes[j]); + if (j < 5) + printf(":"); + } + printf("\n"); + break; + } + } } +#endif lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt ` (8 preceding siblings ...) 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt ` (9 more replies) 9 siblings, 10 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu Policy Based Power Control for Guest This patchset adds the facility for a guest VM to send a policy down to the host that will allow the host to scale up/down cpu frequencies depending on the policy criteria independently of the DPDK app running in the guest. This differs from the previous vm_power implementation where individual scale up/down requests were send from the guest to the host via virtio-serial. V5 patchset changes: * Removed most of the #ifdef I40_PMD from the example code as it will be applicable to other PMDs in the future. * Changed the parameter of rte_pmd_i40e_query_vfid_by_mac from a uint64 to a const struct ether_addr *, rather than casting it later in the function. V4 patchset changes: * None, re-post to mailing list under the correct email thread. V3 patchset changes: * Changed to using is_same_ether_addr() instead of looping through the mac address bytes to compare them. * Tweaked some comments and working in the i40e patch after review. * Added a patch to the set to add new i40e function to map file, so as to allow shared library builds. The power library API needs a cleanup in next release, so will add API/ABI warning for this cleanup in a separate patch. V2 patchset changes: * Removed API's in ethdev layer. * Now just a single new API in the i40e driver for mapping VF MAC to VF index. * Moved new function from rte_rxtx.c to rte_pmd_i40e.c * Removed function for reading i40e register, moved to using the standard stats API. * Renamed i40e function to rte_pmd_i40e_query_vfid_by_mac * Cleaned up policy generation code. It's a modification of the vm_power_manager app that runs in the host, and the guest_vm_power_app example app that runs in the guest. This allows the guest to send down a policy to the host via virtio-serial, which then allows the host to scale up/down based on the criteria in the policy, resulting in quicker scale up/down than individual requests coming from the guest. It also means that the DPDK application running in the guest does not need to be modified in any way, it is unaware that it's cores are being scaled up/down, reducing the effort in implementing a power-aware infrastructure. The usage model is as follows: 1. Set up the VF's and assign to the guest in the usual way. 2. run vm_power_manager on the host, creating a channel to the guest. 3. Start the guest_vm_power_mgr app on the guest, which establishes a virtio-serial channel to the host. 4. Send down the profile for the guest using the "send_profile now" command. There is an example profile hard-coded into guest_vm_power_mgr. 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. 6. Send traffic into the VFs at varying traffic rates. Observe the frequency change on the host (turbostat -i 1) The sequence of code changes are as follows: A new function has been aded to the i40e driver to allow mapping of a VF MAC to VF index. Next we make an addition to librte_power that adds an extra command to allow the passing of a policy structure from the guest to the host. This struct contains information like busy/quiet hour, packet throughput thresholds, etc. The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to physical CPU (pcpu) IDs so that the host can scale up/down the cores used in the guest. The remaining patches are functionality to process the policy, and take action when the relevant trigger occurs to cause a frequency change. [1/9] net/i40e: add API to convert VF MAC to VF id [2/9] lib/librte_power: add extra msg type for policies [3/9] examples/vm_power_mgr: add vcpu to pcpu mapping [4/9] examples/vm_power_mgr: add scale to medium freq fn [5/9] examples/vm_power_mgr: add policy to channels [6/9] examples/vm_power_mgr: add port initialisation [7/9] power: add send channel msg function to map file [8/9] examples/guest_cli: add send policy to host [9/9] examples/vm_power_mgr: set MAC address of VF ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 15:41 ` santosh 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 2/9] lib/librte_power: add extra msg type for policies David Hunt ` (8 subsequent siblings) 9 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Need a way to convert a vf id to a pf id on the host so as to query the pf for relevant statistics which are used for the frequency changes in the vm_power_manager app. Used when profiles are passed down from the guest to the host, allowing the host to map the vfs to pfs. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- drivers/net/i40e/rte_pmd_i40e.c | 30 ++++++++++++++++++++++++++++++ drivers/net/i40e/rte_pmd_i40e.h | 14 ++++++++++++++ drivers/net/i40e/rte_pmd_i40e_version.map | 7 +++++++ 3 files changed, 51 insertions(+) diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c index f12b7f4..08e6b16 100644 --- a/drivers/net/i40e/rte_pmd_i40e.c +++ b/drivers/net/i40e/rte_pmd_i40e.c @@ -2115,3 +2115,33 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, return 0; } + +uint64_t +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, const struct ether_addr *vf_mac) +{ + struct rte_eth_dev *dev; + struct ether_addr *mac; + struct i40e_pf *pf; + int vf_id; + struct i40e_pf_vf *vf; + uint16_t vf_num; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); + dev = &rte_eth_devices[port]; + + if (!is_i40e_supported(dev)) + return -ENOTSUP; + + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); + vf_num = pf->vf_num; + + for (vf_id = 0; vf_id < vf_num; vf_id++) { + vf = &pf->vfs[vf_id]; + mac = &vf->mac_addr; + + if (is_same_ether_addr(mac, vf_mac)) + return vf_id; + } + + return -EINVAL; +} diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h index 356fa89..9798103 100644 --- a/drivers/net/i40e/rte_pmd_i40e.h +++ b/drivers/net/i40e/rte_pmd_i40e.h @@ -637,4 +637,18 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, uint8_t mask, uint32_t pkt_type); +/** + * On the PF, find VF index based on VF MAC address + * + * @param port + * pointer to port identifier of the device + * @param vf_mac + * the mac address of the vf to determine index of + * @return + * -(-22 EINVAL) the vf mac does not exist on this port + * -(!-22) the index of vfid in pf->vfs + */ +uint64_t rte_pmd_i40e_query_vfid_by_mac(uint8_t port, + const struct ether_addr *vf_mac); + #endif /* _PMD_I40E_H_ */ diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map b/drivers/net/i40e/rte_pmd_i40e_version.map index 20cc980..d8b74bd 100644 --- a/drivers/net/i40e/rte_pmd_i40e_version.map +++ b/drivers/net/i40e/rte_pmd_i40e_version.map @@ -45,3 +45,10 @@ DPDK_17.08 { rte_pmd_i40e_get_ddp_info; } DPDK_17.05; + +DPDK_17.11 { + global: + + rte_pmd_i40e_query_vfid_by_mac; + +} DPDK_17.08; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt @ 2017-10-04 15:41 ` santosh 2017-10-05 8:31 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: santosh @ 2017-10-04 15:41 UTC (permalink / raw) To: David Hunt, dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic Hi David, On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: > From: "Sexton, Rory" <rory.sexton@intel.com> > > Need a way to convert a vf id to a pf id on the host so as to query the pf > for relevant statistics which are used for the frequency changes in the > vm_power_manager app. Used when profiles are passed down from the guest > to the host, allowing the host to map the vfs to pfs. > > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- I see that you just now sent out v5;) But I guess v4 comment on this patch [1] is still applicable (imo). Thanks. [1] http://dpdk.org/dev/patchwork/patch/29577/ ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-04 15:41 ` santosh @ 2017-10-05 8:31 ` Hunt, David 2017-10-05 9:22 ` santosh 0 siblings, 1 reply; 105+ messages in thread From: Hunt, David @ 2017-10-05 8:31 UTC (permalink / raw) To: santosh, dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic Hi Santosh, On 4/10/2017 4:41 PM, santosh wrote: > Hi David, > > > On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: >> From: "Sexton, Rory" <rory.sexton@intel.com> >> >> Need a way to convert a vf id to a pf id on the host so as to query the pf >> for relevant statistics which are used for the frequency changes in the >> vm_power_manager app. Used when profiles are passed down from the guest >> to the host, allowing the host to map the vfs to pfs. >> >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- > I see that you just now sent out v5;) > But I guess v4 comment on this patch [1] > is still applicable (imo). > Thanks. > > [1] http://dpdk.org/dev/patchwork/patch/29577/ The v5 went out just as you were commenting on v4. :) I agree that your comment above needs addressing, I'll do that in v6 today. Regards. Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-05 8:31 ` Hunt, David @ 2017-10-05 9:22 ` santosh 0 siblings, 0 replies; 105+ messages in thread From: santosh @ 2017-10-05 9:22 UTC (permalink / raw) To: Hunt, David, dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic On Thursday 05 October 2017 02:01 PM, Hunt, David wrote: > Hi Santosh, > > On 4/10/2017 4:41 PM, santosh wrote: >> Hi David, >> >> >> On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: >>> From: "Sexton, Rory" <rory.sexton@intel.com> >>> >>> Need a way to convert a vf id to a pf id on the host so as to query the pf >>> for relevant statistics which are used for the frequency changes in the >>> vm_power_manager app. Used when profiles are passed down from the guest >>> to the host, allowing the host to map the vfs to pfs. >>> >>> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >>> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >>> Signed-off-by: David Hunt <david.hunt@intel.com> >>> --- >> I see that you just now sent out v5;) >> But I guess v4 comment on this patch [1] >> is still applicable (imo). >> Thanks. >> >> [1] http://dpdk.org/dev/patchwork/patch/29577/ > > The v5 went out just as you were commenting on v4. :) > > I agree that your comment above needs addressing, I'll do that in v6 today. > Thanks. > Regards. > Dave. > ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 2/9] lib/librte_power: add extra msg type for policies 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 15:47 ` santosh 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt ` (7 subsequent siblings) 9 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- lib/librte_power/channel_commands.h | 42 +++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index 484085b..020d9fe 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -46,6 +46,7 @@ extern "C" { /* Valid Commands */ #define CPU_POWER 1 #define CPU_POWER_CONNECT 2 +#define PKT_POLICY 3 /* CPU Power Command Scaling */ #define CPU_POWER_SCALE_UP 1 @@ -54,11 +55,52 @@ extern "C" { #define CPU_POWER_SCALE_MIN 4 #define CPU_POWER_ENABLE_TURBO 5 #define CPU_POWER_DISABLE_TURBO 6 +#define HOURS 24 + +#define MAX_VFS 10 + +#define MAX_VCPU_PER_VM 8 + +typedef enum {false, true} bool; + +struct t_boost_status { + bool tbEnabled; +}; + +struct timer_profile { + int busy_hours[HOURS]; + int quiet_hours[HOURS]; + int hours_to_use_traffic_profile[HOURS]; +}; + +enum workload {HIGH, MEDIUM, LOW}; +enum policy_to_use { + TRAFFIC, + TIME, + WORKLOAD +}; + +struct traffic { + uint32_t min_packet_thresh; + uint32_t avg_max_packet_thresh; + uint32_t max_max_packet_thresh; +}; struct channel_packet { uint64_t resource_id; /**< core_num, device */ uint32_t unit; /**< scale down/up/min/max */ uint32_t command; /**< Power, IO, etc */ + char vm_name[32]; + + uint64_t vfid[MAX_VFS]; + int nb_mac_to_monitor; + struct traffic traffic_policy; + uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; + uint8_t num_vcpu; + struct timer_profile timer_policy; + enum workload workload; + enum policy_to_use policy_to_use; + struct t_boost_status t_boost_status; }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 2/9] lib/librte_power: add extra msg type for policies 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 2/9] lib/librte_power: add extra msg type for policies David Hunt @ 2017-10-04 15:47 ` santosh 2017-10-05 8:41 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: santosh @ 2017-10-04 15:47 UTC (permalink / raw) To: David Hunt, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi David, On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- Glad that ifdef clutter removed. Few nits though.. > lib/librte_power/channel_commands.h | 42 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 42 insertions(+) > > diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h > index 484085b..020d9fe 100644 > --- a/lib/librte_power/channel_commands.h > +++ b/lib/librte_power/channel_commands.h > @@ -46,6 +46,7 @@ extern "C" { > /* Valid Commands */ > #define CPU_POWER 1 > #define CPU_POWER_CONNECT 2 > +#define PKT_POLICY 3 > > /* CPU Power Command Scaling */ > #define CPU_POWER_SCALE_UP 1 > @@ -54,11 +55,52 @@ extern "C" { > #define CPU_POWER_SCALE_MIN 4 > #define CPU_POWER_ENABLE_TURBO 5 > #define CPU_POWER_DISABLE_TURBO 6 > +#define HOURS 24 > + > +#define MAX_VFS 10 > + > +#define MAX_VCPU_PER_VM 8 > + > +typedef enum {false, true} bool; > + do we really need typedef for bool; can't we simply use bool data-type? > +struct t_boost_status { > + bool tbEnabled; > +}; > + > +struct timer_profile { > + int busy_hours[HOURS]; > + int quiet_hours[HOURS]; > + int hours_to_use_traffic_profile[HOURS]; > +}; > + > +enum workload {HIGH, MEDIUM, LOW}; > +enum policy_to_use { > + TRAFFIC, > + TIME, > + WORKLOAD > +}; > + > +struct traffic { > + uint32_t min_packet_thresh; > + uint32_t avg_max_packet_thresh; > + uint32_t max_max_packet_thresh; > +}; > > struct channel_packet { > uint64_t resource_id; /**< core_num, device */ > uint32_t unit; /**< scale down/up/min/max */ > uint32_t command; /**< Power, IO, etc */ > + char vm_name[32]; > + How about const char * Or in case not possible then #define RTE_xx 32 Or use existing RTE_ for same purpose or some micro local to power lib. > + uint64_t vfid[MAX_VFS]; > + int nb_mac_to_monitor; > + struct traffic traffic_policy; > + uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; > + uint8_t num_vcpu; > + struct timer_profile timer_policy; > + enum workload workload; > + enum policy_to_use policy_to_use; > + struct t_boost_status t_boost_status; > }; > > ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 2/9] lib/librte_power: add extra msg type for policies 2017-10-04 15:47 ` santosh @ 2017-10-05 8:41 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-10-05 8:41 UTC (permalink / raw) To: santosh, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi Santosh, On 4/10/2017 4:47 PM, santosh wrote: > Hi David, > > > On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- > Glad that ifdef clutter removed. > Few nits though.. > >> lib/librte_power/channel_commands.h | 42 +++++++++++++++++++++++++++++++++++++ >> 1 file changed, 42 insertions(+) >> >> diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h >> index 484085b..020d9fe 100644 >> --- a/lib/librte_power/channel_commands.h >> +++ b/lib/librte_power/channel_commands.h >> @@ -46,6 +46,7 @@ extern "C" { >> /* Valid Commands */ >> #define CPU_POWER 1 >> #define CPU_POWER_CONNECT 2 >> +#define PKT_POLICY 3 >> >> /* CPU Power Command Scaling */ >> #define CPU_POWER_SCALE_UP 1 >> @@ -54,11 +55,52 @@ extern "C" { >> #define CPU_POWER_SCALE_MIN 4 >> #define CPU_POWER_ENABLE_TURBO 5 >> #define CPU_POWER_DISABLE_TURBO 6 >> +#define HOURS 24 >> + >> +#define MAX_VFS 10 >> + >> +#define MAX_VCPU_PER_VM 8 >> + >> +typedef enum {false, true} bool; >> + > do we really need typedef for bool; can't we simply > use bool data-type? Sure, will fix. >> +struct t_boost_status { >> + bool tbEnabled; >> +}; >> + >> +struct timer_profile { >> + int busy_hours[HOURS]; >> + int quiet_hours[HOURS]; >> + int hours_to_use_traffic_profile[HOURS]; >> +}; >> + >> +enum workload {HIGH, MEDIUM, LOW}; >> +enum policy_to_use { >> + TRAFFIC, >> + TIME, >> + WORKLOAD >> +}; >> + >> +struct traffic { >> + uint32_t min_packet_thresh; >> + uint32_t avg_max_packet_thresh; >> + uint32_t max_max_packet_thresh; >> +}; >> >> struct channel_packet { >> uint64_t resource_id; /**< core_num, device */ >> uint32_t unit; /**< scale down/up/min/max */ >> uint32_t command; /**< Power, IO, etc */ >> + char vm_name[32]; >> + > How about const char * Or in case not possible then #define RTE_xx 32 Or > use existing RTE_ for same purpose or some micro local to power lib. I'll change to use an existing RTE_xx. --snip-- Thanks, Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 2/9] lib/librte_power: add extra msg type for policies David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 15:58 ` santosh 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt ` (6 subsequent siblings) 9 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/channel_manager.c | 62 +++++++++++++++++++++++++++++ examples/vm_power_manager/channel_manager.h | 25 ++++++++++++ 2 files changed, 87 insertions(+) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index e068ae2..03fa626 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -574,6 +574,68 @@ set_channel_status(const char *vm_name, unsigned *channel_list, return num_channels_changed; } +void +get_all_vm(int *num_vm, int *num_cpu) +{ + + virNodeInfo node_info; + virDomainPtr *domptr; + uint64_t mask; + int i, ii, numVcpus[MAX_VCPUS], cpu, n_vcpus; + unsigned int jj; + const char *vm_name; + unsigned int flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; + unsigned int flag = VIR_DOMAIN_VCPU_CONFIG; + + + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); + if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); + + /* Returns number of pcpus */ + global_n_host_cpus = (unsigned int)node_info.cpus; + + /* Returns number of active domains */ + *num_vm = virConnectListAllDomains(global_vir_conn_ptr, &domptr, flags); + if (*num_vm <= 0) + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); + + for (i = 0; i < *num_vm; i++) { + + /* Get Domain Names */ + vm_name = virDomainGetName(domptr[i]); + lvm_info[i].vm_name = vm_name; + + /* Get Number of Vcpus */ + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], flag); + + /* Get Number of VCpus & VcpuPinInfo */ + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], + numVcpus[i], global_cpumaps, + global_maplen, flag); + + if ((int)n_vcpus > 0) { + *num_cpu = n_vcpus; + lvm_info[i].num_cpus = n_vcpus; + } + + /* Save pcpu in use by libvirt VMs */ + for (ii = 0; ii < n_vcpus; ii++) { + mask = 0; + for (jj = 0; jj < global_n_host_cpus; jj++) { + if (VIR_CPU_USABLE(global_cpumaps, + global_maplen, ii, jj) > 0) { + mask |= 1ULL << jj; + } + } + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { + lvm_info[i].pcpus[ii] = cpu; + } + } + } +} + int get_info_vm(const char *vm_name, struct vm_info *info) { diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h index 47c3b9c..788c1e6 100644 --- a/examples/vm_power_manager/channel_manager.h +++ b/examples/vm_power_manager/channel_manager.h @@ -66,6 +66,17 @@ struct sockaddr_un _sockaddr_un; #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) #endif +#define MAX_VMS 4 +#define MAX_VCPUS 20 + + +struct libvirt_vm_info { + const char *vm_name; + unsigned int pcpus[MAX_VCPUS]; + uint8_t num_cpus; +}; + +struct libvirt_vm_info lvm_info[MAX_VMS]; /* Communication Channel Status */ enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, CHANNEL_MGR_CHANNEL_CONNECTED, @@ -319,6 +330,20 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, */ int get_info_vm(const char *vm_name, struct vm_info *info); +/** + * Populates a table with all domains running and their physical cpu. + * All information is gathered through libvirt api. + * + * @param noVms + * modified to store number of active VMs + * + * @param noVcpus + modified to store number of vcpus active + * + * @return + * void + */ +void get_all_vm(int *noVms, int *noVcpus); #ifdef __cplusplus } #endif -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt @ 2017-10-04 15:58 ` santosh 2017-10-05 8:44 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: santosh @ 2017-10-04 15:58 UTC (permalink / raw) To: David Hunt, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi David, On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- > examples/vm_power_manager/channel_manager.c | 62 +++++++++++++++++++++++++++++ > examples/vm_power_manager/channel_manager.h | 25 ++++++++++++ > 2 files changed, 87 insertions(+) > > diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c > index e068ae2..03fa626 100644 > --- a/examples/vm_power_manager/channel_manager.c > +++ b/examples/vm_power_manager/channel_manager.c > @@ -574,6 +574,68 @@ set_channel_status(const char *vm_name, unsigned *channel_list, > return num_channels_changed; > } > > +void > +get_all_vm(int *num_vm, int *num_cpu) > +{ nits: s/*num_cpu/*num_vcpu > + > + virNodeInfo node_info; > + virDomainPtr *domptr; > + uint64_t mask; > + int i, ii, numVcpus[MAX_VCPUS], cpu, n_vcpus; > + unsigned int jj; > + const char *vm_name; > + unsigned int flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | > + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; > + unsigned int flag = VIR_DOMAIN_VCPU_CONFIG; > + nits: Perhaps add more clear name example: s/flags/conn_flags s/flag/domain_flags > + > + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); > + if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) > + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); > + Should return from here.. since node info not retrieve ops errored out. > + /* Returns number of pcpus */ > + global_n_host_cpus = (unsigned int)node_info.cpus; > + > + /* Returns number of active domains */ > + *num_vm = virConnectListAllDomains(global_vir_conn_ptr, &domptr, flags); > + if (*num_vm <= 0) > + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); > + ditto.. > + for (i = 0; i < *num_vm; i++) { > + > + /* Get Domain Names */ > + vm_name = virDomainGetName(domptr[i]); > + lvm_info[i].vm_name = vm_name; > + > + /* Get Number of Vcpus */ > + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], flag); > + > + /* Get Number of VCpus & VcpuPinInfo */ > + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], > + numVcpus[i], global_cpumaps, > + global_maplen, flag); > + > + if ((int)n_vcpus > 0) { > + *num_cpu = n_vcpus; > + lvm_info[i].num_cpus = n_vcpus; > + } > + > + /* Save pcpu in use by libvirt VMs */ > + for (ii = 0; ii < n_vcpus; ii++) { > + mask = 0; > + for (jj = 0; jj < global_n_host_cpus; jj++) { > + if (VIR_CPU_USABLE(global_cpumaps, > + global_maplen, ii, jj) > 0) { > + mask |= 1ULL << jj; > + } > + } > + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { > + lvm_info[i].pcpus[ii] = cpu; > + } > + } > + } > +} > + > int > get_info_vm(const char *vm_name, struct vm_info *info) > { > diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h > index 47c3b9c..788c1e6 100644 > --- a/examples/vm_power_manager/channel_manager.h > +++ b/examples/vm_power_manager/channel_manager.h > @@ -66,6 +66,17 @@ struct sockaddr_un _sockaddr_un; > #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) > #endif > > +#define MAX_VMS 4 > +#define MAX_VCPUS 20 > + > + > +struct libvirt_vm_info { > + const char *vm_name; > + unsigned int pcpus[MAX_VCPUS]; > + uint8_t num_cpus; > +}; > + > +struct libvirt_vm_info lvm_info[MAX_VMS]; > /* Communication Channel Status */ > enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, > CHANNEL_MGR_CHANNEL_CONNECTED, > @@ -319,6 +330,20 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, > */ > int get_info_vm(const char *vm_name, struct vm_info *info); > > +/** > + * Populates a table with all domains running and their physical cpu. > + * All information is gathered through libvirt api. > + * > + * @param noVms > + * modified to store number of active VMs > + * > + * @param noVcpus > + modified to store number of vcpus active > + * > + * @return > + * void > + */ > +void get_all_vm(int *noVms, int *noVcpus); nits: perhaps, void get_all_vm(int *num_vm, int *num_vcpu) Thanks. > #ifdef __cplusplus > } > #endif ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-10-04 15:58 ` santosh @ 2017-10-05 8:44 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-10-05 8:44 UTC (permalink / raw) To: santosh, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi Santosh, On 4/10/2017 4:58 PM, santosh wrote: > Hi David, > > > On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- >> examples/vm_power_manager/channel_manager.c | 62 +++++++++++++++++++++++++++++ >> examples/vm_power_manager/channel_manager.h | 25 ++++++++++++ >> 2 files changed, 87 insertions(+) >> >> diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c >> index e068ae2..03fa626 100644 >> --- a/examples/vm_power_manager/channel_manager.c >> +++ b/examples/vm_power_manager/channel_manager.c >> @@ -574,6 +574,68 @@ set_channel_status(const char *vm_name, unsigned *channel_list, >> return num_channels_changed; >> } >> >> +void >> +get_all_vm(int *num_vm, int *num_cpu) >> +{ > nits: > s/*num_cpu/*num_vcpu Sure. Makes it more readable. >> + >> + virNodeInfo node_info; >> + virDomainPtr *domptr; >> + uint64_t mask; >> + int i, ii, numVcpus[MAX_VCPUS], cpu, n_vcpus; >> + unsigned int jj; >> + const char *vm_name; >> + unsigned int flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | >> + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; >> + unsigned int flag = VIR_DOMAIN_VCPU_CONFIG; >> + > nits: > Perhaps add more clear name example: > s/flags/conn_flags > s/flag/domain_flags domain_flags sounds good to me. >> + >> + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); >> + if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) >> + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); >> + > Should return from here.. since node info not retrieve ops errored out. Sure. >> + /* Returns number of pcpus */ >> + global_n_host_cpus = (unsigned int)node_info.cpus; >> + >> + /* Returns number of active domains */ >> + *num_vm = virConnectListAllDomains(global_vir_conn_ptr, &domptr, flags); >> + if (*num_vm <= 0) >> + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); >> + > ditto.. Sure. >> + for (i = 0; i < *num_vm; i++) { >> + >> + /* Get Domain Names */ >> + vm_name = virDomainGetName(domptr[i]); >> + lvm_info[i].vm_name = vm_name; >> + >> + /* Get Number of Vcpus */ >> + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], flag); >> + >> + /* Get Number of VCpus & VcpuPinInfo */ >> + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], >> + numVcpus[i], global_cpumaps, >> + global_maplen, flag); >> + >> + if ((int)n_vcpus > 0) { >> + *num_cpu = n_vcpus; >> + lvm_info[i].num_cpus = n_vcpus; >> + } >> + >> + /* Save pcpu in use by libvirt VMs */ >> + for (ii = 0; ii < n_vcpus; ii++) { >> + mask = 0; >> + for (jj = 0; jj < global_n_host_cpus; jj++) { >> + if (VIR_CPU_USABLE(global_cpumaps, >> + global_maplen, ii, jj) > 0) { >> + mask |= 1ULL << jj; >> + } >> + } >> + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { >> + lvm_info[i].pcpus[ii] = cpu; >> + } >> + } >> + } >> +} >> + >> int >> get_info_vm(const char *vm_name, struct vm_info *info) >> { >> diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h >> index 47c3b9c..788c1e6 100644 >> --- a/examples/vm_power_manager/channel_manager.h >> +++ b/examples/vm_power_manager/channel_manager.h >> @@ -66,6 +66,17 @@ struct sockaddr_un _sockaddr_un; >> #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) >> #endif >> >> +#define MAX_VMS 4 >> +#define MAX_VCPUS 20 >> + >> + >> +struct libvirt_vm_info { >> + const char *vm_name; >> + unsigned int pcpus[MAX_VCPUS]; >> + uint8_t num_cpus; >> +}; >> + >> +struct libvirt_vm_info lvm_info[MAX_VMS]; >> /* Communication Channel Status */ >> enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, >> CHANNEL_MGR_CHANNEL_CONNECTED, >> @@ -319,6 +330,20 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, >> */ >> int get_info_vm(const char *vm_name, struct vm_info *info); >> >> +/** >> + * Populates a table with all domains running and their physical cpu. >> + * All information is gathered through libvirt api. >> + * >> + * @param noVms >> + * modified to store number of active VMs >> + * >> + * @param noVcpus >> + modified to store number of vcpus active >> + * >> + * @return >> + * void >> + */ >> +void get_all_vm(int *noVms, int *noVcpus); > nits: perhaps, > void > get_all_vm(int *num_vm, int *num_vcpu) > Thanks. Agreed, what you suggest is a more common naming convention. Thanks, Dave. >> #ifdef __cplusplus >> } >> #endif ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 4/9] examples/vm_power_mgr: add scale to medium freq fn 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt ` (2 preceding siblings ...) 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 16:04 ` santosh 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 5/9] examples/vm_power_mgr: add policy to channels David Hunt ` (5 subsequent siblings) 9 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/power_manager.c | 15 +++++++++++++++ examples/vm_power_manager/power_manager.h | 13 +++++++++++++ 2 files changed, 28 insertions(+) diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c index 80705f9..c021c1d 100644 --- a/examples/vm_power_manager/power_manager.c +++ b/examples/vm_power_manager/power_manager.c @@ -286,3 +286,18 @@ power_manager_disable_turbo_core(unsigned int core_num) POWER_SCALE_CORE(disable_turbo, core_num, ret); return ret; } + +int +power_manager_scale_core_med(unsigned int core_num) +{ + int ret = 0; + + if (core_num >= POWER_MGR_MAX_CPUS) + return -1; + if (!(global_enabled_cpus & (1ULL << core_num))) + return -1; + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); + ret = rte_power_set_freq(core_num, 5); + rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); + return ret; +} diff --git a/examples/vm_power_manager/power_manager.h b/examples/vm_power_manager/power_manager.h index b74d09b..b52fb4c 100644 --- a/examples/vm_power_manager/power_manager.h +++ b/examples/vm_power_manager/power_manager.h @@ -231,6 +231,19 @@ int power_manager_disable_turbo_core(unsigned int core_num); */ uint32_t power_manager_get_current_frequency(unsigned core_num); +/** + * Scale to medium frequency for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to change frequency + * + * @return + * - 1 on success. + * - 0 if frequency not changed. + * - Negative on error. + */ +int power_manager_scale_core_med(unsigned int core_num); #ifdef __cplusplus } -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 4/9] examples/vm_power_mgr: add scale to medium freq fn 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt @ 2017-10-04 16:04 ` santosh 2017-10-05 8:47 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: santosh @ 2017-10-04 16:04 UTC (permalink / raw) To: David Hunt, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi David, On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- > examples/vm_power_manager/power_manager.c | 15 +++++++++++++++ > examples/vm_power_manager/power_manager.h | 13 +++++++++++++ > 2 files changed, 28 insertions(+) > > diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c > index 80705f9..c021c1d 100644 > --- a/examples/vm_power_manager/power_manager.c > +++ b/examples/vm_power_manager/power_manager.c > @@ -286,3 +286,18 @@ power_manager_disable_turbo_core(unsigned int core_num) > POWER_SCALE_CORE(disable_turbo, core_num, ret); > return ret; > } > + > +int > +power_manager_scale_core_med(unsigned int core_num) > +{ > + int ret = 0; > + > + if (core_num >= POWER_MGR_MAX_CPUS) > + return -1; > + if (!(global_enabled_cpus & (1ULL << core_num))) > + return -1; > + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); > + ret = rte_power_set_freq(core_num, 5); nits: what is 5? also should be enum or macro. Thanks. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 4/9] examples/vm_power_mgr: add scale to medium freq fn 2017-10-04 16:04 ` santosh @ 2017-10-05 8:47 ` Hunt, David 2017-10-05 9:07 ` santosh 0 siblings, 1 reply; 105+ messages in thread From: Hunt, David @ 2017-10-05 8:47 UTC (permalink / raw) To: santosh, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi Santosh, On 4/10/2017 5:04 PM, santosh wrote: > Hi David, > > > On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- >> examples/vm_power_manager/power_manager.c | 15 +++++++++++++++ >> examples/vm_power_manager/power_manager.h | 13 +++++++++++++ >> 2 files changed, 28 insertions(+) >> >> diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c >> index 80705f9..c021c1d 100644 >> --- a/examples/vm_power_manager/power_manager.c >> +++ b/examples/vm_power_manager/power_manager.c >> @@ -286,3 +286,18 @@ power_manager_disable_turbo_core(unsigned int core_num) >> POWER_SCALE_CORE(disable_turbo, core_num, ret); >> return ret; >> } >> + >> +int >> +power_manager_scale_core_med(unsigned int core_num) >> +{ >> + int ret = 0; >> + >> + if (core_num >= POWER_MGR_MAX_CPUS) >> + return -1; >> + if (!(global_enabled_cpus & (1ULL << core_num))) >> + return -1; >> + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); >> + ret = rte_power_set_freq(core_num, 5); > nits: > what is 5? also should be enum or macro. > > Thanks. > This probably shouldn't be hard-coded. The intention is to select a middle frequency. I can add a helper function to get the value that is halfway between min and max, and use that instead. Thanks, Dave. ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 4/9] examples/vm_power_mgr: add scale to medium freq fn 2017-10-05 8:47 ` Hunt, David @ 2017-10-05 9:07 ` santosh 0 siblings, 0 replies; 105+ messages in thread From: santosh @ 2017-10-05 9:07 UTC (permalink / raw) To: Hunt, David, dev Cc: konstantin.ananyev, jingjing.wu, Nemanja Marjanovic, Rory Sexton Hi David, On Thursday 05 October 2017 02:17 PM, Hunt, David wrote: > Hi Santosh, > > > On 4/10/2017 5:04 PM, santosh wrote: >> Hi David, >> >> >> On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: >>> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >>> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >>> Signed-off-by: David Hunt <david.hunt@intel.com> >>> --- >>> examples/vm_power_manager/power_manager.c | 15 +++++++++++++++ >>> examples/vm_power_manager/power_manager.h | 13 +++++++++++++ >>> 2 files changed, 28 insertions(+) >>> >>> diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c >>> index 80705f9..c021c1d 100644 >>> --- a/examples/vm_power_manager/power_manager.c >>> +++ b/examples/vm_power_manager/power_manager.c >>> @@ -286,3 +286,18 @@ power_manager_disable_turbo_core(unsigned int core_num) >>> POWER_SCALE_CORE(disable_turbo, core_num, ret); >>> return ret; >>> } >>> + >>> +int >>> +power_manager_scale_core_med(unsigned int core_num) >>> +{ >>> + int ret = 0; >>> + >>> + if (core_num >= POWER_MGR_MAX_CPUS) >>> + return -1; >>> + if (!(global_enabled_cpus & (1ULL << core_num))) >>> + return -1; >>> + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); >>> + ret = rte_power_set_freq(core_num, 5); >> nits: >> what is 5? also should be enum or macro. >> >> Thanks. >> > > This probably shouldn't be hard-coded. The intention is to select a middle frequency. I can add a helper function to get the value > that is halfway between min and max, and use that instead. > I'm ok with your proposition. Thanks. > Thanks, > Dave. > > > ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 5/9] examples/vm_power_mgr: add policy to channels 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt ` (3 preceding siblings ...) 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 6/9] examples/vm_power_mgr: add port initialisation David Hunt ` (4 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/Makefile | 16 ++ examples/vm_power_manager/channel_monitor.c | 321 +++++++++++++++++++++++++++- examples/vm_power_manager/channel_monitor.h | 18 ++ 3 files changed, 348 insertions(+), 7 deletions(-) diff --git a/examples/vm_power_manager/Makefile b/examples/vm_power_manager/Makefile index 59a9641..9cf20a2 100644 --- a/examples/vm_power_manager/Makefile +++ b/examples/vm_power_manager/Makefile @@ -54,6 +54,22 @@ CFLAGS += $(WERROR_FLAGS) LDLIBS += -lvirt +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) + +ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y) +LDLIBS += -lrte_pmd_ixgbe +endif + +ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y) +LDLIBS += -lrte_pmd_i40e +endif + +ifeq ($(CONFIG_RTE_LIBRTE_BNXT_PMD),y) +LDLIBS += -lrte_pmd_bnxt +endif + +endif + # workaround for a gcc bug with noreturn attribute # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index ac40dac..f16358d 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -41,13 +41,17 @@ #include <sys/types.h> #include <sys/epoll.h> #include <sys/queue.h> +#include <sys/time.h> #include <rte_log.h> #include <rte_memory.h> #include <rte_malloc.h> #include <rte_atomic.h> +#include <rte_cycles.h> +#include <rte_ethdev.h> +#include <rte_pmd_i40e.h> - +#include <libvirt/libvirt.h> #include "channel_monitor.h" #include "channel_commands.h" #include "channel_manager.h" @@ -57,10 +61,15 @@ #define MAX_EVENTS 256 +uint64_t vsi_pkt_count_prev[384]; +uint64_t rdtsc_prev[384]; +double time_period_s = 1; static volatile unsigned run_loop = 1; static int global_event_fd; +static unsigned int policy_is_set; static struct epoll_event *global_events_list; +static struct policy policies[MAX_VMS]; void channel_monitor_exit(void) { @@ -68,6 +77,286 @@ void channel_monitor_exit(void) rte_free(global_events_list); } +static void +core_share(int pNo, int z, int x, int t) +{ + if (policies[pNo].core_share[z].pcpu == lvm_info[x].pcpus[t]) { + if (strcmp(policies[pNo].pkt.vm_name, + lvm_info[x].vm_name) != 0) { + policies[pNo].core_share[z].status = 1; + power_manager_scale_core_max( + policies[pNo].core_share[z].pcpu); + } + } +} + +static void +core_share_status(int pNo) +{ + + int noVms, noVcpus, z, x, t; + + get_all_vm(&noVms, &noVcpus); + + /* Reset Core Share Status. */ + for (z = 0; z < noVcpus; z++) + policies[pNo].core_share[z].status = 0; + + /* Foreach vcpu in a policy. */ + for (z = 0; z < policies[pNo].pkt.num_vcpu; z++) { + /* Foreach VM on the platform. */ + for (x = 0; x < noVms; x++) { + /* Foreach vcpu of VMs on platform. */ + for (t = 0; t < lvm_info[x].num_cpus; t++) + core_share(pNo, z, x, t); + } + } +} + +static void +get_pcpu_to_control(struct policy *pol) +{ + + /* Convert vcpu to pcpu. */ + struct vm_info info; + int pcpu, count; + uint64_t mask_u64b; + + RTE_LOG(INFO, CHANNEL_MONITOR, "Looking for pcpu for %s\n", + pol->pkt.vm_name); + get_info_vm(pol->pkt.vm_name, &info); + + for (count = 0; count < pol->pkt.num_vcpu; count++) { + mask_u64b = info.pcpu_mask[pol->pkt.vcpu_to_control[count]]; + for (pcpu = 0; mask_u64b; mask_u64b &= ~(1ULL << pcpu++)) { + if ((mask_u64b >> pcpu) & 1) + pol->core_share[count].pcpu = pcpu; + } + } +} + +static int +get_pfid(struct policy *pol) +{ + + int i, x, ret = 0, nb_ports; + + nb_ports = rte_eth_dev_count(); + for (i = 0; i < pol->pkt.nb_mac_to_monitor; i++) { + + for (x = 0; x < nb_ports; x++) { + ret = rte_pmd_i40e_query_vfid_by_mac(x, + (struct ether_addr *)&(pol->pkt.vfid[i])); + if (ret != -EINVAL) { + pol->port[i] = x; + break; + } + } + if (ret == -EINVAL || ret == -ENOTSUP || ret == ENODEV) { + RTE_LOG(INFO, CHANNEL_MONITOR, + "Error with Policy. MAC not found on " + "attached ports "); + pol->enabled = 0; + return ret; + } + pol->pfid[i] = ret; + } + return 1; +} + +static int +update_policy(struct channel_packet *pkt) +{ + + unsigned int updated = 0; + + for (int i = 0; i < MAX_VMS; i++) { + if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) { + updated = 1; + break; + } + core_share_status(i); + policies[i].enabled = 1; + updated = 1; + } + } + if (!updated) { + for (int i = 0; i < MAX_VMS; i++) { + if (policies[i].enabled == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) + break; + core_share_status(i); + policies[i].enabled = 1; + break; + } + } + } + return 0; +} + +static uint64_t +get_pkt_diff(struct policy *pol) +{ + + uint64_t vsi_pkt_count, + vsi_pkt_total = 0, + vsi_pkt_count_prev_total = 0; + double rdtsc_curr, rdtsc_diff, diff; + int x; + struct rte_eth_stats vf_stats; + + for (x = 0; x < pol->pkt.nb_mac_to_monitor; x++) { + + /*Read vsi stats*/ + if (rte_pmd_i40e_get_vf_stats(x, pol->pfid[x], &vf_stats) == 0) + vsi_pkt_count = vf_stats.ipackets; + else + vsi_pkt_count = -1; + + vsi_pkt_total += vsi_pkt_count; + + vsi_pkt_count_prev_total += vsi_pkt_count_prev[pol->pfid[x]]; + vsi_pkt_count_prev[pol->pfid[x]] = vsi_pkt_count; + } + + rdtsc_curr = rte_rdtsc_precise(); + rdtsc_diff = rdtsc_curr - rdtsc_prev[pol->pfid[x-1]]; + rdtsc_prev[pol->pfid[x-1]] = rdtsc_curr; + + diff = (vsi_pkt_total - vsi_pkt_count_prev_total) * + ((double)rte_get_tsc_hz() / rdtsc_diff); + + return diff; +} + +static void +apply_traffic_profile(struct policy *pol) +{ + + int count; + uint64_t diff = 0; + + diff = get_pkt_diff(pol); + + RTE_LOG(INFO, CHANNEL_MONITOR, "Applying traffic profile\n"); + + if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (diff >= (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (diff < (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_time_profile(struct policy *pol) +{ + + int count, x; + struct timeval tv; + struct tm *ptm; + char time_string[40]; + + /* Obtain the time of day, and convert it to a tm struct. */ + gettimeofday(&tv, NULL); + ptm = localtime(&tv.tv_sec); + /* Format the date and time, down to a single second. */ + strftime(time_string, sizeof(time_string), "%Y-%m-%d %H:%M:%S", ptm); + + for (x = 0; x < HOURS; x++) { + + if (ptm->tm_hour == pol->pkt.timer_policy.busy_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_max( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling up core %d to max\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.quiet_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_min( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling down core %d to min\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.hours_to_use_traffic_profile[x]) { + apply_traffic_profile(pol); + break; + } + } +} + +static void +apply_workload_profile(struct policy *pol) +{ + + int count; + + if (pol->pkt.workload == HIGH) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == MEDIUM) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == LOW) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_policy(struct policy *pol) +{ + + struct channel_packet *pkt = &pol->pkt; + + /*Check policy to use*/ + if (pkt->policy_to_use == TRAFFIC) + apply_traffic_profile(pol); + else if (pkt->policy_to_use == TIME) + apply_time_profile(pol); + else if (pkt->policy_to_use == WORKLOAD) + apply_workload_profile(pol); +} + + static int process_request(struct channel_packet *pkt, struct channel_info *chan_info) { @@ -140,6 +429,13 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) } } + + if (pkt->command == PKT_POLICY) { + RTE_LOG(INFO, CHANNEL_MONITOR, "\nProcessing Policy request from Guest\n"); + update_policy(pkt); + policy_is_set = 1; + } + /* Return is not checked as channel status may have been set to DISABLED * from management thread */ @@ -209,9 +505,10 @@ run_channel_monitor(void) struct channel_info *chan_info = (struct channel_info *) global_events_list[i].data.ptr; if ((global_events_list[i].events & EPOLLERR) || - (global_events_list[i].events & EPOLLHUP)) { + (global_events_list[i].events & EPOLLHUP)) { RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed connection for " - "channel '%s'\n", chan_info->channel_path); + "channel '%s'\n", + chan_info->channel_path); remove_channel(&chan_info); continue; } @@ -223,14 +520,17 @@ run_channel_monitor(void) int buffer_len = sizeof(pkt); while (buffer_len > 0) { - n_bytes = read(chan_info->fd, buffer, buffer_len); + n_bytes = read(chan_info->fd, + buffer, buffer_len); if (n_bytes == buffer_len) break; if (n_bytes == -1) { err = errno; - RTE_LOG(DEBUG, CHANNEL_MONITOR, "Received error on " - "channel '%s' read: %s\n", - chan_info->channel_path, strerror(err)); + RTE_LOG(DEBUG, CHANNEL_MONITOR, + "Received error on " + "channel '%s' read: %s\n", + chan_info->channel_path, + strerror(err)); remove_channel(&chan_info); break; } @@ -241,5 +541,12 @@ run_channel_monitor(void) process_request(&pkt, chan_info); } } + rte_delay_us(time_period_s*1000000); + if (policy_is_set) { + for (int j = 0; j < MAX_VMS; j++) { + if (policies[j].enabled == 1) + apply_policy(&policies[j]); + } + } } } diff --git a/examples/vm_power_manager/channel_monitor.h b/examples/vm_power_manager/channel_monitor.h index c138607..b52c1fc 100644 --- a/examples/vm_power_manager/channel_monitor.h +++ b/examples/vm_power_manager/channel_monitor.h @@ -35,6 +35,24 @@ #define CHANNEL_MONITOR_H_ #include "channel_manager.h" +#include "channel_commands.h" + +struct core_share { + unsigned int pcpu; + /* + * 1 CORE SHARE + * 0 NOT SHARED + */ + int status; +}; + +struct policy { + struct channel_packet pkt; + uint32_t pfid[MAX_VFS]; + uint32_t port[MAX_VFS]; + unsigned int enabled; + struct core_share core_share[MAX_VCPU_PER_VM]; +}; #ifdef __cplusplus extern "C" { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 6/9] examples/vm_power_mgr: add port initialisation 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt ` (4 preceding siblings ...) 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 5/9] examples/vm_power_mgr: add policy to channels David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 7/9] power: add send channel msg function to map file David Hunt ` (3 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, David Hunt, Nemanja Marjanovic We need to initialise the port's we're monitoring to be able to see the throughput. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 220 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index c33fcc9..698abca 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -49,6 +49,9 @@ #include <rte_log.h> #include <rte_per_lcore.h> #include <rte_lcore.h> +#include <rte_ethdev.h> +#include <getopt.h> +#include <rte_cycles.h> #include <rte_debug.h> #include "channel_manager.h" @@ -56,6 +59,192 @@ #include "power_manager.h" #include "vm_power_cli.h" +#define RX_RING_SIZE 512 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static uint32_t enabled_port_mask; +static volatile bool force_quit; + +/****************/ +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned int)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + + return 0; +} + +static int +parse_portmask(const char *portmask) +{ + char *end = NULL; + unsigned long pm; + + /* parse hexadecimal string */ + pm = strtoul(portmask, &end, 16); + if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + + if (pm == 0) + return -1; + + return pm; +} +/* Parse the argument given in the command line of the application */ +static int +parse_args(int argc, char **argv) +{ + int opt, ret; + char **argvopt; + int option_index; + char *prgname = argv[0]; + static struct option lgopts[] = { + { "mac-updating", no_argument, 0, 1}, + { "no-mac-updating", no_argument, 0, 0}, + {NULL, 0, 0, 0} + }; + argvopt = argv; + + while ((opt = getopt_long(argc, argvopt, "p:q:T:", + lgopts, &option_index)) != EOF) { + + switch (opt) { + /* portmask */ + case 'p': + enabled_port_mask = parse_portmask(optarg); + if (enabled_port_mask == 0) { + printf("invalid portmask\n"); + return -1; + } + break; + /* long options */ + case 0: + break; + + default: + return -1; + } + } + + if (optind >= 0) + argv[optind-1] = prgname; + + ret = optind-1; + optind = 0; /* reset getopt lib */ + return ret; +} + +static void +check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) +{ +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ + uint8_t portid, count, all_ports_up, print_flag = 0; + struct rte_eth_link link; + + printf("\nChecking link status"); + fflush(stdout); + for (count = 0; count <= MAX_CHECK_TIME; count++) { + if (force_quit) + return; + all_ports_up = 1; + for (portid = 0; portid < port_num; portid++) { + if (force_quit) + return; + if ((port_mask & (1 << portid)) == 0) + continue; + memset(&link, 0, sizeof(link)); + rte_eth_link_get_nowait(portid, &link); + /* print link status if flag set */ + if (print_flag == 1) { + if (link.link_status) + printf("Port %d Link Up - speed %u " + "Mbps - %s\n", (uint8_t)portid, + (unsigned int)link.link_speed, + (link.link_duplex == ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex\n")); + else + printf("Port %d Link Down\n", + (uint8_t)portid); + continue; + } + /* clear all_ports_up flag if any link down */ + if (link.link_status == ETH_LINK_DOWN) { + all_ports_up = 0; + break; + } + } + /* after finally printing all link status, get out */ + if (print_flag == 1) + break; + + if (all_ports_up == 0) { + printf("."); + fflush(stdout); + rte_delay_ms(CHECK_INTERVAL); + } + + /* set the print_flag if all ports up or timeout */ + if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) { + print_flag = 1; + printf("done\n"); + } + } +} static int run_monitor(__attribute__((unused)) void *arg) { @@ -82,6 +271,10 @@ main(int argc, char **argv) { int ret; unsigned lcore_id; + unsigned int nb_ports; + struct rte_mempool *mbuf_pool; + uint8_t portid; + ret = rte_eal_init(argc, argv); if (ret < 0) @@ -90,12 +283,39 @@ main(int argc, char **argv) signal(SIGINT, sig_handler); signal(SIGTERM, sig_handler); + argc -= ret; + argv += ret; + + /* parse application arguments (after the EAL ones) */ + ret = parse_args(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid arguments\n"); + + nb_ports = rte_eth_dev_count(); + + mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports, + MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize ports. */ + for (portid = 0; portid < nb_ports; portid++) { + if ((enabled_port_mask & (1 << portid)) == 0) + continue; + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + } + lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { RTE_LOG(ERR, EAL, "A minimum of two cores are required to run " "application\n"); return 0; } + + check_all_ports_link_status(nb_ports, enabled_port_mask); rte_eal_remote_launch(run_monitor, NULL, lcore_id); if (power_manager_init() < 0) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 7/9] power: add send channel msg function to map file 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt ` (5 preceding siblings ...) 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 6/9] examples/vm_power_mgr: add port initialisation David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 16:20 ` santosh 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 8/9] examples/guest_cli: add send policy to host David Hunt ` (2 subsequent siblings) 9 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, David Hunt Adding new wrapper function to existing private (but unused 'till now) function with an rte_power_ prefix. The plan is to clean up all the header files in the next release so that only the intended public functions are in the map file and only the relevant headers have the rte_ prefix so that only they are included in the documentation. Signed-off-by: David Hunt <david.hunt@intel.com> --- lib/librte_power/guest_channel.c | 7 +++++++ lib/librte_power/guest_channel.h | 15 +++++++++++++++ lib/librte_power/rte_power_version.map | 1 + 3 files changed, 23 insertions(+) diff --git a/lib/librte_power/guest_channel.c b/lib/librte_power/guest_channel.c index 85c92fa..fa5de0f 100644 --- a/lib/librte_power/guest_channel.c +++ b/lib/librte_power/guest_channel.c @@ -148,6 +148,13 @@ guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id) return 0; } +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id) +{ + return guest_channel_send_msg(pkt, lcore_id); +} + + void guest_channel_host_disconnect(unsigned lcore_id) { diff --git a/lib/librte_power/guest_channel.h b/lib/librte_power/guest_channel.h index 9e18af5..741339c 100644 --- a/lib/librte_power/guest_channel.h +++ b/lib/librte_power/guest_channel.h @@ -81,6 +81,21 @@ void guest_channel_host_disconnect(unsigned lcore_id); */ int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); +/** + * Send a message contained in pkt over the Virtio-Serial to the host endpoint. + * + * @param pkt + * Pointer to a populated struct channel_packet + * + * @param lcore_id + * lcore_id. + * + * @return + * - 0 on success. + * - Negative on error. + */ +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id); #ifdef __cplusplus } diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map index 9ae0627..96dc42e 100644 --- a/lib/librte_power/rte_power_version.map +++ b/lib/librte_power/rte_power_version.map @@ -20,6 +20,7 @@ DPDK_2.0 { DPDK_17.11 { global: + rte_power_guest_channel_send_msg; rte_power_freq_disable_turbo; rte_power_freq_enable_turbo; rte_power_turbo_status; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v5 7/9] power: add send channel msg function to map file 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 7/9] power: add send channel msg function to map file David Hunt @ 2017-10-04 16:20 ` santosh 0 siblings, 0 replies; 105+ messages in thread From: santosh @ 2017-10-04 16:20 UTC (permalink / raw) To: David Hunt, dev; +Cc: konstantin.ananyev, jingjing.wu Hi David, On Wednesday 04 October 2017 08:55 PM, David Hunt wrote: > Adding new wrapper function to existing private (but unused 'till now) > function with an rte_power_ prefix. > > The plan is to clean up all the header files in the next release so > that only the intended public functions are in the map file and only > the relevant headers have the rte_ prefix so that only they are > included in the documentation. > > Signed-off-by: David Hunt <david.hunt@intel.com> > --- lgtm: Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Thanks. ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 8/9] examples/guest_cli: add send policy to host 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt ` (6 preceding siblings ...) 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 7/9] power: add send channel msg function to map file David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Here we're adding an example of setting up a policy, and allowing the vm_cli_guest app to send it to the host using the cli command "send_policy now" Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- .../guest_cli/vm_power_cli_guest.c | 97 ++++++++++++++++++++++ .../guest_cli/vm_power_cli_guest.h | 6 -- 2 files changed, 97 insertions(+), 6 deletions(-) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 4e982bd..dc9efc2 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -45,8 +45,10 @@ #include <cmdline.h> #include <rte_log.h> #include <rte_lcore.h> +#include <rte_ethdev.h> #include <rte_power.h> +#include <guest_channel.h> #include "vm_power_cli_guest.h" @@ -139,8 +141,103 @@ cmdline_parse_inst_t cmd_set_cpu_freq_set = { }, }; +struct cmd_send_policy_result { + cmdline_fixed_string_t send_policy; + cmdline_fixed_string_t cmd; +}; + +union PFID { + struct ether_addr addr; + uint64_t pfid; +}; + +static inline int +send_policy(void) +{ + struct channel_packet pkt; + int ret; + + union PFID pfid; + /* Use port MAC address as the vfid */ + rte_eth_macaddr_get(0, &pfid.addr); + printf("Port %u MAC: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":" + "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n", + 1, + pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1], + pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3], + pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]); + pkt.vfid[0] = pfid.pfid; + + pkt.nb_mac_to_monitor = 1; + pkt.t_boost_status.tbEnabled = false; + + pkt.vcpu_to_control[0] = 0; + pkt.vcpu_to_control[1] = 1; + pkt.num_vcpu = 2; + /* Dummy Population. */ + pkt.traffic_policy.min_packet_thresh = 96000; + pkt.traffic_policy.avg_max_packet_thresh = 1800000; + pkt.traffic_policy.max_max_packet_thresh = 2000000; + + pkt.timer_policy.busy_hours[0] = 3; + pkt.timer_policy.busy_hours[1] = 4; + pkt.timer_policy.busy_hours[2] = 5; + pkt.timer_policy.quiet_hours[0] = 11; + pkt.timer_policy.quiet_hours[1] = 12; + pkt.timer_policy.quiet_hours[2] = 13; + + pkt.timer_policy.hours_to_use_traffic_profile[0] = 8; + pkt.timer_policy.hours_to_use_traffic_profile[1] = 10; + + pkt.workload = LOW; + pkt.policy_to_use = TIME; + pkt.command = PKT_POLICY; + strcpy(pkt.vm_name, "ubuntu2"); + ret = rte_power_guest_channel_send_msg(&pkt, 1); + if (ret == 0) + return 1; + RTE_LOG(DEBUG, POWER, "Error sending message: %s\n", + ret > 0 ? strerror(ret) : "channel not connected"); + return -1; +} + +static void +cmd_send_policy_parsed(void *parsed_result, struct cmdline *cl, + __attribute__((unused)) void *data) +{ + int ret = -1; + struct cmd_send_policy_result *res = parsed_result; + + if (!strcmp(res->cmd, "now")) { + printf("Sending Policy down now!\n"); + ret = send_policy(); + } + if (ret != 1) + cmdline_printf(cl, "Error sending message: %s\n", + strerror(ret)); +} + +cmdline_parse_token_string_t cmd_send_policy = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + send_policy, "send_policy"); +cmdline_parse_token_string_t cmd_send_policy_cmd_cmd = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + cmd, "now"); + +cmdline_parse_inst_t cmd_send_policy_set = { + .f = cmd_send_policy_parsed, + .data = NULL, + .help_str = "send_policy now", + .tokens = { + (void *)&cmd_send_policy, + (void *)&cmd_send_policy_cmd_cmd, + NULL, + }, +}; + cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_quit, + (cmdline_parse_inst_t *)&cmd_send_policy_set, (cmdline_parse_inst_t *)&cmd_set_cpu_freq_set, NULL, }; diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h index 0c4bdd5..277eab3 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h @@ -40,12 +40,6 @@ extern "C" { #include "channel_commands.h" -int guest_channel_host_connect(unsigned lcore_id); - -int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); - -void guest_channel_host_disconnect(unsigned lcore_id); - void run_cli(__attribute__((unused)) void *arg); #ifdef __cplusplus -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v5 9/9] examples/vm_power_mgr: set MAC address of VF 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt ` (7 preceding siblings ...) 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 8/9] examples/guest_cli: add send policy to host David Hunt @ 2017-10-04 15:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-04 15:25 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, David Hunt We need to set vf mac from the host, so that they will be in sync on the guest and the host. Otherwise, we'll have a random mac on the guest, and a 00:00:00:00:00:00 mac on the host. Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 43 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index 698abca..5147789 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -58,6 +58,9 @@ #include "channel_monitor.h" #include "power_manager.h" #include "vm_power_cli.h" +#include <rte_pmd_ixgbe.h> +#include <rte_pmd_i40e.h> +#include <rte_pmd_bnxt.h> #define RX_RING_SIZE 512 #define TX_RING_SIZE 512 @@ -222,7 +225,7 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) (uint8_t)portid); continue; } - /* clear all_ports_up flag if any link down */ + /* clear all_ports_up flag if any link down */ if (link.link_status == ETH_LINK_DOWN) { all_ports_up = 0; break; @@ -301,11 +304,49 @@ main(int argc, char **argv) /* Initialize ports. */ for (portid = 0; portid < nb_ports; portid++) { + struct ether_addr eth; + int w, j; + int ret = -ENOTSUP; + if ((enabled_port_mask & (1 << portid)) == 0) continue; + + eth.addr_bytes[0] = 0xe0; + eth.addr_bytes[1] = 0xe0; + eth.addr_bytes[2] = 0xe0; + eth.addr_bytes[3] = 0xe0; + eth.addr_bytes[4] = portid + 0xf0; + if (port_init(portid, mbuf_pool) != 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", portid); + + for (w = 0; w < MAX_VFS; w++) { + eth.addr_bytes[5] = w + 0xf0; + + if (ret == -ENOTSUP) + ret = rte_pmd_ixgbe_set_vf_mac_addr(portid, + w, ð); + if (ret == -ENOTSUP) + ret = rte_pmd_i40e_set_vf_mac_addr(portid, + w, ð); + if (ret == -ENOTSUP) + ret = rte_pmd_bnxt_set_vf_mac_addr(portid, + w, ð); + + switch (ret) { + case 0: + printf("Port %d VF %d MAC: ", + portid, w); + for (j = 0; j < 6; j++) { + printf("%02x", eth.addr_bytes[j]); + if (j < 5) + printf(":"); + } + printf("\n"); + break; + } + } } lcore_id = rte_get_next_lcore(-1, 1, 0); -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt ` (8 preceding siblings ...) 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt ` (9 more replies) 9 siblings, 10 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla Policy Based Power Control for Guest This patchset adds the facility for a guest VM to send a policy down to the host that will allow the host to scale up/down cpu frequencies depending on the policy criteria independently of the DPDK app running in the guest. This differs from the previous vm_power implementation where individual scale up/down requests were send from the guest to the host via virtio-serial. V6 patchset changes: * Fixed comments in header for rte_pmd_i40e_query_vfid_by_mac. * changed rte_pmd_i40e_query_vfid_by_mac return code from uint to int as it can return negative error codes. * Removed bool enum from channel_commands.h, including stdbool.h instead. * Added #define VM_MAX_NAME_SZ 32 to channel_commands.h * Renamed a few variables to be more readable. * Added returns in a few places if failed to get info on domain. * Fixed power_manager_init to keep track of num_freqs for each core. * In power_manager_scale_core_med(), changed a hardcoded '5' to instead be calculated from the centre of the frequency list (global_core_freq_info[core_num].num_freqs / 2) V5 patchset changes: * Removed most of the #ifdef I40_PMD as it will be applicable to other PMDs in the future. * Changed the parameter of rte_pmd_i40e_query_vfid_by_mac from a uint64 to a const struct ether_addr *, rather than casting it later in the function. V4 patchset changes: * None, re-post to mailing list under the correct email thread. V3 patchset changes: * Changed to using is_same_ether_addr() instead of looping through the mac address bytes to compare them. * Tweaked some comments and working in the i40e patch after review. * Added a patch to the set to add new i40e function to map file, so as to allow shared library builds. The power library API needs a cleanup in next release, so will add API/ABI warning for this cleanup in a separate patch. V2 patchset changes: * Removed API's in ethdev layer. * Now just a single new API in the i40e driver for mapping VF MAC to VF index. * Moved new function from rte_rxtx.c to rte_pmd_i40e.c * Removed function for reading i40e register, moved to using the standard stats API. * Renamed i40e function to rte_pmd_i40e_query_vfid_by_mac * Cleaned up policy generation code. It's a modification of the vm_power_manager app that runs in the host, and the guest_vm_power_app example app that runs in the guest. This allows the guest to send down a policy to the host via virtio-serial, which then allows the host to scale up/down based on the criteria in the policy, resulting in quicker scale up/down than individual requests coming from the guest. It also means that the DPDK application running in the guest does not need to be modified in any way, it is unaware that it's cores are being scaled up/down, reducing the effort in implementing a power-aware infrastructure. The usage model is as follows: 1. Set up the VF's and assign to the guest in the usual way. 2. run vm_power_manager on the host, creating a channel to the guest. 3. Start the guest_vm_power_mgr app on the guest, which establishes a virtio-serial channel to the host. 4. Send down the profile for the guest using the "send_profile now" command. There is an example profile hard-coded into guest_vm_power_mgr. 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. 6. Send traffic into the VFs at varying traffic rates. Observe the frequency change on the host (turbostat -i 1) The sequence of code changes are as follows: A new function has been aded to the i40e driver to allow mapping of a VF MAC to VF index. Next we make an addition to librte_power that adds an extra command to allow the passing of a policy structure from the guest to the host. This struct contains information like busy/quiet hour, packet throughput thresholds, etc. The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to physical CPU (pcpu) IDs so that the host can scale up/down the cores used in the guest. The remaining patches are functionality to process the policy, and take action when the relevant trigger occurs to cause a frequency change. [1/9] net/i40e: add API to convert VF MAC to VF id [2/9] lib/librte_power: add extra msg type for policies [3/9] examples/vm_power_mgr: add vcpu to pcpu mapping [4/9] examples/vm_power_mgr: add scale to medium freq fn [5/9] examples/vm_power_mgr: add policy to channels [6/9] examples/vm_power_mgr: add port initialisation [7/9] power: add send channel msg function to map file [8/9] examples/guest_cli: add send policy to host [9/9] examples/vm_power_mgr: set MAC address of VF ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:45 ` Ananyev, Konstantin 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 2/9] lib/librte_power: add extra msg type for policies David Hunt ` (8 subsequent siblings) 9 siblings, 1 reply; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Need a way to convert a vf id to a pf id on the host so as to query the pf for relevant statistics which are used for the frequency changes in the vm_power_manager app. Used when profiles are passed down from the guest to the host, allowing the host to map the vfs to pfs. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- drivers/net/i40e/rte_pmd_i40e.c | 30 ++++++++++++++++++++++++++++++ drivers/net/i40e/rte_pmd_i40e.h | 15 +++++++++++++++ drivers/net/i40e/rte_pmd_i40e_version.map | 7 +++++++ 3 files changed, 52 insertions(+) diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c index f12b7f4..541d575 100644 --- a/drivers/net/i40e/rte_pmd_i40e.c +++ b/drivers/net/i40e/rte_pmd_i40e.c @@ -2115,3 +2115,33 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, return 0; } + +int64_t +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, const struct ether_addr *vf_mac) +{ + struct rte_eth_dev *dev; + struct ether_addr *mac; + struct i40e_pf *pf; + int vf_id; + struct i40e_pf_vf *vf; + uint16_t vf_num; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); + dev = &rte_eth_devices[port]; + + if (!is_i40e_supported(dev)) + return -ENOTSUP; + + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); + vf_num = pf->vf_num; + + for (vf_id = 0; vf_id < vf_num; vf_id++) { + vf = &pf->vfs[vf_id]; + mac = &vf->mac_addr; + + if (is_same_ether_addr(mac, vf_mac)) + return vf_id; + } + + return -EINVAL; +} diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h index 356fa89..2952ab0 100644 --- a/drivers/net/i40e/rte_pmd_i40e.h +++ b/drivers/net/i40e/rte_pmd_i40e.h @@ -637,4 +637,19 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, uint8_t mask, uint32_t pkt_type); +/** + * On the PF, find VF index based on VF MAC address + * + * @param port + * pointer to port identifier of the device + * @param vf_mac + * the mac address of the vf to determine index of + * @return + * The index of vfid If successful. + * -EINVAL: vf mac address does not exist for this port + * -ENOTSUP: i40e not supported for this port. + */ +int64_t rte_pmd_i40e_query_vfid_by_mac(uint8_t port, + const struct ether_addr *vf_mac); + #endif /* _PMD_I40E_H_ */ diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map b/drivers/net/i40e/rte_pmd_i40e_version.map index 20cc980..d8b74bd 100644 --- a/drivers/net/i40e/rte_pmd_i40e_version.map +++ b/drivers/net/i40e/rte_pmd_i40e_version.map @@ -45,3 +45,10 @@ DPDK_17.08 { rte_pmd_i40e_get_ddp_info; } DPDK_17.05; + +DPDK_17.11 { + global: + + rte_pmd_i40e_query_vfid_by_mac; + +} DPDK_17.08; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt @ 2017-10-05 12:45 ` Ananyev, Konstantin 2017-10-05 12:51 ` Hunt, David 0 siblings, 1 reply; 105+ messages in thread From: Ananyev, Konstantin @ 2017-10-05 12:45 UTC (permalink / raw) To: Hunt, David, dev Cc: Wu, Jingjing, santosh.shukla, Sexton, Rory, Marjanovic, Nemanja Hi Dave, > -----Original Message----- > From: Hunt, David > Sent: Thursday, October 5, 2017 1:26 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; santosh.shukla@caviumnetworks.com; > Sexton, Rory <rory.sexton@intel.com>; Marjanovic, Nemanja <nemanja.marjanovic@intel.com>; Hunt, David <david.hunt@intel.com> > Subject: [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id > > From: "Sexton, Rory" <rory.sexton@intel.com> > > Need a way to convert a vf id to a pf id on the host so as to query the pf > for relevant statistics which are used for the frequency changes in the > vm_power_manager app. Used when profiles are passed down from the guest > to the host, allowing the host to map the vfs to pfs. > > Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> > Signed-off-by: Rory Sexton <rory.sexton@intel.com> > Signed-off-by: David Hunt <david.hunt@intel.com> > --- > drivers/net/i40e/rte_pmd_i40e.c | 30 ++++++++++++++++++++++++++++++ > drivers/net/i40e/rte_pmd_i40e.h | 15 +++++++++++++++ > drivers/net/i40e/rte_pmd_i40e_version.map | 7 +++++++ > 3 files changed, 52 insertions(+) > > diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c > index f12b7f4..541d575 100644 > --- a/drivers/net/i40e/rte_pmd_i40e.c > +++ b/drivers/net/i40e/rte_pmd_i40e.c > @@ -2115,3 +2115,33 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, > > return 0; > } > + > +int64_t > +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, const struct ether_addr *vf_mac) I don't think you need int64_t as a return value here. Just 'int' seems good enough. Anyway vf_id is just an 'int'. Konstantin > +{ > + struct rte_eth_dev *dev; > + struct ether_addr *mac; > + struct i40e_pf *pf; > + int vf_id; > + struct i40e_pf_vf *vf; > + uint16_t vf_num; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); > + dev = &rte_eth_devices[port]; > + > + if (!is_i40e_supported(dev)) > + return -ENOTSUP; > + > + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); > + vf_num = pf->vf_num; > + > + for (vf_id = 0; vf_id < vf_num; vf_id++) { > + vf = &pf->vfs[vf_id]; > + mac = &vf->mac_addr; > + > + if (is_same_ether_addr(mac, vf_mac)) > + return vf_id; > + } > + > + return -EINVAL; > +} > diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h > index 356fa89..2952ab0 100644 > --- a/drivers/net/i40e/rte_pmd_i40e.h > +++ b/drivers/net/i40e/rte_pmd_i40e.h > @@ -637,4 +637,19 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, > uint8_t mask, > uint32_t pkt_type); > > +/** > + * On the PF, find VF index based on VF MAC address > + * > + * @param port > + * pointer to port identifier of the device > + * @param vf_mac > + * the mac address of the vf to determine index of > + * @return > + * The index of vfid If successful. > + * -EINVAL: vf mac address does not exist for this port > + * -ENOTSUP: i40e not supported for this port. > + */ > +int64_t rte_pmd_i40e_query_vfid_by_mac(uint8_t port, > + const struct ether_addr *vf_mac); > + > #endif /* _PMD_I40E_H_ */ > diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map b/drivers/net/i40e/rte_pmd_i40e_version.map > index 20cc980..d8b74bd 100644 > --- a/drivers/net/i40e/rte_pmd_i40e_version.map > +++ b/drivers/net/i40e/rte_pmd_i40e_version.map > @@ -45,3 +45,10 @@ DPDK_17.08 { > rte_pmd_i40e_get_ddp_info; > > } DPDK_17.05; > + > +DPDK_17.11 { > + global: > + > + rte_pmd_i40e_query_vfid_by_mac; > + > +} DPDK_17.08; > -- > 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-05 12:45 ` Ananyev, Konstantin @ 2017-10-05 12:51 ` Hunt, David 0 siblings, 0 replies; 105+ messages in thread From: Hunt, David @ 2017-10-05 12:51 UTC (permalink / raw) To: Ananyev, Konstantin, dev Cc: Wu, Jingjing, santosh.shukla, Sexton, Rory, Marjanovic, Nemanja Hi Konstantin, On 5/10/2017 1:45 PM, Ananyev, Konstantin wrote: > Hi Dave, > >> -----Original Message----- >> From: Hunt, David >> Sent: Thursday, October 5, 2017 1:26 PM >> To: dev@dpdk.org >> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; santosh.shukla@caviumnetworks.com; >> Sexton, Rory <rory.sexton@intel.com>; Marjanovic, Nemanja <nemanja.marjanovic@intel.com>; Hunt, David <david.hunt@intel.com> >> Subject: [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id >> >> From: "Sexton, Rory" <rory.sexton@intel.com> >> >> Need a way to convert a vf id to a pf id on the host so as to query the pf >> for relevant statistics which are used for the frequency changes in the >> vm_power_manager app. Used when profiles are passed down from the guest >> to the host, allowing the host to map the vfs to pfs. >> >> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> >> Signed-off-by: Rory Sexton <rory.sexton@intel.com> >> Signed-off-by: David Hunt <david.hunt@intel.com> >> --- >> drivers/net/i40e/rte_pmd_i40e.c | 30 ++++++++++++++++++++++++++++++ >> drivers/net/i40e/rte_pmd_i40e.h | 15 +++++++++++++++ >> drivers/net/i40e/rte_pmd_i40e_version.map | 7 +++++++ >> 3 files changed, 52 insertions(+) >> >> diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c >> index f12b7f4..541d575 100644 >> --- a/drivers/net/i40e/rte_pmd_i40e.c >> +++ b/drivers/net/i40e/rte_pmd_i40e.c >> @@ -2115,3 +2115,33 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, >> >> return 0; >> } >> + >> +int64_t >> +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, const struct ether_addr *vf_mac) > I don't think you need int64_t as a return value here. > Just 'int' seems good enough. > Anyway vf_id is just an 'int'. > Konstantin OK. I'll push a v7 in the next couple of hours. Thanks, Dave. ---snip-- ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 2/9] lib/librte_power: add extra msg type for policies 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt ` (7 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- lib/librte_power/channel_commands.h | 42 +++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index 484085b..f0f5f0a 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -39,6 +39,7 @@ extern "C" { #endif #include <stdint.h> +#include <stdbool.h> /* Maximum number of channels per VM */ #define CHANNEL_CMDS_MAX_VM_CHANNELS 64 @@ -46,6 +47,7 @@ extern "C" { /* Valid Commands */ #define CPU_POWER 1 #define CPU_POWER_CONNECT 2 +#define PKT_POLICY 3 /* CPU Power Command Scaling */ #define CPU_POWER_SCALE_UP 1 @@ -54,11 +56,51 @@ extern "C" { #define CPU_POWER_SCALE_MIN 4 #define CPU_POWER_ENABLE_TURBO 5 #define CPU_POWER_DISABLE_TURBO 6 +#define HOURS 24 + +#define MAX_VFS 10 +#define VM_MAX_NAME_SZ 32 + +#define MAX_VCPU_PER_VM 8 + +struct t_boost_status { + bool tbEnabled; +}; + +struct timer_profile { + int busy_hours[HOURS]; + int quiet_hours[HOURS]; + int hours_to_use_traffic_profile[HOURS]; +}; + +enum workload {HIGH, MEDIUM, LOW}; +enum policy_to_use { + TRAFFIC, + TIME, + WORKLOAD +}; + +struct traffic { + uint32_t min_packet_thresh; + uint32_t avg_max_packet_thresh; + uint32_t max_max_packet_thresh; +}; struct channel_packet { uint64_t resource_id; /**< core_num, device */ uint32_t unit; /**< scale down/up/min/max */ uint32_t command; /**< Power, IO, etc */ + char vm_name[VM_MAX_NAME_SZ]; + + uint64_t vfid[MAX_VFS]; + int nb_mac_to_monitor; + struct traffic traffic_policy; + uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; + uint8_t num_vcpu; + struct timer_profile timer_policy; + enum workload workload; + enum policy_to_use policy_to_use; + struct t_boost_status t_boost_status; }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 2/9] lib/librte_power: add extra msg type for policies David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt ` (6 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/channel_manager.c | 67 +++++++++++++++++++++++++++++ examples/vm_power_manager/channel_manager.h | 25 +++++++++++ 2 files changed, 92 insertions(+) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index e068ae2..ab856bd 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -574,6 +574,73 @@ set_channel_status(const char *vm_name, unsigned *channel_list, return num_channels_changed; } +void +get_all_vm(int *num_vm, int *num_vcpu) +{ + + virNodeInfo node_info; + virDomainPtr *domptr; + uint64_t mask; + int i, ii, numVcpus[MAX_VCPUS], cpu, n_vcpus; + unsigned int jj; + const char *vm_name; + unsigned int domain_flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; + unsigned int domain_flag = VIR_DOMAIN_VCPU_CONFIG; + + + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); + if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); + return; + } + + /* Returns number of pcpus */ + global_n_host_cpus = (unsigned int)node_info.cpus; + + /* Returns number of active domains */ + *num_vm = virConnectListAllDomains(global_vir_conn_ptr, &domptr, + domain_flags); + if (*num_vm <= 0) { + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); + return; + } + + for (i = 0; i < *num_vm; i++) { + + /* Get Domain Names */ + vm_name = virDomainGetName(domptr[i]); + lvm_info[i].vm_name = vm_name; + + /* Get Number of Vcpus */ + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], domain_flag); + + /* Get Number of VCpus & VcpuPinInfo */ + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], + numVcpus[i], global_cpumaps, + global_maplen, domain_flag); + + if ((int)n_vcpus > 0) { + *num_vcpu = n_vcpus; + lvm_info[i].num_cpus = n_vcpus; + } + + /* Save pcpu in use by libvirt VMs */ + for (ii = 0; ii < n_vcpus; ii++) { + mask = 0; + for (jj = 0; jj < global_n_host_cpus; jj++) { + if (VIR_CPU_USABLE(global_cpumaps, + global_maplen, ii, jj) > 0) { + mask |= 1ULL << jj; + } + } + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { + lvm_info[i].pcpus[ii] = cpu; + } + } + } +} + int get_info_vm(const char *vm_name, struct vm_info *info) { diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h index 47c3b9c..358fb8f 100644 --- a/examples/vm_power_manager/channel_manager.h +++ b/examples/vm_power_manager/channel_manager.h @@ -66,6 +66,17 @@ struct sockaddr_un _sockaddr_un; #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) #endif +#define MAX_VMS 4 +#define MAX_VCPUS 20 + + +struct libvirt_vm_info { + const char *vm_name; + unsigned int pcpus[MAX_VCPUS]; + uint8_t num_cpus; +}; + +struct libvirt_vm_info lvm_info[MAX_VMS]; /* Communication Channel Status */ enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, CHANNEL_MGR_CHANNEL_CONNECTED, @@ -319,6 +330,20 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, */ int get_info_vm(const char *vm_name, struct vm_info *info); +/** + * Populates a table with all domains running and their physical cpu. + * All information is gathered through libvirt api. + * + * @param num_vm + * modified to store number of active VMs + * + * @param num_vcpu + modified to store number of vcpus active + * + * @return + * void + */ +void get_all_vm(int *num_vm, int *num_vcpu); #ifdef __cplusplus } #endif -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 4/9] examples/vm_power_mgr: add scale to medium freq fn 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt ` (2 preceding siblings ...) 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 5/9] examples/vm_power_mgr: add policy to channels David Hunt ` (5 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/power_manager.c | 32 ++++++++++++++++++++++++++----- examples/vm_power_manager/power_manager.h | 13 +++++++++++++ 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c index 80705f9..1834a82 100644 --- a/examples/vm_power_manager/power_manager.c +++ b/examples/vm_power_manager/power_manager.c @@ -108,7 +108,7 @@ set_host_cpus_mask(void) int power_manager_init(void) { - unsigned i, num_cpus; + unsigned int i, num_cpus, num_freqs; uint64_t cpu_mask; int ret = 0; @@ -121,15 +121,21 @@ power_manager_init(void) rte_power_set_env(PM_ENV_ACPI_CPUFREQ); cpu_mask = global_enabled_cpus; for (i = 0; cpu_mask; cpu_mask &= ~(1 << i++)) { - if (rte_power_init(i) < 0 || rte_power_freqs(i, - global_core_freq_info[i].freqs, - RTE_MAX_LCORE_FREQS) == 0) { - RTE_LOG(ERR, POWER_MANAGER, "Unable to initialize power manager " + if (rte_power_init(i) < 0) + RTE_LOG(ERR, POWER_MANAGER, + "Unable to initialize power manager " "for core %u\n", i); + num_freqs = rte_power_freqs(i, global_core_freq_info[i].freqs, + RTE_MAX_LCORE_FREQS); + if (num_freqs == 0) { + RTE_LOG(ERR, POWER_MANAGER, + "Unable to get frequency list for core %u\n", + i); global_enabled_cpus &= ~(1 << i); num_cpus--; ret = -1; } + global_core_freq_info[i].num_freqs = num_freqs; rte_spinlock_init(&global_core_freq_info[i].power_sl); } RTE_LOG(INFO, POWER_MANAGER, "Detected %u host CPUs , enabled core mask:" @@ -286,3 +292,19 @@ power_manager_disable_turbo_core(unsigned int core_num) POWER_SCALE_CORE(disable_turbo, core_num, ret); return ret; } + +int +power_manager_scale_core_med(unsigned int core_num) +{ + int ret = 0; + + if (core_num >= POWER_MGR_MAX_CPUS) + return -1; + if (!(global_enabled_cpus & (1ULL << core_num))) + return -1; + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); + ret = rte_power_set_freq(core_num, + global_core_freq_info[core_num].num_freqs / 2); + rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); + return ret; +} diff --git a/examples/vm_power_manager/power_manager.h b/examples/vm_power_manager/power_manager.h index b74d09b..b52fb4c 100644 --- a/examples/vm_power_manager/power_manager.h +++ b/examples/vm_power_manager/power_manager.h @@ -231,6 +231,19 @@ int power_manager_disable_turbo_core(unsigned int core_num); */ uint32_t power_manager_get_current_frequency(unsigned core_num); +/** + * Scale to medium frequency for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to change frequency + * + * @return + * - 1 on success. + * - 0 if frequency not changed. + * - Negative on error. + */ +int power_manager_scale_core_med(unsigned int core_num); #ifdef __cplusplus } -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 5/9] examples/vm_power_mgr: add policy to channels 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt ` (3 preceding siblings ...) 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 6/9] examples/vm_power_mgr: add port initialisation David Hunt ` (4 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/Makefile | 16 ++ examples/vm_power_manager/channel_monitor.c | 321 +++++++++++++++++++++++++++- examples/vm_power_manager/channel_monitor.h | 18 ++ 3 files changed, 348 insertions(+), 7 deletions(-) diff --git a/examples/vm_power_manager/Makefile b/examples/vm_power_manager/Makefile index 59a9641..9cf20a2 100644 --- a/examples/vm_power_manager/Makefile +++ b/examples/vm_power_manager/Makefile @@ -54,6 +54,22 @@ CFLAGS += $(WERROR_FLAGS) LDLIBS += -lvirt +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) + +ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y) +LDLIBS += -lrte_pmd_ixgbe +endif + +ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y) +LDLIBS += -lrte_pmd_i40e +endif + +ifeq ($(CONFIG_RTE_LIBRTE_BNXT_PMD),y) +LDLIBS += -lrte_pmd_bnxt +endif + +endif + # workaround for a gcc bug with noreturn attribute # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index ac40dac..f16358d 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -41,13 +41,17 @@ #include <sys/types.h> #include <sys/epoll.h> #include <sys/queue.h> +#include <sys/time.h> #include <rte_log.h> #include <rte_memory.h> #include <rte_malloc.h> #include <rte_atomic.h> +#include <rte_cycles.h> +#include <rte_ethdev.h> +#include <rte_pmd_i40e.h> - +#include <libvirt/libvirt.h> #include "channel_monitor.h" #include "channel_commands.h" #include "channel_manager.h" @@ -57,10 +61,15 @@ #define MAX_EVENTS 256 +uint64_t vsi_pkt_count_prev[384]; +uint64_t rdtsc_prev[384]; +double time_period_s = 1; static volatile unsigned run_loop = 1; static int global_event_fd; +static unsigned int policy_is_set; static struct epoll_event *global_events_list; +static struct policy policies[MAX_VMS]; void channel_monitor_exit(void) { @@ -68,6 +77,286 @@ void channel_monitor_exit(void) rte_free(global_events_list); } +static void +core_share(int pNo, int z, int x, int t) +{ + if (policies[pNo].core_share[z].pcpu == lvm_info[x].pcpus[t]) { + if (strcmp(policies[pNo].pkt.vm_name, + lvm_info[x].vm_name) != 0) { + policies[pNo].core_share[z].status = 1; + power_manager_scale_core_max( + policies[pNo].core_share[z].pcpu); + } + } +} + +static void +core_share_status(int pNo) +{ + + int noVms, noVcpus, z, x, t; + + get_all_vm(&noVms, &noVcpus); + + /* Reset Core Share Status. */ + for (z = 0; z < noVcpus; z++) + policies[pNo].core_share[z].status = 0; + + /* Foreach vcpu in a policy. */ + for (z = 0; z < policies[pNo].pkt.num_vcpu; z++) { + /* Foreach VM on the platform. */ + for (x = 0; x < noVms; x++) { + /* Foreach vcpu of VMs on platform. */ + for (t = 0; t < lvm_info[x].num_cpus; t++) + core_share(pNo, z, x, t); + } + } +} + +static void +get_pcpu_to_control(struct policy *pol) +{ + + /* Convert vcpu to pcpu. */ + struct vm_info info; + int pcpu, count; + uint64_t mask_u64b; + + RTE_LOG(INFO, CHANNEL_MONITOR, "Looking for pcpu for %s\n", + pol->pkt.vm_name); + get_info_vm(pol->pkt.vm_name, &info); + + for (count = 0; count < pol->pkt.num_vcpu; count++) { + mask_u64b = info.pcpu_mask[pol->pkt.vcpu_to_control[count]]; + for (pcpu = 0; mask_u64b; mask_u64b &= ~(1ULL << pcpu++)) { + if ((mask_u64b >> pcpu) & 1) + pol->core_share[count].pcpu = pcpu; + } + } +} + +static int +get_pfid(struct policy *pol) +{ + + int i, x, ret = 0, nb_ports; + + nb_ports = rte_eth_dev_count(); + for (i = 0; i < pol->pkt.nb_mac_to_monitor; i++) { + + for (x = 0; x < nb_ports; x++) { + ret = rte_pmd_i40e_query_vfid_by_mac(x, + (struct ether_addr *)&(pol->pkt.vfid[i])); + if (ret != -EINVAL) { + pol->port[i] = x; + break; + } + } + if (ret == -EINVAL || ret == -ENOTSUP || ret == ENODEV) { + RTE_LOG(INFO, CHANNEL_MONITOR, + "Error with Policy. MAC not found on " + "attached ports "); + pol->enabled = 0; + return ret; + } + pol->pfid[i] = ret; + } + return 1; +} + +static int +update_policy(struct channel_packet *pkt) +{ + + unsigned int updated = 0; + + for (int i = 0; i < MAX_VMS; i++) { + if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) { + updated = 1; + break; + } + core_share_status(i); + policies[i].enabled = 1; + updated = 1; + } + } + if (!updated) { + for (int i = 0; i < MAX_VMS; i++) { + if (policies[i].enabled == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) + break; + core_share_status(i); + policies[i].enabled = 1; + break; + } + } + } + return 0; +} + +static uint64_t +get_pkt_diff(struct policy *pol) +{ + + uint64_t vsi_pkt_count, + vsi_pkt_total = 0, + vsi_pkt_count_prev_total = 0; + double rdtsc_curr, rdtsc_diff, diff; + int x; + struct rte_eth_stats vf_stats; + + for (x = 0; x < pol->pkt.nb_mac_to_monitor; x++) { + + /*Read vsi stats*/ + if (rte_pmd_i40e_get_vf_stats(x, pol->pfid[x], &vf_stats) == 0) + vsi_pkt_count = vf_stats.ipackets; + else + vsi_pkt_count = -1; + + vsi_pkt_total += vsi_pkt_count; + + vsi_pkt_count_prev_total += vsi_pkt_count_prev[pol->pfid[x]]; + vsi_pkt_count_prev[pol->pfid[x]] = vsi_pkt_count; + } + + rdtsc_curr = rte_rdtsc_precise(); + rdtsc_diff = rdtsc_curr - rdtsc_prev[pol->pfid[x-1]]; + rdtsc_prev[pol->pfid[x-1]] = rdtsc_curr; + + diff = (vsi_pkt_total - vsi_pkt_count_prev_total) * + ((double)rte_get_tsc_hz() / rdtsc_diff); + + return diff; +} + +static void +apply_traffic_profile(struct policy *pol) +{ + + int count; + uint64_t diff = 0; + + diff = get_pkt_diff(pol); + + RTE_LOG(INFO, CHANNEL_MONITOR, "Applying traffic profile\n"); + + if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (diff >= (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (diff < (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_time_profile(struct policy *pol) +{ + + int count, x; + struct timeval tv; + struct tm *ptm; + char time_string[40]; + + /* Obtain the time of day, and convert it to a tm struct. */ + gettimeofday(&tv, NULL); + ptm = localtime(&tv.tv_sec); + /* Format the date and time, down to a single second. */ + strftime(time_string, sizeof(time_string), "%Y-%m-%d %H:%M:%S", ptm); + + for (x = 0; x < HOURS; x++) { + + if (ptm->tm_hour == pol->pkt.timer_policy.busy_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_max( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling up core %d to max\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.quiet_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_min( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling down core %d to min\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.hours_to_use_traffic_profile[x]) { + apply_traffic_profile(pol); + break; + } + } +} + +static void +apply_workload_profile(struct policy *pol) +{ + + int count; + + if (pol->pkt.workload == HIGH) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == MEDIUM) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == LOW) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_policy(struct policy *pol) +{ + + struct channel_packet *pkt = &pol->pkt; + + /*Check policy to use*/ + if (pkt->policy_to_use == TRAFFIC) + apply_traffic_profile(pol); + else if (pkt->policy_to_use == TIME) + apply_time_profile(pol); + else if (pkt->policy_to_use == WORKLOAD) + apply_workload_profile(pol); +} + + static int process_request(struct channel_packet *pkt, struct channel_info *chan_info) { @@ -140,6 +429,13 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) } } + + if (pkt->command == PKT_POLICY) { + RTE_LOG(INFO, CHANNEL_MONITOR, "\nProcessing Policy request from Guest\n"); + update_policy(pkt); + policy_is_set = 1; + } + /* Return is not checked as channel status may have been set to DISABLED * from management thread */ @@ -209,9 +505,10 @@ run_channel_monitor(void) struct channel_info *chan_info = (struct channel_info *) global_events_list[i].data.ptr; if ((global_events_list[i].events & EPOLLERR) || - (global_events_list[i].events & EPOLLHUP)) { + (global_events_list[i].events & EPOLLHUP)) { RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed connection for " - "channel '%s'\n", chan_info->channel_path); + "channel '%s'\n", + chan_info->channel_path); remove_channel(&chan_info); continue; } @@ -223,14 +520,17 @@ run_channel_monitor(void) int buffer_len = sizeof(pkt); while (buffer_len > 0) { - n_bytes = read(chan_info->fd, buffer, buffer_len); + n_bytes = read(chan_info->fd, + buffer, buffer_len); if (n_bytes == buffer_len) break; if (n_bytes == -1) { err = errno; - RTE_LOG(DEBUG, CHANNEL_MONITOR, "Received error on " - "channel '%s' read: %s\n", - chan_info->channel_path, strerror(err)); + RTE_LOG(DEBUG, CHANNEL_MONITOR, + "Received error on " + "channel '%s' read: %s\n", + chan_info->channel_path, + strerror(err)); remove_channel(&chan_info); break; } @@ -241,5 +541,12 @@ run_channel_monitor(void) process_request(&pkt, chan_info); } } + rte_delay_us(time_period_s*1000000); + if (policy_is_set) { + for (int j = 0; j < MAX_VMS; j++) { + if (policies[j].enabled == 1) + apply_policy(&policies[j]); + } + } } } diff --git a/examples/vm_power_manager/channel_monitor.h b/examples/vm_power_manager/channel_monitor.h index c138607..b52c1fc 100644 --- a/examples/vm_power_manager/channel_monitor.h +++ b/examples/vm_power_manager/channel_monitor.h @@ -35,6 +35,24 @@ #define CHANNEL_MONITOR_H_ #include "channel_manager.h" +#include "channel_commands.h" + +struct core_share { + unsigned int pcpu; + /* + * 1 CORE SHARE + * 0 NOT SHARED + */ + int status; +}; + +struct policy { + struct channel_packet pkt; + uint32_t pfid[MAX_VFS]; + uint32_t port[MAX_VFS]; + unsigned int enabled; + struct core_share core_share[MAX_VCPU_PER_VM]; +}; #ifdef __cplusplus extern "C" { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 6/9] examples/vm_power_mgr: add port initialisation 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt ` (4 preceding siblings ...) 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 5/9] examples/vm_power_mgr: add policy to channels David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 7/9] power: add send channel msg function to map file David Hunt ` (3 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic We need to initialise the port's we're monitoring to be able to see the throughput. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 220 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index c33fcc9..698abca 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -49,6 +49,9 @@ #include <rte_log.h> #include <rte_per_lcore.h> #include <rte_lcore.h> +#include <rte_ethdev.h> +#include <getopt.h> +#include <rte_cycles.h> #include <rte_debug.h> #include "channel_manager.h" @@ -56,6 +59,192 @@ #include "power_manager.h" #include "vm_power_cli.h" +#define RX_RING_SIZE 512 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static uint32_t enabled_port_mask; +static volatile bool force_quit; + +/****************/ +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned int)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + + return 0; +} + +static int +parse_portmask(const char *portmask) +{ + char *end = NULL; + unsigned long pm; + + /* parse hexadecimal string */ + pm = strtoul(portmask, &end, 16); + if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + + if (pm == 0) + return -1; + + return pm; +} +/* Parse the argument given in the command line of the application */ +static int +parse_args(int argc, char **argv) +{ + int opt, ret; + char **argvopt; + int option_index; + char *prgname = argv[0]; + static struct option lgopts[] = { + { "mac-updating", no_argument, 0, 1}, + { "no-mac-updating", no_argument, 0, 0}, + {NULL, 0, 0, 0} + }; + argvopt = argv; + + while ((opt = getopt_long(argc, argvopt, "p:q:T:", + lgopts, &option_index)) != EOF) { + + switch (opt) { + /* portmask */ + case 'p': + enabled_port_mask = parse_portmask(optarg); + if (enabled_port_mask == 0) { + printf("invalid portmask\n"); + return -1; + } + break; + /* long options */ + case 0: + break; + + default: + return -1; + } + } + + if (optind >= 0) + argv[optind-1] = prgname; + + ret = optind-1; + optind = 0; /* reset getopt lib */ + return ret; +} + +static void +check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) +{ +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ + uint8_t portid, count, all_ports_up, print_flag = 0; + struct rte_eth_link link; + + printf("\nChecking link status"); + fflush(stdout); + for (count = 0; count <= MAX_CHECK_TIME; count++) { + if (force_quit) + return; + all_ports_up = 1; + for (portid = 0; portid < port_num; portid++) { + if (force_quit) + return; + if ((port_mask & (1 << portid)) == 0) + continue; + memset(&link, 0, sizeof(link)); + rte_eth_link_get_nowait(portid, &link); + /* print link status if flag set */ + if (print_flag == 1) { + if (link.link_status) + printf("Port %d Link Up - speed %u " + "Mbps - %s\n", (uint8_t)portid, + (unsigned int)link.link_speed, + (link.link_duplex == ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex\n")); + else + printf("Port %d Link Down\n", + (uint8_t)portid); + continue; + } + /* clear all_ports_up flag if any link down */ + if (link.link_status == ETH_LINK_DOWN) { + all_ports_up = 0; + break; + } + } + /* after finally printing all link status, get out */ + if (print_flag == 1) + break; + + if (all_ports_up == 0) { + printf("."); + fflush(stdout); + rte_delay_ms(CHECK_INTERVAL); + } + + /* set the print_flag if all ports up or timeout */ + if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) { + print_flag = 1; + printf("done\n"); + } + } +} static int run_monitor(__attribute__((unused)) void *arg) { @@ -82,6 +271,10 @@ main(int argc, char **argv) { int ret; unsigned lcore_id; + unsigned int nb_ports; + struct rte_mempool *mbuf_pool; + uint8_t portid; + ret = rte_eal_init(argc, argv); if (ret < 0) @@ -90,12 +283,39 @@ main(int argc, char **argv) signal(SIGINT, sig_handler); signal(SIGTERM, sig_handler); + argc -= ret; + argv += ret; + + /* parse application arguments (after the EAL ones) */ + ret = parse_args(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid arguments\n"); + + nb_ports = rte_eth_dev_count(); + + mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports, + MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize ports. */ + for (portid = 0; portid < nb_ports; portid++) { + if ((enabled_port_mask & (1 << portid)) == 0) + continue; + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + } + lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { RTE_LOG(ERR, EAL, "A minimum of two cores are required to run " "application\n"); return 0; } + + check_all_ports_link_status(nb_ports, enabled_port_mask); rte_eal_remote_launch(run_monitor, NULL, lcore_id); if (power_manager_init() < 0) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 7/9] power: add send channel msg function to map file 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt ` (5 preceding siblings ...) 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 6/9] examples/vm_power_mgr: add port initialisation David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 8/9] examples/guest_cli: add send policy to host David Hunt ` (2 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt Adding new wrapper function to existing private (but unused 'till now) function with an rte_power_ prefix. The plan is to clean up all the header files in the next release so that only the intended public functions are in the map file and only the relevant headers have the rte_ prefix so that only they are included in the documentation. Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> --- lib/librte_power/guest_channel.c | 7 +++++++ lib/librte_power/guest_channel.h | 15 +++++++++++++++ lib/librte_power/rte_power_version.map | 1 + 3 files changed, 23 insertions(+) diff --git a/lib/librte_power/guest_channel.c b/lib/librte_power/guest_channel.c index 85c92fa..fa5de0f 100644 --- a/lib/librte_power/guest_channel.c +++ b/lib/librte_power/guest_channel.c @@ -148,6 +148,13 @@ guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id) return 0; } +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id) +{ + return guest_channel_send_msg(pkt, lcore_id); +} + + void guest_channel_host_disconnect(unsigned lcore_id) { diff --git a/lib/librte_power/guest_channel.h b/lib/librte_power/guest_channel.h index 9e18af5..741339c 100644 --- a/lib/librte_power/guest_channel.h +++ b/lib/librte_power/guest_channel.h @@ -81,6 +81,21 @@ void guest_channel_host_disconnect(unsigned lcore_id); */ int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); +/** + * Send a message contained in pkt over the Virtio-Serial to the host endpoint. + * + * @param pkt + * Pointer to a populated struct channel_packet + * + * @param lcore_id + * lcore_id. + * + * @return + * - 0 on success. + * - Negative on error. + */ +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id); #ifdef __cplusplus } diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map index 9ae0627..96dc42e 100644 --- a/lib/librte_power/rte_power_version.map +++ b/lib/librte_power/rte_power_version.map @@ -20,6 +20,7 @@ DPDK_2.0 { DPDK_17.11 { global: + rte_power_guest_channel_send_msg; rte_power_freq_disable_turbo; rte_power_freq_enable_turbo; rte_power_turbo_status; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 8/9] examples/guest_cli: add send policy to host 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt ` (6 preceding siblings ...) 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 7/9] power: add send channel msg function to map file David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Here we're adding an example of setting up a policy, and allowing the vm_cli_guest app to send it to the host using the cli command "send_policy now" Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- .../guest_cli/vm_power_cli_guest.c | 97 ++++++++++++++++++++++ .../guest_cli/vm_power_cli_guest.h | 6 -- 2 files changed, 97 insertions(+), 6 deletions(-) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 4e982bd..dc9efc2 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -45,8 +45,10 @@ #include <cmdline.h> #include <rte_log.h> #include <rte_lcore.h> +#include <rte_ethdev.h> #include <rte_power.h> +#include <guest_channel.h> #include "vm_power_cli_guest.h" @@ -139,8 +141,103 @@ cmdline_parse_inst_t cmd_set_cpu_freq_set = { }, }; +struct cmd_send_policy_result { + cmdline_fixed_string_t send_policy; + cmdline_fixed_string_t cmd; +}; + +union PFID { + struct ether_addr addr; + uint64_t pfid; +}; + +static inline int +send_policy(void) +{ + struct channel_packet pkt; + int ret; + + union PFID pfid; + /* Use port MAC address as the vfid */ + rte_eth_macaddr_get(0, &pfid.addr); + printf("Port %u MAC: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":" + "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n", + 1, + pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1], + pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3], + pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]); + pkt.vfid[0] = pfid.pfid; + + pkt.nb_mac_to_monitor = 1; + pkt.t_boost_status.tbEnabled = false; + + pkt.vcpu_to_control[0] = 0; + pkt.vcpu_to_control[1] = 1; + pkt.num_vcpu = 2; + /* Dummy Population. */ + pkt.traffic_policy.min_packet_thresh = 96000; + pkt.traffic_policy.avg_max_packet_thresh = 1800000; + pkt.traffic_policy.max_max_packet_thresh = 2000000; + + pkt.timer_policy.busy_hours[0] = 3; + pkt.timer_policy.busy_hours[1] = 4; + pkt.timer_policy.busy_hours[2] = 5; + pkt.timer_policy.quiet_hours[0] = 11; + pkt.timer_policy.quiet_hours[1] = 12; + pkt.timer_policy.quiet_hours[2] = 13; + + pkt.timer_policy.hours_to_use_traffic_profile[0] = 8; + pkt.timer_policy.hours_to_use_traffic_profile[1] = 10; + + pkt.workload = LOW; + pkt.policy_to_use = TIME; + pkt.command = PKT_POLICY; + strcpy(pkt.vm_name, "ubuntu2"); + ret = rte_power_guest_channel_send_msg(&pkt, 1); + if (ret == 0) + return 1; + RTE_LOG(DEBUG, POWER, "Error sending message: %s\n", + ret > 0 ? strerror(ret) : "channel not connected"); + return -1; +} + +static void +cmd_send_policy_parsed(void *parsed_result, struct cmdline *cl, + __attribute__((unused)) void *data) +{ + int ret = -1; + struct cmd_send_policy_result *res = parsed_result; + + if (!strcmp(res->cmd, "now")) { + printf("Sending Policy down now!\n"); + ret = send_policy(); + } + if (ret != 1) + cmdline_printf(cl, "Error sending message: %s\n", + strerror(ret)); +} + +cmdline_parse_token_string_t cmd_send_policy = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + send_policy, "send_policy"); +cmdline_parse_token_string_t cmd_send_policy_cmd_cmd = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + cmd, "now"); + +cmdline_parse_inst_t cmd_send_policy_set = { + .f = cmd_send_policy_parsed, + .data = NULL, + .help_str = "send_policy now", + .tokens = { + (void *)&cmd_send_policy, + (void *)&cmd_send_policy_cmd_cmd, + NULL, + }, +}; + cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_quit, + (cmdline_parse_inst_t *)&cmd_send_policy_set, (cmdline_parse_inst_t *)&cmd_set_cpu_freq_set, NULL, }; diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h index 0c4bdd5..277eab3 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h @@ -40,12 +40,6 @@ extern "C" { #include "channel_commands.h" -int guest_channel_host_connect(unsigned lcore_id); - -int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); - -void guest_channel_host_disconnect(unsigned lcore_id); - void run_cli(__attribute__((unused)) void *arg); #ifdef __cplusplus -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v6 9/9] examples/vm_power_mgr: set MAC address of VF 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt ` (7 preceding siblings ...) 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 8/9] examples/guest_cli: add send policy to host David Hunt @ 2017-10-05 12:25 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 12:25 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt We need to set vf mac from the host, so that they will be in sync on the guest and the host. Otherwise, we'll have a random mac on the guest, and a 00:00:00:00:00:00 mac on the host. Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 43 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index 698abca..5147789 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -58,6 +58,9 @@ #include "channel_monitor.h" #include "power_manager.h" #include "vm_power_cli.h" +#include <rte_pmd_ixgbe.h> +#include <rte_pmd_i40e.h> +#include <rte_pmd_bnxt.h> #define RX_RING_SIZE 512 #define TX_RING_SIZE 512 @@ -222,7 +225,7 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) (uint8_t)portid); continue; } - /* clear all_ports_up flag if any link down */ + /* clear all_ports_up flag if any link down */ if (link.link_status == ETH_LINK_DOWN) { all_ports_up = 0; break; @@ -301,11 +304,49 @@ main(int argc, char **argv) /* Initialize ports. */ for (portid = 0; portid < nb_ports; portid++) { + struct ether_addr eth; + int w, j; + int ret = -ENOTSUP; + if ((enabled_port_mask & (1 << portid)) == 0) continue; + + eth.addr_bytes[0] = 0xe0; + eth.addr_bytes[1] = 0xe0; + eth.addr_bytes[2] = 0xe0; + eth.addr_bytes[3] = 0xe0; + eth.addr_bytes[4] = portid + 0xf0; + if (port_init(portid, mbuf_pool) != 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", portid); + + for (w = 0; w < MAX_VFS; w++) { + eth.addr_bytes[5] = w + 0xf0; + + if (ret == -ENOTSUP) + ret = rte_pmd_ixgbe_set_vf_mac_addr(portid, + w, ð); + if (ret == -ENOTSUP) + ret = rte_pmd_i40e_set_vf_mac_addr(portid, + w, ð); + if (ret == -ENOTSUP) + ret = rte_pmd_bnxt_set_vf_mac_addr(portid, + w, ð); + + switch (ret) { + case 0: + printf("Port %d VF %d MAC: ", + portid, w); + for (j = 0; j < 6; j++) { + printf("%02x", eth.addr_bytes[j]); + if (j < 5) + printf(":"); + } + printf("\n"); + break; + } + } } lcore_id = rte_get_next_lcore(-1, 1, 0); -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt ` (8 preceding siblings ...) 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt ` (11 more replies) 9 siblings, 12 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla Policy Based Power Control for Guest This patchset adds the facility for a guest VM to send a policy down to the host that will allow the host to scale up/down cpu frequencies depending on the policy criteria independently of the DPDK app running in the guest. This differs from the previous vm_power implementation where individual scale up/down requests were send from the guest to the host via virtio-serial. V7 patchset changes: * Changed return code of rte_pmd_i40e_query_vfid_by_mac() from an int64_t to int V6 patchset changes: * Fixed comments in header for rte_pmd_i40e_query_vfid_by_mac. * changed rte_pmd_i40e_query_vfid_by_mac return code from uint to int as it can return negative error codes. * Removed bool enum from channel_commands.h, including stdbool.h instead. * Added #define VM_MAX_NAME_SZ 32 to channel_commands.h * Renamed a few variables to be more readable. * Added returns in a few places if failed to get info on domain. * Fixed power_manager_init to keep track of num_freqs for each core. * In power_manager_scale_core_med(), changed a hardcoded '5' to instead be calculated from the centre of the frequency list (global_core_freq_info[core_num].num_freqs / 2) V5 patchset changes: * Removed most of the #ifdef I40_PMD as it will be applicable to other PMDs in the future. * Changed the parameter of rte_pmd_i40e_query_vfid_by_mac from a uint64 to a const struct ether_addr *, rather than casting it later in the function. V4 patchset changes: * None, re-post to mailing list under the correct email thread. V3 patchset changes: * Changed to using is_same_ether_addr() instead of looping through the mac address bytes to compare them. * Tweaked some comments and working in the i40e patch after review. * Added a patch to the set to add new i40e function to map file, so as to allow shared library builds. The power library API needs a cleanup in next release, so will add API/ABI warning for this cleanup in a separate patch. V2 patchset changes: * Removed API's in ethdev layer. * Now just a single new API in the i40e driver for mapping VF MAC to VF index. * Moved new function from rte_rxtx.c to rte_pmd_i40e.c * Removed function for reading i40e register, moved to using the standard stats API. * Renamed i40e function to rte_pmd_i40e_query_vfid_by_mac * Cleaned up policy generation code. It's a modification of the vm_power_manager app that runs in the host, and the guest_vm_power_app example app that runs in the guest. This allows the guest to send down a policy to the host via virtio-serial, which then allows the host to scale up/down based on the criteria in the policy, resulting in quicker scale up/down than individual requests coming from the guest. It also means that the DPDK application running in the guest does not need to be modified in any way, it is unaware that it's cores are being scaled up/down, reducing the effort in implementing a power-aware infrastructure. The usage model is as follows: 1. Set up the VF's and assign to the guest in the usual way. 2. run vm_power_manager on the host, creating a channel to the guest. 3. Start the guest_vm_power_mgr app on the guest, which establishes a virtio-serial channel to the host. 4. Send down the profile for the guest using the "send_profile now" command. There is an example profile hard-coded into guest_vm_power_mgr. 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. 6. Send traffic into the VFs at varying traffic rates. Observe the frequency change on the host (turbostat -i 1) The sequence of code changes are as follows: A new function has been aded to the i40e driver to allow mapping of a VF MAC to VF index. Next we make an addition to librte_power that adds an extra command to allow the passing of a policy structure from the guest to the host. This struct contains information like busy/quiet hour, packet throughput thresholds, etc. The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to physical CPU (pcpu) IDs so that the host can scale up/down the cores used in the guest. The remaining patches are functionality to process the policy, and take action when the relevant trigger occurs to cause a frequency change. [1/9] net/i40e: add API to convert VF MAC to VF id [2/9] lib/librte_power: add extra msg type for policies [3/9] examples/vm_power_mgr: add vcpu to pcpu mapping [4/9] examples/vm_power_mgr: add scale to medium freq fn [5/9] examples/vm_power_mgr: add policy to channels [6/9] examples/vm_power_mgr: add port initialisation [7/9] power: add send channel msg function to map file [8/9] examples/guest_cli: add send policy to host [9/9] examples/vm_power_mgr: set MAC address of VF ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 2/9] lib/librte_power: add extra msg type for policies David Hunt ` (10 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Need a way to convert a vf id to a pf id on the host so as to query the pf for relevant statistics which are used for the frequency changes in the vm_power_manager app. Used when profiles are passed down from the guest to the host, allowing the host to map the vfs to pfs. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- drivers/net/i40e/rte_pmd_i40e.c | 30 ++++++++++++++++++++++++++++++ drivers/net/i40e/rte_pmd_i40e.h | 15 +++++++++++++++ drivers/net/i40e/rte_pmd_i40e_version.map | 7 +++++++ 3 files changed, 52 insertions(+) diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c index f12b7f4..76d11dd 100644 --- a/drivers/net/i40e/rte_pmd_i40e.c +++ b/drivers/net/i40e/rte_pmd_i40e.c @@ -2115,3 +2115,33 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, return 0; } + +int +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, const struct ether_addr *vf_mac) +{ + struct rte_eth_dev *dev; + struct ether_addr *mac; + struct i40e_pf *pf; + int vf_id; + struct i40e_pf_vf *vf; + uint16_t vf_num; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); + dev = &rte_eth_devices[port]; + + if (!is_i40e_supported(dev)) + return -ENOTSUP; + + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); + vf_num = pf->vf_num; + + for (vf_id = 0; vf_id < vf_num; vf_id++) { + vf = &pf->vfs[vf_id]; + mac = &vf->mac_addr; + + if (is_same_ether_addr(mac, vf_mac)) + return vf_id; + } + + return -EINVAL; +} diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h index 356fa89..a355896 100644 --- a/drivers/net/i40e/rte_pmd_i40e.h +++ b/drivers/net/i40e/rte_pmd_i40e.h @@ -637,4 +637,19 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, uint8_t mask, uint32_t pkt_type); +/** + * On the PF, find VF index based on VF MAC address + * + * @param port + * pointer to port identifier of the device + * @param vf_mac + * the mac address of the vf to determine index of + * @return + * The index of vfid If successful. + * -EINVAL: vf mac address does not exist for this port + * -ENOTSUP: i40e not supported for this port. + */ +int rte_pmd_i40e_query_vfid_by_mac(uint8_t port, + const struct ether_addr *vf_mac); + #endif /* _PMD_I40E_H_ */ diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map b/drivers/net/i40e/rte_pmd_i40e_version.map index 20cc980..d8b74bd 100644 --- a/drivers/net/i40e/rte_pmd_i40e_version.map +++ b/drivers/net/i40e/rte_pmd_i40e_version.map @@ -45,3 +45,10 @@ DPDK_17.08 { rte_pmd_i40e_get_ddp_info; } DPDK_17.05; + +DPDK_17.11 { + global: + + rte_pmd_i40e_query_vfid_by_mac; + +} DPDK_17.08; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 2/9] lib/librte_power: add extra msg type for policies 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt ` (9 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- lib/librte_power/channel_commands.h | 42 +++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index 484085b..f0f5f0a 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -39,6 +39,7 @@ extern "C" { #endif #include <stdint.h> +#include <stdbool.h> /* Maximum number of channels per VM */ #define CHANNEL_CMDS_MAX_VM_CHANNELS 64 @@ -46,6 +47,7 @@ extern "C" { /* Valid Commands */ #define CPU_POWER 1 #define CPU_POWER_CONNECT 2 +#define PKT_POLICY 3 /* CPU Power Command Scaling */ #define CPU_POWER_SCALE_UP 1 @@ -54,11 +56,51 @@ extern "C" { #define CPU_POWER_SCALE_MIN 4 #define CPU_POWER_ENABLE_TURBO 5 #define CPU_POWER_DISABLE_TURBO 6 +#define HOURS 24 + +#define MAX_VFS 10 +#define VM_MAX_NAME_SZ 32 + +#define MAX_VCPU_PER_VM 8 + +struct t_boost_status { + bool tbEnabled; +}; + +struct timer_profile { + int busy_hours[HOURS]; + int quiet_hours[HOURS]; + int hours_to_use_traffic_profile[HOURS]; +}; + +enum workload {HIGH, MEDIUM, LOW}; +enum policy_to_use { + TRAFFIC, + TIME, + WORKLOAD +}; + +struct traffic { + uint32_t min_packet_thresh; + uint32_t avg_max_packet_thresh; + uint32_t max_max_packet_thresh; +}; struct channel_packet { uint64_t resource_id; /**< core_num, device */ uint32_t unit; /**< scale down/up/min/max */ uint32_t command; /**< Power, IO, etc */ + char vm_name[VM_MAX_NAME_SZ]; + + uint64_t vfid[MAX_VFS]; + int nb_mac_to_monitor; + struct traffic traffic_policy; + uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; + uint8_t num_vcpu; + struct timer_profile timer_policy; + enum workload workload; + enum policy_to_use policy_to_use; + struct t_boost_status t_boost_status; }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 2/9] lib/librte_power: add extra msg type for policies David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt ` (8 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/channel_manager.c | 67 +++++++++++++++++++++++++++++ examples/vm_power_manager/channel_manager.h | 25 +++++++++++ 2 files changed, 92 insertions(+) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index e068ae2..ab856bd 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -574,6 +574,73 @@ set_channel_status(const char *vm_name, unsigned *channel_list, return num_channels_changed; } +void +get_all_vm(int *num_vm, int *num_vcpu) +{ + + virNodeInfo node_info; + virDomainPtr *domptr; + uint64_t mask; + int i, ii, numVcpus[MAX_VCPUS], cpu, n_vcpus; + unsigned int jj; + const char *vm_name; + unsigned int domain_flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; + unsigned int domain_flag = VIR_DOMAIN_VCPU_CONFIG; + + + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); + if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); + return; + } + + /* Returns number of pcpus */ + global_n_host_cpus = (unsigned int)node_info.cpus; + + /* Returns number of active domains */ + *num_vm = virConnectListAllDomains(global_vir_conn_ptr, &domptr, + domain_flags); + if (*num_vm <= 0) { + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); + return; + } + + for (i = 0; i < *num_vm; i++) { + + /* Get Domain Names */ + vm_name = virDomainGetName(domptr[i]); + lvm_info[i].vm_name = vm_name; + + /* Get Number of Vcpus */ + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], domain_flag); + + /* Get Number of VCpus & VcpuPinInfo */ + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], + numVcpus[i], global_cpumaps, + global_maplen, domain_flag); + + if ((int)n_vcpus > 0) { + *num_vcpu = n_vcpus; + lvm_info[i].num_cpus = n_vcpus; + } + + /* Save pcpu in use by libvirt VMs */ + for (ii = 0; ii < n_vcpus; ii++) { + mask = 0; + for (jj = 0; jj < global_n_host_cpus; jj++) { + if (VIR_CPU_USABLE(global_cpumaps, + global_maplen, ii, jj) > 0) { + mask |= 1ULL << jj; + } + } + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { + lvm_info[i].pcpus[ii] = cpu; + } + } + } +} + int get_info_vm(const char *vm_name, struct vm_info *info) { diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h index 47c3b9c..358fb8f 100644 --- a/examples/vm_power_manager/channel_manager.h +++ b/examples/vm_power_manager/channel_manager.h @@ -66,6 +66,17 @@ struct sockaddr_un _sockaddr_un; #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) #endif +#define MAX_VMS 4 +#define MAX_VCPUS 20 + + +struct libvirt_vm_info { + const char *vm_name; + unsigned int pcpus[MAX_VCPUS]; + uint8_t num_cpus; +}; + +struct libvirt_vm_info lvm_info[MAX_VMS]; /* Communication Channel Status */ enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, CHANNEL_MGR_CHANNEL_CONNECTED, @@ -319,6 +330,20 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, */ int get_info_vm(const char *vm_name, struct vm_info *info); +/** + * Populates a table with all domains running and their physical cpu. + * All information is gathered through libvirt api. + * + * @param num_vm + * modified to store number of active VMs + * + * @param num_vcpu + modified to store number of vcpus active + * + * @return + * void + */ +void get_all_vm(int *num_vm, int *num_vcpu); #ifdef __cplusplus } #endif -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 4/9] examples/vm_power_mgr: add scale to medium freq fn 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (2 preceding siblings ...) 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 5/9] examples/vm_power_mgr: add policy to channels David Hunt ` (7 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/power_manager.c | 32 ++++++++++++++++++++++++++----- examples/vm_power_manager/power_manager.h | 13 +++++++++++++ 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c index 80705f9..1834a82 100644 --- a/examples/vm_power_manager/power_manager.c +++ b/examples/vm_power_manager/power_manager.c @@ -108,7 +108,7 @@ set_host_cpus_mask(void) int power_manager_init(void) { - unsigned i, num_cpus; + unsigned int i, num_cpus, num_freqs; uint64_t cpu_mask; int ret = 0; @@ -121,15 +121,21 @@ power_manager_init(void) rte_power_set_env(PM_ENV_ACPI_CPUFREQ); cpu_mask = global_enabled_cpus; for (i = 0; cpu_mask; cpu_mask &= ~(1 << i++)) { - if (rte_power_init(i) < 0 || rte_power_freqs(i, - global_core_freq_info[i].freqs, - RTE_MAX_LCORE_FREQS) == 0) { - RTE_LOG(ERR, POWER_MANAGER, "Unable to initialize power manager " + if (rte_power_init(i) < 0) + RTE_LOG(ERR, POWER_MANAGER, + "Unable to initialize power manager " "for core %u\n", i); + num_freqs = rte_power_freqs(i, global_core_freq_info[i].freqs, + RTE_MAX_LCORE_FREQS); + if (num_freqs == 0) { + RTE_LOG(ERR, POWER_MANAGER, + "Unable to get frequency list for core %u\n", + i); global_enabled_cpus &= ~(1 << i); num_cpus--; ret = -1; } + global_core_freq_info[i].num_freqs = num_freqs; rte_spinlock_init(&global_core_freq_info[i].power_sl); } RTE_LOG(INFO, POWER_MANAGER, "Detected %u host CPUs , enabled core mask:" @@ -286,3 +292,19 @@ power_manager_disable_turbo_core(unsigned int core_num) POWER_SCALE_CORE(disable_turbo, core_num, ret); return ret; } + +int +power_manager_scale_core_med(unsigned int core_num) +{ + int ret = 0; + + if (core_num >= POWER_MGR_MAX_CPUS) + return -1; + if (!(global_enabled_cpus & (1ULL << core_num))) + return -1; + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); + ret = rte_power_set_freq(core_num, + global_core_freq_info[core_num].num_freqs / 2); + rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); + return ret; +} diff --git a/examples/vm_power_manager/power_manager.h b/examples/vm_power_manager/power_manager.h index b74d09b..b52fb4c 100644 --- a/examples/vm_power_manager/power_manager.h +++ b/examples/vm_power_manager/power_manager.h @@ -231,6 +231,19 @@ int power_manager_disable_turbo_core(unsigned int core_num); */ uint32_t power_manager_get_current_frequency(unsigned core_num); +/** + * Scale to medium frequency for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to change frequency + * + * @return + * - 1 on success. + * - 0 if frequency not changed. + * - Negative on error. + */ +int power_manager_scale_core_med(unsigned int core_num); #ifdef __cplusplus } -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 5/9] examples/vm_power_mgr: add policy to channels 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (3 preceding siblings ...) 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 6/9] examples/vm_power_mgr: add port initialisation David Hunt ` (6 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/Makefile | 16 ++ examples/vm_power_manager/channel_monitor.c | 321 +++++++++++++++++++++++++++- examples/vm_power_manager/channel_monitor.h | 18 ++ 3 files changed, 348 insertions(+), 7 deletions(-) diff --git a/examples/vm_power_manager/Makefile b/examples/vm_power_manager/Makefile index 59a9641..9cf20a2 100644 --- a/examples/vm_power_manager/Makefile +++ b/examples/vm_power_manager/Makefile @@ -54,6 +54,22 @@ CFLAGS += $(WERROR_FLAGS) LDLIBS += -lvirt +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) + +ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y) +LDLIBS += -lrte_pmd_ixgbe +endif + +ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y) +LDLIBS += -lrte_pmd_i40e +endif + +ifeq ($(CONFIG_RTE_LIBRTE_BNXT_PMD),y) +LDLIBS += -lrte_pmd_bnxt +endif + +endif + # workaround for a gcc bug with noreturn attribute # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index ac40dac..f16358d 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -41,13 +41,17 @@ #include <sys/types.h> #include <sys/epoll.h> #include <sys/queue.h> +#include <sys/time.h> #include <rte_log.h> #include <rte_memory.h> #include <rte_malloc.h> #include <rte_atomic.h> +#include <rte_cycles.h> +#include <rte_ethdev.h> +#include <rte_pmd_i40e.h> - +#include <libvirt/libvirt.h> #include "channel_monitor.h" #include "channel_commands.h" #include "channel_manager.h" @@ -57,10 +61,15 @@ #define MAX_EVENTS 256 +uint64_t vsi_pkt_count_prev[384]; +uint64_t rdtsc_prev[384]; +double time_period_s = 1; static volatile unsigned run_loop = 1; static int global_event_fd; +static unsigned int policy_is_set; static struct epoll_event *global_events_list; +static struct policy policies[MAX_VMS]; void channel_monitor_exit(void) { @@ -68,6 +77,286 @@ void channel_monitor_exit(void) rte_free(global_events_list); } +static void +core_share(int pNo, int z, int x, int t) +{ + if (policies[pNo].core_share[z].pcpu == lvm_info[x].pcpus[t]) { + if (strcmp(policies[pNo].pkt.vm_name, + lvm_info[x].vm_name) != 0) { + policies[pNo].core_share[z].status = 1; + power_manager_scale_core_max( + policies[pNo].core_share[z].pcpu); + } + } +} + +static void +core_share_status(int pNo) +{ + + int noVms, noVcpus, z, x, t; + + get_all_vm(&noVms, &noVcpus); + + /* Reset Core Share Status. */ + for (z = 0; z < noVcpus; z++) + policies[pNo].core_share[z].status = 0; + + /* Foreach vcpu in a policy. */ + for (z = 0; z < policies[pNo].pkt.num_vcpu; z++) { + /* Foreach VM on the platform. */ + for (x = 0; x < noVms; x++) { + /* Foreach vcpu of VMs on platform. */ + for (t = 0; t < lvm_info[x].num_cpus; t++) + core_share(pNo, z, x, t); + } + } +} + +static void +get_pcpu_to_control(struct policy *pol) +{ + + /* Convert vcpu to pcpu. */ + struct vm_info info; + int pcpu, count; + uint64_t mask_u64b; + + RTE_LOG(INFO, CHANNEL_MONITOR, "Looking for pcpu for %s\n", + pol->pkt.vm_name); + get_info_vm(pol->pkt.vm_name, &info); + + for (count = 0; count < pol->pkt.num_vcpu; count++) { + mask_u64b = info.pcpu_mask[pol->pkt.vcpu_to_control[count]]; + for (pcpu = 0; mask_u64b; mask_u64b &= ~(1ULL << pcpu++)) { + if ((mask_u64b >> pcpu) & 1) + pol->core_share[count].pcpu = pcpu; + } + } +} + +static int +get_pfid(struct policy *pol) +{ + + int i, x, ret = 0, nb_ports; + + nb_ports = rte_eth_dev_count(); + for (i = 0; i < pol->pkt.nb_mac_to_monitor; i++) { + + for (x = 0; x < nb_ports; x++) { + ret = rte_pmd_i40e_query_vfid_by_mac(x, + (struct ether_addr *)&(pol->pkt.vfid[i])); + if (ret != -EINVAL) { + pol->port[i] = x; + break; + } + } + if (ret == -EINVAL || ret == -ENOTSUP || ret == ENODEV) { + RTE_LOG(INFO, CHANNEL_MONITOR, + "Error with Policy. MAC not found on " + "attached ports "); + pol->enabled = 0; + return ret; + } + pol->pfid[i] = ret; + } + return 1; +} + +static int +update_policy(struct channel_packet *pkt) +{ + + unsigned int updated = 0; + + for (int i = 0; i < MAX_VMS; i++) { + if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) { + updated = 1; + break; + } + core_share_status(i); + policies[i].enabled = 1; + updated = 1; + } + } + if (!updated) { + for (int i = 0; i < MAX_VMS; i++) { + if (policies[i].enabled == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) + break; + core_share_status(i); + policies[i].enabled = 1; + break; + } + } + } + return 0; +} + +static uint64_t +get_pkt_diff(struct policy *pol) +{ + + uint64_t vsi_pkt_count, + vsi_pkt_total = 0, + vsi_pkt_count_prev_total = 0; + double rdtsc_curr, rdtsc_diff, diff; + int x; + struct rte_eth_stats vf_stats; + + for (x = 0; x < pol->pkt.nb_mac_to_monitor; x++) { + + /*Read vsi stats*/ + if (rte_pmd_i40e_get_vf_stats(x, pol->pfid[x], &vf_stats) == 0) + vsi_pkt_count = vf_stats.ipackets; + else + vsi_pkt_count = -1; + + vsi_pkt_total += vsi_pkt_count; + + vsi_pkt_count_prev_total += vsi_pkt_count_prev[pol->pfid[x]]; + vsi_pkt_count_prev[pol->pfid[x]] = vsi_pkt_count; + } + + rdtsc_curr = rte_rdtsc_precise(); + rdtsc_diff = rdtsc_curr - rdtsc_prev[pol->pfid[x-1]]; + rdtsc_prev[pol->pfid[x-1]] = rdtsc_curr; + + diff = (vsi_pkt_total - vsi_pkt_count_prev_total) * + ((double)rte_get_tsc_hz() / rdtsc_diff); + + return diff; +} + +static void +apply_traffic_profile(struct policy *pol) +{ + + int count; + uint64_t diff = 0; + + diff = get_pkt_diff(pol); + + RTE_LOG(INFO, CHANNEL_MONITOR, "Applying traffic profile\n"); + + if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (diff >= (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (diff < (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_time_profile(struct policy *pol) +{ + + int count, x; + struct timeval tv; + struct tm *ptm; + char time_string[40]; + + /* Obtain the time of day, and convert it to a tm struct. */ + gettimeofday(&tv, NULL); + ptm = localtime(&tv.tv_sec); + /* Format the date and time, down to a single second. */ + strftime(time_string, sizeof(time_string), "%Y-%m-%d %H:%M:%S", ptm); + + for (x = 0; x < HOURS; x++) { + + if (ptm->tm_hour == pol->pkt.timer_policy.busy_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_max( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling up core %d to max\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.quiet_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_min( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling down core %d to min\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.hours_to_use_traffic_profile[x]) { + apply_traffic_profile(pol); + break; + } + } +} + +static void +apply_workload_profile(struct policy *pol) +{ + + int count; + + if (pol->pkt.workload == HIGH) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == MEDIUM) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == LOW) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_policy(struct policy *pol) +{ + + struct channel_packet *pkt = &pol->pkt; + + /*Check policy to use*/ + if (pkt->policy_to_use == TRAFFIC) + apply_traffic_profile(pol); + else if (pkt->policy_to_use == TIME) + apply_time_profile(pol); + else if (pkt->policy_to_use == WORKLOAD) + apply_workload_profile(pol); +} + + static int process_request(struct channel_packet *pkt, struct channel_info *chan_info) { @@ -140,6 +429,13 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) } } + + if (pkt->command == PKT_POLICY) { + RTE_LOG(INFO, CHANNEL_MONITOR, "\nProcessing Policy request from Guest\n"); + update_policy(pkt); + policy_is_set = 1; + } + /* Return is not checked as channel status may have been set to DISABLED * from management thread */ @@ -209,9 +505,10 @@ run_channel_monitor(void) struct channel_info *chan_info = (struct channel_info *) global_events_list[i].data.ptr; if ((global_events_list[i].events & EPOLLERR) || - (global_events_list[i].events & EPOLLHUP)) { + (global_events_list[i].events & EPOLLHUP)) { RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed connection for " - "channel '%s'\n", chan_info->channel_path); + "channel '%s'\n", + chan_info->channel_path); remove_channel(&chan_info); continue; } @@ -223,14 +520,17 @@ run_channel_monitor(void) int buffer_len = sizeof(pkt); while (buffer_len > 0) { - n_bytes = read(chan_info->fd, buffer, buffer_len); + n_bytes = read(chan_info->fd, + buffer, buffer_len); if (n_bytes == buffer_len) break; if (n_bytes == -1) { err = errno; - RTE_LOG(DEBUG, CHANNEL_MONITOR, "Received error on " - "channel '%s' read: %s\n", - chan_info->channel_path, strerror(err)); + RTE_LOG(DEBUG, CHANNEL_MONITOR, + "Received error on " + "channel '%s' read: %s\n", + chan_info->channel_path, + strerror(err)); remove_channel(&chan_info); break; } @@ -241,5 +541,12 @@ run_channel_monitor(void) process_request(&pkt, chan_info); } } + rte_delay_us(time_period_s*1000000); + if (policy_is_set) { + for (int j = 0; j < MAX_VMS; j++) { + if (policies[j].enabled == 1) + apply_policy(&policies[j]); + } + } } } diff --git a/examples/vm_power_manager/channel_monitor.h b/examples/vm_power_manager/channel_monitor.h index c138607..b52c1fc 100644 --- a/examples/vm_power_manager/channel_monitor.h +++ b/examples/vm_power_manager/channel_monitor.h @@ -35,6 +35,24 @@ #define CHANNEL_MONITOR_H_ #include "channel_manager.h" +#include "channel_commands.h" + +struct core_share { + unsigned int pcpu; + /* + * 1 CORE SHARE + * 0 NOT SHARED + */ + int status; +}; + +struct policy { + struct channel_packet pkt; + uint32_t pfid[MAX_VFS]; + uint32_t port[MAX_VFS]; + unsigned int enabled; + struct core_share core_share[MAX_VCPU_PER_VM]; +}; #ifdef __cplusplus extern "C" { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 6/9] examples/vm_power_mgr: add port initialisation 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (4 preceding siblings ...) 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 5/9] examples/vm_power_mgr: add policy to channels David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 7/9] power: add send channel msg function to map file David Hunt ` (5 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic We need to initialise the port's we're monitoring to be able to see the throughput. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 220 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index c33fcc9..698abca 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -49,6 +49,9 @@ #include <rte_log.h> #include <rte_per_lcore.h> #include <rte_lcore.h> +#include <rte_ethdev.h> +#include <getopt.h> +#include <rte_cycles.h> #include <rte_debug.h> #include "channel_manager.h" @@ -56,6 +59,192 @@ #include "power_manager.h" #include "vm_power_cli.h" +#define RX_RING_SIZE 512 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static uint32_t enabled_port_mask; +static volatile bool force_quit; + +/****************/ +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned int)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + + return 0; +} + +static int +parse_portmask(const char *portmask) +{ + char *end = NULL; + unsigned long pm; + + /* parse hexadecimal string */ + pm = strtoul(portmask, &end, 16); + if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + + if (pm == 0) + return -1; + + return pm; +} +/* Parse the argument given in the command line of the application */ +static int +parse_args(int argc, char **argv) +{ + int opt, ret; + char **argvopt; + int option_index; + char *prgname = argv[0]; + static struct option lgopts[] = { + { "mac-updating", no_argument, 0, 1}, + { "no-mac-updating", no_argument, 0, 0}, + {NULL, 0, 0, 0} + }; + argvopt = argv; + + while ((opt = getopt_long(argc, argvopt, "p:q:T:", + lgopts, &option_index)) != EOF) { + + switch (opt) { + /* portmask */ + case 'p': + enabled_port_mask = parse_portmask(optarg); + if (enabled_port_mask == 0) { + printf("invalid portmask\n"); + return -1; + } + break; + /* long options */ + case 0: + break; + + default: + return -1; + } + } + + if (optind >= 0) + argv[optind-1] = prgname; + + ret = optind-1; + optind = 0; /* reset getopt lib */ + return ret; +} + +static void +check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) +{ +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ + uint8_t portid, count, all_ports_up, print_flag = 0; + struct rte_eth_link link; + + printf("\nChecking link status"); + fflush(stdout); + for (count = 0; count <= MAX_CHECK_TIME; count++) { + if (force_quit) + return; + all_ports_up = 1; + for (portid = 0; portid < port_num; portid++) { + if (force_quit) + return; + if ((port_mask & (1 << portid)) == 0) + continue; + memset(&link, 0, sizeof(link)); + rte_eth_link_get_nowait(portid, &link); + /* print link status if flag set */ + if (print_flag == 1) { + if (link.link_status) + printf("Port %d Link Up - speed %u " + "Mbps - %s\n", (uint8_t)portid, + (unsigned int)link.link_speed, + (link.link_duplex == ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex\n")); + else + printf("Port %d Link Down\n", + (uint8_t)portid); + continue; + } + /* clear all_ports_up flag if any link down */ + if (link.link_status == ETH_LINK_DOWN) { + all_ports_up = 0; + break; + } + } + /* after finally printing all link status, get out */ + if (print_flag == 1) + break; + + if (all_ports_up == 0) { + printf("."); + fflush(stdout); + rte_delay_ms(CHECK_INTERVAL); + } + + /* set the print_flag if all ports up or timeout */ + if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) { + print_flag = 1; + printf("done\n"); + } + } +} static int run_monitor(__attribute__((unused)) void *arg) { @@ -82,6 +271,10 @@ main(int argc, char **argv) { int ret; unsigned lcore_id; + unsigned int nb_ports; + struct rte_mempool *mbuf_pool; + uint8_t portid; + ret = rte_eal_init(argc, argv); if (ret < 0) @@ -90,12 +283,39 @@ main(int argc, char **argv) signal(SIGINT, sig_handler); signal(SIGTERM, sig_handler); + argc -= ret; + argv += ret; + + /* parse application arguments (after the EAL ones) */ + ret = parse_args(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid arguments\n"); + + nb_ports = rte_eth_dev_count(); + + mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports, + MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize ports. */ + for (portid = 0; portid < nb_ports; portid++) { + if ((enabled_port_mask & (1 << portid)) == 0) + continue; + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + } + lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { RTE_LOG(ERR, EAL, "A minimum of two cores are required to run " "application\n"); return 0; } + + check_all_ports_link_status(nb_ports, enabled_port_mask); rte_eal_remote_launch(run_monitor, NULL, lcore_id); if (power_manager_init() < 0) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 7/9] power: add send channel msg function to map file 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (5 preceding siblings ...) 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 6/9] examples/vm_power_mgr: add port initialisation David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 8/9] examples/guest_cli: add send policy to host David Hunt ` (4 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt Adding new wrapper function to existing private (but unused 'till now) function with an rte_power_ prefix. The plan is to clean up all the header files in the next release so that only the intended public functions are in the map file and only the relevant headers have the rte_ prefix so that only they are included in the documentation. Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> --- lib/librte_power/guest_channel.c | 7 +++++++ lib/librte_power/guest_channel.h | 15 +++++++++++++++ lib/librte_power/rte_power_version.map | 1 + 3 files changed, 23 insertions(+) diff --git a/lib/librte_power/guest_channel.c b/lib/librte_power/guest_channel.c index 85c92fa..fa5de0f 100644 --- a/lib/librte_power/guest_channel.c +++ b/lib/librte_power/guest_channel.c @@ -148,6 +148,13 @@ guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id) return 0; } +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id) +{ + return guest_channel_send_msg(pkt, lcore_id); +} + + void guest_channel_host_disconnect(unsigned lcore_id) { diff --git a/lib/librte_power/guest_channel.h b/lib/librte_power/guest_channel.h index 9e18af5..741339c 100644 --- a/lib/librte_power/guest_channel.h +++ b/lib/librte_power/guest_channel.h @@ -81,6 +81,21 @@ void guest_channel_host_disconnect(unsigned lcore_id); */ int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); +/** + * Send a message contained in pkt over the Virtio-Serial to the host endpoint. + * + * @param pkt + * Pointer to a populated struct channel_packet + * + * @param lcore_id + * lcore_id. + * + * @return + * - 0 on success. + * - Negative on error. + */ +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id); #ifdef __cplusplus } diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map index 9ae0627..96dc42e 100644 --- a/lib/librte_power/rte_power_version.map +++ b/lib/librte_power/rte_power_version.map @@ -20,6 +20,7 @@ DPDK_2.0 { DPDK_17.11 { global: + rte_power_guest_channel_send_msg; rte_power_freq_disable_turbo; rte_power_freq_enable_turbo; rte_power_turbo_status; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 8/9] examples/guest_cli: add send policy to host 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (6 preceding siblings ...) 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 7/9] power: add send channel msg function to map file David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt ` (3 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Here we're adding an example of setting up a policy, and allowing the vm_cli_guest app to send it to the host using the cli command "send_policy now" Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> --- .../guest_cli/vm_power_cli_guest.c | 97 ++++++++++++++++++++++ .../guest_cli/vm_power_cli_guest.h | 6 -- 2 files changed, 97 insertions(+), 6 deletions(-) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 4e982bd..dc9efc2 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -45,8 +45,10 @@ #include <cmdline.h> #include <rte_log.h> #include <rte_lcore.h> +#include <rte_ethdev.h> #include <rte_power.h> +#include <guest_channel.h> #include "vm_power_cli_guest.h" @@ -139,8 +141,103 @@ cmdline_parse_inst_t cmd_set_cpu_freq_set = { }, }; +struct cmd_send_policy_result { + cmdline_fixed_string_t send_policy; + cmdline_fixed_string_t cmd; +}; + +union PFID { + struct ether_addr addr; + uint64_t pfid; +}; + +static inline int +send_policy(void) +{ + struct channel_packet pkt; + int ret; + + union PFID pfid; + /* Use port MAC address as the vfid */ + rte_eth_macaddr_get(0, &pfid.addr); + printf("Port %u MAC: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":" + "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n", + 1, + pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1], + pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3], + pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]); + pkt.vfid[0] = pfid.pfid; + + pkt.nb_mac_to_monitor = 1; + pkt.t_boost_status.tbEnabled = false; + + pkt.vcpu_to_control[0] = 0; + pkt.vcpu_to_control[1] = 1; + pkt.num_vcpu = 2; + /* Dummy Population. */ + pkt.traffic_policy.min_packet_thresh = 96000; + pkt.traffic_policy.avg_max_packet_thresh = 1800000; + pkt.traffic_policy.max_max_packet_thresh = 2000000; + + pkt.timer_policy.busy_hours[0] = 3; + pkt.timer_policy.busy_hours[1] = 4; + pkt.timer_policy.busy_hours[2] = 5; + pkt.timer_policy.quiet_hours[0] = 11; + pkt.timer_policy.quiet_hours[1] = 12; + pkt.timer_policy.quiet_hours[2] = 13; + + pkt.timer_policy.hours_to_use_traffic_profile[0] = 8; + pkt.timer_policy.hours_to_use_traffic_profile[1] = 10; + + pkt.workload = LOW; + pkt.policy_to_use = TIME; + pkt.command = PKT_POLICY; + strcpy(pkt.vm_name, "ubuntu2"); + ret = rte_power_guest_channel_send_msg(&pkt, 1); + if (ret == 0) + return 1; + RTE_LOG(DEBUG, POWER, "Error sending message: %s\n", + ret > 0 ? strerror(ret) : "channel not connected"); + return -1; +} + +static void +cmd_send_policy_parsed(void *parsed_result, struct cmdline *cl, + __attribute__((unused)) void *data) +{ + int ret = -1; + struct cmd_send_policy_result *res = parsed_result; + + if (!strcmp(res->cmd, "now")) { + printf("Sending Policy down now!\n"); + ret = send_policy(); + } + if (ret != 1) + cmdline_printf(cl, "Error sending message: %s\n", + strerror(ret)); +} + +cmdline_parse_token_string_t cmd_send_policy = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + send_policy, "send_policy"); +cmdline_parse_token_string_t cmd_send_policy_cmd_cmd = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + cmd, "now"); + +cmdline_parse_inst_t cmd_send_policy_set = { + .f = cmd_send_policy_parsed, + .data = NULL, + .help_str = "send_policy now", + .tokens = { + (void *)&cmd_send_policy, + (void *)&cmd_send_policy_cmd_cmd, + NULL, + }, +}; + cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_quit, + (cmdline_parse_inst_t *)&cmd_send_policy_set, (cmdline_parse_inst_t *)&cmd_set_cpu_freq_set, NULL, }; diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h index 0c4bdd5..277eab3 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h @@ -40,12 +40,6 @@ extern "C" { #include "channel_commands.h" -int guest_channel_host_connect(unsigned lcore_id); - -int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); - -void guest_channel_host_disconnect(unsigned lcore_id); - void run_cli(__attribute__((unused)) void *arg); #ifdef __cplusplus -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v7 9/9] examples/vm_power_mgr: set MAC address of VF 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (7 preceding siblings ...) 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 8/9] examples/guest_cli: add send policy to host David Hunt @ 2017-10-05 13:28 ` David Hunt 2017-10-05 13:54 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest Ananyev, Konstantin ` (2 subsequent siblings) 11 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 13:28 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt We need to set vf mac from the host, so that they will be in sync on the guest and the host. Otherwise, we'll have a random mac on the guest, and a 00:00:00:00:00:00 mac on the host. Signed-off-by: David Hunt <david.hunt@intel.com> --- examples/vm_power_manager/main.c | 43 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index 698abca..5147789 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -58,6 +58,9 @@ #include "channel_monitor.h" #include "power_manager.h" #include "vm_power_cli.h" +#include <rte_pmd_ixgbe.h> +#include <rte_pmd_i40e.h> +#include <rte_pmd_bnxt.h> #define RX_RING_SIZE 512 #define TX_RING_SIZE 512 @@ -222,7 +225,7 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) (uint8_t)portid); continue; } - /* clear all_ports_up flag if any link down */ + /* clear all_ports_up flag if any link down */ if (link.link_status == ETH_LINK_DOWN) { all_ports_up = 0; break; @@ -301,11 +304,49 @@ main(int argc, char **argv) /* Initialize ports. */ for (portid = 0; portid < nb_ports; portid++) { + struct ether_addr eth; + int w, j; + int ret = -ENOTSUP; + if ((enabled_port_mask & (1 << portid)) == 0) continue; + + eth.addr_bytes[0] = 0xe0; + eth.addr_bytes[1] = 0xe0; + eth.addr_bytes[2] = 0xe0; + eth.addr_bytes[3] = 0xe0; + eth.addr_bytes[4] = portid + 0xf0; + if (port_init(portid, mbuf_pool) != 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", portid); + + for (w = 0; w < MAX_VFS; w++) { + eth.addr_bytes[5] = w + 0xf0; + + if (ret == -ENOTSUP) + ret = rte_pmd_ixgbe_set_vf_mac_addr(portid, + w, ð); + if (ret == -ENOTSUP) + ret = rte_pmd_i40e_set_vf_mac_addr(portid, + w, ð); + if (ret == -ENOTSUP) + ret = rte_pmd_bnxt_set_vf_mac_addr(portid, + w, ð); + + switch (ret) { + case 0: + printf("Port %d VF %d MAC: ", + portid, w); + for (j = 0; j < 6; j++) { + printf("%02x", eth.addr_bytes[j]); + if (j < 5) + printf(":"); + } + printf("\n"); + break; + } + } } lcore_id = rte_get_next_lcore(-1, 1, 0); -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (8 preceding siblings ...) 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt @ 2017-10-05 13:54 ` Ananyev, Konstantin 2017-10-05 14:12 ` santosh 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt 11 siblings, 0 replies; 105+ messages in thread From: Ananyev, Konstantin @ 2017-10-05 13:54 UTC (permalink / raw) To: Hunt, David, dev; +Cc: Wu, Jingjing, santosh.shukla > -----Original Message----- > From: Hunt, David > Sent: Thursday, October 5, 2017 2:28 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; santosh.shukla@caviumnetworks.com > Subject: [PATCH v7 0/9] Policy Based Power Control for Guest > > Policy Based Power Control for Guest > > This patchset adds the facility for a guest VM to send a policy down to the > host that will allow the host to scale up/down cpu frequencies > depending on the policy criteria independently of the DPDK app running in > the guest. This differs from the previous vm_power implementation where > individual scale up/down requests were send from the guest to the host via > virtio-serial. > > V7 patchset changes: > * Changed return code of rte_pmd_i40e_query_vfid_by_mac() from an > int64_t to int > > V6 patchset changes: > * Fixed comments in header for rte_pmd_i40e_query_vfid_by_mac. > * changed rte_pmd_i40e_query_vfid_by_mac return code from uint to int > as it can return negative error codes. > * Removed bool enum from channel_commands.h, including stdbool.h instead. > * Added #define VM_MAX_NAME_SZ 32 to channel_commands.h > * Renamed a few variables to be more readable. > * Added returns in a few places if failed to get info on domain. > * Fixed power_manager_init to keep track of num_freqs for each core. > * In power_manager_scale_core_med(), changed a hardcoded '5' to instead > be calculated from the centre of the frequency list > (global_core_freq_info[core_num].num_freqs / 2) > > V5 patchset changes: > * Removed most of the #ifdef I40_PMD as it will be applicable to > other PMDs in the future. > * Changed the parameter of rte_pmd_i40e_query_vfid_by_mac from a uint64 > to a const struct ether_addr *, rather than casting it later in the > function. > > V4 patchset changes: > * None, re-post to mailing list under the correct email thread. > > V3 patchset changes: > * Changed to using is_same_ether_addr() instead of looping through > the mac address bytes to compare them. > * Tweaked some comments and working in the i40e patch after review. > * Added a patch to the set to add new i40e function to map file, so > as to allow shared library builds. The power library API needs a cleanup > in next release, so will add API/ABI warning for this cleanup in a > separate patch. > > V2 patchset changes: > * Removed API's in ethdev layer. > * Now just a single new API in the i40e driver for mapping VF MAC to > VF index. > * Moved new function from rte_rxtx.c to rte_pmd_i40e.c > * Removed function for reading i40e register, moved to using the > standard stats API. > * Renamed i40e function to rte_pmd_i40e_query_vfid_by_mac > * Cleaned up policy generation code. > > It's a modification of the vm_power_manager app that runs in the host, and > the guest_vm_power_app example app that runs in the guest. This allows the > guest to send down a policy to the host via virtio-serial, which then allows > the host to scale up/down based on the criteria in the policy, resulting in > quicker scale up/down than individual requests coming from the guest. > It also means that the DPDK application running in the guest does not need > to be modified in any way, it is unaware that it's cores are being scaled > up/down, reducing the effort in implementing a power-aware infrastructure. > > The usage model is as follows: > 1. Set up the VF's and assign to the guest in the usual way. > 2. run vm_power_manager on the host, creating a channel to the guest. > 3. Start the guest_vm_power_mgr app on the guest, which establishes > a virtio-serial channel to the host. > 4. Send down the profile for the guest using the "send_profile now" command. > There is an example profile hard-coded into guest_vm_power_mgr. > 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. > 6. Send traffic into the VFs at varying traffic rates. > Observe the frequency change on the host (turbostat -i 1) > > The sequence of code changes are as follows: > > A new function has been aded to the i40e driver to allow mapping of > a VF MAC to VF index. > > Next we make an addition to librte_power that adds an extra command to allow > the passing of a policy structure from the guest to the host. This struct > contains information like busy/quiet hour, packet throughput thresholds, etc. > > The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to > physical CPU (pcpu) IDs so that the host can scale up/down the cores used > in the guest. > > The remaining patches are functionality to process the policy, and take action > when the relevant trigger occurs to cause a frequency change. > > [1/9] net/i40e: add API to convert VF MAC to VF id > [2/9] lib/librte_power: add extra msg type for policies > [3/9] examples/vm_power_mgr: add vcpu to pcpu mapping > [4/9] examples/vm_power_mgr: add scale to medium freq fn > [5/9] examples/vm_power_mgr: add policy to channels > [6/9] examples/vm_power_mgr: add port initialisation > [7/9] power: add send channel msg function to map file > [8/9] examples/guest_cli: add send policy to host > [9/9] examples/vm_power_mgr: set MAC address of VF Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (9 preceding siblings ...) 2017-10-05 13:54 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest Ananyev, Konstantin @ 2017-10-05 14:12 ` santosh 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt 11 siblings, 0 replies; 105+ messages in thread From: santosh @ 2017-10-05 14:12 UTC (permalink / raw) To: David Hunt, dev; +Cc: konstantin.ananyev, jingjing.wu On Thursday 05 October 2017 06:58 PM, David Hunt wrote: > Policy Based Power Control for Guest > > This patchset adds the facility for a guest VM to send a policy down to the > host that will allow the host to scale up/down cpu frequencies > depending on the policy criteria independently of the DPDK app running in > the guest. This differs from the previous vm_power implementation where > individual scale up/down requests were send from the guest to the host via > virtio-serial. > > V7 patchset changes: > * Changed return code of rte_pmd_i40e_query_vfid_by_mac() from an > int64_t to int Series: Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 0/9] Policy Based Power Control for Guest 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt ` (10 preceding siblings ...) 2017-10-05 14:12 ` santosh @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt ` (9 more replies) 11 siblings, 10 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla Policy Based Power Control for Guest This patchset adds the facility for a guest VM to send a policy down to the host that will allow the host to scale up/down cpu frequencies depending on the policy criteria independently of the DPDK app running in the guest. This differs from the previous vm_power implementation where individual scale up/down requests were send from the guest to the host via virtio-serial. V8 patchset changes: * Added Ack's and Reviewed-by's to individual patches in the set so as to keep patchwork A/R/T flags properly in sync. V7 patchset changes: * Changed return code of rte_pmd_i40e_query_vfid_by_mac() from an int64_t to int V6 patchset changes: * Fixed comments in header for rte_pmd_i40e_query_vfid_by_mac. * changed rte_pmd_i40e_query_vfid_by_mac return code from uint to int as it can return negative error codes. * Removed bool enum from channel_commands.h, including stdbool.h instead. * Added #define VM_MAX_NAME_SZ 32 to channel_commands.h * Renamed a few variables to be more readable. * Added returns in a few places if failed to get info on domain. * Fixed power_manager_init to keep track of num_freqs for each core. * In power_manager_scale_core_med(), changed a hardcoded '5' to instead be calculated from the centre of the frequency list (global_core_freq_info[core_num].num_freqs / 2) V5 patchset changes: * Removed most of the #ifdef I40_PMD as it will be applicable to other PMDs in the future. * Changed the parameter of rte_pmd_i40e_query_vfid_by_mac from a uint64 to a const struct ether_addr *, rather than casting it later in the function. V4 patchset changes: * None, re-post to mailing list under the correct email thread. V3 patchset changes: * Changed to using is_same_ether_addr() instead of looping through the mac address bytes to compare them. * Tweaked some comments and working in the i40e patch after review. * Added a patch to the set to add new i40e function to map file, so as to allow shared library builds. The power library API needs a cleanup in next release, so will add API/ABI warning for this cleanup in a separate patch. V2 patchset changes: * Removed API's in ethdev layer. * Now just a single new API in the i40e driver for mapping VF MAC to VF index. * Moved new function from rte_rxtx.c to rte_pmd_i40e.c * Removed function for reading i40e register, moved to using the standard stats API. * Renamed i40e function to rte_pmd_i40e_query_vfid_by_mac * Cleaned up policy generation code. It's a modification of the vm_power_manager app that runs in the host, and the guest_vm_power_app example app that runs in the guest. This allows the guest to send down a policy to the host via virtio-serial, which then allows the host to scale up/down based on the criteria in the policy, resulting in quicker scale up/down than individual requests coming from the guest. It also means that the DPDK application running in the guest does not need to be modified in any way, it is unaware that it's cores are being scaled up/down, reducing the effort in implementing a power-aware infrastructure. The usage model is as follows: 1. Set up the VF's and assign to the guest in the usual way. 2. run vm_power_manager on the host, creating a channel to the guest. 3. Start the guest_vm_power_mgr app on the guest, which establishes a virtio-serial channel to the host. 4. Send down the profile for the guest using the "send_profile now" command. There is an example profile hard-coded into guest_vm_power_mgr. 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. 6. Send traffic into the VFs at varying traffic rates. Observe the frequency change on the host (turbostat -i 1) The sequence of code changes are as follows: A new function has been aded to the i40e driver to allow mapping of a VF MAC to VF index. Next we make an addition to librte_power that adds an extra command to allow the passing of a policy structure from the guest to the host. This struct contains information like busy/quiet hour, packet throughput thresholds, etc. The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to physical CPU (pcpu) IDs so that the host can scale up/down the cores used in the guest. The remaining patches are functionality to process the policy, and take action when the relevant trigger occurs to cause a frequency change. [1/9] net/i40e: add API to convert VF MAC to VF id [2/9] lib/librte_power: add extra msg type for policies [3/9] examples/vm_power_mgr: add vcpu to pcpu mapping [4/9] examples/vm_power_mgr: add scale to medium freq fn [5/9] examples/vm_power_mgr: add policy to channels [6/9] examples/vm_power_mgr: add port initialisation [7/9] power: add send channel msg function to map file [8/9] examples/guest_cli: add send policy to host [9/9] examples/vm_power_mgr: set MAC address of VF ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 1/9] net/i40e: add API to convert VF MAC to VF id 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 2/9] lib/librte_power: add extra msg type for policies David Hunt ` (8 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Need a way to convert a vf id to a pf id on the host so as to query the pf for relevant statistics which are used for the frequency changes in the vm_power_manager app. Used when profiles are passed down from the guest to the host, allowing the host to map the vfs to pfs. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- drivers/net/i40e/rte_pmd_i40e.c | 30 ++++++++++++++++++++++++++++++ drivers/net/i40e/rte_pmd_i40e.h | 15 +++++++++++++++ drivers/net/i40e/rte_pmd_i40e_version.map | 7 +++++++ 3 files changed, 52 insertions(+) diff --git a/drivers/net/i40e/rte_pmd_i40e.c b/drivers/net/i40e/rte_pmd_i40e.c index f12b7f4..76d11dd 100644 --- a/drivers/net/i40e/rte_pmd_i40e.c +++ b/drivers/net/i40e/rte_pmd_i40e.c @@ -2115,3 +2115,33 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, return 0; } + +int +rte_pmd_i40e_query_vfid_by_mac(uint8_t port, const struct ether_addr *vf_mac) +{ + struct rte_eth_dev *dev; + struct ether_addr *mac; + struct i40e_pf *pf; + int vf_id; + struct i40e_pf_vf *vf; + uint16_t vf_num; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV); + dev = &rte_eth_devices[port]; + + if (!is_i40e_supported(dev)) + return -ENOTSUP; + + pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); + vf_num = pf->vf_num; + + for (vf_id = 0; vf_id < vf_num; vf_id++) { + vf = &pf->vfs[vf_id]; + mac = &vf->mac_addr; + + if (is_same_ether_addr(mac, vf_mac)) + return vf_id; + } + + return -EINVAL; +} diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h index 356fa89..a355896 100644 --- a/drivers/net/i40e/rte_pmd_i40e.h +++ b/drivers/net/i40e/rte_pmd_i40e.h @@ -637,4 +637,19 @@ int rte_pmd_i40e_ptype_mapping_replace(uint8_t port, uint8_t mask, uint32_t pkt_type); +/** + * On the PF, find VF index based on VF MAC address + * + * @param port + * pointer to port identifier of the device + * @param vf_mac + * the mac address of the vf to determine index of + * @return + * The index of vfid If successful. + * -EINVAL: vf mac address does not exist for this port + * -ENOTSUP: i40e not supported for this port. + */ +int rte_pmd_i40e_query_vfid_by_mac(uint8_t port, + const struct ether_addr *vf_mac); + #endif /* _PMD_I40E_H_ */ diff --git a/drivers/net/i40e/rte_pmd_i40e_version.map b/drivers/net/i40e/rte_pmd_i40e_version.map index 20cc980..d8b74bd 100644 --- a/drivers/net/i40e/rte_pmd_i40e_version.map +++ b/drivers/net/i40e/rte_pmd_i40e_version.map @@ -45,3 +45,10 @@ DPDK_17.08 { rte_pmd_i40e_get_ddp_info; } DPDK_17.05; + +DPDK_17.11 { + global: + + rte_pmd_i40e_query_vfid_by_mac; + +} DPDK_17.08; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 2/9] lib/librte_power: add extra msg type for policies 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt ` (7 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- lib/librte_power/channel_commands.h | 42 +++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index 484085b..f0f5f0a 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -39,6 +39,7 @@ extern "C" { #endif #include <stdint.h> +#include <stdbool.h> /* Maximum number of channels per VM */ #define CHANNEL_CMDS_MAX_VM_CHANNELS 64 @@ -46,6 +47,7 @@ extern "C" { /* Valid Commands */ #define CPU_POWER 1 #define CPU_POWER_CONNECT 2 +#define PKT_POLICY 3 /* CPU Power Command Scaling */ #define CPU_POWER_SCALE_UP 1 @@ -54,11 +56,51 @@ extern "C" { #define CPU_POWER_SCALE_MIN 4 #define CPU_POWER_ENABLE_TURBO 5 #define CPU_POWER_DISABLE_TURBO 6 +#define HOURS 24 + +#define MAX_VFS 10 +#define VM_MAX_NAME_SZ 32 + +#define MAX_VCPU_PER_VM 8 + +struct t_boost_status { + bool tbEnabled; +}; + +struct timer_profile { + int busy_hours[HOURS]; + int quiet_hours[HOURS]; + int hours_to_use_traffic_profile[HOURS]; +}; + +enum workload {HIGH, MEDIUM, LOW}; +enum policy_to_use { + TRAFFIC, + TIME, + WORKLOAD +}; + +struct traffic { + uint32_t min_packet_thresh; + uint32_t avg_max_packet_thresh; + uint32_t max_max_packet_thresh; +}; struct channel_packet { uint64_t resource_id; /**< core_num, device */ uint32_t unit; /**< scale down/up/min/max */ uint32_t command; /**< Power, IO, etc */ + char vm_name[VM_MAX_NAME_SZ]; + + uint64_t vfid[MAX_VFS]; + int nb_mac_to_monitor; + struct traffic traffic_policy; + uint8_t vcpu_to_control[MAX_VCPU_PER_VM]; + uint8_t num_vcpu; + struct timer_profile timer_policy; + enum workload workload; + enum policy_to_use policy_to_use; + struct t_boost_status t_boost_status; }; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 2/9] lib/librte_power: add extra msg type for policies David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt ` (6 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- examples/vm_power_manager/channel_manager.c | 67 +++++++++++++++++++++++++++++ examples/vm_power_manager/channel_manager.h | 25 +++++++++++ 2 files changed, 92 insertions(+) diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c index e068ae2..ab856bd 100644 --- a/examples/vm_power_manager/channel_manager.c +++ b/examples/vm_power_manager/channel_manager.c @@ -574,6 +574,73 @@ set_channel_status(const char *vm_name, unsigned *channel_list, return num_channels_changed; } +void +get_all_vm(int *num_vm, int *num_vcpu) +{ + + virNodeInfo node_info; + virDomainPtr *domptr; + uint64_t mask; + int i, ii, numVcpus[MAX_VCPUS], cpu, n_vcpus; + unsigned int jj; + const char *vm_name; + unsigned int domain_flags = VIR_CONNECT_LIST_DOMAINS_RUNNING | + VIR_CONNECT_LIST_DOMAINS_PERSISTENT; + unsigned int domain_flag = VIR_DOMAIN_VCPU_CONFIG; + + + memset(global_cpumaps, 0, CHANNEL_CMDS_MAX_CPUS*global_maplen); + if (virNodeGetInfo(global_vir_conn_ptr, &node_info)) { + RTE_LOG(ERR, CHANNEL_MANAGER, "Unable to retrieve node Info\n"); + return; + } + + /* Returns number of pcpus */ + global_n_host_cpus = (unsigned int)node_info.cpus; + + /* Returns number of active domains */ + *num_vm = virConnectListAllDomains(global_vir_conn_ptr, &domptr, + domain_flags); + if (*num_vm <= 0) { + RTE_LOG(ERR, CHANNEL_MANAGER, "No Active Domains Running\n"); + return; + } + + for (i = 0; i < *num_vm; i++) { + + /* Get Domain Names */ + vm_name = virDomainGetName(domptr[i]); + lvm_info[i].vm_name = vm_name; + + /* Get Number of Vcpus */ + numVcpus[i] = virDomainGetVcpusFlags(domptr[i], domain_flag); + + /* Get Number of VCpus & VcpuPinInfo */ + n_vcpus = virDomainGetVcpuPinInfo(domptr[i], + numVcpus[i], global_cpumaps, + global_maplen, domain_flag); + + if ((int)n_vcpus > 0) { + *num_vcpu = n_vcpus; + lvm_info[i].num_cpus = n_vcpus; + } + + /* Save pcpu in use by libvirt VMs */ + for (ii = 0; ii < n_vcpus; ii++) { + mask = 0; + for (jj = 0; jj < global_n_host_cpus; jj++) { + if (VIR_CPU_USABLE(global_cpumaps, + global_maplen, ii, jj) > 0) { + mask |= 1ULL << jj; + } + } + ITERATIVE_BITMASK_CHECK_64(mask, cpu) { + lvm_info[i].pcpus[ii] = cpu; + } + } + } +} + int get_info_vm(const char *vm_name, struct vm_info *info) { diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h index 47c3b9c..358fb8f 100644 --- a/examples/vm_power_manager/channel_manager.h +++ b/examples/vm_power_manager/channel_manager.h @@ -66,6 +66,17 @@ struct sockaddr_un _sockaddr_un; #define UNIX_PATH_MAX sizeof(_sockaddr_un.sun_path) #endif +#define MAX_VMS 4 +#define MAX_VCPUS 20 + + +struct libvirt_vm_info { + const char *vm_name; + unsigned int pcpus[MAX_VCPUS]; + uint8_t num_cpus; +}; + +struct libvirt_vm_info lvm_info[MAX_VMS]; /* Communication Channel Status */ enum channel_status { CHANNEL_MGR_CHANNEL_DISCONNECTED = 0, CHANNEL_MGR_CHANNEL_CONNECTED, @@ -319,6 +330,20 @@ int set_channel_status(const char *vm_name, unsigned *channel_list, */ int get_info_vm(const char *vm_name, struct vm_info *info); +/** + * Populates a table with all domains running and their physical cpu. + * All information is gathered through libvirt api. + * + * @param num_vm + * modified to store number of active VMs + * + * @param num_vcpu + modified to store number of vcpus active + * + * @return + * void + */ +void get_all_vm(int *num_vm, int *num_vcpu); #ifdef __cplusplus } #endif -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 4/9] examples/vm_power_mgr: add scale to medium freq fn 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt ` (2 preceding siblings ...) 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 5/9] examples/vm_power_mgr: add policy to channels David Hunt ` (5 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic, Rory Sexton Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- examples/vm_power_manager/power_manager.c | 32 ++++++++++++++++++++++++++----- examples/vm_power_manager/power_manager.h | 13 +++++++++++++ 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c index 80705f9..1834a82 100644 --- a/examples/vm_power_manager/power_manager.c +++ b/examples/vm_power_manager/power_manager.c @@ -108,7 +108,7 @@ set_host_cpus_mask(void) int power_manager_init(void) { - unsigned i, num_cpus; + unsigned int i, num_cpus, num_freqs; uint64_t cpu_mask; int ret = 0; @@ -121,15 +121,21 @@ power_manager_init(void) rte_power_set_env(PM_ENV_ACPI_CPUFREQ); cpu_mask = global_enabled_cpus; for (i = 0; cpu_mask; cpu_mask &= ~(1 << i++)) { - if (rte_power_init(i) < 0 || rte_power_freqs(i, - global_core_freq_info[i].freqs, - RTE_MAX_LCORE_FREQS) == 0) { - RTE_LOG(ERR, POWER_MANAGER, "Unable to initialize power manager " + if (rte_power_init(i) < 0) + RTE_LOG(ERR, POWER_MANAGER, + "Unable to initialize power manager " "for core %u\n", i); + num_freqs = rte_power_freqs(i, global_core_freq_info[i].freqs, + RTE_MAX_LCORE_FREQS); + if (num_freqs == 0) { + RTE_LOG(ERR, POWER_MANAGER, + "Unable to get frequency list for core %u\n", + i); global_enabled_cpus &= ~(1 << i); num_cpus--; ret = -1; } + global_core_freq_info[i].num_freqs = num_freqs; rte_spinlock_init(&global_core_freq_info[i].power_sl); } RTE_LOG(INFO, POWER_MANAGER, "Detected %u host CPUs , enabled core mask:" @@ -286,3 +292,19 @@ power_manager_disable_turbo_core(unsigned int core_num) POWER_SCALE_CORE(disable_turbo, core_num, ret); return ret; } + +int +power_manager_scale_core_med(unsigned int core_num) +{ + int ret = 0; + + if (core_num >= POWER_MGR_MAX_CPUS) + return -1; + if (!(global_enabled_cpus & (1ULL << core_num))) + return -1; + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); + ret = rte_power_set_freq(core_num, + global_core_freq_info[core_num].num_freqs / 2); + rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); + return ret; +} diff --git a/examples/vm_power_manager/power_manager.h b/examples/vm_power_manager/power_manager.h index b74d09b..b52fb4c 100644 --- a/examples/vm_power_manager/power_manager.h +++ b/examples/vm_power_manager/power_manager.h @@ -231,6 +231,19 @@ int power_manager_disable_turbo_core(unsigned int core_num); */ uint32_t power_manager_get_current_frequency(unsigned core_num); +/** + * Scale to medium frequency for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to change frequency + * + * @return + * - 1 on success. + * - 0 if frequency not changed. + * - Negative on error. + */ +int power_manager_scale_core_med(unsigned int core_num); #ifdef __cplusplus } -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 5/9] examples/vm_power_mgr: add policy to channels 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt ` (3 preceding siblings ...) 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 6/9] examples/vm_power_mgr: add port initialisation David Hunt ` (4 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- examples/vm_power_manager/Makefile | 16 ++ examples/vm_power_manager/channel_monitor.c | 321 +++++++++++++++++++++++++++- examples/vm_power_manager/channel_monitor.h | 18 ++ 3 files changed, 348 insertions(+), 7 deletions(-) diff --git a/examples/vm_power_manager/Makefile b/examples/vm_power_manager/Makefile index 59a9641..9cf20a2 100644 --- a/examples/vm_power_manager/Makefile +++ b/examples/vm_power_manager/Makefile @@ -54,6 +54,22 @@ CFLAGS += $(WERROR_FLAGS) LDLIBS += -lvirt +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) + +ifeq ($(CONFIG_RTE_LIBRTE_IXGBE_PMD),y) +LDLIBS += -lrte_pmd_ixgbe +endif + +ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y) +LDLIBS += -lrte_pmd_i40e +endif + +ifeq ($(CONFIG_RTE_LIBRTE_BNXT_PMD),y) +LDLIBS += -lrte_pmd_bnxt +endif + +endif + # workaround for a gcc bug with noreturn attribute # http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index ac40dac..f16358d 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -41,13 +41,17 @@ #include <sys/types.h> #include <sys/epoll.h> #include <sys/queue.h> +#include <sys/time.h> #include <rte_log.h> #include <rte_memory.h> #include <rte_malloc.h> #include <rte_atomic.h> +#include <rte_cycles.h> +#include <rte_ethdev.h> +#include <rte_pmd_i40e.h> - +#include <libvirt/libvirt.h> #include "channel_monitor.h" #include "channel_commands.h" #include "channel_manager.h" @@ -57,10 +61,15 @@ #define MAX_EVENTS 256 +uint64_t vsi_pkt_count_prev[384]; +uint64_t rdtsc_prev[384]; +double time_period_s = 1; static volatile unsigned run_loop = 1; static int global_event_fd; +static unsigned int policy_is_set; static struct epoll_event *global_events_list; +static struct policy policies[MAX_VMS]; void channel_monitor_exit(void) { @@ -68,6 +77,286 @@ void channel_monitor_exit(void) rte_free(global_events_list); } +static void +core_share(int pNo, int z, int x, int t) +{ + if (policies[pNo].core_share[z].pcpu == lvm_info[x].pcpus[t]) { + if (strcmp(policies[pNo].pkt.vm_name, + lvm_info[x].vm_name) != 0) { + policies[pNo].core_share[z].status = 1; + power_manager_scale_core_max( + policies[pNo].core_share[z].pcpu); + } + } +} + +static void +core_share_status(int pNo) +{ + + int noVms, noVcpus, z, x, t; + + get_all_vm(&noVms, &noVcpus); + + /* Reset Core Share Status. */ + for (z = 0; z < noVcpus; z++) + policies[pNo].core_share[z].status = 0; + + /* Foreach vcpu in a policy. */ + for (z = 0; z < policies[pNo].pkt.num_vcpu; z++) { + /* Foreach VM on the platform. */ + for (x = 0; x < noVms; x++) { + /* Foreach vcpu of VMs on platform. */ + for (t = 0; t < lvm_info[x].num_cpus; t++) + core_share(pNo, z, x, t); + } + } +} + +static void +get_pcpu_to_control(struct policy *pol) +{ + + /* Convert vcpu to pcpu. */ + struct vm_info info; + int pcpu, count; + uint64_t mask_u64b; + + RTE_LOG(INFO, CHANNEL_MONITOR, "Looking for pcpu for %s\n", + pol->pkt.vm_name); + get_info_vm(pol->pkt.vm_name, &info); + + for (count = 0; count < pol->pkt.num_vcpu; count++) { + mask_u64b = info.pcpu_mask[pol->pkt.vcpu_to_control[count]]; + for (pcpu = 0; mask_u64b; mask_u64b &= ~(1ULL << pcpu++)) { + if ((mask_u64b >> pcpu) & 1) + pol->core_share[count].pcpu = pcpu; + } + } +} + +static int +get_pfid(struct policy *pol) +{ + + int i, x, ret = 0, nb_ports; + + nb_ports = rte_eth_dev_count(); + for (i = 0; i < pol->pkt.nb_mac_to_monitor; i++) { + + for (x = 0; x < nb_ports; x++) { + ret = rte_pmd_i40e_query_vfid_by_mac(x, + (struct ether_addr *)&(pol->pkt.vfid[i])); + if (ret != -EINVAL) { + pol->port[i] = x; + break; + } + } + if (ret == -EINVAL || ret == -ENOTSUP || ret == ENODEV) { + RTE_LOG(INFO, CHANNEL_MONITOR, + "Error with Policy. MAC not found on " + "attached ports "); + pol->enabled = 0; + return ret; + } + pol->pfid[i] = ret; + } + return 1; +} + +static int +update_policy(struct channel_packet *pkt) +{ + + unsigned int updated = 0; + + for (int i = 0; i < MAX_VMS; i++) { + if (strcmp(policies[i].pkt.vm_name, pkt->vm_name) == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) { + updated = 1; + break; + } + core_share_status(i); + policies[i].enabled = 1; + updated = 1; + } + } + if (!updated) { + for (int i = 0; i < MAX_VMS; i++) { + if (policies[i].enabled == 0) { + policies[i].pkt = *pkt; + get_pcpu_to_control(&policies[i]); + if (get_pfid(&policies[i]) == -1) + break; + core_share_status(i); + policies[i].enabled = 1; + break; + } + } + } + return 0; +} + +static uint64_t +get_pkt_diff(struct policy *pol) +{ + + uint64_t vsi_pkt_count, + vsi_pkt_total = 0, + vsi_pkt_count_prev_total = 0; + double rdtsc_curr, rdtsc_diff, diff; + int x; + struct rte_eth_stats vf_stats; + + for (x = 0; x < pol->pkt.nb_mac_to_monitor; x++) { + + /*Read vsi stats*/ + if (rte_pmd_i40e_get_vf_stats(x, pol->pfid[x], &vf_stats) == 0) + vsi_pkt_count = vf_stats.ipackets; + else + vsi_pkt_count = -1; + + vsi_pkt_total += vsi_pkt_count; + + vsi_pkt_count_prev_total += vsi_pkt_count_prev[pol->pfid[x]]; + vsi_pkt_count_prev[pol->pfid[x]] = vsi_pkt_count; + } + + rdtsc_curr = rte_rdtsc_precise(); + rdtsc_diff = rdtsc_curr - rdtsc_prev[pol->pfid[x-1]]; + rdtsc_prev[pol->pfid[x-1]] = rdtsc_curr; + + diff = (vsi_pkt_total - vsi_pkt_count_prev_total) * + ((double)rte_get_tsc_hz() / rdtsc_diff); + + return diff; +} + +static void +apply_traffic_profile(struct policy *pol) +{ + + int count; + uint64_t diff = 0; + + diff = get_pkt_diff(pol); + + RTE_LOG(INFO, CHANNEL_MONITOR, "Applying traffic profile\n"); + + if (diff >= (pol->pkt.traffic_policy.max_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (diff >= (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (diff < (pol->pkt.traffic_policy.avg_max_packet_thresh)) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_time_profile(struct policy *pol) +{ + + int count, x; + struct timeval tv; + struct tm *ptm; + char time_string[40]; + + /* Obtain the time of day, and convert it to a tm struct. */ + gettimeofday(&tv, NULL); + ptm = localtime(&tv.tv_sec); + /* Format the date and time, down to a single second. */ + strftime(time_string, sizeof(time_string), "%Y-%m-%d %H:%M:%S", ptm); + + for (x = 0; x < HOURS; x++) { + + if (ptm->tm_hour == pol->pkt.timer_policy.busy_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_max( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling up core %d to max\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.quiet_hours[x]) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) { + power_manager_scale_core_min( + pol->core_share[count].pcpu); + RTE_LOG(INFO, CHANNEL_MONITOR, + "Scaling down core %d to min\n", + pol->core_share[count].pcpu); + } + } + break; + } else if (ptm->tm_hour == + pol->pkt.timer_policy.hours_to_use_traffic_profile[x]) { + apply_traffic_profile(pol); + break; + } + } +} + +static void +apply_workload_profile(struct policy *pol) +{ + + int count; + + if (pol->pkt.workload == HIGH) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_max( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == MEDIUM) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_med( + pol->core_share[count].pcpu); + } + } else if (pol->pkt.workload == LOW) { + for (count = 0; count < pol->pkt.num_vcpu; count++) { + if (pol->core_share[count].status != 1) + power_manager_scale_core_min( + pol->core_share[count].pcpu); + } + } +} + +static void +apply_policy(struct policy *pol) +{ + + struct channel_packet *pkt = &pol->pkt; + + /*Check policy to use*/ + if (pkt->policy_to_use == TRAFFIC) + apply_traffic_profile(pol); + else if (pkt->policy_to_use == TIME) + apply_time_profile(pol); + else if (pkt->policy_to_use == WORKLOAD) + apply_workload_profile(pol); +} + + static int process_request(struct channel_packet *pkt, struct channel_info *chan_info) { @@ -140,6 +429,13 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) } } + + if (pkt->command == PKT_POLICY) { + RTE_LOG(INFO, CHANNEL_MONITOR, "\nProcessing Policy request from Guest\n"); + update_policy(pkt); + policy_is_set = 1; + } + /* Return is not checked as channel status may have been set to DISABLED * from management thread */ @@ -209,9 +505,10 @@ run_channel_monitor(void) struct channel_info *chan_info = (struct channel_info *) global_events_list[i].data.ptr; if ((global_events_list[i].events & EPOLLERR) || - (global_events_list[i].events & EPOLLHUP)) { + (global_events_list[i].events & EPOLLHUP)) { RTE_LOG(DEBUG, CHANNEL_MONITOR, "Remote closed connection for " - "channel '%s'\n", chan_info->channel_path); + "channel '%s'\n", + chan_info->channel_path); remove_channel(&chan_info); continue; } @@ -223,14 +520,17 @@ run_channel_monitor(void) int buffer_len = sizeof(pkt); while (buffer_len > 0) { - n_bytes = read(chan_info->fd, buffer, buffer_len); + n_bytes = read(chan_info->fd, + buffer, buffer_len); if (n_bytes == buffer_len) break; if (n_bytes == -1) { err = errno; - RTE_LOG(DEBUG, CHANNEL_MONITOR, "Received error on " - "channel '%s' read: %s\n", - chan_info->channel_path, strerror(err)); + RTE_LOG(DEBUG, CHANNEL_MONITOR, + "Received error on " + "channel '%s' read: %s\n", + chan_info->channel_path, + strerror(err)); remove_channel(&chan_info); break; } @@ -241,5 +541,12 @@ run_channel_monitor(void) process_request(&pkt, chan_info); } } + rte_delay_us(time_period_s*1000000); + if (policy_is_set) { + for (int j = 0; j < MAX_VMS; j++) { + if (policies[j].enabled == 1) + apply_policy(&policies[j]); + } + } } } diff --git a/examples/vm_power_manager/channel_monitor.h b/examples/vm_power_manager/channel_monitor.h index c138607..b52c1fc 100644 --- a/examples/vm_power_manager/channel_monitor.h +++ b/examples/vm_power_manager/channel_monitor.h @@ -35,6 +35,24 @@ #define CHANNEL_MONITOR_H_ #include "channel_manager.h" +#include "channel_commands.h" + +struct core_share { + unsigned int pcpu; + /* + * 1 CORE SHARE + * 0 NOT SHARED + */ + int status; +}; + +struct policy { + struct channel_packet pkt; + uint32_t pfid[MAX_VFS]; + uint32_t port[MAX_VFS]; + unsigned int enabled; + struct core_share core_share[MAX_VCPU_PER_VM]; +}; #ifdef __cplusplus extern "C" { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 6/9] examples/vm_power_mgr: add port initialisation 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt ` (4 preceding siblings ...) 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 5/9] examples/vm_power_mgr: add policy to channels David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 7/9] power: add send channel msg function to map file David Hunt ` (3 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt, Nemanja Marjanovic We need to initialise the port's we're monitoring to be able to see the throughput. Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- examples/vm_power_manager/main.c | 220 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index c33fcc9..698abca 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -49,6 +49,9 @@ #include <rte_log.h> #include <rte_per_lcore.h> #include <rte_lcore.h> +#include <rte_ethdev.h> +#include <getopt.h> +#include <rte_cycles.h> #include <rte_debug.h> #include "channel_manager.h" @@ -56,6 +59,192 @@ #include "power_manager.h" #include "vm_power_cli.h" +#define RX_RING_SIZE 512 +#define TX_RING_SIZE 512 + +#define NUM_MBUFS 8191 +#define MBUF_CACHE_SIZE 250 +#define BURST_SIZE 32 + +static uint32_t enabled_port_mask; +static volatile bool force_quit; + +/****************/ +static const struct rte_eth_conf port_conf_default = { + .rxmode = { .max_rx_pkt_len = ETHER_MAX_LEN } +}; + +static inline int +port_init(uint8_t port, struct rte_mempool *mbuf_pool) +{ + struct rte_eth_conf port_conf = port_conf_default; + const uint16_t rx_rings = 1, tx_rings = 1; + int retval; + uint16_t q; + + if (port >= rte_eth_dev_count()) + return -1; + + /* Configure the Ethernet device. */ + retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); + if (retval != 0) + return retval; + + /* Allocate and set up 1 RX queue per Ethernet port. */ + for (q = 0; q < rx_rings; q++) { + retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL, mbuf_pool); + if (retval < 0) + return retval; + } + + /* Allocate and set up 1 TX queue per Ethernet port. */ + for (q = 0; q < tx_rings; q++) { + retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, + rte_eth_dev_socket_id(port), NULL); + if (retval < 0) + return retval; + } + + /* Start the Ethernet port. */ + retval = rte_eth_dev_start(port); + if (retval < 0) + return retval; + + /* Display the port MAC address. */ + struct ether_addr addr; + rte_eth_macaddr_get(port, &addr); + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", + (unsigned int)port, + addr.addr_bytes[0], addr.addr_bytes[1], + addr.addr_bytes[2], addr.addr_bytes[3], + addr.addr_bytes[4], addr.addr_bytes[5]); + + /* Enable RX in promiscuous mode for the Ethernet device. */ + rte_eth_promiscuous_enable(port); + + + return 0; +} + +static int +parse_portmask(const char *portmask) +{ + char *end = NULL; + unsigned long pm; + + /* parse hexadecimal string */ + pm = strtoul(portmask, &end, 16); + if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0')) + return -1; + + if (pm == 0) + return -1; + + return pm; +} +/* Parse the argument given in the command line of the application */ +static int +parse_args(int argc, char **argv) +{ + int opt, ret; + char **argvopt; + int option_index; + char *prgname = argv[0]; + static struct option lgopts[] = { + { "mac-updating", no_argument, 0, 1}, + { "no-mac-updating", no_argument, 0, 0}, + {NULL, 0, 0, 0} + }; + argvopt = argv; + + while ((opt = getopt_long(argc, argvopt, "p:q:T:", + lgopts, &option_index)) != EOF) { + + switch (opt) { + /* portmask */ + case 'p': + enabled_port_mask = parse_portmask(optarg); + if (enabled_port_mask == 0) { + printf("invalid portmask\n"); + return -1; + } + break; + /* long options */ + case 0: + break; + + default: + return -1; + } + } + + if (optind >= 0) + argv[optind-1] = prgname; + + ret = optind-1; + optind = 0; /* reset getopt lib */ + return ret; +} + +static void +check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) +{ +#define CHECK_INTERVAL 100 /* 100ms */ +#define MAX_CHECK_TIME 90 /* 9s (90 * 100ms) in total */ + uint8_t portid, count, all_ports_up, print_flag = 0; + struct rte_eth_link link; + + printf("\nChecking link status"); + fflush(stdout); + for (count = 0; count <= MAX_CHECK_TIME; count++) { + if (force_quit) + return; + all_ports_up = 1; + for (portid = 0; portid < port_num; portid++) { + if (force_quit) + return; + if ((port_mask & (1 << portid)) == 0) + continue; + memset(&link, 0, sizeof(link)); + rte_eth_link_get_nowait(portid, &link); + /* print link status if flag set */ + if (print_flag == 1) { + if (link.link_status) + printf("Port %d Link Up - speed %u " + "Mbps - %s\n", (uint8_t)portid, + (unsigned int)link.link_speed, + (link.link_duplex == ETH_LINK_FULL_DUPLEX) ? + ("full-duplex") : ("half-duplex\n")); + else + printf("Port %d Link Down\n", + (uint8_t)portid); + continue; + } + /* clear all_ports_up flag if any link down */ + if (link.link_status == ETH_LINK_DOWN) { + all_ports_up = 0; + break; + } + } + /* after finally printing all link status, get out */ + if (print_flag == 1) + break; + + if (all_ports_up == 0) { + printf("."); + fflush(stdout); + rte_delay_ms(CHECK_INTERVAL); + } + + /* set the print_flag if all ports up or timeout */ + if (all_ports_up == 1 || count == (MAX_CHECK_TIME - 1)) { + print_flag = 1; + printf("done\n"); + } + } +} static int run_monitor(__attribute__((unused)) void *arg) { @@ -82,6 +271,10 @@ main(int argc, char **argv) { int ret; unsigned lcore_id; + unsigned int nb_ports; + struct rte_mempool *mbuf_pool; + uint8_t portid; + ret = rte_eal_init(argc, argv); if (ret < 0) @@ -90,12 +283,39 @@ main(int argc, char **argv) signal(SIGINT, sig_handler); signal(SIGTERM, sig_handler); + argc -= ret; + argv += ret; + + /* parse application arguments (after the EAL ones) */ + ret = parse_args(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid arguments\n"); + + nb_ports = rte_eth_dev_count(); + + mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports, + MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id()); + + if (mbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); + + /* Initialize ports. */ + for (portid = 0; portid < nb_ports; portid++) { + if ((enabled_port_mask & (1 << portid)) == 0) + continue; + if (port_init(portid, mbuf_pool) != 0) + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", + portid); + } + lcore_id = rte_get_next_lcore(-1, 1, 0); if (lcore_id == RTE_MAX_LCORE) { RTE_LOG(ERR, EAL, "A minimum of two cores are required to run " "application\n"); return 0; } + + check_all_ports_link_status(nb_ports, enabled_port_mask); rte_eal_remote_launch(run_monitor, NULL, lcore_id); if (power_manager_init() < 0) { -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 7/9] power: add send channel msg function to map file 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt ` (5 preceding siblings ...) 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 6/9] examples/vm_power_mgr: add port initialisation David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 8/9] examples/guest_cli: add send policy to host David Hunt ` (2 subsequent siblings) 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt Adding new wrapper function to existing private (but unused 'till now) function with an rte_power_ prefix. The plan is to clean up all the header files in the next release so that only the intended public functions are in the map file and only the relevant headers have the rte_ prefix so that only they are included in the documentation. Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- lib/librte_power/guest_channel.c | 7 +++++++ lib/librte_power/guest_channel.h | 15 +++++++++++++++ lib/librte_power/rte_power_version.map | 1 + 3 files changed, 23 insertions(+) diff --git a/lib/librte_power/guest_channel.c b/lib/librte_power/guest_channel.c index 85c92fa..fa5de0f 100644 --- a/lib/librte_power/guest_channel.c +++ b/lib/librte_power/guest_channel.c @@ -148,6 +148,13 @@ guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id) return 0; } +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id) +{ + return guest_channel_send_msg(pkt, lcore_id); +} + + void guest_channel_host_disconnect(unsigned lcore_id) { diff --git a/lib/librte_power/guest_channel.h b/lib/librte_power/guest_channel.h index 9e18af5..741339c 100644 --- a/lib/librte_power/guest_channel.h +++ b/lib/librte_power/guest_channel.h @@ -81,6 +81,21 @@ void guest_channel_host_disconnect(unsigned lcore_id); */ int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); +/** + * Send a message contained in pkt over the Virtio-Serial to the host endpoint. + * + * @param pkt + * Pointer to a populated struct channel_packet + * + * @param lcore_id + * lcore_id. + * + * @return + * - 0 on success. + * - Negative on error. + */ +int rte_power_guest_channel_send_msg(struct channel_packet *pkt, + unsigned int lcore_id); #ifdef __cplusplus } diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map index 9ae0627..96dc42e 100644 --- a/lib/librte_power/rte_power_version.map +++ b/lib/librte_power/rte_power_version.map @@ -20,6 +20,7 @@ DPDK_2.0 { DPDK_17.11 { global: + rte_power_guest_channel_send_msg; rte_power_freq_disable_turbo; rte_power_freq_enable_turbo; rte_power_turbo_status; -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 8/9] examples/guest_cli: add send policy to host 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt ` (6 preceding siblings ...) 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 7/9] power: add send channel msg function to map file David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-09 22:34 ` [dpdk-dev] [PATCH v8 0/9] Policy Based Power Control for Guest Ferruh Yigit 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, Sexton, Rory, Nemanja Marjanovic, David Hunt From: "Sexton, Rory" <rory.sexton@intel.com> Here we're adding an example of setting up a policy, and allowing the vm_cli_guest app to send it to the host using the cli command "send_policy now" Signed-off-by: Nemanja Marjanovic <nemanja.marjanovic@intel.com> Signed-off-by: Rory Sexton <rory.sexton@intel.com> Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- .../guest_cli/vm_power_cli_guest.c | 97 ++++++++++++++++++++++ .../guest_cli/vm_power_cli_guest.h | 6 -- 2 files changed, 97 insertions(+), 6 deletions(-) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 4e982bd..dc9efc2 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -45,8 +45,10 @@ #include <cmdline.h> #include <rte_log.h> #include <rte_lcore.h> +#include <rte_ethdev.h> #include <rte_power.h> +#include <guest_channel.h> #include "vm_power_cli_guest.h" @@ -139,8 +141,103 @@ cmdline_parse_inst_t cmd_set_cpu_freq_set = { }, }; +struct cmd_send_policy_result { + cmdline_fixed_string_t send_policy; + cmdline_fixed_string_t cmd; +}; + +union PFID { + struct ether_addr addr; + uint64_t pfid; +}; + +static inline int +send_policy(void) +{ + struct channel_packet pkt; + int ret; + + union PFID pfid; + /* Use port MAC address as the vfid */ + rte_eth_macaddr_get(0, &pfid.addr); + printf("Port %u MAC: %02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 ":" + "%02" PRIx8 ":%02" PRIx8 ":%02" PRIx8 "\n", + 1, + pfid.addr.addr_bytes[0], pfid.addr.addr_bytes[1], + pfid.addr.addr_bytes[2], pfid.addr.addr_bytes[3], + pfid.addr.addr_bytes[4], pfid.addr.addr_bytes[5]); + pkt.vfid[0] = pfid.pfid; + + pkt.nb_mac_to_monitor = 1; + pkt.t_boost_status.tbEnabled = false; + + pkt.vcpu_to_control[0] = 0; + pkt.vcpu_to_control[1] = 1; + pkt.num_vcpu = 2; + /* Dummy Population. */ + pkt.traffic_policy.min_packet_thresh = 96000; + pkt.traffic_policy.avg_max_packet_thresh = 1800000; + pkt.traffic_policy.max_max_packet_thresh = 2000000; + + pkt.timer_policy.busy_hours[0] = 3; + pkt.timer_policy.busy_hours[1] = 4; + pkt.timer_policy.busy_hours[2] = 5; + pkt.timer_policy.quiet_hours[0] = 11; + pkt.timer_policy.quiet_hours[1] = 12; + pkt.timer_policy.quiet_hours[2] = 13; + + pkt.timer_policy.hours_to_use_traffic_profile[0] = 8; + pkt.timer_policy.hours_to_use_traffic_profile[1] = 10; + + pkt.workload = LOW; + pkt.policy_to_use = TIME; + pkt.command = PKT_POLICY; + strcpy(pkt.vm_name, "ubuntu2"); + ret = rte_power_guest_channel_send_msg(&pkt, 1); + if (ret == 0) + return 1; + RTE_LOG(DEBUG, POWER, "Error sending message: %s\n", + ret > 0 ? strerror(ret) : "channel not connected"); + return -1; +} + +static void +cmd_send_policy_parsed(void *parsed_result, struct cmdline *cl, + __attribute__((unused)) void *data) +{ + int ret = -1; + struct cmd_send_policy_result *res = parsed_result; + + if (!strcmp(res->cmd, "now")) { + printf("Sending Policy down now!\n"); + ret = send_policy(); + } + if (ret != 1) + cmdline_printf(cl, "Error sending message: %s\n", + strerror(ret)); +} + +cmdline_parse_token_string_t cmd_send_policy = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + send_policy, "send_policy"); +cmdline_parse_token_string_t cmd_send_policy_cmd_cmd = + TOKEN_STRING_INITIALIZER(struct cmd_send_policy_result, + cmd, "now"); + +cmdline_parse_inst_t cmd_send_policy_set = { + .f = cmd_send_policy_parsed, + .data = NULL, + .help_str = "send_policy now", + .tokens = { + (void *)&cmd_send_policy, + (void *)&cmd_send_policy_cmd_cmd, + NULL, + }, +}; + cmdline_parse_ctx_t main_ctx[] = { (cmdline_parse_inst_t *)&cmd_quit, + (cmdline_parse_inst_t *)&cmd_send_policy_set, (cmdline_parse_inst_t *)&cmd_set_cpu_freq_set, NULL, }; diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h index 0c4bdd5..277eab3 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.h @@ -40,12 +40,6 @@ extern "C" { #include "channel_commands.h" -int guest_channel_host_connect(unsigned lcore_id); - -int guest_channel_send_msg(struct channel_packet *pkt, unsigned lcore_id); - -void guest_channel_host_disconnect(unsigned lcore_id); - void run_cli(__attribute__((unused)) void *arg); #ifdef __cplusplus -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* [dpdk-dev] [PATCH v8 9/9] examples/vm_power_mgr: set MAC address of VF 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt ` (7 preceding siblings ...) 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 8/9] examples/guest_cli: add send policy to host David Hunt @ 2017-10-05 14:34 ` David Hunt 2017-10-09 22:34 ` [dpdk-dev] [PATCH v8 0/9] Policy Based Power Control for Guest Ferruh Yigit 9 siblings, 0 replies; 105+ messages in thread From: David Hunt @ 2017-10-05 14:34 UTC (permalink / raw) To: dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla, David Hunt We need to set vf mac from the host, so that they will be in sync on the guest and the host. Otherwise, we'll have a random mac on the guest, and a 00:00:00:00:00:00 mac on the host. Signed-off-by: David Hunt <david.hunt@intel.com> Reviewed-by: Santosh Shukla <santosh.shukla@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- examples/vm_power_manager/main.c | 43 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c index 698abca..5147789 100644 --- a/examples/vm_power_manager/main.c +++ b/examples/vm_power_manager/main.c @@ -58,6 +58,9 @@ #include "channel_monitor.h" #include "power_manager.h" #include "vm_power_cli.h" +#include <rte_pmd_ixgbe.h> +#include <rte_pmd_i40e.h> +#include <rte_pmd_bnxt.h> #define RX_RING_SIZE 512 #define TX_RING_SIZE 512 @@ -222,7 +225,7 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask) (uint8_t)portid); continue; } - /* clear all_ports_up flag if any link down */ + /* clear all_ports_up flag if any link down */ if (link.link_status == ETH_LINK_DOWN) { all_ports_up = 0; break; @@ -301,11 +304,49 @@ main(int argc, char **argv) /* Initialize ports. */ for (portid = 0; portid < nb_ports; portid++) { + struct ether_addr eth; + int w, j; + int ret = -ENOTSUP; + if ((enabled_port_mask & (1 << portid)) == 0) continue; + + eth.addr_bytes[0] = 0xe0; + eth.addr_bytes[1] = 0xe0; + eth.addr_bytes[2] = 0xe0; + eth.addr_bytes[3] = 0xe0; + eth.addr_bytes[4] = portid + 0xf0; + if (port_init(portid, mbuf_pool) != 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "\n", portid); + + for (w = 0; w < MAX_VFS; w++) { + eth.addr_bytes[5] = w + 0xf0; + + if (ret == -ENOTSUP) + ret = rte_pmd_ixgbe_set_vf_mac_addr(portid, + w, ð); + if (ret == -ENOTSUP) + ret = rte_pmd_i40e_set_vf_mac_addr(portid, + w, ð); + if (ret == -ENOTSUP) + ret = rte_pmd_bnxt_set_vf_mac_addr(portid, + w, ð); + + switch (ret) { + case 0: + printf("Port %d VF %d MAC: ", + portid, w); + for (j = 0; j < 6; j++) { + printf("%02x", eth.addr_bytes[j]); + if (j < 5) + printf(":"); + } + printf("\n"); + break; + } + } } lcore_id = rte_get_next_lcore(-1, 1, 0); -- 2.7.4 ^ permalink raw reply [flat|nested] 105+ messages in thread
* Re: [dpdk-dev] [PATCH v8 0/9] Policy Based Power Control for Guest 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt ` (8 preceding siblings ...) 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt @ 2017-10-09 22:34 ` Ferruh Yigit 9 siblings, 0 replies; 105+ messages in thread From: Ferruh Yigit @ 2017-10-09 22:34 UTC (permalink / raw) To: David Hunt, dev; +Cc: konstantin.ananyev, jingjing.wu, santosh.shukla On 10/5/2017 3:34 PM, David Hunt wrote: > Policy Based Power Control for Guest > > This patchset adds the facility for a guest VM to send a policy down to the > host that will allow the host to scale up/down cpu frequencies > depending on the policy criteria independently of the DPDK app running in > the guest. This differs from the previous vm_power implementation where > individual scale up/down requests were send from the guest to the host via > virtio-serial. > > V8 patchset changes: > * Added Ack's and Reviewed-by's to individual patches in the set so as to > keep patchwork A/R/T flags properly in sync. > > V7 patchset changes: > * Changed return code of rte_pmd_i40e_query_vfid_by_mac() from an > int64_t to int > > V6 patchset changes: > * Fixed comments in header for rte_pmd_i40e_query_vfid_by_mac. > * changed rte_pmd_i40e_query_vfid_by_mac return code from uint to int > as it can return negative error codes. > * Removed bool enum from channel_commands.h, including stdbool.h instead. > * Added #define VM_MAX_NAME_SZ 32 to channel_commands.h > * Renamed a few variables to be more readable. > * Added returns in a few places if failed to get info on domain. > * Fixed power_manager_init to keep track of num_freqs for each core. > * In power_manager_scale_core_med(), changed a hardcoded '5' to instead > be calculated from the centre of the frequency list > (global_core_freq_info[core_num].num_freqs / 2) > > V5 patchset changes: > * Removed most of the #ifdef I40_PMD as it will be applicable to > other PMDs in the future. > * Changed the parameter of rte_pmd_i40e_query_vfid_by_mac from a uint64 > to a const struct ether_addr *, rather than casting it later in the > function. > > V4 patchset changes: > * None, re-post to mailing list under the correct email thread. > > V3 patchset changes: > * Changed to using is_same_ether_addr() instead of looping through > the mac address bytes to compare them. > * Tweaked some comments and working in the i40e patch after review. > * Added a patch to the set to add new i40e function to map file, so > as to allow shared library builds. The power library API needs a cleanup > in next release, so will add API/ABI warning for this cleanup in a > separate patch. > > V2 patchset changes: > * Removed API's in ethdev layer. > * Now just a single new API in the i40e driver for mapping VF MAC to > VF index. > * Moved new function from rte_rxtx.c to rte_pmd_i40e.c > * Removed function for reading i40e register, moved to using the > standard stats API. > * Renamed i40e function to rte_pmd_i40e_query_vfid_by_mac > * Cleaned up policy generation code. > > It's a modification of the vm_power_manager app that runs in the host, and > the guest_vm_power_app example app that runs in the guest. This allows the > guest to send down a policy to the host via virtio-serial, which then allows > the host to scale up/down based on the criteria in the policy, resulting in > quicker scale up/down than individual requests coming from the guest. > It also means that the DPDK application running in the guest does not need > to be modified in any way, it is unaware that it's cores are being scaled > up/down, reducing the effort in implementing a power-aware infrastructure. > > The usage model is as follows: > 1. Set up the VF's and assign to the guest in the usual way. > 2. run vm_power_manager on the host, creating a channel to the guest. > 3. Start the guest_vm_power_mgr app on the guest, which establishes > a virtio-serial channel to the host. > 4. Send down the profile for the guest using the "send_profile now" command. > There is an example profile hard-coded into guest_vm_power_mgr. > 5. Stop the guest_vm_power_mgr and run your normal power-unaware application. > 6. Send traffic into the VFs at varying traffic rates. > Observe the frequency change on the host (turbostat -i 1) > > The sequence of code changes are as follows: > > A new function has been aded to the i40e driver to allow mapping of > a VF MAC to VF index. > > Next we make an addition to librte_power that adds an extra command to allow > the passing of a policy structure from the guest to the host. This struct > contains information like busy/quiet hour, packet throughput thresholds, etc. > > The next addition adds functionality to convert the virtual CPU (vcpU0 IDs to > physical CPU (pcpu) IDs so that the host can scale up/down the cores used > in the guest. > > The remaining patches are functionality to process the policy, and take action > when the relevant trigger occurs to cause a frequency change. Hi Dave, Can you please rebase the set on top of latest main repo? There are two set of changes: 1- There are more i40e updates in main repo, conflicts with this one. 2- Port id increased to 16 bits, there are some code still using uint8_t in this patch Thanks, ferruh ^ permalink raw reply [flat|nested] 105+ messages in thread
end of thread, other threads:[~2017-10-09 22:34 UTC | newest] Thread overview: 105+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-25 16:02 [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 01/10] net/i40e: add API to convert VF Id to PF Id David Hunt 2017-09-22 9:56 ` Thomas Monjalon 2017-09-22 12:39 ` Hunt, David 2017-09-25 2:43 ` Wu, Jingjing 2017-09-25 9:57 ` Hunt, David 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 02/10] net/i40e: add API to get received packet count David Hunt 2017-09-25 2:47 ` Wu, Jingjing 2017-09-25 9:59 ` Hunt, David 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 03/10] lib/librte_power: add extra msg type for policies David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 04/10] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 05/10] examples/vm_power_mgr: add scale to medium freq fn David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 06/10] examples/vm_power_mgr: add policy to channels David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 07/10] examples/vm_power_mgr: add port initialisation David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 08/10] examples/guest_cli: add send policy to host David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 09/10] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-08-25 16:02 ` [dpdk-dev] [PATCH v1 10/10] net/i40e: set register for no drop David Hunt 2017-09-25 2:50 ` Wu, Jingjing 2017-09-25 9:44 ` Hunt, David 2017-08-29 13:03 ` [dpdk-dev] [PATCH v1 0/10] Policy Based Power Control for Guest Ananyev, Konstantin 2017-09-22 9:51 ` Thomas Monjalon 2017-09-22 10:28 ` Hunt, David 2017-09-22 13:03 ` Thomas Monjalon 2017-09-22 13:12 ` Hunt, David 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2] " David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 1/8] net/i40e: add API to convert VF MAC to VSI index David Hunt 2017-09-26 14:04 ` Wu, Jingjing 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 2/8] lib/librte_power: add extra msg type for policies David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 3/8] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 4/8] examples/vm_power_mgr: add scale to medium freq fn David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 5/8] examples/vm_power_mgr: add policy to channels David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 6/8] examples/vm_power_mgr: add port initialisation David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 7/8] examples/guest_cli: add send policy to host David Hunt 2017-09-25 12:27 ` [dpdk-dev] [PATCH v2 8/8] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4] Policy Based Power Control for Guest David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-04 15:26 ` santosh 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 2/9] lib/librte_power: add extra msg type for policies David Hunt 2017-10-04 15:36 ` santosh 2017-10-05 8:38 ` Hunt, David 2017-10-05 9:21 ` santosh 2017-10-05 9:51 ` Hunt, David 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 5/9] examples/vm_power_mgr: add policy to channels David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 6/9] examples/vm_power_mgr: add port initialisation David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 7/9] power: add send channel msg function to map file David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 8/9] examples/guest_cli: add send policy to host David Hunt 2017-10-04 9:15 ` [dpdk-dev] [PATCH v4 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 0/9] Policy Based Power Control for Guest David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-04 15:41 ` santosh 2017-10-05 8:31 ` Hunt, David 2017-10-05 9:22 ` santosh 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 2/9] lib/librte_power: add extra msg type for policies David Hunt 2017-10-04 15:47 ` santosh 2017-10-05 8:41 ` Hunt, David 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt 2017-10-04 15:58 ` santosh 2017-10-05 8:44 ` Hunt, David 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt 2017-10-04 16:04 ` santosh 2017-10-05 8:47 ` Hunt, David 2017-10-05 9:07 ` santosh 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 5/9] examples/vm_power_mgr: add policy to channels David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 6/9] examples/vm_power_mgr: add port initialisation David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 7/9] power: add send channel msg function to map file David Hunt 2017-10-04 16:20 ` santosh 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 8/9] examples/guest_cli: add send policy to host David Hunt 2017-10-04 15:25 ` [dpdk-dev] [PATCH v5 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 0/9] Policy Based Power Control for Guest David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-05 12:45 ` Ananyev, Konstantin 2017-10-05 12:51 ` Hunt, David 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 2/9] lib/librte_power: add extra msg type for policies David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 5/9] examples/vm_power_mgr: add policy to channels David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 6/9] examples/vm_power_mgr: add port initialisation David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 7/9] power: add send channel msg function to map file David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 8/9] examples/guest_cli: add send policy to host David Hunt 2017-10-05 12:25 ` [dpdk-dev] [PATCH v6 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 2/9] lib/librte_power: add extra msg type for policies David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 5/9] examples/vm_power_mgr: add policy to channels David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 6/9] examples/vm_power_mgr: add port initialisation David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 7/9] power: add send channel msg function to map file David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 8/9] examples/guest_cli: add send policy to host David Hunt 2017-10-05 13:28 ` [dpdk-dev] [PATCH v7 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-05 13:54 ` [dpdk-dev] [PATCH v7 0/9] Policy Based Power Control for Guest Ananyev, Konstantin 2017-10-05 14:12 ` santosh 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 " David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 1/9] net/i40e: add API to convert VF MAC to VF id David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 2/9] lib/librte_power: add extra msg type for policies David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 3/9] examples/vm_power_mgr: add vcpu to pcpu mapping David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 4/9] examples/vm_power_mgr: add scale to medium freq fn David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 5/9] examples/vm_power_mgr: add policy to channels David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 6/9] examples/vm_power_mgr: add port initialisation David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 7/9] power: add send channel msg function to map file David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 8/9] examples/guest_cli: add send policy to host David Hunt 2017-10-05 14:34 ` [dpdk-dev] [PATCH v8 9/9] examples/vm_power_mgr: set MAC address of VF David Hunt 2017-10-09 22:34 ` [dpdk-dev] [PATCH v8 0/9] Policy Based Power Control for Guest Ferruh Yigit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).