From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 8FF0F9A81 for ; Wed, 11 Feb 2015 13:14:00 +0100 (CET) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP; 11 Feb 2015 04:08:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,558,1418112000"; d="scan'208";a="676299868" Received: from pgsmsx105.gar.corp.intel.com ([10.221.44.96]) by fmsmga002.fm.intel.com with ESMTP; 11 Feb 2015 04:13:55 -0800 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by PGSMSX105.gar.corp.intel.com (10.221.44.96) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 11 Feb 2015 20:13:54 +0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.192]) by shsmsx102.ccr.corp.intel.com ([169.254.2.62]) with mapi id 14.03.0195.001; Wed, 11 Feb 2015 20:13:53 +0800 From: "Qiu, Michael" To: Tetsuya Mukawa , "Iremonger, Bernard" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v7 04/14] eal/pci: Consolidate pci address comparison APIs Thread-Index: AQHQRELTSxWi0iDBGE+tIPrk61ioFA== Date: Wed, 11 Feb 2015 12:13:53 +0000 Message-ID: <533710CFB86FA344BFBF2D6802E60286CE821D@SHSMSX101.ccr.corp.intel.com> References: <1422763322-13742-4-git-send-email-mukawa@igel.co.jp> <1423470639-15744-1-git-send-email-mukawa@igel.co.jp> <1423470639-15744-5-git-send-email-mukawa@igel.co.jp> <533710CFB86FA344BFBF2D6802E60286CE71EA@SHSMSX101.ccr.corp.intel.com> <8CEF83825BEC744B83065625E567D7C2049DF5CA@IRSMSX108.ger.corp.intel.com> <533710CFB86FA344BFBF2D6802E60286CE7D25@SHSMSX101.ccr.corp.intel.com> <54DAE045.6000208@igel.co.jp> <54DAE142.6090204@igel.co.jp> <533710CFB86FA344BFBF2D6802E60286CE7E63@SHSMSX101.ccr.corp.intel.com> <54DB0F5F.90806@igel.co.jp> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v7 04/14] eal/pci: Consolidate pci address comparison APIs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Feb 2015 12:14:01 -0000 On 2/11/2015 4:14 PM, Tetsuya Mukawa wrote:=0A= > On 2015/02/11 15:29, Qiu, Michael wrote:=0A= >> On 2/11/2015 12:57 PM, Tetsuya Mukawa wrote:=0A= >>> On 2015/02/11 13:53, Tetsuya Mukawa wrote:=0A= >>>> On 2015/02/11 12:27, Qiu, Michael wrote:=0A= >>>>> On 2/10/2015 11:11 PM, Iremonger, Bernard wrote:=0A= >>>>>>> -----Original Message-----=0A= >>>>>>> From: Qiu, Michael=0A= >>>>>>> Sent: Monday, February 9, 2015 1:10 PM=0A= >>>>>>> To: Tetsuya Mukawa; dev@dpdk.org=0A= >>>>>>> Cc: Iremonger, Bernard=0A= >>>>>>> Subject: Re: [PATCH v7 04/14] eal/pci: Consolidate pci address comp= arison APIs=0A= >>>>>>>=0A= >>>>>>> On 2/9/2015 4:31 PM, Tetsuya Mukawa wrote:=0A= >>>>>>>> This patch replaces pci_addr_comparison() and memcmp() of pci=0A= >>>>>>>> addresses by eal_compare_pci_addr().=0A= >>>>>>>>=0A= >>>>>>>> v5:=0A= >>>>>>>> - Fix pci_scan_one to handle pt_driver correctly.=0A= >>>>>>>> v4:=0A= >>>>>>>> - Fix calculation method of eal_compare_pci_addr().=0A= >>>>>>>> - Add parameter checking.=0A= >>>>>>>>=0A= >>>>>>>> Signed-off-by: Tetsuya Mukawa =0A= >>>>>>>> ---=0A= >>>>>>>> lib/librte_eal/bsdapp/eal/eal_pci.c | 25 ++++++++----------= -----=0A= >>>>>>>> lib/librte_eal/common/eal_common_pci.c | 2 +-=0A= >>>>>>>> lib/librte_eal/common/include/rte_pci.h | 34 ++++++++++++++++++= +++++++++++++=0A= >>>>>>>> lib/librte_eal/linuxapp/eal/eal_pci.c | 25 ++++++++----------= -----=0A= >>>>>>>> lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 2 +-=0A= >>>>>>>> 5 files changed, 54 insertions(+), 34 deletions(-)=0A= >>>>>>>>=0A= >>>>>>>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c=0A= >>>>>>>> b/lib/librte_eal/bsdapp/eal/eal_pci.c=0A= >>>>>>>> index 74ecce7..c844d58 100644=0A= >>>>>>>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c=0A= >>>>>>>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c=0A= >>>>>>>> @@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device *d= ev)=0A= >>>>>>>> return (0);=0A= >>>>>>>> }=0A= >>>>>>>>=0A= >>>>>>>> -/* Compare two PCI device addresses. */ -static int=0A= >>>>>>>> -pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_add= r=0A= >>>>>>>> *addr2) -{=0A= >>>>>>>> - uint64_t dev_addr =3D (addr->domain << 24) + (addr->bus << 16) += (addr->devid << 8) + addr-=0A= >>>>>>>> function;=0A= >>>>>>>> - uint64_t dev_addr2 =3D (addr2->domain << 24) + (addr2->bus << 16= ) + (addr2->devid << 8) +=0A= >>>>>>> addr2->function;=0A= >>>>>>>> -=0A= >>>>>>>> - if (dev_addr > dev_addr2)=0A= >>>>>>>> - return 1;=0A= >>>>>>>> - else=0A= >>>>>>>> - return 0;=0A= >>>>>>>> -}=0A= >>>>>>>> -=0A= >>>>>>>> -=0A= >>>>>>>> /* Scan one pci sysfs entry, and fill the devices list from it. *= /=0A= >>>>>>>> static int pci_scan_one(int dev_pci_fd, struct pci_conf *conf) @@= =0A= >>>>>>>> -356,13 +342,20 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *c= onf)=0A= >>>>>>>> }=0A= >>>>>>>> else {=0A= >>>>>>>> struct rte_pci_device *dev2 =3D NULL;=0A= >>>>>>>> + int ret;=0A= >>>>>>>>=0A= >>>>>>>> TAILQ_FOREACH(dev2, &pci_device_list, next) {=0A= >>>>>>>> - if (pci_addr_comparison(&dev->addr, &dev2->addr))=0A= >>>>>>>> + ret =3D eal_compare_pci_addr(&dev->addr, &dev2->addr);=0A= >>>>>>>> + if (ret > 0)=0A= >>>>>>>> continue;=0A= >>>>>>>> - else {=0A= >>>>>>>> + else if (ret < 0) {=0A= >>>>>>>> TAILQ_INSERT_BEFORE(dev2, dev, next);=0A= >>>>>>>> return 0;=0A= >>>>>>>> + } else { /* already registered */=0A= >>>>>>>> + /* update pt_driver */=0A= >>>>>>>> + dev2->pt_driver =3D dev->pt_driver;=0A= >>>>>>>> + free(dev);=0A= >>>>>>>> + return 0;=0A= >>>>>>>> }=0A= >>>>>>>> }=0A= >>>>>>>> TAILQ_INSERT_TAIL(&pci_device_list, dev, next); diff --git=0A= >>>>>>>> a/lib/librte_eal/common/eal_common_pci.c=0A= >>>>>>>> b/lib/librte_eal/common/eal_common_pci.c=0A= >>>>>>>> index f3c7f71..a89f5c3 100644=0A= >>>>>>>> --- a/lib/librte_eal/common/eal_common_pci.c=0A= >>>>>>>> +++ b/lib/librte_eal/common/eal_common_pci.c=0A= >>>>>>>> @@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup(st= ruct rte_pci_device *dev)=0A= >>>>>>>> if (devargs->type !=3D RTE_DEVTYPE_BLACKLISTED_PCI &&=0A= >>>>>>>> devargs->type !=3D RTE_DEVTYPE_WHITELISTED_PCI)=0A= >>>>>>>> continue;=0A= >>>>>>>> - if (!memcmp(&dev->addr, &devargs->pci.addr, sizeof(dev->addr)))= =0A= >>>>>>>> + if (!eal_compare_pci_addr(&dev->addr, &devargs->pci.addr))=0A= >>>>>>>> return devargs;=0A= >>>>>>>> }=0A= >>>>>>>> return NULL;=0A= >>>>>>>> diff --git a/lib/librte_eal/common/include/rte_pci.h=0A= >>>>>>>> b/lib/librte_eal/common/include/rte_pci.h=0A= >>>>>>>> index 7f2d699..4814cd7 100644=0A= >>>>>>>> --- a/lib/librte_eal/common/include/rte_pci.h=0A= >>>>>>>> +++ b/lib/librte_eal/common/include/rte_pci.h=0A= >>>>>>>> @@ -269,6 +269,40 @@ eal_parse_pci_DomBDF(const char *input, struc= t=0A= >>>>>>>> rte_pci_addr *dev_addr) } #undef GET_PCIADDR_FIELD=0A= >>>>>>>>=0A= >>>>>>>> +/* Compare two PCI device addresses. */=0A= >>>>>>>> +/**=0A= >>>>>>>> + * Utility function to compare two PCI device addresses.=0A= >>>>>>>> + *=0A= >>>>>>>> + * @param addr=0A= >>>>>>>> + * The PCI Bus-Device-Function address to compare=0A= >>>>>>>> + * @param addr2=0A= >>>>>>>> + * The PCI Bus-Device-Function address to compare=0A= >>>>>>>> + * @return=0A= >>>>>>>> + * 0 on equal PCI address.=0A= >>>>>>>> + * Positive on addr is greater than addr2.=0A= >>>>>>>> + * Negative on addr is less than addr2, or error.=0A= >>>>>>>> + */=0A= >>>>>>>> +static inline int=0A= >>>>>>>> +eal_compare_pci_addr(struct rte_pci_addr *addr, struct rte_pci_ad= dr=0A= >>>>>>>> +*addr2) {=0A= >>>>>>>> + uint64_t dev_addr, dev_addr2;=0A= >>>>>>>> +=0A= >>>>>>>> + if ((addr =3D=3D NULL) || (addr2 =3D=3D NULL))=0A= >>>>>>>> + return -1;=0A= >>>>>>>> +=0A= >>>>>>>> + dev_addr =3D (addr->domain << 24) | (addr->bus << 16) |=0A= >>>>>>>> + (addr->devid << 8) | addr->function;=0A= >>>>>>>> + dev_addr2 =3D (addr2->domain << 24) | (addr2->bus << 16) |=0A= >>>>>>>> + (addr2->devid << 8) | addr2->function;=0A= >>>>>>>> +=0A= >>>>>>>> + if (dev_addr > dev_addr2)=0A= >>>>>>>> + return 1;=0A= >>>>>>>> + else if (dev_addr < dev_addr2)=0A= >>>>>>>> + return -1;=0A= >>>>>>>> + else=0A= >>>>>>>> + return 0;=0A= >>>>>>>> +}=0A= >>>>>>>> +=0A= >>>>>>>> /**=0A= >>>>>>>> * Probe the PCI bus for registered drivers.=0A= >>>>>>>> *=0A= >>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c=0A= >>>>>>>> b/lib/librte_eal/linuxapp/eal/eal_pci.c=0A= >>>>>>>> index c0ca5a5..d847102 100644=0A= >>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c=0A= >>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c=0A= >>>>>>>> @@ -229,20 +229,6 @@ error:=0A= >>>>>>>> return -1;=0A= >>>>>>>> }=0A= >>>>>>>>=0A= >>>>>>>> -/* Compare two PCI device addresses. */ -static int=0A= >>>>>>>> -pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_add= r=0A= >>>>>>>> *addr2) -{=0A= >>>>>>>> - uint64_t dev_addr =3D (addr->domain << 24) + (addr->bus << 16) += (addr->devid << 8) + addr-=0A= >>>>>>>> function;=0A= >>>>>>>> - uint64_t dev_addr2 =3D (addr2->domain << 24) + (addr2->bus << 16= ) + (addr2->devid << 8) +=0A= >>>>>>> addr2->function;=0A= >>>>>>>> -=0A= >>>>>>>> - if (dev_addr > dev_addr2)=0A= >>>>>>>> - return 1;=0A= >>>>>>>> - else=0A= >>>>>>>> - return 0;=0A= >>>>>>>> -}=0A= >>>>>>>> -=0A= >>>>>>>> -=0A= >>>>>>>> /* Scan one pci sysfs entry, and fill the devices list from it. *= /=0A= >>>>>>>> static int pci_scan_one(const char *dirname, uint16_t domain, uin= t8_t=0A= >>>>>>>> bus, @@ -353,13 +339,20 @@ pci_scan_one(const char *dirname, uint1= 6_t=0A= >>>>>>>> domain, uint8_t bus,=0A= >>>>>>>> }=0A= >>>>>>>> else {=0A= >>>>>>>> struct rte_pci_device *dev2 =3D NULL;=0A= >>>>>>>> + int ret;=0A= >>>>>>>>=0A= >>>>>>>> TAILQ_FOREACH(dev2, &pci_device_list, next) {=0A= >>>>>>>> - if (pci_addr_comparison(&dev->addr, &dev2->addr))=0A= >>>>>>>> + ret =3D eal_compare_pci_addr(&dev->addr, &dev2->addr);=0A= >>>>>>>> + if (ret > 0)=0A= >>>>>>>> continue;=0A= >>>>>>>> - else {=0A= >>>>>>>> + else if (ret < 0) {=0A= >>>>>>>> TAILQ_INSERT_BEFORE(dev2, dev, next);=0A= >>>>>>>> return 0;=0A= >>>>>>>> + } else { /* already registered */=0A= >>>>>>>> + /* update pt_driver */=0A= >>>>>>>> + dev2->pt_driver =3D dev->pt_driver;=0A= >>>>>> Hi Tetsuya,=0A= >>>>>>=0A= >>>>>> I am seeing a problem with the librte_pmd_ixgbe code where dev->max_= vfs is being lost in some scenarios.=0A= >>>>>> The following line should be added here:=0A= >>>>>> dev2->max_vfs =3D dev->max_vfs;=0A= >>>>>>=0A= >>>>>> numa_mode should probably be updated too (although it is not causing= a problem at present).=0A= >>>>>> dev2->numa_mode =3D dev->numa_mode;=0A= >>>>> I'm very curious, why those field miss? I haven't see any places clea= r=0A= >>>>> this field.=0A= >>>>>=0A= >>>>> What is the root cause?=0A= >>>> Hi Michael,=0A= >>>>=0A= >>>> Here is my guess.=0A= >>>> The above function creates pci device list.=0A= >>> I am sorry. I forgot to add below information.=0A= >>>=0A= >>> "max_vfs" or "numa_node" value is came from sysfs when the above=0A= >>> function is processed.=0A= >> Yes, but it has already been registered, why it missed?=0A= > Yes, it has been registered already, but probably should be updated.=0A= > I guess sysfs value will be changed when igb_uio starts managing the devi= ce.=0A= >=0A= > ex)=0A= > 1. Boot linux=0A= > 2. start a dpdk application with no port.=0A= > 3. pci device list is registered.=0A= > - Here, "max_vfs" is came from sysfs. Or there is no such a entry.=0A= > 4. igb_uio binds the device.=0A= > 5. I guess max_vfs value of sysfs is changed. Or max_vfs entry is create= d.=0A= > 6. The dpdk application calls hotplug function.=0A= =0A= Yes, agree.=0A= =0A= But numa node can be changed?=0A= =0A= Bernard, does your issue occur after max_vfs changed in igb_uio?=0A= =0A= If not, I think must be figure out the reason.=0A= =0A= Thanks,=0A= Michael=0A= > - Here, I guess we need to update "max_vfs" value.=0A= >=0A= > Above is a just my assumption.=0A= > It may be good to wait for Bernard's reply.=0A= >=0A= > Thanks,=0A= > Tetsuya=0A= >=0A= >> Thanks,=0A= >> Michael=0A= >>>> And current DPDK implementation assumes all devices needed to be manag= ed=0A= >>>> are under igb_uio or vfio when above code is processed.=0A= >>>> To add hotplug function, we also need to think some devices will start= =0A= >>>> to be managed under igb_uio or vfio after initializing pci device list= .=0A= >>>> Anyway, I guess "max_vfs" value will be changed when igb_uio or vfio= =0A= >>>> manages the device.=0A= >>>>=0A= >>>> Hi Bernard,=0A= >>>>=0A= >>>> Could you please check "max_vfs" and "num_node" values, then check the= =0A= >>>> values again after the device is managed by igb_uio or vfio?=0A= >>>> In my environment, it seems max_vfs is created by igb_uio.=0A= >>>> But my NIC doesn't have VF, so behavior might be different in your=0A= >>>> environment.=0A= >>>> I guess "numa_node" should not be changed theoretically.=0A= >>>>=0A= >>>> If my guess is correct, how about replacing following values?=0A= >>>> - driver=0A= >>>> - max_vfs=0A= >>>> - resource=0A= >>>> - (numa_node)=0A= >>>> Except for above value, I guess other value shouldn't be changed even= =0A= >>>> after the device is managed by igb_uio or vfio.=0A= >>>>=0A= >>>> Thanks,=0A= >>>> Tetsuya=0A= >>>>=0A= >>>>> Thanks,=0A= >>>>> Michael=0A= >>>>>=0A= >>>>>> Regards,=0A= >>>>>>=0A= >>>>>> Bernard.=0A= >>>>>>=0A= >>>>>>=0A= >>>>>>=0A= >>>>>>=0A= >>>>>>>> + free(dev);=0A= >>>>>>>> + return 0;=0A= >>>>>>>> }=0A= >>>>>>>> }=0A= >>>>>>>> TAILQ_INSERT_TAIL(&pci_device_list, dev, next); diff --git=0A= >>>>>>>> a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c=0A= >>>>>>>> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c=0A= >>>>>>>> index e53f06b..1da3507 100644=0A= >>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c=0A= >>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c=0A= >>>>>>>> @@ -123,7 +123,7 @@ pci_uio_map_secondary(struct rte_pci_device *d= ev)=0A= >>>>>>>> TAILQ_FOREACH(uio_res, pci_res_list, next) {=0A= >>>>>>>>=0A= >>>>>>>> /* skip this element if it doesn't match our PCI address */=0A= >>>>>>>> - if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))= =0A= >>>>>>>> + if (eal_compare_pci_addr(&uio_res->pci_addr, &dev->addr))=0A= >>>>>>>> continue;=0A= >>>>>>>>=0A= >>>>>>>> for (i =3D 0; i !=3D uio_res->nb_maps; i++) {=0A= >>>>>>> Acked-by: Michael Qiu =0A= >=0A= >=0A= =0A=