From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 9D8759A8D for ; Thu, 12 Feb 2015 14:55:08 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 12 Feb 2015 05:55:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,565,1418112000"; d="scan'208";a="453692009" Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66]) by FMSMGA003.fm.intel.com with ESMTP; 12 Feb 2015 05:40:24 -0800 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.64]) by IRSMSX152.ger.corp.intel.com ([169.254.6.35]) with mapi id 14.03.0195.001; Thu, 12 Feb 2015 13:55:05 +0000 From: "Iremonger, Bernard" To: Tetsuya Mukawa , "Qiu, Michael" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v7 04/14] eal/pci: Consolidate pci address comparison APIs Thread-Index: AQHQRELJadkSvYPLqkeLS8tHXgQYqpzsQlEAgACzNUA= Date: Thu, 12 Feb 2015 13:55:04 +0000 Message-ID: <8CEF83825BEC744B83065625E567D7C2049DFE2F@IRSMSX108.ger.corp.intel.com> References: <1422763322-13742-4-git-send-email-mukawa@igel.co.jp> <1423470639-15744-1-git-send-email-mukawa@igel.co.jp> <1423470639-15744-5-git-send-email-mukawa@igel.co.jp> <533710CFB86FA344BFBF2D6802E60286CE71EA@SHSMSX101.ccr.corp.intel.com> <8CEF83825BEC744B83065625E567D7C2049DF5CA@IRSMSX108.ger.corp.intel.com> <533710CFB86FA344BFBF2D6802E60286CE7D25@SHSMSX101.ccr.corp.intel.com> <54DAE045.6000208@igel.co.jp> <54DAE142.6090204@igel.co.jp> <533710CFB86FA344BFBF2D6802E60286CE7E63@SHSMSX101.ccr.corp.intel.com> <54DB0F5F.90806@igel.co.jp> <533710CFB86FA344BFBF2D6802E60286CE821D@SHSMSX101.ccr.corp.intel.com> <54DC0574.6000006@igel.co.jp> In-Reply-To: <54DC0574.6000006@igel.co.jp> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v7 04/14] eal/pci: Consolidate pci address comparison APIs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Feb 2015 13:55:09 -0000 > >>>>>>>>> /* Scan one pci sysfs entry, and fill the devices list from > >>>>>>>>> it. */ static int pci_scan_one(const char *dirname, uint16_t > >>>>>>>>> domain, uint8_t bus, @@ -353,13 +339,20 @@ pci_scan_one(const > >>>>>>>>> char *dirname, uint16_t domain, uint8_t bus, > >>>>>>>>> } > >>>>>>>>> else { > >>>>>>>>> struct rte_pci_device *dev2 =3D NULL; > >>>>>>>>> + int ret; > >>>>>>>>> > >>>>>>>>> TAILQ_FOREACH(dev2, &pci_device_list, next) { > >>>>>>>>> - if (pci_addr_comparison(&dev->addr, &dev2->addr)) > >>>>>>>>> + ret =3D eal_compare_pci_addr(&dev->addr, &dev2->addr); > >>>>>>>>> + if (ret > 0) > >>>>>>>>> continue; > >>>>>>>>> - else { > >>>>>>>>> + else if (ret < 0) { > >>>>>>>>> TAILQ_INSERT_BEFORE(dev2, dev, next); > >>>>>>>>> return 0; > >>>>>>>>> + } else { /* already registered */ > >>>>>>>>> + /* update pt_driver */ > >>>>>>>>> + dev2->pt_driver =3D dev->pt_driver; > >>>>>>> Hi Tetsuya, > >>>>>>> > >>>>>>> I am seeing a problem with the librte_pmd_ixgbe code where dev->m= ax_vfs is being lost in > some scenarios. > >>>>>>> The following line should be added here: > >>>>>>> dev2->max_vfs =3D dev->max_vfs; > >>>>>>> > >>>>>>> numa_mode should probably be updated too (although it is not caus= ing a problem at > present). > >>>>>>> dev2->numa_mode =3D dev->numa_mode; Hi Tetsuya, Michael, The "already registered path" in the code is being executed. dev->numa_mode does not change so the above line is not needed. dev2->max_vfs has a value of 0 and dev->max_vfs has a value of 2 ( 2 VF's= have been configured previously). dev2->mem_resource is different from dev->mem_resource so the following lin= e needs to be added: memmove(dev2->mem_resource, dev->mem_resource, sizeof(dev->mem_resource)); More replies below. =20 > >>>>>> I'm very curious, why those field miss? I haven't see any places > >>>>>> clear this field. > >>>>>> > >>>>>> What is the root cause? The max_vfs entry is created when igb_uio binds the device. dev2 is the device already in the pci_device_list so dev2->max_vfs is 0. dev is the new device being parsed, in this case dev->max_vfs is 2. So the dev2->max_vfs value needs to be updated. Similarly dev2-> pt_driver needs to be updated as its value is 0 (this is = already done). Similarly dev2->mem_resource needs to be updated as it is different from de= v->mem_resource.=20 > >>>>> Hi Michael, > >>>>> > >>>>> Here is my guess. > >>>>> The above function creates pci device list. > >>>> I am sorry. I forgot to add below information. > >>>> > >>>> "max_vfs" or "numa_node" value is came from sysfs when the above > >>>> function is processed. > >>> Yes, but it has already been registered, why it missed? > >> Yes, it has been registered already, but probably should be updated. > >> I guess sysfs value will be changed when igb_uio starts managing the d= evice. > >> > >> ex) > >> 1. Boot linux > >> 2. start a dpdk application with no port. > >> 3. pci device list is registered. > >> - Here, "max_vfs" is came from sysfs. Or there is no such a entry. > >> 4. igb_uio binds the device. > >> 5. I guess max_vfs value of sysfs is changed. Or max_vfs entry is cre= ated. > >> 6. The dpdk application calls hotplug function. > > Yes, agree. > > > > But numa node can be changed? >=20 > Hi Michael, >=20 > I may misunderstand meaning of numa_node. > I thought it indicated which numa node was nearest from the pci device, s= o it could not be > configurable. > BTW, I will be out of office tomorrow. So, I will submit v8 patches next = Monday. >=20 > Thanks, > Tetsuya >=20 > > > > Bernard, does your issue occur after max_vfs changed in igb_uio? > > > > If not, I think must be figure out the reason. > > > > Thanks, > > Michael > >> - Here, I guess we need to update "max_vfs" value. > >> > >> Above is a just my assumption. > >> It may be good to wait for Bernard's reply. > >> > >> Thanks, > >> Tetsuya > >> > >>> Thanks, > >>> Michael > >>>>> And current DPDK implementation assumes all devices needed to be > >>>>> managed are under igb_uio or vfio when above code is processed. > >>>>> To add hotplug function, we also need to think some devices will > >>>>> start to be managed under igb_uio or vfio after initializing pci de= vice list. > >>>>> Anyway, I guess "max_vfs" value will be changed when igb_uio or > >>>>> vfio manages the device. > >>>>> > >>>>> Hi Bernard, > >>>>> > >>>>> Could you please check "max_vfs" and "num_node" values, then check > >>>>> the values again after the device is managed by igb_uio or vfio? > >>>>> In my environment, it seems max_vfs is created by igb_uio. Yes, max_vfs is created by igb_uio > >>>>> But my NIC doesn't have VF, so behavior might be different in your > >>>>> environment. > >>>>> I guess "numa_node" should not be changed theoretically. > >>>>> > >>>>> If my guess is correct, how about replacing following values? > >>>>> - driver Agreed=20 > >>>>> - max_vfs Agreed > >>>>> - resource Agreed > >>>>> - (numa_node) Not necessary, as it does not change. > >>>>> Except for above value, I guess other value shouldn't be changed > >>>>> even after the device is managed by igb_uio or vfio. > >>>>> > >>>>> Thanks, > >>>>> Tetsuya > >>>>> > >>>>>> Thanks, > >>>>>> Michael > >>>>>> The scenario I am using is as follows: Terminal1: rmmod ixgbe Terminal1: rmmod ixgbevf Terminal1: rmmod igb_uio=20 Terminal2: ./testpmd -c f -n 1 -d ../../librte_pmd_null.so -- -i --no-flush= -rx Terminal1: insmod ./igb_uio.ko Terminal3: ./dpdk_nic_bind.py -b igb_uio 0000:05:00.0 Terminal2: testpmd> port attach 0000:05:00.0 Regards, Bernard.