From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f170.google.com (mail-pd0-f170.google.com [209.85.192.170]) by dpdk.org (Postfix) with ESMTP id EE6725A30 for ; Thu, 12 Feb 2015 02:44:25 +0100 (CET) Received: by pdjy10 with SMTP id y10so8331047pdj.6 for ; Wed, 11 Feb 2015 17:44:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=x92dAJTuepraf1kAyN+ueDCIC5N1eiFuH/7PUyQCSWg=; b=T7bvCPPbvHSxbJLGybcQkcXb1J4BD7ie5pQd6u8KTmhBsO6+OKFd+pz7o0UTmZd6sb BAnlu982arFkwoavgGCuJVgwE4X0sMuXRLpEp7TBP7QCkTJw0GPrCCaps9GJzD0AEdKe UfV1+1cg4puOEVi1dPte2ooFgoAoBqDY6OeOD6oeWMfQpSBnqpeEB96spYFmBnYEUFMc vKkqMLtWZ1Nd4WyJY92Fprp5tLc+8J9rNsne20xGcSUA8ATxYEQIZ/mp9tpr7kzZEohJ gFME/PwlhaV2dgUWzgQ6oGAlzZCS0zKEfWE3p4dmQC5R8VLH7SJDz3mvPaWlwjohBZYy 8gew== X-Gm-Message-State: ALoCoQlQTeiiyBnhi07X8tLSBqcqRSMd5bU4WJenyqPYle5/Mv62DQjfvsJApoVWjFcHftO28JXh X-Received: by 10.66.184.144 with SMTP id eu16mr2540241pac.18.1423705465313; Wed, 11 Feb 2015 17:44:25 -0800 (PST) Received: from [10.16.129.101] (napt.igel.co.jp. [219.106.231.132]) by mx.google.com with ESMTPSA id kt7sm1978783pdb.84.2015.02.11.17.44.23 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Feb 2015 17:44:24 -0800 (PST) Message-ID: <54DC0574.6000006@igel.co.jp> Date: Thu, 12 Feb 2015 10:44:20 +0900 From: Tetsuya Mukawa User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: "Qiu, Michael" , "Iremonger, Bernard" , "dev@dpdk.org" References: <1422763322-13742-4-git-send-email-mukawa@igel.co.jp> <1423470639-15744-1-git-send-email-mukawa@igel.co.jp> <1423470639-15744-5-git-send-email-mukawa@igel.co.jp> <533710CFB86FA344BFBF2D6802E60286CE71EA@SHSMSX101.ccr.corp.intel.com> <8CEF83825BEC744B83065625E567D7C2049DF5CA@IRSMSX108.ger.corp.intel.com> <533710CFB86FA344BFBF2D6802E60286CE7D25@SHSMSX101.ccr.corp.intel.com> <54DAE045.6000208@igel.co.jp> <54DAE142.6090204@igel.co.jp> <533710CFB86FA344BFBF2D6802E60286CE7E63@SHSMSX101.ccr.corp.intel.com> <54DB0F5F.90806@igel.co.jp> <533710CFB86FA344BFBF2D6802E60286CE821D@SHSMSX101.ccr.corp.intel.com> In-Reply-To: <533710CFB86FA344BFBF2D6802E60286CE821D@SHSMSX101.ccr.corp.intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH v7 04/14] eal/pci: Consolidate pci address comparison APIs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Feb 2015 01:44:26 -0000 On 2015/02/11 21:13, Qiu, Michael wrote: > On 2/11/2015 4:14 PM, Tetsuya Mukawa wrote: >> On 2015/02/11 15:29, Qiu, Michael wrote: >>> On 2/11/2015 12:57 PM, Tetsuya Mukawa wrote: >>>> On 2015/02/11 13:53, Tetsuya Mukawa wrote: >>>>> On 2015/02/11 12:27, Qiu, Michael wrote: >>>>>> On 2/10/2015 11:11 PM, Iremonger, Bernard wrote: >>>>>>>> -----Original Message----- >>>>>>>> From: Qiu, Michael >>>>>>>> Sent: Monday, February 9, 2015 1:10 PM >>>>>>>> To: Tetsuya Mukawa; dev@dpdk.org >>>>>>>> Cc: Iremonger, Bernard >>>>>>>> Subject: Re: [PATCH v7 04/14] eal/pci: Consolidate pci address c= omparison APIs >>>>>>>> >>>>>>>> On 2/9/2015 4:31 PM, Tetsuya Mukawa wrote: >>>>>>>>> This patch replaces pci_addr_comparison() and memcmp() of pci >>>>>>>>> addresses by eal_compare_pci_addr(). >>>>>>>>> >>>>>>>>> v5: >>>>>>>>> - Fix pci_scan_one to handle pt_driver correctly. >>>>>>>>> v4: >>>>>>>>> - Fix calculation method of eal_compare_pci_addr(). >>>>>>>>> - Add parameter checking. >>>>>>>>> >>>>>>>>> Signed-off-by: Tetsuya Mukawa >>>>>>>>> --- >>>>>>>>> lib/librte_eal/bsdapp/eal/eal_pci.c | 25 ++++++++-------= -------- >>>>>>>>> lib/librte_eal/common/eal_common_pci.c | 2 +- >>>>>>>>> lib/librte_eal/common/include/rte_pci.h | 34 +++++++++++++++= ++++++++++++++++ >>>>>>>>> lib/librte_eal/linuxapp/eal/eal_pci.c | 25 ++++++++-------= -------- >>>>>>>>> lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 2 +- >>>>>>>>> 5 files changed, 54 insertions(+), 34 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c >>>>>>>>> b/lib/librte_eal/bsdapp/eal/eal_pci.c >>>>>>>>> index 74ecce7..c844d58 100644 >>>>>>>>> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c >>>>>>>>> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c >>>>>>>>> @@ -270,20 +270,6 @@ pci_uio_map_resource(struct rte_pci_device= *dev) >>>>>>>>> return (0); >>>>>>>>> } >>>>>>>>> >>>>>>>>> -/* Compare two PCI device addresses. */ -static int >>>>>>>>> -pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_= addr >>>>>>>>> *addr2) -{ >>>>>>>>> - uint64_t dev_addr =3D (addr->domain << 24) + (addr->bus << 16= ) + (addr->devid << 8) + addr- >>>>>>>>> function; >>>>>>>>> - uint64_t dev_addr2 =3D (addr2->domain << 24) + (addr2->bus <<= 16) + (addr2->devid << 8) + >>>>>>>> addr2->function; >>>>>>>>> - >>>>>>>>> - if (dev_addr > dev_addr2) >>>>>>>>> - return 1; >>>>>>>>> - else >>>>>>>>> - return 0; >>>>>>>>> -} >>>>>>>>> - >>>>>>>>> - >>>>>>>>> /* Scan one pci sysfs entry, and fill the devices list from it= =2E */ >>>>>>>>> static int pci_scan_one(int dev_pci_fd, struct pci_conf *conf)= @@ >>>>>>>>> -356,13 +342,20 @@ pci_scan_one(int dev_pci_fd, struct pci_conf= *conf) >>>>>>>>> } >>>>>>>>> else { >>>>>>>>> struct rte_pci_device *dev2 =3D NULL; >>>>>>>>> + int ret; >>>>>>>>> >>>>>>>>> TAILQ_FOREACH(dev2, &pci_device_list, next) { >>>>>>>>> - if (pci_addr_comparison(&dev->addr, &dev2->addr)) >>>>>>>>> + ret =3D eal_compare_pci_addr(&dev->addr, &dev2->addr); >>>>>>>>> + if (ret > 0) >>>>>>>>> continue; >>>>>>>>> - else { >>>>>>>>> + else if (ret < 0) { >>>>>>>>> TAILQ_INSERT_BEFORE(dev2, dev, next); >>>>>>>>> return 0; >>>>>>>>> + } else { /* already registered */ >>>>>>>>> + /* update pt_driver */ >>>>>>>>> + dev2->pt_driver =3D dev->pt_driver; >>>>>>>>> + free(dev); >>>>>>>>> + return 0; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> TAILQ_INSERT_TAIL(&pci_device_list, dev, next); diff --git >>>>>>>>> a/lib/librte_eal/common/eal_common_pci.c >>>>>>>>> b/lib/librte_eal/common/eal_common_pci.c >>>>>>>>> index f3c7f71..a89f5c3 100644 >>>>>>>>> --- a/lib/librte_eal/common/eal_common_pci.c >>>>>>>>> +++ b/lib/librte_eal/common/eal_common_pci.c >>>>>>>>> @@ -93,7 +93,7 @@ static struct rte_devargs *pci_devargs_lookup= (struct rte_pci_device *dev) >>>>>>>>> if (devargs->type !=3D RTE_DEVTYPE_BLACKLISTED_PCI && >>>>>>>>> devargs->type !=3D RTE_DEVTYPE_WHITELISTED_PCI) >>>>>>>>> continue; >>>>>>>>> - if (!memcmp(&dev->addr, &devargs->pci.addr, sizeof(dev->addr= ))) >>>>>>>>> + if (!eal_compare_pci_addr(&dev->addr, &devargs->pci.addr)) >>>>>>>>> return devargs; >>>>>>>>> } >>>>>>>>> return NULL; >>>>>>>>> diff --git a/lib/librte_eal/common/include/rte_pci.h >>>>>>>>> b/lib/librte_eal/common/include/rte_pci.h >>>>>>>>> index 7f2d699..4814cd7 100644 >>>>>>>>> --- a/lib/librte_eal/common/include/rte_pci.h >>>>>>>>> +++ b/lib/librte_eal/common/include/rte_pci.h >>>>>>>>> @@ -269,6 +269,40 @@ eal_parse_pci_DomBDF(const char *input, st= ruct >>>>>>>>> rte_pci_addr *dev_addr) } #undef GET_PCIADDR_FIELD >>>>>>>>> >>>>>>>>> +/* Compare two PCI device addresses. */ >>>>>>>>> +/** >>>>>>>>> + * Utility function to compare two PCI device addresses. >>>>>>>>> + * >>>>>>>>> + * @param addr >>>>>>>>> + * The PCI Bus-Device-Function address to compare >>>>>>>>> + * @param addr2 >>>>>>>>> + * The PCI Bus-Device-Function address to compare >>>>>>>>> + * @return >>>>>>>>> + * 0 on equal PCI address. >>>>>>>>> + * Positive on addr is greater than addr2. >>>>>>>>> + * Negative on addr is less than addr2, or error. >>>>>>>>> + */ >>>>>>>>> +static inline int >>>>>>>>> +eal_compare_pci_addr(struct rte_pci_addr *addr, struct rte_pci= _addr >>>>>>>>> +*addr2) { >>>>>>>>> + uint64_t dev_addr, dev_addr2; >>>>>>>>> + >>>>>>>>> + if ((addr =3D=3D NULL) || (addr2 =3D=3D NULL)) >>>>>>>>> + return -1; >>>>>>>>> + >>>>>>>>> + dev_addr =3D (addr->domain << 24) | (addr->bus << 16) | >>>>>>>>> + (addr->devid << 8) | addr->function; >>>>>>>>> + dev_addr2 =3D (addr2->domain << 24) | (addr2->bus << 16) | >>>>>>>>> + (addr2->devid << 8) | addr2->function; >>>>>>>>> + >>>>>>>>> + if (dev_addr > dev_addr2) >>>>>>>>> + return 1; >>>>>>>>> + else if (dev_addr < dev_addr2) >>>>>>>>> + return -1; >>>>>>>>> + else >>>>>>>>> + return 0; >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> /** >>>>>>>>> * Probe the PCI bus for registered drivers. >>>>>>>>> * >>>>>>>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c >>>>>>>>> b/lib/librte_eal/linuxapp/eal/eal_pci.c >>>>>>>>> index c0ca5a5..d847102 100644 >>>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci.c >>>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c >>>>>>>>> @@ -229,20 +229,6 @@ error: >>>>>>>>> return -1; >>>>>>>>> } >>>>>>>>> >>>>>>>>> -/* Compare two PCI device addresses. */ -static int >>>>>>>>> -pci_addr_comparison(struct rte_pci_addr *addr, struct rte_pci_= addr >>>>>>>>> *addr2) -{ >>>>>>>>> - uint64_t dev_addr =3D (addr->domain << 24) + (addr->bus << 16= ) + (addr->devid << 8) + addr- >>>>>>>>> function; >>>>>>>>> - uint64_t dev_addr2 =3D (addr2->domain << 24) + (addr2->bus <<= 16) + (addr2->devid << 8) + >>>>>>>> addr2->function; >>>>>>>>> - >>>>>>>>> - if (dev_addr > dev_addr2) >>>>>>>>> - return 1; >>>>>>>>> - else >>>>>>>>> - return 0; >>>>>>>>> -} >>>>>>>>> - >>>>>>>>> - >>>>>>>>> /* Scan one pci sysfs entry, and fill the devices list from it= =2E */ >>>>>>>>> static int pci_scan_one(const char *dirname, uint16_t domain, = uint8_t >>>>>>>>> bus, @@ -353,13 +339,20 @@ pci_scan_one(const char *dirname, ui= nt16_t >>>>>>>>> domain, uint8_t bus, >>>>>>>>> } >>>>>>>>> else { >>>>>>>>> struct rte_pci_device *dev2 =3D NULL; >>>>>>>>> + int ret; >>>>>>>>> >>>>>>>>> TAILQ_FOREACH(dev2, &pci_device_list, next) { >>>>>>>>> - if (pci_addr_comparison(&dev->addr, &dev2->addr)) >>>>>>>>> + ret =3D eal_compare_pci_addr(&dev->addr, &dev2->addr); >>>>>>>>> + if (ret > 0) >>>>>>>>> continue; >>>>>>>>> - else { >>>>>>>>> + else if (ret < 0) { >>>>>>>>> TAILQ_INSERT_BEFORE(dev2, dev, next); >>>>>>>>> return 0; >>>>>>>>> + } else { /* already registered */ >>>>>>>>> + /* update pt_driver */ >>>>>>>>> + dev2->pt_driver =3D dev->pt_driver; >>>>>>> Hi Tetsuya, >>>>>>> >>>>>>> I am seeing a problem with the librte_pmd_ixgbe code where dev->m= ax_vfs is being lost in some scenarios. >>>>>>> The following line should be added here: >>>>>>> dev2->max_vfs =3D dev->max_vfs; >>>>>>> >>>>>>> numa_mode should probably be updated too (although it is not caus= ing a problem at present). >>>>>>> dev2->numa_mode =3D dev->numa_mode; >>>>>> I'm very curious, why those field miss? I haven't see any places c= lear >>>>>> this field. >>>>>> >>>>>> What is the root cause? >>>>> Hi Michael, >>>>> >>>>> Here is my guess. >>>>> The above function creates pci device list. >>>> I am sorry. I forgot to add below information. >>>> >>>> "max_vfs" or "numa_node" value is came from sysfs when the above >>>> function is processed. >>> Yes, but it has already been registered, why it missed? >> Yes, it has been registered already, but probably should be updated. >> I guess sysfs value will be changed when igb_uio starts managing the d= evice. >> >> ex) >> 1. Boot linux >> 2. start a dpdk application with no port. >> 3. pci device list is registered. >> - Here, "max_vfs" is came from sysfs. Or there is no such a entry. >> 4. igb_uio binds the device. >> 5. I guess max_vfs value of sysfs is changed. Or max_vfs entry is cre= ated. >> 6. The dpdk application calls hotplug function. > Yes, agree. > > But numa node can be changed? Hi Michael, I may misunderstand meaning of numa_node. I thought it indicated which numa node was nearest from the pci device, so it could not be configurable. BTW, I will be out of office tomorrow. So, I will submit v8 patches next Monday. Thanks, Tetsuya > > Bernard, does your issue occur after max_vfs changed in igb_uio? > > If not, I think must be figure out the reason. > > Thanks, > Michael >> - Here, I guess we need to update "max_vfs" value. >> >> Above is a just my assumption. >> It may be good to wait for Bernard's reply. >> >> Thanks, >> Tetsuya >> >>> Thanks, >>> Michael >>>>> And current DPDK implementation assumes all devices needed to be ma= naged >>>>> are under igb_uio or vfio when above code is processed. >>>>> To add hotplug function, we also need to think some devices will st= art >>>>> to be managed under igb_uio or vfio after initializing pci device l= ist. >>>>> Anyway, I guess "max_vfs" value will be changed when igb_uio or vfi= o >>>>> manages the device. >>>>> >>>>> Hi Bernard, >>>>> >>>>> Could you please check "max_vfs" and "num_node" values, then check = the >>>>> values again after the device is managed by igb_uio or vfio? >>>>> In my environment, it seems max_vfs is created by igb_uio. >>>>> But my NIC doesn't have VF, so behavior might be different in your >>>>> environment. >>>>> I guess "numa_node" should not be changed theoretically. >>>>> >>>>> If my guess is correct, how about replacing following values? >>>>> - driver >>>>> - max_vfs >>>>> - resource >>>>> - (numa_node) >>>>> Except for above value, I guess other value shouldn't be changed ev= en >>>>> after the device is managed by igb_uio or vfio. >>>>> >>>>> Thanks, >>>>> Tetsuya >>>>> >>>>>> Thanks, >>>>>> Michael >>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Bernard. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>> + free(dev); >>>>>>>>> + return 0; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> TAILQ_INSERT_TAIL(&pci_device_list, dev, next); diff --git >>>>>>>>> a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c >>>>>>>>> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c >>>>>>>>> index e53f06b..1da3507 100644 >>>>>>>>> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c >>>>>>>>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c >>>>>>>>> @@ -123,7 +123,7 @@ pci_uio_map_secondary(struct rte_pci_device= *dev) >>>>>>>>> TAILQ_FOREACH(uio_res, pci_res_list, next) { >>>>>>>>> >>>>>>>>> /* skip this element if it doesn't match our PCI address */ >>>>>>>>> - if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)= )) >>>>>>>>> + if (eal_compare_pci_addr(&uio_res->pci_addr, &dev->addr)) >>>>>>>>> continue; >>>>>>>>> >>>>>>>>> for (i =3D 0; i !=3D uio_res->nb_maps; i++) { >>>>>>>> Acked-by: Michael Qiu >>