From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 92D8F1B81C for ; Mon, 17 Dec 2018 11:38:54 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Dec 2018 02:38:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,365,1539673200"; d="scan'208";a="130582026" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.251.86.46]) ([10.251.86.46]) by fmsmga001.fm.intel.com with ESMTP; 17 Dec 2018 02:38:52 -0800 To: Hideyuki Yamashita , dev@dpdk.org Cc: Yasufumi Ogawa , Nakamura Hioryuki References: <201812100754.wBA7s5ZE022173@ccmail04.silk.ntt-tx.co.jp> <201812110055.wBB0tCpA023963@ccmail04.silk.ntt-tx.co.jp> <201812171004.wBHA432F013027@ccmail04.silk.ntt-tx.co.jp> <0eb59f12-0192-a234-a773-7f3061abea3a@intel.com> From: "Burakov, Anatoly" Message-ID: Date: Mon, 17 Dec 2018 10:38:51 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.3 MIME-Version: 1.0 In-Reply-To: <0eb59f12-0192-a234-a773-7f3061abea3a@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] rte_eal_hotplug_remove() generates error message X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2018 10:38:55 -0000 On 17-Dec-18 10:23 AM, Burakov, Anatoly wrote: > On 17-Dec-18 10:02 AM, Hideyuki Yamashita wrote: >> Dear Thomas and all, >> >> I took a look on dpdk code. >> I firstly write qustions and my analisys >> on the current dpdk code follows after that. >> >> [1.Questions] >> I have several questions to ask again. >> Is my understanding correct about followings >> >> Q1: "EAL:ERROR, Invalid memory" is ignorable >> >> Q2: there is no big difference between calling >> rte_eal_hotplug_remove(bus->name, dev->name) >> and >> rte_dev_remove(dev) because anyway it >> reaches to rte_pmd_vhost_remove and encounter >> the same error. >> >> [2.snip from my code] >> ..... >>          rte_eth_dev_close(port_id); >>          ret = rte_dev_remove(dev); >>          if (ret < 0) >>                  return ret; >>          rte_eth_dev_release_port(&rte_eth_devices[port_id]); >> >> [3. My analysis on dpdk code] >> static int >>    rte_pmd_vhost_remove(struct rte_vdev_device *dev) >>    { >>     ........... >>           eth_dev_close(eth_dev); >> >>            rte_free(vring_states[eth_dev->data->port_id]); >>            vring_states[eth_dev->data->port_id] = NULL; >> >>            rte_eth_dev_release_port(eth_dev); >> >> As you can see in rte_eth_vhost.c >> It calls both eth_dev_close and rte_eth_dev_release_port. >> And inside both functions, it tries to free mac_addrs. >>          rte_free(dev->data->mac_addrs);       //in rth_dev_close >>          rte_free(eth_dev->data->mac_addrs);  //in >> rte_eth_dev_release_port >> >> I understand that is the reason why >> /* Free the memory space back to heap */ >> void rte_free(void *addr) >> { >>          if (addr == NULL) return; >>          if (malloc_heap_free(malloc_elem_from_data(addr)) < 0) >>                  RTE_LOG(ERR, EAL, "Error: Invalid memory\n"); >> } >> encounter the error. >> As an experiment, I commented out one of them, "ERR, Invalid memory" >> disappered. >> >> Thanks and BR, >> Hideyuki Yamashita >> NTT TechnoCross >> >>> Adding my colleague Yasufumi and Hiroyuki as CC. >>> >>> We are waiting valuable advice from you. >>> >>> Thanks in advance, >>> Hideyuki Yamashita >>> NTT TechnoCross >>> >>>> >>>> Dear Thomas and all, >>>> >>>> I hope you all get safely back home after DPDK summit. >>>> (When I get back Japan, it is chilling. (start of winter)) >>>> >>>> On DPDK 18.11.0, we tried to remove vhost device by using >>>> rte_eal_hotplug_remove(). >>>> However, following syslog message is printed. >>>> “EAL: Error: Invalid memory” >>>> >>>> At DPDK summit San Jose, we had chance to ask Thomas how to handle >>>> the error message, and he gave us following advice: >>>> Replace “rte_eal_hotplug_add()” to “rte_dev_probe(devargs)” and >>>> “rte_eal_hotplug_remove()” to “rte_eth_dev_close() and >>>> rte_dev_remove(rte_dev)” >>>> >>>> We tested above changes, but the result is the same (i.e., same >>>> error message is printed). >>>> The debug log message says: >>>> --- >>>> [primary] >>>> VHOST_CONFIG: vhost-user server: socket created, fd: 17 >>>> VHOST_CONFIG: bind to /tmp/sock0 >>>> EAL: Error: Invalid memory >>>> VHOST_CONFIG: vhost-user server: socket created, fd: 17 >>>> VHOST_CONFIG: bind to /tmp/sock0 >>>> >>>> [secondary] >>>> APP: devargs=eth_vhost0,iface=/tmp/sock0,queues=1 >>>> EAL: request: eal_dev_mp_request >>>> EAL: msg: eal_dev_mp_request >>>> EAL: request: bus_vdev_mp >>>> EAL: msg: bus_vdev_mp >>>> EAL: msg: bus_vdev_mp >>>> EAL: reply: eal_dev_mp_request >>>> EAL: msg: eal_dev_mp_request >>>> rte_eth_promiscuous_disable: Function not supported >>>> rte_eth_allmulticast_disable: Function not supported >>>> APP: To Server: add >>>> EAL: request: eal_dev_mp_request >>>> EAL: msg: eal_dev_mp_request >>>> EAL: reply: eal_dev_mp_request >>>> EAL: msg: eal_dev_mp_request >>>> APP: To Server: del >>>> APP: devargs=eth_vhost0,iface=/tmp/sock0,queues=1 >>>> EAL: request: eal_dev_mp_request >>>> EAL: msg: eal_dev_mp_request >>>> EAL: request: bus_vdev_mp >>>> EAL: msg: bus_vdev_mp >>>> EAL: msg: bus_vdev_mp >>>> EAL: reply: eal_dev_mp_request >>>> EAL: msg: eal_dev_mp_request >>>> rte_eth_promiscuous_disable: Function not supported >>>> rte_eth_allmulticast_disable: Function not supported >>>> APP: To Server: add >>>> --- >>>> >>>> We would like to ask: >>>> 1)    Is the message “EAL: Error: Invalid memory” ignorable or not? >>>> There is no obstacle except this message to re-add the vhost device. >>>> 2)    Which is the better(best?) way to add/del vhost device >>>> “rte_eal_hotplug_add/remove()” or the way Thomas suggested? >>>> >>>> Thanks in advance and have a nice day. >>>> >>>> BR, >>>> Hideyuki Yamashita >>>> NTT TechnoCross >>> >> >> >> > > Hi Hideyuki, > > The error you're referring to (about invalid memory) means that you're > trying to free a pointer that points to invalid memory. Meaning, either > the pointer itself is not pointing to an allocated area, or it points to > memory that has already been freed. > > If dev->data->mac_addrs and eth_dev->data->mac_addrs point to the same > area, this is a bug, because this would lead to double free, and > rte_malloc will rightly complain about invalid memory. Now, malloc won't > try to do anything with the invalid memory, so the error itself is > harmless *as far as malloc is concerned*, but these attempts to free the > memory twice should be fixed whereever they happen. > > I'm not well-versed in dev infrastructure, so i wouldn't be able to say > which one of the rte_free calls is an extra, unneeded one. This is > something e.g. Thomas could help with, or the driver maintainer. > ...with that said, to me, this looks suspicious: > rte_eth_dev_close(port_id); > ret = rte_dev_remove(dev); > if (ret < 0) > return ret; > rte_eth_dev_release_port(&rte_eth_devices[port_id]); So, you're closing the port, then you remove the device, then you release port. From your own analysis, what dev_remove does is the following: > rte_pmd_vhost_remove(struct rte_vdev_device *dev) > { > ........... >  eth_dev_close(eth_dev); > > rte_free(vring_states[eth_dev->data->port_id]); > vring_states[eth_dev->data->port_id] = NULL; > > rte_eth_dev_release_port(eth_dev); So, you have duplicate call to eth_dev_close() (which is likely not a problem), and a duplicate call to rte_eth_dev_release_port(). So, if rte_dev_remove() already calls rte_eth_dev_release_port(), why are you calling it in your code? Now, as i said above, i'm not well-versed enough in the dev infrastructure to know which of the calls should go where, but i'm pretty sure you shouldn't call rte_eth_dev_release_port twice - this is likely what's causing your double-free. -- Thanks, Anatoly