From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id DB367C4CC for ; Fri, 19 Feb 2016 08:02:50 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP; 18 Feb 2016 23:02:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,469,1449561600"; d="scan'208";a="50706895" Received: from yliu-dev.sh.intel.com (HELO yliu-dev) ([10.239.66.49]) by fmsmga004.fm.intel.com with ESMTP; 18 Feb 2016 23:02:48 -0800 Date: Fri, 19 Feb 2016 15:03:26 +0800 From: Yuanhan Liu To: "Tan, Jianfeng" Message-ID: <20160219070326.GR21426@yliu-dev.sh.intel.com> References: <1450321921-27799-1-git-send-email-yuanhan.liu@linux.intel.com> <1454043483-24579-1-git-send-email-yuanhan.liu@linux.intel.com> <1454043483-24579-7-git-send-email-yuanhan.liu@linux.intel.com> <56C6B218.6080501@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56C6B218.6080501@intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: dev@dpdk.org, Victor Kaplansky , "Michael S. Tsirkin" Subject: Re: [dpdk-dev] [PATCH v3 6/8] vhost: handle VHOST_USER_SEND_RARP request X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Feb 2016 07:02:51 -0000 On Fri, Feb 19, 2016 at 02:11:36PM +0800, Tan, Jianfeng wrote: > Hi Yuanhan, > > On 1/29/2016 12:58 PM, Yuanhan Liu wrote: > >While in former patch we enabled GUEST_ANNOUNCE feature, so that the > >guest OS will broadcast a GARP message after migration to notify the > >switch about the new location of migrated VM, the thing is that > >GUEST_ANNOUNCE is enabled since kernel v3.5 only. For older kernel, > >VHOST_USER_SEND_RARP request comes to rescue. > > > >The payload of this new request is the mac address of the migrated VM, > >with that, we could construct a RARP message, and then broadcast it > >to host interfaces. > > > >That's how this patch works: > > > >- list all interfaces, with the help of SIOCGIFCONF ioctl command > > > >- construct an RARP message and broadcast it > > > >Cc: Thibaut Collet > >Signed-off-by: Yuanhan Liu > >--- > ... > >+ > >+/* > >+ * Broadcast a RARP message to all interfaces, to update > >+ * switch's mac table > >+ */ > >+int > >+user_send_rarp(struct VhostUserMsg *msg) > >+{ > >+ uint8_t *mac = (uint8_t *)&msg->payload.u64; > >+ uint8_t rarp[RARP_BUF_SIZE]; > >+ struct ifconf ifc = {0, }; > >+ struct ifreq *ifr; > >+ int nr = 16; > >+ int fd; > >+ uint32_t i; > >+ > >+ RTE_LOG(DEBUG, VHOST_CONFIG, > >+ ":: mac: %02x:%02x:%02x:%02x:%02x:%02x\n", > >+ mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]); > >+ > >+ make_rarp_packet(rarp, mac); > >+ > >+ /* > >+ * Get all interfaces > >+ */ > >+ fd = socket(AF_INET, SOCK_DGRAM, 0); > >+ if (fd < 0) { > >+ perror("failed to create AF_INET socket"); > >+ return -1; > >+ } > >+ > >+again: > >+ ifc.ifc_len = sizeof(*ifr) * nr; > >+ ifc.ifc_buf = realloc(ifc.ifc_buf, ifc.ifc_len); > >+ > >+ if (ioctl(fd, SIOCGIFCONF, &ifc) < 0) { > >+ perror("failed at SIOCGIFCONF"); > >+ close(fd); > >+ return -1; > >+ } > >+ > >+ if (ifc.ifc_len == (int)sizeof(struct ifreq) * nr) { > >+ /* > >+ * current ifc_buf is not big enough to hold > >+ * all interfaces; double it and try again. > >+ */ > >+ nr *= 2; > >+ goto again; > >+ } > >+ > >+ ifr = (struct ifreq *)ifc.ifc_buf; > >+ for (i = 0; i < ifc.ifc_len / sizeof(struct ifreq); i++) > >+ send_rarp(ifr[i].ifr_name, rarp); > >+ > >+ close(fd); > >+ > >+ return 0; > >+} > > From how you implement user_send_rarp(), if I understand it correctly, it > broadcasts this ARP packets to all host interfaces, which I don't think it's > appropriate. This ARP packets should be sent to it's own L2 networking. You > should not make the hypothesis that all interfaces maintained in the kernel > are in the same L2 networking. Even worse, this could bring problems when > used in overlay networking, in which two VM in two different overlay > networking, can have same MAC address. > > What I suggest here is to move user_send_rarp() to rte_vhost_dequeue_burst() > using a flag to control, so that this arp packet can be broadcasted in its > own L2 network. I have thought of that, too. It was given up because SEND_RARP request was handled in different thread from rte_vhost_dequeue_burst(), leading to the fact that the RARP packet will not be broadcasted immediately after migration is done: it will be broadcasted only when rte_vhost_dequeue_burst() is invoked. I was thinking the delay might be a problem. While thinking it twice, it doesn't look like one then. As GUEST_ANNOUNCE is also broadcasted by rte_vhost_dequeue_burst(); it's enqueued by guest kernel though. And judging that we are polling mode driver, it won't be an issue then. So, thanks. I will give it a quick try; it should work. --yliu