From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 00EE8B3C6 for ; Thu, 18 Sep 2014 17:56:53 +0200 (CEST) Received: from hmsreliant.think-freely.org ([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1XUeAF-0002l6-6v; Thu, 18 Sep 2014 12:02:37 -0400 Date: Thu, 18 Sep 2014 12:02:34 -0400 From: Neil Horman To: "Wodkowski, PawelX" Message-ID: <20140918160234.GJ20389@hmsreliant.think-freely.org> References: <1410963713-13837-1-git-send-email-pawelx.wodkowski@intel.com> <1410963713-13837-3-git-send-email-pawelx.wodkowski@intel.com> <20140917151304.GD4213@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: "dev@dpdk.org" , "Jastrzebski, MichalX K" Subject: Re: [dpdk-dev] [PATCH 2/2] bond: add mode 4 support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Sep 2014 15:56:53 -0000 On Thu, Sep 18, 2014 at 08:07:31AM +0000, Wodkowski, PawelX wrote: > > > +int > > > +bond_mode_8023ad_deactivate_slave(struct rte_eth_dev *bond_dev, > > > + uint8_t slave_pos) > > > +{ > > > + struct bond_dev_private *internals = bond_dev->data->dev_private; > > > + struct mode8023ad_data *data = &internals->mode4; > > > + struct port *port; > > > + uint8_t i; > > > + > > > + bond_mode_8023ad_stop(bond_dev); > > > + > > > + /* Exclude slave from transmit policy. If this slave is an aggregator > > > + * make all aggregated slaves unselected to force sellection logic > > > + * to select suitable aggregator for this port */ > > > + for (i = 0; i < internals->active_slave_count; i++) { > > > + port = &data->port_list[slave_pos]; > > > + if (port->used_agregator_idx == slave_pos) { > > > + port->selected = UNSELECTED; > > > + port->actor_state &= ~(STATE_SYNCHRONIZATION | > > STATE_DISTRIBUTING | > > > + STATE_COLLECTING); > > > + > > > + /* Use default aggregator */ > > > + port->used_agregator_idx = i; > > > + } > > > + } > > > + > > > + port = &data->port_list[slave_pos]; > > > + timer_cancel(&port->current_while_timer); > > > + timer_cancel(&port->periodic_timer); > > > + timer_cancel(&port->wait_while_timer); > > > + timer_cancel(&port->tx_machine_timer); > > > + > > These all seem rather racy. Alarm callbacks are executed with the alarm list > > locks not held. So there is every possibility that you could execute these (or > > any timer_cancel calls in this PMD in parallel with the internal state machine > > timer callback, and leave either with a corrupted timer list (resulting from a > > double free between here, and the actual callback site), > > I don't think so. Yes, callbacks are executed with alarm list locks not held, but > this is not the issue because access to list itself is guarded by lock and > ap->executing variable. So list will not be trashed. Check source of > eal_alarm_callback(), rte_eal_alarm_set() and rte_eal_alarm_cancel(). > Yes, you're right, the list is probably safe wht the executing bit. > > or a timer that is > > actually still pending when a slave is removed. > > > This is not the issue also, but problem might be similar. I assumed that alarms > are atomic but when I looked at rte alarms closer I saw a race condition > between and rte_eal_alarm_cancel() from bond_mode_8023ad_stop() > and rte_eal_alarm_set() from state machines callback. This need to be > reworked in some way. Yes, this is what I was referring to: CPU0 CPU1 rte_eal_alarm_callback bond_8023ad_deactivate_slave -bond_8023_ad_periodic_cb timer_cancel timer_set If those timer functions operate on the same timer, the result is that you can leave the stop/deactivate slave paths with a timer function for that slave still pending. The bonding mode needs some internal state to serialize those operations and determine if the timer should be reactivated. Neil