From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3D6ECA04B7; Thu, 17 Sep 2020 15:28:46 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 901B81D643; Thu, 17 Sep 2020 15:28:45 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 6219D1BEB2; Thu, 17 Sep 2020 15:28:43 +0200 (CEST) IronPort-SDR: Q2YIsMCm4Yi02jtoQwzAl7XJkSapSEaQ49PJcuCMXzNX1+0C9/5D/O3RxQTU+VQ4H2pXI97fiT ZNpSOd7qTbUg== X-IronPort-AV: E=McAfee;i="6000,8403,9746"; a="158995215" X-IronPort-AV: E=Sophos;i="5.76,437,1592895600"; d="scan'208";a="158995215" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2020 06:28:40 -0700 IronPort-SDR: QSp+MJEjnJ376vONsf2NGlORBSbQnUu60oosWwQ4fpR1NxTSbj55At0fFBiUNwfVBkeWySiJs+ tcApYFyVM1vA== X-IronPort-AV: E=Sophos;i="5.76,437,1592895600"; d="scan'208";a="483745515" Received: from dhunt5-mobl5.ger.corp.intel.com (HELO [10.213.228.49]) ([10.213.228.49]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2020 06:28:39 -0700 To: Lukasz Wojciechowski , Bruce Richardson Cc: dev@dpdk.org, stable@dpdk.org References: <20200915193449.13310-1-l.wojciechow@partner.samsung.com> <20200915193449.13310-7-l.wojciechow@partner.samsung.com> From: David Hunt Message-ID: <8720f8b9-bd25-29fb-e80a-4de81eec980c@intel.com> Date: Thu, 17 Sep 2020 14:28:37 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20200915193449.13310-7-l.wojciechow@partner.samsung.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [dpdk-dev] [PATCH v1 6/6] distributor: fix handshake deadlock X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Lukasz, On 15/9/2020 8:34 PM, Lukasz Wojciechowski wrote: > Synchronization of data exchange between distributor and worker cores > is based on 2 handshakes: retptr64 for returning mbufs from workers > to distributor and bufptr64 for passing mbufs to workers. > > Without proper order of verifying those 2 handshakes a deadlock may > occur. This can happen when worker core want to return back mbufs > and waits for retptr handshake to be cleared and distributor core > wait for bufptr to send mbufs to worker. > > This can happen as worker core first returns mbufs to distributor > and later gets new mbufs, while distributor first release mbufs > to worker and later handle returning packets. > > This patch fixes possibility of the deadlock by always taking care > of returning packets first on the distributor side and handling > packets while waiting to release new. > > Fixes: 775003ad2f96 ("distributor: add new burst-capable library") > Cc: david.hunt@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Lukasz Wojciechowski > --- > lib/librte_distributor/rte_distributor.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c > index 89493c331..12b3db33c 100644 > --- a/lib/librte_distributor/rte_distributor.c > +++ b/lib/librte_distributor/rte_distributor.c > @@ -321,12 +321,14 @@ release(struct rte_distributor *d, unsigned int wkr) > struct rte_distributor_buffer *buf = &(d->bufs[wkr]); > unsigned int i; > > + handle_returns(d, wkr); > + > /* Sync with worker on GET_BUF flag */ > while (!(__atomic_load_n(&(d->bufs[wkr].bufptr64[0]), __ATOMIC_ACQUIRE) > - & RTE_DISTRIB_GET_BUF)) > + & RTE_DISTRIB_GET_BUF)) { > + handle_returns(d, wkr); > rte_pause(); > - > - handle_returns(d, wkr); > + } > > buf->count = 0; > > @@ -376,6 +378,7 @@ rte_distributor_process(struct rte_distributor *d, > /* Flush out all non-full cache-lines to workers. */ > for (wid = 0 ; wid < d->num_workers; wid++) { > /* Sync with worker on GET_BUF flag. */ > + handle_returns(d, wid); > if (__atomic_load_n(&(d->bufs[wid].bufptr64[0]), > __ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF) { > release(d, wid); Makes sense. Thanks for the series.  Again, no degradation in performance on my systems. Acked-by: David Hunt