From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 2D7035F1A for ; Thu, 25 Oct 2018 16:04:46 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Oct 2018 07:04:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,424,1534834800"; d="scan'208";a="98590134" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.99]) ([10.237.220.99]) by fmsmga002.fm.intel.com with ESMTP; 25 Oct 2018 07:04:44 -0700 To: Thomas Monjalon , Stephen Hemminger Cc: dev@dpdk.org, Stephen Hemminger References: <20180725182019.31518-1-stephen@networkplumber.org> <1622402.chDL49Ktjv@xps> <1682249.U5QVuGPMnJ@xps> From: "Burakov, Anatoly" Message-ID: Date: Thu, 25 Oct 2018 15:04:42 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <1682249.U5QVuGPMnJ@xps> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH 3/4] eal: don't crash if alarm set fails X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Oct 2018 14:04:47 -0000 On 25-Oct-18 12:51 AM, Thomas Monjalon wrote: > 18/09/2018 12:16, Burakov, Anatoly: >> On 18-Sep-18 10:43 AM, Thomas Monjalon wrote: >>> 26/07/2018 11:41, Burakov, Anatoly: >>>> On 25-Jul-18 7:20 PM, Stephen Hemminger wrote: >>>>> There is no need to call rte_exit and crash the application here; >>>>> better to let the application handle the error itself. >>>>> >>>>> Remove the gratuitous profanity which would be visible if >>>>> the rte_exit was still there. >>>>> >>>>> Signed-off-by: Stephen Hemminger >>>>> --- >>>>> --- a/lib/librte_eal/common/eal_common_proc.c >>>>> +++ b/lib/librte_eal/common/eal_common_proc.c >>>>> @@ -841,14 +841,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req, >>>>> >>>>> param->user_reply.nb_sent++; >>>>> >>>>> - if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000, >>>>> - async_reply_handle, pending_req) < 0) { >>>>> + ret = rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000, >>>>> + async_reply_handle, pending_req); >>>>> + if (ret < 0) >>>>> RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n", >>>>> dst, req->name); >>>>> - rte_panic("Fix the above shit to properly free all memory\n"); >>>> >>>> Profanity aside, i think the message was trying to tell me something - >>>> namely, that if alarm_set fails, we're risking to leak this memory if >>>> reply from the peer never comes, and we're risking leaving the >>>> application hanging because the timeout never triggers. I'm not sure if >>>> leaving this "to the user" is the right choice, because there is no way >>>> for the user to free IPC-internal memory if it leaks. >>>> >>>> So i think the proper way to handle this would've been to set the alarm >>>> first, then, if it fails, don't sent the message in the first place. >>> >>> What should be done here? OK to remove rte_panic for now? >>> >> >> As i said, the above fix is wrong because it leaks memory (however >> unlikely it may be). >> >> The alarm set call should be moved to before we do send_msg() call (and >> goto fail; on failure). That way, even if alarm triggers too early (i.e. >> immediately), the requests tailq will still be locked until we complete >> our request sends - so we appropriately free memory on response, on >> timeout or in our failure handler if alarm set has failed. > > Someone to fix it, please? > I'll do it. -- Thanks, Anatoly