From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <anatoly.burakov@intel.com>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id 2D7035F1A
 for <dev@dpdk.org>; Thu, 25 Oct 2018 16:04:46 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 25 Oct 2018 07:04:45 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,424,1534834800"; d="scan'208";a="98590134"
Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.99])
 ([10.237.220.99])
 by fmsmga002.fm.intel.com with ESMTP; 25 Oct 2018 07:04:44 -0700
To: Thomas Monjalon <thomas@monjalon.net>,
 Stephen Hemminger <stephen@networkplumber.org>
Cc: dev@dpdk.org, Stephen Hemminger <sthemmin@microsoft.com>
References: <20180725182019.31518-1-stephen@networkplumber.org>
 <1622402.chDL49Ktjv@xps> <facd66b5-1e99-0cdc-748e-174b06d3113f@intel.com>
 <1682249.U5QVuGPMnJ@xps>
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Message-ID: <cb645bbd-de30-c2ee-e29e-bc93a03597d6@intel.com>
Date: Thu, 25 Oct 2018 15:04:42 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <1682249.U5QVuGPMnJ@xps>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH 3/4] eal: don't crash if alarm set fails
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Oct 2018 14:04:47 -0000

On 25-Oct-18 12:51 AM, Thomas Monjalon wrote:
> 18/09/2018 12:16, Burakov, Anatoly:
>> On 18-Sep-18 10:43 AM, Thomas Monjalon wrote:
>>> 26/07/2018 11:41, Burakov, Anatoly:
>>>> On 25-Jul-18 7:20 PM, Stephen Hemminger wrote:
>>>>> There is no need to call rte_exit and crash the application here;
>>>>> better to let the application handle the error itself.
>>>>>
>>>>> Remove the gratuitous profanity which would be visible if
>>>>> the rte_exit was still there.
>>>>>
>>>>> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>>>>> ---
>>>>> --- a/lib/librte_eal/common/eal_common_proc.c
>>>>> +++ b/lib/librte_eal/common/eal_common_proc.c
>>>>> @@ -841,14 +841,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
>>>>>     
>>>>>     	param->user_reply.nb_sent++;
>>>>>     
>>>>> -	if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
>>>>> -			      async_reply_handle, pending_req) < 0) {
>>>>> +	ret = rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
>>>>> +				async_reply_handle, pending_req);
>>>>> +	if (ret < 0)
>>>>>     		RTE_LOG(ERR, EAL, "Fail to set alarm for request %s:%s\n",
>>>>>     			dst, req->name);
>>>>> -		rte_panic("Fix the above shit to properly free all memory\n");
>>>>
>>>> Profanity aside, i think the message was trying to tell me something -
>>>> namely, that if alarm_set fails, we're risking to leak this memory if
>>>> reply from the peer never comes, and we're risking leaving the
>>>> application hanging because the timeout never triggers. I'm not sure if
>>>> leaving this "to the user" is the right choice, because there is no way
>>>> for the user to free IPC-internal memory if it leaks.
>>>>
>>>> So i think the proper way to handle this would've been to set the alarm
>>>> first, then, if it fails, don't sent the message in the first place.
>>>
>>> What should be done here? OK to remove rte_panic for now?
>>>
>>
>> As i said, the above fix is wrong because it leaks memory (however
>> unlikely it may be).
>>
>> The alarm set call should be moved to before we do send_msg() call (and
>> goto fail; on failure). That way, even if alarm triggers too early (i.e.
>> immediately), the requests tailq will still be locked until we complete
>> our request sends - so we appropriately free memory on response, on
>> timeout or in our failure handler if alarm set has failed.
> 
> Someone to fix it, please?
> 

I'll do it.

-- 
Thanks,
Anatoly