From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A8757A04E0; Thu, 28 Nov 2019 14:15:02 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C44282C52; Thu, 28 Nov 2019 14:15:00 +0100 (CET) Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) by dpdk.org (Postfix) with ESMTP id 4297F2BF9 for ; Thu, 28 Nov 2019 14:14:59 +0100 (CET) Received: by mail-ed1-f66.google.com with SMTP id s10so22706668edi.5 for ; Thu, 28 Nov 2019 05:14:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=zqLW8y6XuY/eze5TPu8yoer29B64iVvTSIGRLfMwXKw=; b=L2UrByn/hIEY3GuApMigydf4E+7ct2CFRISj+lKxZheIiQ44eVNRpIuKHGk9pOJUQ7 SA6WDP0dSLndQHTTJioCYX2BrcqxO2fKJ5Wktx8qPOAIZb3ZrnB9Lj3bPMgz4mLZukV6 ka12xMyCmckkMep8k3Z+F8/HrME2OV8YTCZTXJBTtCBwY2QXRhRCmgB/7b2rjOpb7LiQ Kltwy++1m2keel4BUyUvL1JzGp4s3JY5TMbohWkP8k3y1azumBqwMUAcI/IVz9uerPl5 qLsiGFjZSc0XlXZA9wiVBcNGGg8LmfH9ZY2lyu72k9z87EZPNVLLEtpbdjr4m6xpSUSm OCxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=zqLW8y6XuY/eze5TPu8yoer29B64iVvTSIGRLfMwXKw=; b=sp7++yKSsfGR3JuUHql+DVsUM64Ytq4cvAxHV9UPgdVEmMpBj1yeCuIQKZk5ah+z16 H+O7AQKF8FvJQB7aL13kp0605GjogyqhIeBAAv/LW5Y9tz462ViyR9UAecZXR8ZcuT6H 5bo/DuU38jn0Wx7F2+eoNG2uLwjzpiBF7ej2oLNDH2MUHgG+oTFh6m1FrUtm9F4YNvdZ 6uPA3uomyrniTss1tZ20JgAy069sCd0N3zC+yNT/JQyEMgWyZUxqNAictUYfsKi8MHQ8 nf1eoZSLw5kFEfcpTLMoGzoUMi3XtK15yi62pD5FJbljnS+Qy/sIoMhcRgTugxalMcPe OS2A== X-Gm-Message-State: APjAAAXrrbsWmUVAoSzARQPiFWj4Tmvkv5cKDm9Vduiz+vE3Fi8lY+WQ F6vCEld2D/jFN/O4b9lrsbAGCGmuOkSolcBgwfBFSQ== X-Google-Smtp-Source: APXvYqwNC7wI48lEA0mIEDry+Ye7zKZ4aygcXCzxtqFShZvVB/gtHsp7o68LdHcUX37vFe2SKQkoBfGgws8iPIiXyz8= X-Received: by 2002:aa7:d6c8:: with SMTP id x8mr8313806edr.300.1574946898914; Thu, 28 Nov 2019 05:14:58 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Micha=C5=82_Krawczyk?= Date: Thu, 28 Nov 2019 14:14:47 +0100 Message-ID: To: kumaraparameshwaran rathinavel Cc: dev@dpdk.org, "Chauskin, Igor" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] Admin Queue ENA X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Param, first of all - you are using very old ena_com. This code comes from the DPDK version before v18.08. If you have any doubts, please check the newer version of the driver and DPDK as the potential bug could be already fixed there. Anyway, if you will look at the function get_comp_ctxt() which is called by __ena_com_submit_admin_cmd() to get the completion context, there is a check for the context if it's not occupied - in case it is (which will be true until comp_ctxt_release() will clear it), the new command using the same context cannot be used. So there shouldn't be two consumers using the same completion contexts. In addition, drivers that are using ena_com are sending admin commands one at a time during the init, so there shouldn't be even 2 commands at a time. The only exception is ena_com_get_dev_basic_stats(), which is called from rte_eth_stats_get() context - but if you consider DPDK application, it should use it on the management lcore after init, so it'll also be serialized. Thanks, Michal pt., 8 lis 2019 o 07:02 kumaraparameshwaran rathinavel napisa=C5=82(a): > > Hi Micha=C5=82, > > Please look at the below function, > > static int > ena_com_wait_and_process_admin_cq_polling( > struct ena_comp_ctx *comp_ctx, > struct ena_com_admin_queue *admin_queue) > { > unsigned long flags =3D 0; > u64 start_time; > int ret; > > start_time =3D ENA_GET_SYSTEM_USECS(); > > while (comp_ctx->status =3D=3D ENA_CMD_SUBMITTED) { > if ((ENA_GET_SYSTEM_USECS() - start_time) > > ADMIN_CMD_TIMEOUT_US) { > ena_trc_err("Wait for completion (polling) timeout\n"); > /* ENA didn't have any completion */ > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > admin_queue->stats.no_completion++; > admin_queue->running_state =3D false; > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > > ret =3D ENA_COM_TIMER_EXPIRED; > goto err; > } > > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > ena_com_handle_admin_completion(admin_queue); > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > } > > if (unlikely(comp_ctx->status =3D=3D ENA_CMD_ABORTED)) { > ena_trc_err("Command was aborted\n"); > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > admin_queue->stats.aborted_cmd++; > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > ret =3D ENA_COM_NO_DEVICE; > goto err; > } > > ENA_ASSERT(comp_ctx->status =3D=3D ENA_CMD_COMPLETED, > "Invalid comp status %d\n", comp_ctx->status); > > ret =3D ena_com_comp_status_to_errno(comp_ctx->comp_status); > err: > comp_ctxt_release(admin_queue, comp_ctx); > return ret; > } > > This is a case where there are two threads executing admin commands. > > The occupied flag is set to false in the function comp_ctxt_release. Let= us say there are two consumers of completion context and C1 has a completi= on context and the same completion context can be used by another consumer = C2 even before the C1 is resetting the occupied flag. > > This is because the ena_com_handle_admin_completion is done under spin lo= ck and comp_ctxt_release is not under this spin lock. > > Thanks, > Param > > On Thu, Oct 24, 2019 at 2:09 PM Micha=C5=82 Krawczyk wr= ote: >> >> sob., 19 pa=C5=BA 2019 o 20:26 kumaraparameshwaran rathinavel >> napisa=C5=82(a): >> > >> > Hi All, >> > >> > In the ENA poll mode driver I see that every request in the admin queu= e is >> > associated with a completion context and this is preallocated during t= he >> > device initialisation. When the completion context is used we check fo= r >> > occupied to be true in the 16.X version if the occupied flag is set to= true >> > we assert and in the latest version I see that this is an error log. B= ut >> > there is a time window where if the completion context would be availa= ble >> > to the other consumer but still the old consumer did not set the occup= ied >> > to false. The new consumer holds the admin queue lock to get the compl= etion >> > context but the update by the old consumer to set the the occupied fla= g is >> > not done under lock. So should we make sure that the new consumer shou= ld >> > get the completion context only when the occupied flag is set to false= . Any >> > thoughts on this? >> >> Hi Param, >> >> Both the producer and the consumer are holding the spinlock while >> getting the completion context. If you see any situation where it >> isn't (besides the release function), please let me know. >> As it is protected by the lock, returning error while completion >> context is occupied (and it shouldn't) it fine, as it will stop the >> admin queue and allow the DPDK user application to execute the reset >> of the device. >> >> Thanks, >> Michal >> >> > If required I can try to make a patch where the completion context wou= ld be >> > available only after setting the occupied flag to false. >> > >> > Thanks, >> > Param.