From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 44093A04E0; Fri, 29 Nov 2019 13:01:44 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CC2449E4; Fri, 29 Nov 2019 13:01:42 +0100 (CET) Received: from mail-io1-f52.google.com (mail-io1-f52.google.com [209.85.166.52]) by dpdk.org (Postfix) with ESMTP id 66D7D23D for ; Fri, 29 Nov 2019 13:01:41 +0100 (CET) Received: by mail-io1-f52.google.com with SMTP id z26so28676681iot.8 for ; Fri, 29 Nov 2019 04:01:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=A5duopXXCZNmx+e9aPmECQTGPALfExQKCmeCj/L/GMs=; b=iEVJ6D2y62DYsAm5ZZwVowARSeewfc2tE2wDV1wWNk2sVnYZbupPx1WFXEN8bsnajo 7TKj3uz3S0mtbayTXZwwb0jIanfePwoOBoV05l0ZNPLymVzr7JM60OMh+WFnxN+5YH8M ufUTPEcGzlNQpfhBdUsx0S4z7k2P8fqURBhhegFQL6GaxXvcDFgp0n3FA6IO7Wp+F0qk EjoNaOyg2rSmDtWR7qGVJyptxljg2t0i7exNq5pFD070+BiT18hBZLxOem4K4KSfWxnB vBS9Rz5SzDI4ZYg+o4HoHG5jluqDcJRmmbwcfYigagKkLMtP6c5+p4MibP8/TtEGzDmH z0HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=A5duopXXCZNmx+e9aPmECQTGPALfExQKCmeCj/L/GMs=; b=NdY+BYFOJIIGrNZ3rdFTN4YRRggHzUivIMy0dywrpOO0wTQpXMXvdTqKe4bChTPfUI VuepVdxLQaHKo2aPcZXVbtHKFgLrmsVCaPZ+EG1C0guXM6dYiiuH0dETG80fRQiliMwO ztbdNPinDTHykC1zz3K2yMh0oM6YOcCgo5X166wN2bJm8Z5+eG/BTSP7woPu3Gm3UrDX 7zcWw+EULq0axmYzOTU1hi75utU7B0WJWEJXfceMdPqdm+51Uf136M25GbZWVg5jbzGP FV1SE8mN/nm4rtKHpLEZS/HjMbkR9J0h9PWYAmtE9XWggnizxPN9Chb2HaL0EVd+bDS3 QRFw== X-Gm-Message-State: APjAAAX/XnH4X+vhmxhETfwEWg4YRqIsPjO0GNCAtMC0s90xzZpA999O aPeshBfa0okAgscrJ1dhtTjMhjI+zz+FeqMx+6Y= X-Google-Smtp-Source: APXvYqyQMQL4VryiPOTULcEjjT7hzE3ChO0Timj0xaoXsa2+qwQChU5Ok/aVqkfGihmTHyFmisRTA4J8sgqMWcqz6WE= X-Received: by 2002:a5d:9354:: with SMTP id i20mr44669002ioo.234.1575028900564; Fri, 29 Nov 2019 04:01:40 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: kumaraparameshwaran rathinavel Date: Fri, 29 Nov 2019 17:31:29 +0530 Message-ID: To: =?UTF-8?Q?Micha=C5=82_Krawczyk?= Cc: dev@dpdk.org, "Chauskin, Igor" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] Admin Queue ENA X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Micha=C5=82, Thanks for getting back on this. In our design we are using multiple cores requesting for rte_eth_stats_get, it is not from one process and hence not serialized. Since in our design this is not serialized, and hence in get_comp_ctxt() checking for occupied flag and comp_ctxt_release() are not done atomically which is causing this issue. Please let me know if my understanding is correct, so that I will fix the application in such a way that it is done from one process and not multiple. Thanks, Param. On Thu, Nov 28, 2019 at 6:44 PM Micha=C5=82 Krawczyk wrot= e: > Hi Param, > > first of all - you are using very old ena_com. This code comes from > the DPDK version before v18.08. If you have any doubts, please check > the newer version of the driver and DPDK as the potential bug could be > already fixed there. > > Anyway, if you will look at the function get_comp_ctxt() which is > called by __ena_com_submit_admin_cmd() to get the completion context, > there is a check for the context if it's not occupied - in case it is > (which will be true until comp_ctxt_release() will clear it), the new > command using the same context cannot be used. So there shouldn't be > two consumers using the same completion contexts. > > In addition, drivers that are using ena_com are sending admin commands > one at a time during the init, so there shouldn't be even 2 commands > at a time. The only exception is ena_com_get_dev_basic_stats(), which > is called from rte_eth_stats_get() context - but if you consider DPDK > application, it should use it on the management lcore after init, so > it'll also be serialized. > > Thanks, > Michal > > > > pt., 8 lis 2019 o 07:02 kumaraparameshwaran rathinavel > napisa=C5=82(a): > > > > Hi Micha=C5=82, > > > > Please look at the below function, > > > > static int > > ena_com_wait_and_process_admin_cq_polling( > > struct ena_comp_ctx *comp_ctx, > > struct ena_com_admin_queue *admin_queue) > > { > > unsigned long flags =3D 0; > > u64 start_time; > > int ret; > > > > start_time =3D ENA_GET_SYSTEM_USECS(); > > > > while (comp_ctx->status =3D=3D ENA_CMD_SUBMITTED) { > > if ((ENA_GET_SYSTEM_USECS() - start_time) > > > ADMIN_CMD_TIMEOUT_US) { > > ena_trc_err("Wait for completion (polling) timeout\n"); > > /* ENA didn't have any completion */ > > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > > admin_queue->stats.no_completion++; > > admin_queue->running_state =3D false; > > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > > > > ret =3D ENA_COM_TIMER_EXPIRED; > > goto err; > > } > > > > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > > ena_com_handle_admin_completion(admin_queue); > > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > > } > > > > if (unlikely(comp_ctx->status =3D=3D ENA_CMD_ABORTED)) { > > ena_trc_err("Command was aborted\n"); > > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > > admin_queue->stats.aborted_cmd++; > > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > > ret =3D ENA_COM_NO_DEVICE; > > goto err; > > } > > > > ENA_ASSERT(comp_ctx->status =3D=3D ENA_CMD_COMPLETED, > > "Invalid comp status %d\n", comp_ctx->status); > > > > ret =3D ena_com_comp_status_to_errno(comp_ctx->comp_status); > > err: > > comp_ctxt_release(admin_queue, comp_ctx); > > return ret; > > } > > > > This is a case where there are two threads executing admin commands. > > > > The occupied flag is set to false in the function comp_ctxt_release. > Let us say there are two consumers of completion context and C1 has a > completion context and the same completion context can be used by another > consumer C2 even before the C1 is resetting the occupied flag. > > > > This is because the ena_com_handle_admin_completion is done under spin > lock and comp_ctxt_release is not under this spin lock. > > > > Thanks, > > Param > > > > On Thu, Oct 24, 2019 at 2:09 PM Micha=C5=82 Krawczyk = wrote: > >> > >> sob., 19 pa=C5=BA 2019 o 20:26 kumaraparameshwaran rathinavel > >> napisa=C5=82(a): > >> > > >> > Hi All, > >> > > >> > In the ENA poll mode driver I see that every request in the admin > queue is > >> > associated with a completion context and this is preallocated during > the > >> > device initialisation. When the completion context is used we check > for > >> > occupied to be true in the 16.X version if the occupied flag is set > to true > >> > we assert and in the latest version I see that this is an error log. > But > >> > there is a time window where if the completion context would be > available > >> > to the other consumer but still the old consumer did not set the > occupied > >> > to false. The new consumer holds the admin queue lock to get the > completion > >> > context but the update by the old consumer to set the the occupied > flag is > >> > not done under lock. So should we make sure that the new consumer > should > >> > get the completion context only when the occupied flag is set to > false. Any > >> > thoughts on this? > >> > >> Hi Param, > >> > >> Both the producer and the consumer are holding the spinlock while > >> getting the completion context. If you see any situation where it > >> isn't (besides the release function), please let me know. > >> As it is protected by the lock, returning error while completion > >> context is occupied (and it shouldn't) it fine, as it will stop the > >> admin queue and allow the DPDK user application to execute the reset > >> of the device. > >> > >> Thanks, > >> Michal > >> > >> > If required I can try to make a patch where the completion context > would be > >> > available only after setting the occupied flag to false. > >> > > >> > Thanks, > >> > Param. >