From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E6452A04F1; Sun, 8 Dec 2019 20:03:25 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9B3F41B994; Sun, 8 Dec 2019 20:03:24 +0100 (CET) Received: from mail-io1-f45.google.com (mail-io1-f45.google.com [209.85.166.45]) by dpdk.org (Postfix) with ESMTP id 134401B13C for ; Sun, 8 Dec 2019 20:03:22 +0100 (CET) Received: by mail-io1-f45.google.com with SMTP id f82so12398585ioa.9 for ; Sun, 08 Dec 2019 11:03:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NI0RctpkKIpzz5brNQmHSAQUMkd0E6Vb56l/64NU5Zk=; b=pZWHN1Gzp6vrJ+3IpB3g6qLkX8eBTzzBkO8RgAD3ePEXYVM7tEt0yF1MdMa7NUxmQ5 hWJANOebTm9/9FLQm1/SsewdkIsOvMPgQwpE50I47vbkib4t1eFXe3dqmo9lEKJv28t9 x6yIP5+BbbVIJEO4KW7hNuCbmAEPhBJQOeVCsswZawR0shnfxyBjlExuQc/U5GdMZFuI mH6Ad/j2vxBXrYPBuiAbm6046XFW3X3pVmSfXsWZpmL+ivJsuOjmrm+LBTqFFjEWZj0W 1k3wpiTLGaVzlxRmfKLDFf6TdAheW0f3ix4zYPsN3UxqAHC8wx96BIACNOLN6Kmq7yOW mHCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NI0RctpkKIpzz5brNQmHSAQUMkd0E6Vb56l/64NU5Zk=; b=WzV2Ocpt//U5qF1J70ZZ1qrL2PKcqqheeJq09cmXhIxpCGLo35076G0uQIw2+czLUR J8pRBDhHYtkGo7YmRSmFce0eKqjvXdNN3jH1UoXRhvGCv+sjbs86F1NH5bFGmSQ5pCeM 4HYw6yOa/ARDKD9E1j3/8KnLmTmOsZOknWEQxtYVu++IbX+sLfOTgwyfnVsVb5SpdlCM NIJPMWRl69OYsbZ8OErbCJBeb3vxBqHrEpq6I6M+1Hb4HFJfp7hd1zNfC3coScmSZD6V u27erfLmgpAk+nXaK9oLkxlj8d08DV1QGjjr5FGafzKi5ASOf8cKe8+gkJ7TSWYlunTX sSNw== X-Gm-Message-State: APjAAAVYXVJaDze4qbU35JC5zHKA6NlizRkVN0bnbcebMxIDBLdTw6ab D7romwlGSzyHFA+4TGn4HNS17j1XrueUHTR21Zo= X-Google-Smtp-Source: APXvYqzFNpA84rGlGqzd8AhGOXKxmwLjrKj2o6rYWNv2jMuvgXXV82mhhE32MYrXxMnWyUMgCyFhr2P9t18JiL17Y9Y= X-Received: by 2002:a5d:9e0a:: with SMTP id h10mr18159188ioh.276.1575831801217; Sun, 08 Dec 2019 11:03:21 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: kumaraparameshwaran rathinavel Date: Mon, 9 Dec 2019 00:33:09 +0530 Message-ID: To: =?UTF-8?Q?Micha=C5=82_Krawczyk?= Cc: dev@dpdk.org, "Chauskin, Igor" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] Admin Queue ENA X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Thanks a lot Michal. Will follow approach that you have suggested. Also I see that in case if TSO is enabled we set, 336 /* this param needed only for TSO */ 337 ena_meta->l3_outer_hdr_len =3D 0; 338 ena_meta->l3_outer_hdr_offset =3D 0; So even if TSO is enabled should these values be zero. Thanks, Param. On Wed, Dec 4, 2019 at 7:24 PM Micha=C5=82 Krawczyk wrote= : > Hi Param, > > Adding atomic operations to setting/clearing comp ctxt won't help, as > there is no race there. The admin queue is designed this way, that > only single completion context can be held, so you should serialize > access to the rte_eth_stats_get(). > If you won't do that, the 2nd thread will try to hold already occupied > context and this will result in disabling admin queue by the ena > communication layer - you won't be able to send further admin > commands. > That's intended behavior and it is caused because you are trying to > get the context with the occupied flag being set to true. Adding > atomic operations there won't change anything, as there will still be > a race between the thread that is waiting for the completion (occupied > flag already send to true) and another thread, that is trying to send > the same command using the same context (can't set occupied to true, > as it's already true) - that should never happen. > > Without totally reworking ena_com admin queue design, we could add > lock in ena_stats_get() - but that'll cause unnecessary locking in all > of the applications that are using it from the main lcore context and > as your design seems to be unique by doing it from multiple threads, > maybe you could add a lock to your calls to the rte_eth_stats_get()? > > Another solution might be using xstats API, which should let you to > get statistics from multiple threads as it's not using admin queue for > that - all stats are being counter internally in the PMD. > > Thanks, > Michal > > > pt., 29 lis 2019 o 13:01 kumaraparameshwaran rathinavel > napisa=C5=82(a): > > > > Hi Micha=C5=82, > > > > Thanks for getting back on this. > > > > In our design we are using multiple cores requesting for > rte_eth_stats_get, it is not from one process and hence not serialized. > Since in our design this is not serialized, and hence in get_comp_ctxt() > checking for occupied flag and comp_ctxt_release() are not done atomicall= y > which is causing this issue. Please let me know if my understanding is > correct, so that I will fix the application in such a way that it is done > from one process and not multiple. > > > > Thanks, > > Param. > > > > On Thu, Nov 28, 2019 at 6:44 PM Micha=C5=82 Krawczyk = wrote: > >> > >> Hi Param, > >> > >> first of all - you are using very old ena_com. This code comes from > >> the DPDK version before v18.08. If you have any doubts, please check > >> the newer version of the driver and DPDK as the potential bug could be > >> already fixed there. > >> > >> Anyway, if you will look at the function get_comp_ctxt() which is > >> called by __ena_com_submit_admin_cmd() to get the completion context, > >> there is a check for the context if it's not occupied - in case it is > >> (which will be true until comp_ctxt_release() will clear it), the new > >> command using the same context cannot be used. So there shouldn't be > >> two consumers using the same completion contexts. > >> > >> In addition, drivers that are using ena_com are sending admin commands > >> one at a time during the init, so there shouldn't be even 2 commands > >> at a time. The only exception is ena_com_get_dev_basic_stats(), which > >> is called from rte_eth_stats_get() context - but if you consider DPDK > >> application, it should use it on the management lcore after init, so > >> it'll also be serialized. > >> > >> Thanks, > >> Michal > >> > >> > >> > >> pt., 8 lis 2019 o 07:02 kumaraparameshwaran rathinavel > >> napisa=C5=82(a): > >> > > >> > Hi Micha=C5=82, > >> > > >> > Please look at the below function, > >> > > >> > static int > >> > ena_com_wait_and_process_admin_cq_polling( > >> > struct ena_comp_ctx *comp_ctx, > >> > struct ena_com_admin_queue *admin_queue) > >> > { > >> > unsigned long flags =3D 0; > >> > u64 start_time; > >> > int ret; > >> > > >> > start_time =3D ENA_GET_SYSTEM_USECS(); > >> > > >> > while (comp_ctx->status =3D=3D ENA_CMD_SUBMITTED) { > >> > if ((ENA_GET_SYSTEM_USECS() - start_time) > > >> > ADMIN_CMD_TIMEOUT_US) { > >> > ena_trc_err("Wait for completion (polling) timeout\n"); > >> > /* ENA didn't have any completion */ > >> > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > >> > admin_queue->stats.no_completion++; > >> > admin_queue->running_state =3D false; > >> > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > >> > > >> > ret =3D ENA_COM_TIMER_EXPIRED; > >> > goto err; > >> > } > >> > > >> > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > >> > ena_com_handle_admin_completion(admin_queue); > >> > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > >> > } > >> > > >> > if (unlikely(comp_ctx->status =3D=3D ENA_CMD_ABORTED)) { > >> > ena_trc_err("Command was aborted\n"); > >> > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); > >> > admin_queue->stats.aborted_cmd++; > >> > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); > >> > ret =3D ENA_COM_NO_DEVICE; > >> > goto err; > >> > } > >> > > >> > ENA_ASSERT(comp_ctx->status =3D=3D ENA_CMD_COMPLETED, > >> > "Invalid comp status %d\n", comp_ctx->status); > >> > > >> > ret =3D ena_com_comp_status_to_errno(comp_ctx->comp_status); > >> > err: > >> > comp_ctxt_release(admin_queue, comp_ctx); > >> > return ret; > >> > } > >> > > >> > This is a case where there are two threads executing admin commands. > >> > > >> > The occupied flag is set to false in the function comp_ctxt_release. > Let us say there are two consumers of completion context and C1 has a > completion context and the same completion context can be used by another > consumer C2 even before the C1 is resetting the occupied flag. > >> > > >> > This is because the ena_com_handle_admin_completion is done under > spin lock and comp_ctxt_release is not under this spin lock. > >> > > >> > Thanks, > >> > Param > >> > > >> > On Thu, Oct 24, 2019 at 2:09 PM Micha=C5=82 Krawczyk > wrote: > >> >> > >> >> sob., 19 pa=C5=BA 2019 o 20:26 kumaraparameshwaran rathinavel > >> >> napisa=C5=82(a): > >> >> > > >> >> > Hi All, > >> >> > > >> >> > In the ENA poll mode driver I see that every request in the admin > queue is > >> >> > associated with a completion context and this is preallocated > during the > >> >> > device initialisation. When the completion context is used we > check for > >> >> > occupied to be true in the 16.X version if the occupied flag is > set to true > >> >> > we assert and in the latest version I see that this is an error > log. But > >> >> > there is a time window where if the completion context would be > available > >> >> > to the other consumer but still the old consumer did not set the > occupied > >> >> > to false. The new consumer holds the admin queue lock to get the > completion > >> >> > context but the update by the old consumer to set the the occupie= d > flag is > >> >> > not done under lock. So should we make sure that the new consumer > should > >> >> > get the completion context only when the occupied flag is set to > false. Any > >> >> > thoughts on this? > >> >> > >> >> Hi Param, > >> >> > >> >> Both the producer and the consumer are holding the spinlock while > >> >> getting the completion context. If you see any situation where it > >> >> isn't (besides the release function), please let me know. > >> >> As it is protected by the lock, returning error while completion > >> >> context is occupied (and it shouldn't) it fine, as it will stop the > >> >> admin queue and allow the DPDK user application to execute the rese= t > >> >> of the device. > >> >> > >> >> Thanks, > >> >> Michal > >> >> > >> >> > If required I can try to make a patch where the completion contex= t > would be > >> >> > available only after setting the occupied flag to false. > >> >> > > >> >> > Thanks, > >> >> > Param. >