From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 33EF6A04B3; Mon, 16 Dec 2019 11:37:26 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8A2931BFD8; Mon, 16 Dec 2019 11:37:24 +0100 (CET) Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by dpdk.org (Postfix) with ESMTP id E71F81BFD7 for ; Mon, 16 Dec 2019 11:37:22 +0100 (CET) Received: by mail-ed1-f54.google.com with SMTP id r21so4619477edq.0 for ; Mon, 16 Dec 2019 02:37:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=x7W/wBa5VnbWE8uIVx8OyqTYSCEroO35XgcyeWKbnZM=; b=nt6IICHZqkid0VNQTFYOLhOkhmnt4O6jdgmMR4CnX5JJLt6v68v+wpS/yre9uDi7iq g0mon6HUG9BeJVmzXLAIKtYHRSUyoextiju1cQkilBjPwUY9WXK0yDUyBHqK9NUpuOpH 41oQJjX/mkvQasuwaA8FZuF+UaPGJahEvXGEtnpp2mVGu5TDiGdVZ7pzKUNj2I3SlyLn FayDaUHagYi36D1R74IyoAtlcL9EKdZevz25rwu71aOHnj39akipg0KNl+TD9aKUcLjt FOgQmSumL7FctCS1WoMIZZiVeRwZ4Vpy3ni1x4JIM+RWD/nCISKa1Q7tzMGVvgLcvYkg gJTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=x7W/wBa5VnbWE8uIVx8OyqTYSCEroO35XgcyeWKbnZM=; b=ehG0MuKDJYUnjLK/xFGtLmZSJeed/R7hBc+PKVYLcmDFIR5aHKT6YYPJbIyWDnrrOy YETFVav5Mn7lWoEg01bREg6VInfrq+urTUB1r2vcekMR2/U7/s4tHE0hzJOMQssh9zve /XjmX8e4BK6a6qyqcqklVnsBBmY5V5jrXsxeGcfjAWHow7BBuoQLRoAW5dK+dJNCTCLI FRYtsV4pWk95d2rTK3KUwL0tpnmg1cbLmCWbe0bVcj79YG2xtUurjnwImRAgAEa/ZuCT LF9Nw0a8IxNmSF9Av6pv1z/KMx4VtzzuVFMwD7u3japxsr/2P1Jz5XWf/HKJMzL7qCr2 2qAA== X-Gm-Message-State: APjAAAWs2FX8jxgTv2+MM5m1pS0reEwW90yyu2zXQJfgLCn5k1E/ZxTL ba1KSC2bW1hrTFXFY6OK5nG8LT53G/S0rFL4ycgjPQ== X-Google-Smtp-Source: APXvYqwU/rSr1FKGLQm2DxZfh/3C+8xl/IhWR9vMqMnpngEgmUZOvEBKJ/eTlHmzrGc89ETmeqcpV0Ijju/3Ua9T/wY= X-Received: by 2002:a05:6402:55a:: with SMTP id i26mr5913848edx.16.1576492642534; Mon, 16 Dec 2019 02:37:22 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Micha=C5=82_Krawczyk?= Date: Mon, 16 Dec 2019 11:37:11 +0100 Message-ID: To: kumaraparameshwaran rathinavel Cc: dev@dpdk.org, "Chauskin, Igor" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] Admin Queue ENA X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Param, Those fields are no longer existing in the newer version of the ena_com (and newer DPDK), so please upgrade if you are encountering any issues. But I don't think that TSO is supported on any of the ENA devices on the AWS. Thanks, Michal niedz., 8 gru 2019 o 20:03 kumaraparameshwaran rathinavel < kumaraparamesh92@gmail.com> napisa=C5=82(a): > Thanks a lot Michal. Will follow approach that you have suggested. > > Also I see that in case if TSO is enabled we set, > 336 /* this param needed only for TSO */ > 337 ena_meta->l3_outer_hdr_len =3D 0; > 338 ena_meta->l3_outer_hdr_offset =3D 0; > > > So even if TSO is enabled should these values be zero. > > Thanks, > Param. > > On Wed, Dec 4, 2019 at 7:24 PM Micha=C5=82 Krawczyk wro= te: > >> Hi Param, >> >> Adding atomic operations to setting/clearing comp ctxt won't help, as >> there is no race there. The admin queue is designed this way, that >> only single completion context can be held, so you should serialize >> access to the rte_eth_stats_get(). >> If you won't do that, the 2nd thread will try to hold already occupied >> context and this will result in disabling admin queue by the ena >> communication layer - you won't be able to send further admin >> commands. >> That's intended behavior and it is caused because you are trying to >> get the context with the occupied flag being set to true. Adding >> atomic operations there won't change anything, as there will still be >> a race between the thread that is waiting for the completion (occupied >> flag already send to true) and another thread, that is trying to send >> the same command using the same context (can't set occupied to true, >> as it's already true) - that should never happen. >> >> Without totally reworking ena_com admin queue design, we could add >> lock in ena_stats_get() - but that'll cause unnecessary locking in all >> of the applications that are using it from the main lcore context and >> as your design seems to be unique by doing it from multiple threads, >> maybe you could add a lock to your calls to the rte_eth_stats_get()? >> >> Another solution might be using xstats API, which should let you to >> get statistics from multiple threads as it's not using admin queue for >> that - all stats are being counter internally in the PMD. >> >> Thanks, >> Michal >> >> >> pt., 29 lis 2019 o 13:01 kumaraparameshwaran rathinavel >> napisa=C5=82(a): >> > >> > Hi Micha=C5=82, >> > >> > Thanks for getting back on this. >> > >> > In our design we are using multiple cores requesting for >> rte_eth_stats_get, it is not from one process and hence not serialized. >> Since in our design this is not serialized, and hence in get_comp_ctxt() >> checking for occupied flag and comp_ctxt_release() are not done atomical= ly >> which is causing this issue. Please let me know if my understanding is >> correct, so that I will fix the application in such a way that it is don= e >> from one process and not multiple. >> > >> > Thanks, >> > Param. >> > >> > On Thu, Nov 28, 2019 at 6:44 PM Micha=C5=82 Krawczyk >> wrote: >> >> >> >> Hi Param, >> >> >> >> first of all - you are using very old ena_com. This code comes from >> >> the DPDK version before v18.08. If you have any doubts, please check >> >> the newer version of the driver and DPDK as the potential bug could b= e >> >> already fixed there. >> >> >> >> Anyway, if you will look at the function get_comp_ctxt() which is >> >> called by __ena_com_submit_admin_cmd() to get the completion context, >> >> there is a check for the context if it's not occupied - in case it is >> >> (which will be true until comp_ctxt_release() will clear it), the new >> >> command using the same context cannot be used. So there shouldn't be >> >> two consumers using the same completion contexts. >> >> >> >> In addition, drivers that are using ena_com are sending admin command= s >> >> one at a time during the init, so there shouldn't be even 2 commands >> >> at a time. The only exception is ena_com_get_dev_basic_stats(), which >> >> is called from rte_eth_stats_get() context - but if you consider DPDK >> >> application, it should use it on the management lcore after init, so >> >> it'll also be serialized. >> >> >> >> Thanks, >> >> Michal >> >> >> >> >> >> >> >> pt., 8 lis 2019 o 07:02 kumaraparameshwaran rathinavel >> >> napisa=C5=82(a): >> >> > >> >> > Hi Micha=C5=82, >> >> > >> >> > Please look at the below function, >> >> > >> >> > static int >> >> > ena_com_wait_and_process_admin_cq_polling( >> >> > struct ena_comp_ctx *comp_ctx, >> >> > struct ena_com_admin_queue *admin_queue) >> >> > { >> >> > unsigned long flags =3D 0; >> >> > u64 start_time; >> >> > int ret; >> >> > >> >> > start_time =3D ENA_GET_SYSTEM_USECS(); >> >> > >> >> > while (comp_ctx->status =3D=3D ENA_CMD_SUBMITTED) { >> >> > if ((ENA_GET_SYSTEM_USECS() - start_time) > >> >> > ADMIN_CMD_TIMEOUT_US) { >> >> > ena_trc_err("Wait for completion (polling) timeout\n"); >> >> > /* ENA didn't have any completion */ >> >> > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); >> >> > admin_queue->stats.no_completion++; >> >> > admin_queue->running_state =3D false; >> >> > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); >> >> > >> >> > ret =3D ENA_COM_TIMER_EXPIRED; >> >> > goto err; >> >> > } >> >> > >> >> > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); >> >> > ena_com_handle_admin_completion(admin_queue); >> >> > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); >> >> > } >> >> > >> >> > if (unlikely(comp_ctx->status =3D=3D ENA_CMD_ABORTED)) { >> >> > ena_trc_err("Command was aborted\n"); >> >> > ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); >> >> > admin_queue->stats.aborted_cmd++; >> >> > ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); >> >> > ret =3D ENA_COM_NO_DEVICE; >> >> > goto err; >> >> > } >> >> > >> >> > ENA_ASSERT(comp_ctx->status =3D=3D ENA_CMD_COMPLETED, >> >> > "Invalid comp status %d\n", comp_ctx->status); >> >> > >> >> > ret =3D ena_com_comp_status_to_errno(comp_ctx->comp_status); >> >> > err: >> >> > comp_ctxt_release(admin_queue, comp_ctx); >> >> > return ret; >> >> > } >> >> > >> >> > This is a case where there are two threads executing admin commands= . >> >> > >> >> > The occupied flag is set to false in the function >> comp_ctxt_release. Let us say there are two consumers of completion >> context and C1 has a completion context and the same completion context = can >> be used by another consumer C2 even before the C1 is resetting the occup= ied >> flag. >> >> > >> >> > This is because the ena_com_handle_admin_completion is done under >> spin lock and comp_ctxt_release is not under this spin lock. >> >> > >> >> > Thanks, >> >> > Param >> >> > >> >> > On Thu, Oct 24, 2019 at 2:09 PM Micha=C5=82 Krawczyk >> wrote: >> >> >> >> >> >> sob., 19 pa=C5=BA 2019 o 20:26 kumaraparameshwaran rathinavel >> >> >> napisa=C5=82(a): >> >> >> > >> >> >> > Hi All, >> >> >> > >> >> >> > In the ENA poll mode driver I see that every request in the admi= n >> queue is >> >> >> > associated with a completion context and this is preallocated >> during the >> >> >> > device initialisation. When the completion context is used we >> check for >> >> >> > occupied to be true in the 16.X version if the occupied flag is >> set to true >> >> >> > we assert and in the latest version I see that this is an error >> log. But >> >> >> > there is a time window where if the completion context would be >> available >> >> >> > to the other consumer but still the old consumer did not set the >> occupied >> >> >> > to false. The new consumer holds the admin queue lock to get the >> completion >> >> >> > context but the update by the old consumer to set the the >> occupied flag is >> >> >> > not done under lock. So should we make sure that the new consume= r >> should >> >> >> > get the completion context only when the occupied flag is set to >> false. Any >> >> >> > thoughts on this? >> >> >> >> >> >> Hi Param, >> >> >> >> >> >> Both the producer and the consumer are holding the spinlock while >> >> >> getting the completion context. If you see any situation where it >> >> >> isn't (besides the release function), please let me know. >> >> >> As it is protected by the lock, returning error while completion >> >> >> context is occupied (and it shouldn't) it fine, as it will stop th= e >> >> >> admin queue and allow the DPDK user application to execute the res= et >> >> >> of the device. >> >> >> >> >> >> Thanks, >> >> >> Michal >> >> >> >> >> >> > If required I can try to make a patch where the completion >> context would be >> >> >> > available only after setting the occupied flag to false. >> >> >> > >> >> >> > Thanks, >> >> >> > Param. >> >