DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Michał Krawczyk" <mk@semihalf.com>
To: kumaraparameshwaran rathinavel <kumaraparamesh92@gmail.com>
Cc: dev@dpdk.org, "Chauskin, Igor" <igorch@amazon.com>
Subject: Re: [dpdk-dev] Admin Queue ENA
Date: Thu, 28 Nov 2019 14:14:47 +0100	[thread overview]
Message-ID: <CAJMMOfNM6BY75ybvO_LOq-LaP3tcj60OMfie69U+jFp4GfzGeA@mail.gmail.com> (raw)
In-Reply-To: <CANxNyattLh4HWgHx_XU3rYLLe1R4ic1D2T5dhZKAov+=aqxfzw@mail.gmail.com>

Hi Param,

first of all - you are using very old ena_com. This code comes from
the DPDK version before v18.08. If you have any doubts, please check
the newer version of the driver and DPDK as the potential bug could be
already fixed there.

Anyway, if you will look at the function get_comp_ctxt() which is
called by __ena_com_submit_admin_cmd() to get the completion context,
there is a check for the context if it's not occupied - in case it is
(which will be true until comp_ctxt_release() will clear it), the new
command using the same context cannot be used. So there shouldn't be
two consumers using the same completion contexts.

In addition, drivers that are using ena_com are sending admin commands
one at a time during the init, so there shouldn't be even 2 commands
at a time. The only exception is ena_com_get_dev_basic_stats(), which
is called from rte_eth_stats_get() context - but if you consider DPDK
application, it should use it on the management lcore after init, so
it'll also be serialized.

Thanks,
Michal



pt., 8 lis 2019 o 07:02 kumaraparameshwaran rathinavel
<kumaraparamesh92@gmail.com> napisał(a):
>
> Hi Michał,
>
> Please look at the below function,
>
> static int
> ena_com_wait_and_process_admin_cq_polling(
>         struct ena_comp_ctx *comp_ctx,
>         struct ena_com_admin_queue *admin_queue)
> {
>     unsigned long flags = 0;
>     u64 start_time;
>     int ret;
>
>     start_time = ENA_GET_SYSTEM_USECS();
>
>     while (comp_ctx->status == ENA_CMD_SUBMITTED) {
>         if ((ENA_GET_SYSTEM_USECS() - start_time) >
>             ADMIN_CMD_TIMEOUT_US) {
>             ena_trc_err("Wait for completion (polling) timeout\n");
>             /* ENA didn't have any completion */
>             ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
>             admin_queue->stats.no_completion++;
>             admin_queue->running_state = false;
>             ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
>
>             ret = ENA_COM_TIMER_EXPIRED;
>             goto err;
>         }
>
>         ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
>         ena_com_handle_admin_completion(admin_queue);
>         ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
>     }
>
>     if (unlikely(comp_ctx->status == ENA_CMD_ABORTED)) {
>         ena_trc_err("Command was aborted\n");
>         ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
>         admin_queue->stats.aborted_cmd++;
>         ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
>         ret = ENA_COM_NO_DEVICE;
>         goto err;
>     }
>
>     ENA_ASSERT(comp_ctx->status == ENA_CMD_COMPLETED,
>            "Invalid comp status %d\n", comp_ctx->status);
>
>     ret = ena_com_comp_status_to_errno(comp_ctx->comp_status);
> err:
>     comp_ctxt_release(admin_queue, comp_ctx);
>     return ret;
> }
>
> This is a case where there are two threads executing admin commands.
>
> The occupied flag is set to false in the function comp_ctxt_release.  Let us say there are two consumers of completion context and C1 has a completion context and the same completion context can be used by another consumer C2 even before the C1 is resetting the occupied flag.
>
> This is because the ena_com_handle_admin_completion is done under spin lock and comp_ctxt_release is not under this spin lock.
>
> Thanks,
> Param
>
> On Thu, Oct 24, 2019 at 2:09 PM Michał Krawczyk <mk@semihalf.com> wrote:
>>
>> sob., 19 paź 2019 o 20:26 kumaraparameshwaran rathinavel
>> <kumaraparamesh92@gmail.com> napisał(a):
>> >
>> > Hi All,
>> >
>> > In the ENA poll mode driver I see that every request in the admin queue is
>> > associated with a completion context and this is preallocated during the
>> > device initialisation. When the completion context is used we check for
>> > occupied to be true in the 16.X version if the occupied flag is set to true
>> > we assert and in the latest version I see that this is an error log. But
>> > there is a time window where if the completion context would be available
>> > to the other consumer but still the old consumer did not set the occupied
>> > to false. The new consumer holds the admin queue lock to get the completion
>> > context but the update by the old consumer to set the the occupied flag is
>> > not done under lock. So should we make sure that the new consumer should
>> > get the completion context only when the occupied flag is set to false. Any
>> > thoughts on this?
>>
>> Hi Param,
>>
>> Both the producer and the consumer are holding the spinlock while
>> getting the completion context. If you see any situation where it
>> isn't (besides the release function), please let me know.
>> As it is protected by the lock, returning error while completion
>> context is occupied (and it shouldn't) it fine, as it will stop the
>> admin queue and allow the DPDK user application to execute the reset
>> of the device.
>>
>> Thanks,
>> Michal
>>
>> > If required I can try to make a patch where the completion context would be
>> > available only after setting the occupied flag to false.
>> >
>> > Thanks,
>> > Param.

  reply	other threads:[~2019-11-28 13:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-19 18:26 kumaraparameshwaran rathinavel
2019-10-24  8:38 ` Michał Krawczyk
2019-11-08  6:02   ` kumaraparameshwaran rathinavel
2019-11-28 13:14     ` Michał Krawczyk [this message]
2019-11-29 12:01       ` kumaraparameshwaran rathinavel
2019-12-04 13:54         ` Michał Krawczyk
2019-12-08 19:03           ` kumaraparameshwaran rathinavel
2019-12-16 10:37             ` Michał Krawczyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJMMOfNM6BY75ybvO_LOq-LaP3tcj60OMfie69U+jFp4GfzGeA@mail.gmail.com \
    --to=mk@semihalf.com \
    --cc=dev@dpdk.org \
    --cc=igorch@amazon.com \
    --cc=kumaraparamesh92@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).