DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	Jerin Jacob <jerin.jacob@caviumnetworks.com>
Cc: "Phil Yang (Arm Technology China)" <Phil.Yang@arm.com>,
	"dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
	"kkokkilagadda@caviumnetworks.com"
	<kkokkilagadda@caviumnetworks.com>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>
Subject: Re: [dpdk-dev] [PATCH v2 2/3] kni: fix kni fifo synchronization
Date: Wed, 26 Sep 2018 12:42:37 +0100	[thread overview]
Message-ID: <43c8a93b-a540-f8b2-d810-81d3bbdef3c4@intel.com> (raw)
In-Reply-To: <AM6PR08MB3672606A5866D22C3C727AA498120@AM6PR08MB3672.eurprd08.prod.outlook.com>

On 9/21/2018 7:37 AM, Honnappa Nagarahalli wrote:
>>>>>>>
>>>>>>> @@ -69,5 +89,13 @@ kni_fifo_get(struct rte_kni_fifo *fifo,
>>>>>>> void **data, unsigned num)  static inline uint32_t
>>>>>>> kni_fifo_count(struct rte_kni_fifo *fifo)  {
>>>>>>> +#ifdef RTE_USE_C11_MEM_MODEL
>>>>>>> +       unsigned fifo_write = __atomic_load_n(&fifo->write,
>>>>>>> +                                                 __ATOMIC_ACQUIRE);
>>>>>>> +       unsigned fifo_read = __atomic_load_n(&fifo->read,
>>>>>>> +
>>>>>>> +__ATOMIC_ACQUIRE);
>>>>>>
>>>>>> Isn't too  heavy to have two __ATOMIC_ACQUIREs? a simple
>>>>>> rte_smp_rmb() would be enough here. Right?
>>>>>> or
>>>>>> Do we need __ATOMIC_ACQUIRE for fifo_write case?
>>>>>>
>>>>> We also had some amount of debate internally on this:
>>>>> 1) We do not want to use rte_smp_rmb() as we want to keep the
>>>>> memory
>>>> models separated (for ex: while using C11, use C11 everywhere). It
>>>> is also not sufficient, please see 3) below.
>>>>
>>>> But Nothing technically wrong in using rte_smp_rmb() here in terms
>>>> functionally and code generated by the compiler.
>>>
>>> rte_smp_rmb() generates 'DMB ISHLD'. This works fine, but it is not optimal.
>> 'LDAR' is a better option which is generated when C11 atomics are used.
>>
>> Yes. But which one is optimal 1 x DMB ISHLD vs 2 x LDAR ?
> 
> Good point. I am not sure which one is optimal, it needs to be measured. 'DMB ISHLD' orders 'all' earlier loads against 'all' later loads and stores. 'LDAR' orders the 'specific' load with 'all' later loads and stores.
> 
>>
>>>
>>>>
>>>>> 2) This API can get called from writer or reader, so both the
>>>>> loads have to be __ATOMIC_ACQUIRE
>>>>> 3) Other option is to use __ATOMIC_RELAXED. That would allow any
>>>> loads/stores around of this API to get reordered, especially since
>>>> this is an inline function. This would put burden on the application
>>>> to manage the ordering depending on its usage. It will also require
>>>> the application to understand the implementation of this API.
>>>>
>>>> __ATOMIC_RELAXED may be fine too for _count() case as it may not
>>>> very important to get the exact count for the exact very moment,
>>>> Application can retry.
>>>>
>>>> I am in favor of performance effective implementation.
>>>
>>> The requirement on the correctness of the count depends on the usage of
>> this function. I see the following usage:
>>>
>>> In the file kni_net.c, function: kni_net_tx:
>>>
>>>        if (kni_fifo_free_count(kni->tx_q) == 0 ||
>>>                         kni_fifo_count(kni->alloc_q) == 0) {
>>>                 /**
>>>                  * If no free entry in tx_q or no entry in alloc_q,
>>>                  * drops skb and goes out.
>>>                  */
>>>                 goto drop;
>>>         }
>>>
>>> There is no retry here, the packet is dropped.
>>
>> OK. Then pick an implementation which is an optimal this case.
>> I think, then rte_smp_rmb() makes sense here as
>> a) no #ifdef clutter
>> b) it is optimal compared to 2 x LDAR
>>
> As I understand, one of the principals of using C11 model is to match the store releases and load acquires. IMO, combining C11 memory model with barrier based functions makes the code unreadable.
> I realized rte_smp_rmb() is required for x86 as well to prevent compiler reordering. We can add that in the non-C11 case. This way, we will have clean code for both the options (similar to rte_ring).
> So, if 'RTE_USE_C11_MEM_MODEL' is set to 'n', then the 'rte_smp_rmb' would be used.
> 
> We can look at handling the #ifdef clutter based on Ferruh's feedback.

Hi Honnappa, Jerin,

Sorry for delay, I missed that this is waiting my input.

+1 to remove #ifdef, but I don't think a separate file is required for this,
specially when it will be duplication of same implementation, nothing arch
specific implementation.
+1 Honnappa's suggestion to hide ifdef's behind APIs, plus those APIs can be
reused later...

And +1 to split into two patches, one for fix to current code and one for c11
atomic implementation support.

I have some basic questions on the patch, will send in different thread.

Thanks,
ferruh

> 
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Other than that, I prefer to avoid ifdef clutter by introducing
>>>>>> two separate file just like ring C11 implementation.
>>>>>>
>>>>>> I don't have strong opinion on this this part, I let KNI
>>>>>> MAINTAINER to decide on how to accommodate this change.
>>>>>
>>>>> I prefer to change this as well, I am open for suggestions.
>>>>> Introducing two separate files would be too much for this library.
>>>>> A better
>>>> way would be to have something similar to 'smp_store_release'
>>>> provided by the kernel. i.e. create #defines for loads/stores. Hide
>>>> the clutter behind the #defines.
>>>>
>>>> No Strong opinion on this, leaving to KNI Maintainer.
>>> Will wait on this before re-spinning the patch
>>>
>>>>
>>>> This patch needs to split by two,
>>>> a) Fixes for non C11 implementation(i.e new addition to
>>>> rte_smp_wmb())
>>>> b) add support for C11 implementation.
>>> Agree
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> +       return (fifo->len + fifo_write - fifo_read) &
>>>>>>> +(fifo->len - 1); #else
>>>>>>>         return (fifo->len + fifo->write - fifo->read) &
>>>>>>> (fifo->len
>>>>>>> - 1);
> Requires rte_smp_rmb() for x86 to prevent compiler reordering.
> 
>>>>>>> +#endif
>>>>>>>  }
>>>>>>> --
>>>>>>> 2.7.4
>>>>>>>

  parent reply	other threads:[~2018-09-26 11:43 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-19 13:30 [dpdk-dev] [PATCH 1/3] config: use one single config option for C11 memory model Phil Yang
2018-09-19 13:30 ` [dpdk-dev] [PATCH 2/3] kni: fix kni fifo synchronization Phil Yang
2018-09-19 13:30 ` [dpdk-dev] [PATCH 3/3] kni: fix kni kernel " Phil Yang
2018-09-19 13:42 ` [dpdk-dev] [PATCH v2 1/3] config: use one single config option for C11 memory model Phil Yang
2018-09-19 13:42   ` [dpdk-dev] [PATCH v2 2/3] kni: fix kni fifo synchronization Phil Yang
2018-09-20  8:28     ` Jerin Jacob
2018-09-20 15:20       ` Honnappa Nagarahalli
2018-09-20 15:37         ` Jerin Jacob
2018-09-21  5:48           ` Honnappa Nagarahalli
2018-09-21  5:55             ` Jerin Jacob
2018-09-21  6:37               ` Honnappa Nagarahalli
2018-09-21  9:00                 ` Phil Yang (Arm Technology China)
2018-09-25  4:44                 ` Honnappa Nagarahalli
2018-09-26 11:42                 ` Ferruh Yigit [this message]
2018-09-27  9:06                   ` Phil Yang (Arm Technology China)
2018-09-26 11:45     ` Ferruh Yigit
2018-10-01  4:52       ` Honnappa Nagarahalli
2018-09-19 13:42   ` [dpdk-dev] [PATCH v2 3/3] kni: fix kni kernel " Phil Yang
2018-09-20  8:21   ` [dpdk-dev] [PATCH v2 1/3] config: use one single config option for C11 memory model Jerin Jacob
2018-10-08  9:11   ` [dpdk-dev] [PATCH v3 1/4] " Phil Yang
2018-10-08  9:11     ` [dpdk-dev] [PATCH v3 2/4] kni: fix kni fifo synchronization Phil Yang
2018-10-08 21:53       ` Stephen Hemminger
2018-10-10  9:58         ` Phil Yang (Arm Technology China)
2018-10-10 10:06           ` Gavin Hu (Arm Technology China)
2018-10-10 14:42             ` Ferruh Yigit
2018-10-08  9:11     ` [dpdk-dev] [PATCH v3 3/4] kni: fix kni kernel " Phil Yang
2018-10-08  9:11     ` [dpdk-dev] [PATCH v3 4/4] kni: introduce c11 atomic into kni " Phil Yang
2018-10-10 14:48     ` [dpdk-dev] [PATCH v3 1/4] config: use one single config option for C11 memory model Ferruh Yigit
2018-10-12  9:17       ` Phil Yang (Arm Technology China)
2018-10-26 15:56       ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43c8a93b-a540-f8b2-d810-81d3bbdef3c4@intel.com \
    --to=ferruh.yigit@intel.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Phil.Yang@arm.com \
    --cc=dev@dpdk.org \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=kkokkilagadda@caviumnetworks.com \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).