DPDK patches and discussions
 help / color / Atom feed
From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
To: Michel Machado <michel@digirati.com.br>,
	Kevin Traynor <ktraynor@redhat.com>,
	Ruifeng Wang <Ruifeng.Wang@arm.com>,
	Bruce Richardson <bruce.richardson@intel.com>,
	Cody Doucette <doucette@bu.edu>,
	Andre Nathan <andre@digirati.com.br>,
	Qiaobin Fu <mengxiang0811@gmail.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH 2/2] lpm: hide internal data
Date: Wed, 14 Oct 2020 14:10:15 +0100
Message-ID: <9647f80d-53c3-33aa-b6d0-301aef34ca0a@intel.com> (raw)
In-Reply-To: <f95bf756-70be-9df5-1d9d-e8c2b0addbf9@digirati.com.br>



On 13/10/2020 20:48, Michel Machado wrote:
> On 10/13/20 3:06 PM, Medvedkin, Vladimir wrote:
>>
>>
>> On 13/10/2020 18:46, Michel Machado wrote:
>>> On 10/13/20 11:41 AM, Medvedkin, Vladimir wrote:
>>>> Hi Michel,
>>>>
>>>> Could you please describe a condition when LPM gets inconsistent? As 
>>>> I can see if there is no free tbl8 it will return -ENOSPC.
>>>
>>>     Consider this simple example, we need to add the following two 
>>> prefixes with different next hops: 10.99.0.0/16, 18.99.99.128/25. If 
>>> the LPM table is out of tbl8s, the second prefix is not added and 
>>> Gatekeeper will make decisions in violation of the policy. The data 
>>> structure of the LPM table is consistent, but its content 
>>> inconsistent with the policy.
>>
>> Aha, thanks. So do I understand correctly that you need to add a set 
>> of routes atomically (either the entire set is installed or nothing)?
> 
>     Yes.
> 
>> If so, then I would suggest having 2 lpm and switching them atomically 
>> after a successful addition. As for now, even if you have enough 
>> tbl8's, routes are installed non atomically, i.e. there will be a time 
>> gap between adding two routes, so in this time interval the table will 
>> be inconsistent with the policy.
>> Also, if new lpm algorithms are added to the DPDK, they won't have 
>> such a thing as tbl8.
> 
>     Our code already deals with synchronization.

OK, so my suggestion here would be to add new routes to the shadow copy 
of the lpm, and if it returns -ENOSPC, than create a new LPM with double 
amount of tbl8's and add all the routes to it. Then switch the 
active-shadow LPM pointers. In this case you'll always add a bulk of 
routes atomically.

> 
>>>     We minimize the need of replacing a LPM table by allocating LPM 
>>> tables with the double of what we need (see example here 
>>> https://github.com/AltraMayor/gatekeeper/blob/95d1d6e8201861a0d0c698bfd06ad606674f1e07/lua/examples/policy.lua#L172-L183), 
>>> but the code must be ready for unexpected needs that may arise in 
>>> production.
>>>
>>
>> Usually, the table is initialized with a large enough number of 
>> entries, enough to add a possible number of routes. One tbl8 group 
>> takes up 1Kb of memory which is nothing comparing to the size of tbl24 
>> which is 64Mb.
> 
>     When the prefixes come from BGP, initializing a large enough table 
> is fine. But when prefixes come from threat intelligence, the number of 
> prefixes can vary wildly and the number of prefixes above 24 bits are 
> way more common.
> 
>> P.S. consider using rte_fib library, it has a number of advantages 
>> over LPM. You can replace the loop in __lookup_fib_bulk() with a bulk 
>> lookup call and this will probably increase the speed.
> 
>     I'm not aware of the rte_fib library. The only documentation that I 
> found on Google was https://doc.dpdk.org/api/rte__fib_8h.html and it 
> just says "FIB (Forwarding information base) implementation for IPv4 
> Longest Prefix Match".

That's true, I'm going to add programmer's guide soon.
Although the fib API is very similar to existing LPM.

> 
>>>>
>>>> On 13/10/2020 15:58, Michel Machado wrote:
>>>>> Hi Kevin,
>>>>>
>>>>>     We do need fields max_rules and number_tbl8s of struct rte_lpm, 
>>>>> so the removal would force us to have another patch to our local 
>>>>> copy of DPDK. We'd rather avoid this new local patch because we 
>>>>> wish to eventually be in sync with the stock DPDK.
>>>>>
>>>>>     Those fields are needed in Gatekeeper because we found a 
>>>>> condition in an ongoing deployment in which the entries of some LPM 
>>>>> tables may suddenly change a lot to reflect policy changes. To 
>>>>> avoid getting into a state in which the LPM table is inconsistent 
>>>>> because it cannot fit all the new entries, we compute the needed 
>>>>> parameters to support the new entries, and compare with the current 
>>>>> parameters. If the current table doesn't fit everything, we have to 
>>>>> replace it with a new LPM table.
>>>>>
>>>>>     If there were a way to obtain the struct rte_lpm_config of a 
>>>>> given LPM table, it would cleanly address our need. We have the 
>>>>> same need in IPv6 and have a local patch to work around it (see 
>>>>> https://github.com/cjdoucette/dpdk/commit/3eaf124a781349b8ec8cd880db26a78115cb8c8f). 
>>>>> Thus, an IPv4 and IPv6 solution would be best.
>>>>>
>>>>>     PS: I've added Qiaobin Fu, another Gatekeeper maintainer, to 
>>>>> this disscussion.
>>>>>
>>>>> [ ]'s
>>>>> Michel Machado
>>>>>
>>>>> On 10/13/20 9:53 AM, Kevin Traynor wrote:
>>>>>> Hi Gatekeeper maintainers (I think),
>>>>>>
>>>>>> fyi - there is a proposal to remove some members of a struct in 
>>>>>> DPDK LPM
>>>>>> API that Gatekeeper is using [1]. It would be only from DPDK 20.11 
>>>>>> but
>>>>>> as it's an LTS I guess it would probably hit Debian in a few months.
>>>>>>
>>>>>> The full thread is here:
>>>>>> http://inbox.dpdk.org/dev/20200907081518.46350-1-ruifeng.wang@arm.com/ 
>>>>>>
>>>>>>
>>>>>> Maybe you can take a look and tell us if they are needed in 
>>>>>> Gatekeeper
>>>>>> or you can workaround it?
>>>>>>
>>>>>> thanks,
>>>>>> Kevin.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/AltraMayor/gatekeeper/blob/master/gt/lua_lpm.c#L235-L248 
>>>>>>
>>>>>>
>>>>>> On 09/10/2020 07:54, Ruifeng Wang wrote:
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Kevin Traynor <ktraynor@redhat.com>
>>>>>>>> Sent: Wednesday, September 30, 2020 4:46 PM
>>>>>>>> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Medvedkin, Vladimir
>>>>>>>> <vladimir.medvedkin@intel.com>; Bruce Richardson
>>>>>>>> <bruce.richardson@intel.com>
>>>>>>>> Cc: dev@dpdk.org; Honnappa Nagarahalli
>>>>>>>> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
>>>>>>>> Subject: Re: [dpdk-dev] [PATCH 2/2] lpm: hide internal data
>>>>>>>>
>>>>>>>> On 16/09/2020 04:17, Ruifeng Wang wrote:
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>
>>>>>>>>>> Sent: Wednesday, September 16, 2020 12:28 AM
>>>>>>>>>> To: Bruce Richardson <bruce.richardson@intel.com>; Ruifeng Wang
>>>>>>>>>> <Ruifeng.Wang@arm.com>
>>>>>>>>>> Cc: dev@dpdk.org; Honnappa Nagarahalli
>>>>>>>>>> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
>>>>>>>>>> Subject: Re: [PATCH 2/2] lpm: hide internal data
>>>>>>>>>>
>>>>>>>>>> Hi Ruifeng,
>>>>>>>>>>
>>>>>>>>>> On 15/09/2020 17:02, Bruce Richardson wrote:
>>>>>>>>>>> On Mon, Sep 07, 2020 at 04:15:17PM +0800, Ruifeng Wang wrote:
>>>>>>>>>>>> Fields except tbl24 and tbl8 in rte_lpm structure have no 
>>>>>>>>>>>> need to
>>>>>>>>>>>> be exposed to the user.
>>>>>>>>>>>> Hide the unneeded exposure of structure fields for better ABI
>>>>>>>>>>>> maintainability.
>>>>>>>>>>>>
>>>>>>>>>>>> Suggested-by: David Marchand <david.marchand@redhat.com>
>>>>>>>>>>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>>>>>>>>> Reviewed-by: Phil Yang <phil.yang@arm.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>    lib/librte_lpm/rte_lpm.c | 152
>>>>>>>>>>>> +++++++++++++++++++++++---------------
>>>>>>>>>> -
>>>>>>>>>>>>    lib/librte_lpm/rte_lpm.h |   7 --
>>>>>>>>>>>>    2 files changed, 91 insertions(+), 68 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>> <snip>
>>>>>>>>>>>> diff --git a/lib/librte_lpm/rte_lpm.h 
>>>>>>>>>>>> b/lib/librte_lpm/rte_lpm.h
>>>>>>>>>>>> index 03da2d37e..112d96f37 100644
>>>>>>>>>>>> --- a/lib/librte_lpm/rte_lpm.h
>>>>>>>>>>>> +++ b/lib/librte_lpm/rte_lpm.h
>>>>>>>>>>>> @@ -132,17 +132,10 @@ struct rte_lpm_rule_info {
>>>>>>>>>>>>
>>>>>>>>>>>>    /** @internal LPM structure. */
>>>>>>>>>>>>    struct rte_lpm {
>>>>>>>>>>>> -    /* LPM metadata. */
>>>>>>>>>>>> -    char name[RTE_LPM_NAMESIZE];        /**< Name of the 
>>>>>>>>>>>> lpm. */
>>>>>>>>>>>> -    uint32_t max_rules; /**< Max. balanced rules per lpm. */
>>>>>>>>>>>> -    uint32_t number_tbl8s; /**< Number of tbl8s. */
>>>>>>>>>>>> -    struct rte_lpm_rule_info rule_info[RTE_LPM_MAX_DEPTH]; 
>>>>>>>>>>>> /**<
>>>>>>>>>> Rule info table. */
>>>>>>>>>>>> -
>>>>>>>>>>>>        /* LPM Tables. */
>>>>>>>>>>>>        struct rte_lpm_tbl_entry 
>>>>>>>>>>>> tbl24[RTE_LPM_TBL24_NUM_ENTRIES]
>>>>>>>>>>>>                __rte_cache_aligned; /**< LPM tbl24 table. */
>>>>>>>>>>>>        struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>>>>>>>>>>>> -    struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
>>>>>>>>>>>>    };
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Since this changes the ABI, does it not need advance notice?
>>>>>>>>>>>
>>>>>>>>>>> [Basically the return value point from rte_lpm_create() will be
>>>>>>>>>>> different, and that return value could be used by 
>>>>>>>>>>> rte_lpm_lookup()
>>>>>>>>>>> which as a static inline function will be in the binary and 
>>>>>>>>>>> using
>>>>>>>>>>> the old structure offsets.]
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Agree with Bruce, this patch breaks ABI, so it can't be accepted
>>>>>>>>>> without prior notice.
>>>>>>>>>>
>>>>>>>>> So if the change wants to happen in 20.11, a deprecation notice 
>>>>>>>>> should
>>>>>>>>> have been added in 20.08.
>>>>>>>>> I should have added a deprecation notice. This change will have 
>>>>>>>>> to wait for
>>>>>>>> next ABI update window.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Do you plan to extend? or is this just speculative?
>>>>>>> It is speculative.
>>>>>>>
>>>>>>>>
>>>>>>>> A quick scan and there seems to be several projects using some 
>>>>>>>> of these
>>>>>>>> members that you are proposing to hide. e.g. BESS, NFF-Go, DPVS,
>>>>>>>> gatekeeper. I didn't look at the details to see if they are 
>>>>>>>> really needed.
>>>>>>>>
>>>>>>>> Not sure how much notice they'd need or if they update DPDK 
>>>>>>>> much, but I
>>>>>>>> think it's worth having a closer look as to how they use lpm and 
>>>>>>>> what the
>>>>>>>> impact to them is.
>>>>>>> Checked the projects listed above. BESS, NFF-Go and DPVS don't 
>>>>>>> access the members to be hided.
>>>>>>> They will not be impacted by this patch.
>>>>>>> But Gatekeeper accesses the rte_lpm internal members that to be 
>>>>>>> hided. Its compilation will be broken with this patch.
>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>> Ruifeng
>>>>>>>>>>>>    /** LPM RCU QSBR configuration structure. */
>>>>>>>>>>>> -- 
>>>>>>>>>>>> 2.17.1
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> Regards,
>>>>>>>>>> Vladimir
>>>>>>>
>>>>>>
>>>>
>>

-- 
Regards,
Vladimir

  reply index

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07  8:15 [dpdk-dev] [PATCH 0/2] LPM changes Ruifeng Wang
2020-09-07  8:15 ` [dpdk-dev] [PATCH 1/2] lpm: fix free of data structure Ruifeng Wang
2020-09-15 15:55   ` Bruce Richardson
2020-09-15 16:25   ` Medvedkin, Vladimir
2020-09-07  8:15 ` [dpdk-dev] [PATCH 2/2] lpm: hide internal data Ruifeng Wang
2020-09-15 16:02   ` Bruce Richardson
2020-09-15 16:28     ` Medvedkin, Vladimir
2020-09-16  3:17       ` Ruifeng Wang
2020-09-30  8:45         ` Kevin Traynor
2020-10-09  6:54           ` Ruifeng Wang
2020-10-13 13:53             ` Kevin Traynor
2020-10-13 14:58               ` Michel Machado
2020-10-13 15:41                 ` Medvedkin, Vladimir
2020-10-13 17:46                   ` Michel Machado
2020-10-13 19:06                     ` Medvedkin, Vladimir
2020-10-13 19:48                       ` Michel Machado
2020-10-14 13:10                         ` Medvedkin, Vladimir [this message]
2020-10-14 23:57                           ` Honnappa Nagarahalli
2020-10-15 13:39                             ` Michel Machado
2020-10-15 17:38                               ` Honnappa Nagarahalli
2020-10-15 19:30                                 ` Medvedkin, Vladimir
2020-10-15 22:54                                   ` Honnappa Nagarahalli
2020-10-16 11:39                                     ` Kevin Traynor
2020-10-16 13:55                                       ` Michel Machado
2020-10-19 14:53                                     ` David Marchand
2020-10-20 14:22                                       ` Thomas Monjalon
2020-10-20 14:32                                         ` Medvedkin, Vladimir
2020-10-19 17:53             ` Honnappa Nagarahalli
2020-09-15 14:41 ` [dpdk-dev] [PATCH 0/2] LPM changes David Marchand
2020-10-19 13:37   ` Kevin Traynor
2020-10-21  3:02 ` [dpdk-dev] [PATCH v2 " Ruifeng Wang
2020-10-21  3:02   ` [dpdk-dev] [PATCH v2 1/2] lpm: fix free of data structure Ruifeng Wang
2020-10-21  3:02   ` [dpdk-dev] [PATCH v2 2/2] lpm: hide internal data Ruifeng Wang
2020-10-21  7:58     ` Thomas Monjalon
2020-10-21  8:15       ` Ruifeng Wang
2020-10-22 15:14     ` David Marchand
2020-10-23  6:13       ` Ruifeng Wang
2020-10-23 16:08         ` Medvedkin, Vladimir
2020-10-23  9:38 ` [dpdk-dev] [PATCH v3 0/2] LPM changes David Marchand
2020-10-23  9:38   ` [dpdk-dev] [PATCH v3 1/2] lpm: fix free of data structure David Marchand
2020-10-23  9:38   ` [dpdk-dev] [PATCH v3 2/2] lpm: hide internal data David Marchand
2020-10-26  8:26   ` [dpdk-dev] [PATCH v3 0/2] LPM changes David Marchand

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9647f80d-53c3-33aa-b6d0-301aef34ca0a@intel.com \
    --to=vladimir.medvedkin@intel.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=andre@digirati.com.br \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=doucette@bu.edu \
    --cc=ktraynor@redhat.com \
    --cc=mengxiang0811@gmail.com \
    --cc=michel@digirati.com.br \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox