From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7CD7443E9F; Thu, 18 Apr 2024 12:52:30 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5B030402CC; Thu, 18 Apr 2024 12:52:30 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id EECBA40042 for ; Thu, 18 Apr 2024 12:52:28 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 11B0E339; Thu, 18 Apr 2024 03:52:56 -0700 (PDT) Received: from [10.1.39.56] (FVFG51LCQ05N.cambridge.arm.com [10.1.39.56]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A83A93F792; Thu, 18 Apr 2024 03:52:27 -0700 (PDT) Message-ID: <9fa3af5e-5457-48b1-9a7a-559f6b26452b@arm.com> Date: Thu, 18 Apr 2024 11:52:25 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 4/5] dts: add `show port info` command to TestPmdShell Content-Language: en-GB To: =?UTF-8?Q?Juraj_Linke=C5=A1?= Cc: dev@dpdk.org, Jeremy Spewock , Paul Szczepanek References: <20240412111136.3470304-1-luca.vizzarro@arm.com> <20240412111136.3470304-5-luca.vizzarro@arm.com> <4f17ef06-c508-495a-a0f8-a28e9e77a1f9@arm.com> <68da0ef2-430b-42af-8c1d-026760cfa4f1@arm.com> <7165977f-6371-4398-b34d-eaeeaa1ef379@arm.com> From: Luca Vizzarro In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 18/04/2024 07:41, Juraj Linkeš wrote: >> The equivalent /\n\*.+?(?=\n\*|$)/gs (but slightly more optimised) takes >> approximately 3*input_length steps to run (according to regex101 at >> least). If that's reasonable enough, I can do this: >> >> iter = re.finditer(input, "\n\*.+?(?=\n\*|$)", re.S) >> return [TestPmdPortInfo.parse(match.group(0)) for match in iter] >> >> Another optimization is artificially adding a `\n*` delimiter at the end >> before feeding it to the regex, thus removing the alternative case (|$), >> and making it 2*len steps: >> >> input += "\n*" >> iter = re.finditer(input, "\n\*.+?(?=\n\*)", re.S) >> return [TestPmdPortInfo.parse(match.group(0)) for match in iter] >> > > I like this second one a bit more. How does the performance change if > we try to match four asterisks "\n\****.+?(?=\n\****)"? Four asterisks > shouldn't randomly be in the output as that's basically another > delimited. The difference is negligible as the regex walks every character anyways – while there is a match. Either \* or \*{4} will result in a match. The problem is that if an attempt of match fails, the regex backtracks, this is where the problem is. Of course if we already matched 4 asterisks, then the backtracking can skip the whole sequence at once (compared to 1), but it's only going to be 3 steps less per **** found. It's a bigger difference if we attempt to match all the asterisks. The lookahead construct also increases backtracking. In the meantime, still by amending the output, I've got a solution that doesn't perform any look aheads: \*{21}.+?\n{2} instead of treating blocks as they are (\n******\n), we can add an extra \n at the end and treat the blocks as: ******\n\n. Basically this assumes that an empty line is the end delimiter. This takes input_length/2 steps! Of course in reality every \n is \r\n, as I've discovered that when shells are invoked using paramiko, the stream becomes CRLF for some reason I haven't explored. I think this was worth mentioning for everybody, in case surprise carriage returns may reveal disruptive. > And we should document this in the docstring - sample output, then > explain the extra characters and the regex itself. We shouldn't forget > this in the other commit as well (show port stats). Ack.