From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D1183A0543; Thu, 22 Sep 2022 16:32:37 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 63D7340156; Thu, 22 Sep 2022 16:32:37 +0200 (CEST) Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by mails.dpdk.org (Postfix) with ESMTP id B5CF0400D7 for ; Thu, 22 Sep 2022 16:32:35 +0200 (CEST) Received: by mail-lf1-f49.google.com with SMTP id m22so296720lfg.0 for ; Thu, 22 Sep 2022 07:32:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date; bh=ASV+ygcwlpk3d1eU+Gsa//Ilflqk7jloyPtBbKMsmqo=; b=gFE6GhNBk5PvulQkUK1mcRjm/lQTO7QEciMWmjvg0HP06vdWrv7Mhj3jxoElUM35iG vFV0kCeW/qD1JVGyv0X4RcN0ZUDvisH8bJoNCUUpo/943qEZ+EaMW8LV/FNGEWswSJMr 6ssD0OJOaWu1p+e/62w5rxY4cHrrSMR3De+Suq4fINUmOYxW9m3IGbYRzdXOAm9yAvtX 3Kz8YXLKQhHDKoDlxe6zUR+9Nlg4xE6wA97K/0OM1L7DBSxYgkV6P66n2+fL6jbovWTR 6g31fmttpCnr0qXM8qg7x6a7GfF5QXKZYEj0zINgTfGzbUxG/tfHCP4KYizCtlvoozpP avrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date; bh=ASV+ygcwlpk3d1eU+Gsa//Ilflqk7jloyPtBbKMsmqo=; b=vNEKamGzsKFsmrX8TzVNno32bUosx925TWmYlNHtFh6y9Td1GynZNlls8aHLV2OmLn UzUEJdw8n8V4jBCLLG1SrZ5Swue6FH8AV8ASfJYwT4XZ8HhPIPkvq4D8YeMSVZBILHpc MSrppti+9+Kd0zfa+YRBTd0BvUyBG40filH5Xb/P494jrVPonzGjmFmiwVCxUZKF9G5d TEXwQTaFT3QS43ol+4m/tjaQnzWflQe4Lbq4PSEKxNGw8OIWt3DBmY4c0ZdOrGsxFa5k No0u7w25qWcOwzToOM3Fw+/1mEHr8vMvzs8gn2FkHs63sH6RNYCAQ0UyDauhlbadxQIg 3Jhw== X-Gm-Message-State: ACrzQf1iAL/Mghr2oD/LFZ6ZFQ5sJtr9xkStlJxbRWlQRQkr+MLpbCTP ReWfOTFJXcZ8KLpkLU94A2YfcA== X-Google-Smtp-Source: AMsMyM58y6YYBVxC2uR6YPwUlX/DxpLSzpQeH8dhoiADF2gxoPLv5qPIH4y75Anj+YM2a+E09DV94g== X-Received: by 2002:a05:6512:118d:b0:49f:543d:67e0 with SMTP id g13-20020a056512118d00b0049f543d67e0mr1253768lfr.686.1663857154890; Thu, 22 Sep 2022 07:32:34 -0700 (PDT) Received: from toster ([83.142.187.84]) by smtp.gmail.com with ESMTPSA id g4-20020a056512118400b004979db5aa5bsm957291lfr.223.2022.09.22.07.32.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Sep 2022 07:32:34 -0700 (PDT) Date: Thu, 22 Sep 2022 16:32:26 +0200 From: Stanislaw Kardach To: Juraj =?utf-8?Q?Linke=C5=A1?= Cc: "thomas@monjalon.net" , "david.marchand@redhat.com" , "ronan.randles@intel.com" , "Honnappa.Nagarahalli@arm.com" , "ohilyard@iol.unh.edu" , "lijuan.tu@intel.com" , "dev@dpdk.org" Subject: Re: [PATCH v4 4/9] dts: add ssh pexpect library Message-ID: <20220922143226.3d5e552cylcie665@toster> References: <20220728100044.1318484-1-juraj.linkes@pantheon.tech> <20220729105550.1382664-1-juraj.linkes@pantheon.tech> <20220729105550.1382664-5-juraj.linkes@pantheon.tech> <20220914094222.vprahs3kfhnsklyh@toster> <7f360965ca2c47b29e2413e3207234db@pantheon.tech> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7f360965ca2c47b29e2413e3207234db@pantheon.tech> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Thu, Sep 22, 2022 at 09:41:40AM +0000, Juraj Linkeš wrote: > Hi Stanislaw, First a preface. I understand that the DTS rework is sponsored by someone and there may be people waiting with their labs for this job to be done. Everything that I'll write below is from a point of view of a developer who'd like to utilize DTS as a test suite for DPDK when adding support for new PMDs or architectures/boards. This might be in conflict with time-to-market metric at this point in time but I'm more focused on the state of DPDK+DTS in the long run. So feel free to disregard my comments if there are higher priorities. > > Neither of the current DTS maintainer nor me are the author of the code, so we can only speculate as to why certain parts were implemented the way they are. > > We've thought a lot about replacing pexpect with something else, such as Fabric. Fabric is supposedly faster, which is the biggest draw and instead of fixing/reworking pexpect code, it makes sense to switch our focus on Fabric. For this PoC version though, we'd like to stay with this pexpect code and work on other appoaches in the next release cycle. The code is well tested so there's not much point in poking in it if it's to be replaced. I have a nasty experience of code staying without re-factoring for long "because it works". When it comes to DTS my experience is that it works only if used exactly on the setups it was meant for. Adapting it to a new setup, new PMD or "even" running in the cloud shows that parts of it are held together with a string. I'm not blaming DTS devs here. Such approach is often needed for various reasons (usually time-to-market) and it's hard to be forward-compatible. That said I would suggest to use this opportunity to refactor DTS while it's still not merged. Otherwise we'll be left with code that we're uncertain why it works. That's not a quality-first approach and it'll bite us in the backside in the future. Let's do things right, not fast. > > With that said, some more comments inline. > > > > +class SSHPexpect: > > > + username: str > > > + password: str > > > + node: str > > > + logger: DTSLOG > > > + magic_prompt: str > > > + > > > + def __init__( > > > + self, > > > + node: str, > > > + username: str, > > > + password: Optional[str], > > > + logger: DTSLOG, > > > + ): > > > + self.magic_prompt = "MAGIC PROMPT" > > Why is this necessary? pxssh is already setting target prompt to > > pxssh.UNIQUE_PROMPT in the session constructor, to be specific: > > > > self.UNIQUE_PROMPT = r"\[PEXPECT\][\$\#] " > > self.PROMPT = self.UNIQUE_PROMPT > > > > Also session.login() will change target prompt to that, exactly for the reason of > > achieving a unique prompt that can be easily matched by pxssh. > > > > So if "MAGIC PROMPT is the prompt that you'd like to have on the remote host, > > then the following should be run after opening the session: > > > > I believe this is here to have a prompt that won't be matched anywhere, to induce a timeout a collect all data up to that point. The only reason I can think of for doing this is to get the intermediate output updates while the test is running. Are there any others? If I'm right, how would that be different from matching an actual prompt with a timeout? pexpect is already setting an unusual prompt that will not be matched by anything else ("[PEXPECT] #"). Just use that with a small timeout and you get the same effect, including timeout triggering an exception. > > > self.session.PROMPT = self.magic_prompt > > if not self.session.set_unique_prompt(): > > do_some_error_handling() > > > > Otherwise it's unnecessary. > > > + self.logger = logger > > > + > > > + self.node = node > > > + self.username = username > > > + self.password = password or "" > > > + self.logger.info(f"ssh {self.username}@{self.node}") > > > + > > > + self._connect_host() > > > + > > > + def _connect_host(self) -> None: > > > + """ > > > + Create connection to assigned node. > > > + """ > > > + retry_times = 10 > > > + try: > > > + if ":" in self.node: > > > + while retry_times: > > > + self.ip = self.node.split(":")[0] > > > + self.port = int(self.node.split(":")[1]) > > > + self.session = pxssh.pxssh(encoding="utf-8") > > > + try: > > > + self.session.login( > > > + self.ip, > > > + self.username, > > > + self.password, > > > + original_prompt="[$#>]", > > > + port=self.port, > > > + login_timeout=20, > > > + password_regex=r"(?i)(?:password:)|(?:passphrase for > > key)|(?i)(password for .+:)", > > > + ) > > > + except Exception as e: > > > + print(e) > > > + time.sleep(2) > > > + retry_times -= 1 > > > + print("retry %d times connecting..." % (10 - retry_times)) > > > + else: > > > + break > > > + else: > > > + raise Exception("connect to %s:%s failed" % (self.ip, self.port)) > > > + else: > > > + self.session = pxssh.pxssh(encoding="utf-8") > > > + self.session.login( > > > + self.node, > > > + self.username, > > > + self.password, > > > + original_prompt="[$#>]", > > > + password_regex=r"(?i)(?:password:)|(?:passphrase for > > key)|(?i)(password for .+:)", > > > + ) > > > + self.logger.info(f"Connection to {self.node} succeeded") > > > + self.send_expect("stty -echo", "#") > > > + self.send_expect("stty columns 1000", "#") > > This works only by chance and makes hacks in get_output_before() necessary. > > After some testing it seems that pxssh is matching AND chomping the > > session.PROMPT when session.prompt() is called. Given the UNIQUE_PROMPT, > > the root user prompt will be "[PEXPECT]#" so this > > send_expect() will chomp # and leave "[PEXPECT]" as part of the output. > > > > Given that the two above lines do not require any special output I think > > self.send_command() should be used here. > > Since we want to move away form using root, we'd need to address this, but this would be better left to the Fabric implementation and stay with the root requirement for now. It's more than that. This self.send_expect("...", "#") pattern will cause the hacks on removing the pexpect's prompt that are done below. Those hack are not necessary at all if pexpect is used properly. If this is not understood, then when switching to Fabric, someone might repeat them thinking they're on purpose. > > > > + except Exception as e: > > > + print(RED(str(e))) > > > + if getattr(self, "port", None): > > > + suggestion = ( > > > + "\nSuggession: Check if the firewall on [ %s ] " % self.ip > > > + + "is stopped\n" > > > + ) > > > + print(GREEN(suggestion)) > > > + > > > + raise SSHConnectionException(self.node) > > > + > > > + def send_expect_base(self, command: str, expected: str, timeout: float) -> > > str: > > > + self.clean_session() > > > + self.session.PROMPT = expected > > > + self.__sendline(command) > > > + self.__prompt(command, timeout) > > > + > > > + before = self.get_output_before() > > Prompt should be reverted to whatever it was before leaving this function. > > Seems reasonable. I guess it wasn't needed because of the requirement to use root. Adding this seems innocent enough. > > > > + return before > > > + > > > + def send_expect( > > > + self, command: str, expected: str, timeout: float = 15, verify: bool = False > > > + ) -> str | int: > > > + > > > + try: > > > + ret = self.send_expect_base(command, expected, timeout) > > > + if verify: > > > + ret_status = self.send_expect_base("echo $?", > > > + expected, timeout) > > "echo $?" will only print the return code. How is it supposed to match > > "expected"? If "expected" is a return code then the first command's output > > probably won't match. > > I think send_command() should be used here. > > Expected is the prompt to expect, "#" in most cases. Then it should not be called "expected" but "prompt", otherwise someone might use this API to actually match something in the ssh output (which is a totally valid use-case). Look at the docstring of dts.framework.node.Node.send_expect(): """ Send commands to node and return string before expected string. """ Not to mention that the "verify" parameter is not documented and will NOT work if "expected" is anything else than a prompt. I wonder how many DTS tests have to work around or utilize those hacks? There may be some test cases which work only because of them which is a wrong way to solve a problem but again - maybe there were other priorities. > > > > + if not int(ret_status): > > > + return ret > > The condition above seems like a C-ism used in python which again works by > > mistake. Return code 0 will convert to integer 0 which will be promoted to a > > boolean False. It would be more readable to change this block to: > > ri = int(ret_status) > > if ri != 0: > > # error prints > > return ri > > This is common in Python, but really only usable if you know the object type. In this case it's always integer and it's perfectly fine to use it this way (0 = False, anything else = 1), but I agree that the double negative doesn't help with readibility. In any case, this is a minor thing a I there's not much of a reason to change it if we're to replace it with Fabric. Agreed, this can wait. > > > > + else: > > > + self.logger.error("Command: %s failure!" % command) > > > + self.logger.error(ret) > > > + return int(ret_status) > > > + else: > > > + return ret > > > + except Exception as e: > > > + print( > > > + RED( > > > + "Exception happened in [%s] and output is [%s]" > > > + % (command, self.get_output_before()) > > > + ) > > > + ) > > > + raise e > > > + > > > + def send_command(self, command: str, timeout: float = 1) -> str: > > > + try: > > > + self.clean_session() > > > + self.__sendline(command) > > > + except Exception as e: > > > + raise e > > > + > > > + output = self.get_session_before(timeout=timeout) > > > + self.session.PROMPT = self.session.UNIQUE_PROMPT > > > + self.session.prompt(0.1) > > This is wrong: > > 1. self.get_session_before() will return output of the command but since > > it changed the expected (not real!) prompt to self.magic_prompt, that > > won't be matched so the output will contain the prompt set by pxssh > > (UNIQUE_PROMPT). > > 2. Then prompt is reset to UNIQUE_PROMPT but and prompt() is called but > > that will only clean up the pxssh buffer. If get_session_before() was > > not changing the session.PROMPT from UNIQUE_PROMPT to magic_prompt, > > the second prompt() call would be unnecessary. > > > + > > > + return output > > > + > > > + def clean_session(self) -> None: > > > + self.get_session_before(timeout=0.01) > > What if remote host is slow for any reason? We'll timeout here. It seems that > > such a small timeout value was used because clean_session() is used in every > > send_command() call. > > Come to think of it, why is this call necessary when we have self.__flush()? > > I think the timeout is deliberate. More below. > > > > + > > > + def get_session_before(self, timeout: float = 15) -> str: > > > + """ > > > + Get all output before timeout > > > + """ > > > + self.session.PROMPT = self.magic_prompt > > This line has no effect. Remote prompt was never set to self.magic_prompt. > > I think this is to ensure that we hit the timeout. Why would we intentionally want to hit a timeout? Unless I'm missing something, it'll only cause us to wait for "timeout" even if the command ended already. Also note the full logic of the "get_session_before()": 1. Trigger self.session.prompt() to get the session buffer updated with whatever output was between now and now+timeout seconds. 2. Get the output but since we were matching the self.magic_prompt, self.get_output_all() has to remove the real prompt ([PEXPECT]) if the command ended. 3. Flush the pexpect's ssh buffer so that next call won't read what has already been returned (that's the job of self.__flush()). So I don't think there is any reason to use this always-timeout logic. > > > > + try: > > > + self.session.prompt(timeout) > > > + except Exception as e: > > > + pass > > > + > > > + before = self.get_output_all() > > > + self.__flush() > > > + > > > + return before > > > + > > > + def __flush(self) -> None: > > > + """ > > > + Clear all session buffer > > > + """ > > > + self.session.buffer = "" > > > + self.session.before = "" > > > + > > > + def __prompt(self, command: str, timeout: float) -> None: > > > + if not self.session.prompt(timeout): > > > + raise TimeoutException(command, self.get_output_all()) > > > + from None > > > + > > > + def __sendline(self, command: str) -> None: > > > + if not self.isalive(): > > > + raise SSHSessionDeadException(self.node) > > > + if len(command) == 2 and command.startswith("^"): > > > + self.session.sendcontrol(command[1]) > > > + else: > > > + self.session.sendline(command) > > > + > > > + def get_output_before(self) -> str: > > The name is missleading. In pxssh terms "before" means all the lines before the > > matched expect()/prompt(). Here it returns the last line of the output. Perhaps > > get_last_output_line() is better? > > I thought so too, but this actually returns all lines except the last: > 'a.b.c.d'.rsplit('.', 1)[0] > 'a.b.c' > Oh, I've totally missed that! If that's the case then this function is a result of the always-timeout logic. Note that get_output_before() is only called from send_expect_base() (and indirectly by send_expect()) to get the output of the command without the prompt. So the reason why self.send_expect("echo 'foo'", "#") returns a proper output ('foo\n') is: 0. On login() time pexpect sets the remote prompt to "[PEXPECT]#". 1. send_expect() -> self.send_expect_base() sets the match prompt to #. This causes self.session.prompt() to update the self.session.buffer, match # and chomp it, leaving the ssh buffer as: foo [PEXPECT] 2. send_expect_base() calls self.get_output_before() which cuts the last line of the output. Note that the "if" at the end is necessary for commands that do not return any output. Otherwise the output would be "[PEXPECT]". Also note that if there are any control characters printed by the remote shell, then this will also not work because a straight string comparison is done, not a regex match. Contrary, if self.magic_prompt was not used and pexpect's original prompt was left undisturbed, then all the hacks above would not be needed and the code would be cleaner. > > > + if not self.isalive(): > > > + raise SSHSessionDeadException(self.node) > > > + before: list[str] = self.session.before.rsplit("\r\n", 1) > > > + if before[0] == "[PEXPECT]": > > > + before[0] = "" > > Unnecessary if prompt was handled in proper way as mentioned above. > > > + > > > + return before[0] > > > + > > > + def get_output_all(self) -> str: > > > + output: str = self.session.before > > > + output.replace("[PEXPECT]", "") > > Ditto. If session.PROMPT was restored properly, this function would not be > > necessary at all. > > > + return output > > > + > > > + def close(self, force: bool = False) -> None: > > > + if force is True: > > > + self.session.close() > > > + else: > > > + if self.isalive(): > > > + self.session.logout() > > > + > > > + def isalive(self) -> bool: > > > + return self.session.isalive() > > > diff --git a/dts/framework/utils.py b/dts/framework/utils.py new file > > > mode 100644 index 0000000000..db87349827 > > > --- /dev/null > > > +++ b/dts/framework/utils.py > > > @@ -0,0 +1,12 @@ > > > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2010-2014 > > > +Intel Corporation # Copyright(c) 2022 PANTHEON.tech s.r.o. > > > +# Copyright(c) 2022 University of New Hampshire # > > > + > > > +def RED(text: str) -> str: > > > + return f"\u001B[31;1m{str(text)}\u001B[0m" > > > + > > > + > > > +def GREEN(text: str) -> str: > > > + return f"\u001B[32;1m{str(text)}\u001B[0m" > > > -- > > > 2.30.2 > > > > > > > -- > > Best Regards, > > Stanislaw Kardach > -- Best Regards, Stanislaw Kardach