From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A01954399D; Mon, 22 Jan 2024 20:31:43 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 93951406A2; Mon, 22 Jan 2024 20:31:43 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 5D4AC402C3 for ; Mon, 22 Jan 2024 20:31:41 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705951900; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PozeSREsJ5MX1HbSCdzRD7qWTRGpzH2XtKvjBz+E+WY=; b=PFUxqGgScBabRNpoi8OzaQg4mQPn+csRsZoZHhotX1R0/lGO5xlA/Px0vkhqR3+7UvWF2D 4Q3/WjlZUMe6uSvli3bk9bsRBbzdYmKum1Jc/iNYWGMIUGlxBwNKiXUYfzxWPy0MoPfPnr sJhhHmtGy2KtAuY91+8MxfEtNtqTtCg= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-119--72LuKjyMJaTXXQHqtPWHw-1; Mon, 22 Jan 2024 14:31:39 -0500 X-MC-Unique: -72LuKjyMJaTXXQHqtPWHw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ED68F3C14940; Mon, 22 Jan 2024 19:31:38 +0000 (UTC) Received: from RHTPC1VM0NT (unknown [10.22.33.141]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A94ADC2590D; Mon, 22 Jan 2024 19:31:38 +0000 (UTC) From: Aaron Conole To: Michael Santana Cc: ci@dpdk.org, Ilya Maximets , Jeremy Kerr Subject: Re: [PATCH 2/2] post_pw: Store submitted checks locally as well References: <20240122172635.3641078-1-aconole@redhat.com> <20240122172635.3641078-3-aconole@redhat.com> Date: Mon, 22 Jan 2024 14:31:38 -0500 In-Reply-To: (Michael Santana's message of "Mon, 22 Jan 2024 13:16:49 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.3 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.8 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org Michael Santana writes: > On Mon, Jan 22, 2024 at 12:26=E2=80=AFPM Aaron Conole wrote: >> >> Jeremy Kerr reports that our PW checks reporting submitted 43000 API cal= ls >> in just a single day. That is alarmingly unacceptable. We can store th= e >> URLs we've already submitted and then just skip over any additional >> processing at least on the PW side. >> >> This patch does two things to try and mitigate this issue: >> >> 1. Store each patch ID and URL in the series DB to show that we reported >> the check. This means we don't need to poll patchwork for check stat= us > Yeah, we should have done that from the start. I should have thought > about this sooner. >> >> 2. Store the last modified time of the reports mailing list. This means >> we only poll the mailing list when a new email has surely landed. >> >> Signed-off-by: Aaron Conole >> --- >> post_pw.sh | 35 ++++++++++++++++++++++++++++++++++- >> series_db_lib.sh | 25 +++++++++++++++++++++++++ >> 2 files changed, 59 insertions(+), 1 deletion(-) >> >> diff --git a/post_pw.sh b/post_pw.sh >> index fe2f41c..3e3a493 100755 >> --- a/post_pw.sh >> +++ b/post_pw.sh >> @@ -20,6 +20,7 @@ >> # License for the specific language governing permissions and limitatio= ns >> # under the License. >> >> +[ -f "$(dirname $0)/series_db_lib.sh" ] && source "$(dirname $0)/series= _db_lib.sh" || exit 1 >> [ -f "${HOME}/.mail_patchwork_sync.rc" ] && source "${HOME}/.mail_patch= work_sync.rc" >> >> # Patchwork instance to update with new reports from mailing list >> @@ -75,6 +76,13 @@ send_post() { >> if [ -z "$context" -o -z "$state" -o -z "$description" -o -z "$patc= h_id" ]; then >> echo "Skpping \"$link\" due to missing context, state, descript= ion," \ >> "or patch_id" 1>&2 >> + # Just don't want to even bother seeing these "bad" patches as = well. >> + add_check_scanned_url "$patch_id" "$target_url" >> + return 0 >> + fi >> + >> + if check_id_exists "$patch_id" "$target_url" ; then >> + echo "Skipping \"$link\" - already reported." 1>&2 >> return 0 >> fi >> >> @@ -84,6 +92,7 @@ send_post() { >> "$api_url")" >> if [ $? -ne 0 ]; then >> echo "Failed to get proper server response on link ${api_url}" = 1>&2 >> + # Don't store these as processed in case the server has a tempo= rary issue. >> return 0 >> fi >> >> @@ -95,6 +104,9 @@ send_post() { >> jq -e "[.[].target_url] | contains([\"$mail_url\"])" >/dev/null >> then >> echo "Report ${target_url} already pushed to patchwork. Skippin= g." 1>&2 >> + # Somehow this was not stored (for example, first time we apply= the tracking >> + # feature). Store it now. >> + add_check_scanned_url "$patch_id" "$target_url" >> return 0 >> fi >> >> @@ -114,12 +126,31 @@ send_post() { >> if [ $? -ne 0 ]; then >> echo -e "Failed to push retults based on report ${link} to the"= \ >> "patchwork instance ${pw_instance} using the following = REST"\ >> - "API Endpoint ${api_url} with the following data:\n$dat= a\n" >> + "API Endpoint ${api_url} with the following data:\n$dat= a\n" 1>&2 >> return 0 >> fi >> + >> + add_check_scanned_url "$patch_id" "$target_url" >> } >> >> +# Collect the date. NOTE: this needs some accomodate to catch the mont= h change-overs >> year_month=3D"$(date +"%Y-%B")" >> + >> +# Get the last modified time >> +report_last_mod=3D$(curl -A "pw-post" -sSf "${mail_archive}${year_month= }/thread.html" | grep Last-Modified) > please use grep -i. I doubt Last-Modified will ever change. But better > be on the safe side Okay. >> + >> +mailing_list_save_file=3D$(echo >> ".post_pw_${mail_archive}${year_month}" | sed -e "s@/@_@g" -e >> "s@:@_@g" -e "s,@,_,g") >> + >> +if [ -e "${HOME}/${mailing_list_save_file}" ]; then >> + last_read_date=3D$(cat "${HOME}/${mailing_list_save_file}") > wait.... what? Please correct me if I am misunderstanding this > > But I dont think that $last_read_date or $mailing_list_save_file ever > get populated. They are always empty strings/file. > > I think you are missing a last_read_date=3D"$report_last_mod" somewhere > in this code. It might not be in the place that I mentioned below, but > I am fairly certain you are missing it somewhere Good catch - my testing was with manually editing the files. >> + if [ "$last_read_date" =3D=3D "$report_last_mod" ]; then >> + echo "Last modified times match. Skipping list parsing." >> + exit 0 >> + fi > Maybe if we change this if to an else like this > > else > "$last_read_date"=3D"$report_last_mod" > fi Done. >> +else >> + touch "${HOME}/${mailing_list_save_file}" >> +fi >> + >> reports=3D"$(curl -A "pw-post" -sSf "${mail_archive}${year_month}/threa= d.html" | \ >> grep -i 'HREF=3D' | sed -e 's@[0-9]*
  • @\|@')" >> if [ $? -ne 0 ]; then >> @@ -132,3 +163,5 @@ echo "$reports" | while IFS=3D'|' read -r blank link= title; do >> send_post "${mail_archive}${year_month}/$link" >> fi >> done >> + >> +echo "$last_read_date" > "${HOME}/${mailing_list_save_file}" >> diff --git a/series_db_lib.sh b/series_db_lib.sh >> index c5f42e0..0635469 100644 >> --- a/series_db_lib.sh >> +++ b/series_db_lib.sh >> @@ -130,6 +130,17 @@ recheck_sync INTEGER >> EOF >> run_db_command "INSERT INTO series_schema_version(id) values (8= );" >> fi >> + >> + run_db_command "select * from series_schema_version;" | egrep '^9$'= > /dev/null 2>&1 >> + if [ $? -eq 1 ]; then >> + sqlite3 ${HOME}/.series-db <> +CREATE TABLE check_id_scanned ( >> +check_patch_id INTEGER, >> +check_url STRING >> +) >> +EOF >> + run_db_command "INSERT INTO series_schema_version(id) values (9= );" >> + fi >> } >> >> function series_db_exists() { >> @@ -468,3 +479,17 @@ function set_recheck_request_state() { >> >> echo "UPDATE recheck_requests set recheck_sync=3D$recheck_state whe= re patchwork_instance=3D\"$recheck_instance\" and patchwork_project=3D\"$re= check_project\" and recheck_requested_by=3D\"$recheck_requested_by\" and re= check_series=3D\"$recheck_series\";" | series_db_execute >> } >> + >> +function add_check_scanned_url() { >> + local patch_id=3D"$1" >> + local url=3D"$2" >> + >> + echo "INSERT into check_id_scanned (check_patch_id, check_url) valu= es (${patch_id}, \"$url\");" | series_db_execute >> +} >> + >> +function check_id_exists() { >> + local patch_id=3D"$1" >> + local url=3D"$2" >> + >> + echo "select * from check_id_scanned where check_patch_id=3D$patch_= id and check_url=3D\"$url\";" | series_db_execute | grep "$url" >/dev/null = 2>&1 >> +} >> -- >> 2.41.0 >>