From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1D2D84399D; Mon, 22 Jan 2024 21:39:48 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1729E402C3; Mon, 22 Jan 2024 21:39:48 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 3693D40298 for ; Mon, 22 Jan 2024 21:39:47 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705955986; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FuSqi1y+fiTvRqCogjMyYM5/qNAuZXEDspIfHgGpfAw=; b=i6NRnX8yRCrga7ZfwzF+eL6MYbgSFvNPx+ilT1C+ipwdMQM9iZY5g06E4HCyTmxVUH15h7 Ggz7OLYFgfdSpFyRqaRIT767VVv2Pw330aoudtifqVQV2miZotg/5WSCiss/zbRg/YFXlk fpNittpdfyWOWI+qGdQiXb1Ghqgj1rA= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-680-MK1c9NaLPIOANtWF38YT4A-1; Mon, 22 Jan 2024 15:39:45 -0500 X-MC-Unique: MK1c9NaLPIOANtWF38YT4A-1 Received: by mail-lf1-f72.google.com with SMTP id 2adb3069b0e04-50ee5587dbbso2300906e87.1 for ; Mon, 22 Jan 2024 12:39:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705955983; x=1706560783; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FuSqi1y+fiTvRqCogjMyYM5/qNAuZXEDspIfHgGpfAw=; b=PNL6pYbVRi2h3ROyxjcU1dejxQyXG/z1aWFvMJJM6tFHctyhkATfXJqEb7Z00JWTrg hoMkB2QVwULVD04W4UleQoP4NHQJH3jeLfGKzqGi85bW/YWatBqKdYtAAg8+UklcB6z8 UI0kreXBbZotBjXNeoeGQ3TbdT/GPvRPP/1nIkO7DOtN8AYiWBgEB1WiJaedfvNvxsB2 MQvGEKnI2tVvywbDFmqaiR+byzQV8qzfGCpwhqbuU8I6xWck4erxTw9Jcla2MkPR9euy 5G0wiebAOS+rKshHxdUVjtihiKLnxdO0nTl1SJJ53VTqDzeHuJOHn9TKnFSJaWRdNrZ1 /LYQ== X-Gm-Message-State: AOJu0Yx5Mw6SZx+1fYxEgw+lSQmJdDQVKcc2s1aN7pZCAPGXhlxIPnY7 mBEFJOwnF3yUPPWL9CHEiPJJiti5F3krCZ/uasZ5GDGWSlk3tzS/3mJ/etZxJBA9ac4c1BQTuEe 8GHBmUSYX2H6O8ZMkfx+0Tv+PPdVdZTEiHWBSb+VVX4nk0Gt26DSmf0ATgmeBl2A3Xk9x0EyyUW RmDotZX1uUWV1d7g== X-Received: by 2002:a05:6512:2207:b0:50d:1b99:5034 with SMTP id h7-20020a056512220700b0050d1b995034mr2387850lfu.112.1705955983502; Mon, 22 Jan 2024 12:39:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IFZjbsNQn1eqG0GUfkBKShmRaTmQsKESRE0X3lUmq2PSNOgjTsMTbaDtcQhtzj2BRjECIQWt/cTPqzpUYn+2OU= X-Received: by 2002:a05:6512:2207:b0:50d:1b99:5034 with SMTP id h7-20020a056512220700b0050d1b995034mr2387840lfu.112.1705955983094; Mon, 22 Jan 2024 12:39:43 -0800 (PST) MIME-Version: 1.0 References: <20240122193232.3734371-1-aconole@redhat.com> <20240122193232.3734371-3-aconole@redhat.com> In-Reply-To: <20240122193232.3734371-3-aconole@redhat.com> From: Michael Santana Date: Mon, 22 Jan 2024 15:39:31 -0500 Message-ID: Subject: Re: [PATCH v2 2/2] post_pw: Store submitted checks locally as well To: Aaron Conole Cc: ci@dpdk.org, Ilya Maximets , Jeremy Kerr X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org On Mon, Jan 22, 2024 at 2:32=E2=80=AFPM Aaron Conole w= rote: > > Jeremy Kerr reports that our PW checks reporting submitted 43000 API call= s > in just a single day. That is alarmingly unacceptable. We can store the > URLs we've already submitted and then just skip over any additional > processing at least on the PW side. > > This patch does two things to try and mitigate this issue: > > 1. Store each patch ID and URL in the series DB to show that we reported > the check. This means we don't need to poll patchwork for check statu= s > > 2. Store the last modified time of the reports mailing list. This means > we only poll the mailing list when a new email has surely landed. > > Signed-off-by: Aaron Conole > --- > v2: fixed up the Last-Modified grep and storage > > post_pw.sh | 38 +++++++++++++++++++++++++++++++++++++- > series_db_lib.sh | 25 +++++++++++++++++++++++++ > 2 files changed, 62 insertions(+), 1 deletion(-) > > diff --git a/post_pw.sh b/post_pw.sh > index 9163ea1..a23bdc5 100755 > --- a/post_pw.sh > +++ b/post_pw.sh > @@ -20,6 +20,7 @@ > # License for the specific language governing permissions and limitation= s > # under the License. > > +[ -f "$(dirname $0)/series_db_lib.sh" ] && source "$(dirname $0)/series_= db_lib.sh" || exit 1 > [ -f "${HOME}/.mail_patchwork_sync.rc" ] && source "${HOME}/.mail_patchw= ork_sync.rc" > > # Patchwork instance to update with new reports from mailing list > @@ -75,6 +76,13 @@ send_post() { > if [ -z "$context" -o -z "$state" -o -z "$description" -o -z "$patch= _id" ]; then > echo "Skpping \"$link\" due to missing context, state, descripti= on," \ > "or patch_id" 1>&2 > + # Just don't want to even bother seeing these "bad" patches as w= ell. > + add_check_scanned_url "$patch_id" "$target_url" > + return 0 > + fi > + > + if check_id_exists "$patch_id" "$target_url" ; then > + echo "Skipping \"$link\" - already reported." 1>&2 > return 0 > fi > > @@ -84,6 +92,7 @@ send_post() { > "$api_url")" > if [ $? -ne 0 ]; then > echo "Failed to get proper server response on link ${api_url}" 1= >&2 > + # Don't store these as processed in case the server has a tempor= ary issue. > return 0 > fi > > @@ -95,6 +104,9 @@ send_post() { > jq -e "[.[].target_url] | contains([\"$mail_url\"])" >/dev/null > then > echo "Report ${target_url} already pushed to patchwork. Skipping= ." 1>&2 > + # Somehow this was not stored (for example, first time we apply = the tracking > + # feature). Store it now. > + add_check_scanned_url "$patch_id" "$target_url" > return 0 > fi > > @@ -114,12 +126,34 @@ send_post() { > if [ $? -ne 0 ]; then > echo -e "Failed to push retults based on report ${link} to the"\ > "patchwork instance ${pw_instance} using the following R= EST"\ > - "API Endpoint ${api_url} with the following data:\n$data= \n" > + "API Endpoint ${api_url} with the following data:\n$data= \n" 1>&2 > return 0 > fi > + > + add_check_scanned_url "$patch_id" "$target_url" > } > > +# Collect the date. NOTE: this needs some accomodate to catch the month= change-overs > year_month=3D"$(date +"%Y-%B")" > + > +# Get the last modified time > +report_last_mod=3D$(curl --head -A "(pw-ci) pw-post" -sSf "${mail_archiv= e}${year_month}/thread.html" | grep -i Last-Modified) > + > +mailing_list_save_file=3D$(echo ".post_pw_${mail_archive}${year_month}" = | sed -e "s@/@_@g" -e "s@:@_@g" -e "s,@,_,g") > + > +if [ -e "${HOME}/${mailing_list_save_file}" ]; then > + last_read_date=3D$(cat "${HOME}/${mailing_list_save_file}") > + if [ "$last_read_date" -a "$last_read_date" =3D=3D "$report_last_mod= " ]; then > + echo "Last modified times match. Skipping list parsing." > + exit 0 > + else > + last_read_date=3D"$report_last_mod" > + fi > +else > + last_read_date=3D"Failed curl." > + touch "${HOME}/${mailing_list_save_file}" One last thing, sorry Instead of touch, could we propagate with $report_last_mod ? Im looking at it from the POV the first time this script is run and the file does not exist, it creates the timestamp. Next time it runs, if the timestamps are the same the script exits, which is what we want. If we keep it as its written, the second time the script runs the variable will be propagated with "Failed curl" and the script will run fully. This might not be ideal if the timestamps match but just not yet propagated on the file or maybe another last_read_date=3D"$report_last_mod" might do the job Otherwise, everything looks good! > +fi > + > reports=3D"$(curl -A "(pw-ci) pw-post" -sSf "${mail_archive}${year_month= }/thread.html" | \ > grep -i 'HREF=3D' | sed -e 's@[0-9]*
  • if [ $? -ne 0 ]; then > @@ -132,3 +166,5 @@ echo "$reports" | while IFS=3D'|' read -r blank link = title; do > send_post "${mail_archive}${year_month}/$link" > fi > done > + > +echo "$last_read_date" > "${HOME}/${mailing_list_save_file}" > diff --git a/series_db_lib.sh b/series_db_lib.sh > index c5f42e0..0635469 100644 > --- a/series_db_lib.sh > +++ b/series_db_lib.sh > @@ -130,6 +130,17 @@ recheck_sync INTEGER > EOF > run_db_command "INSERT INTO series_schema_version(id) values (8)= ;" > fi > + > + run_db_command "select * from series_schema_version;" | egrep '^9$' = > /dev/null 2>&1 > + if [ $? -eq 1 ]; then > + sqlite3 ${HOME}/.series-db < +CREATE TABLE check_id_scanned ( > +check_patch_id INTEGER, > +check_url STRING > +) > +EOF > + run_db_command "INSERT INTO series_schema_version(id) values (9)= ;" > + fi > } > > function series_db_exists() { > @@ -468,3 +479,17 @@ function set_recheck_request_state() { > > echo "UPDATE recheck_requests set recheck_sync=3D$recheck_state wher= e patchwork_instance=3D\"$recheck_instance\" and patchwork_project=3D\"$rec= heck_project\" and recheck_requested_by=3D\"$recheck_requested_by\" and rec= heck_series=3D\"$recheck_series\";" | series_db_execute > } > + > +function add_check_scanned_url() { > + local patch_id=3D"$1" > + local url=3D"$2" > + > + echo "INSERT into check_id_scanned (check_patch_id, check_url) value= s (${patch_id}, \"$url\");" | series_db_execute > +} > + > +function check_id_exists() { > + local patch_id=3D"$1" > + local url=3D"$2" > + > + echo "select * from check_id_scanned where check_patch_id=3D$patch_i= d and check_url=3D\"$url\";" | series_db_execute | grep "$url" >/dev/null 2= >&1 > +} > -- > 2.41.0 >