From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E40604399C; Mon, 22 Jan 2024 19:17:07 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C3ECE402C3; Mon, 22 Jan 2024 19:17:07 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 4A26540298 for ; Mon, 22 Jan 2024 19:17:05 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705947424; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wf2b+tBZn8E49skeFZkQUs+T6Q8PGfKA7rppMRAyMcg=; b=VCdc6vMCQGMQsmn0hEd3AxXue+DvHQLeM3L+Aw8zgrFBk3filAkC/VE6RCQ/F3TWHROpdm ed7iAmgcT9pcbAiB2pdMaCe4vbDJOkhijJQEBN25u8T99wtPUq4bY3+g94vDsNt46SUE4l p/rS0wgbwIYkfLm6B2rJAjvQChAbpr0= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-120-pvwAujS0PMKv48ve8C_yMQ-1; Mon, 22 Jan 2024 13:17:03 -0500 X-MC-Unique: pvwAujS0PMKv48ve8C_yMQ-1 Received: by mail-lf1-f71.google.com with SMTP id 2adb3069b0e04-50eec1c173eso2233573e87.2 for ; Mon, 22 Jan 2024 10:17:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705947421; x=1706552221; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Wf2b+tBZn8E49skeFZkQUs+T6Q8PGfKA7rppMRAyMcg=; b=es5P9ssyyXf4wD43N9gSIxIMFlEX/nfhydScEIlvEx74tI0lX9sNAxAUhBgAnTUXWJ M6HaUzm812ZzdBMY/45sMjw5FkOd0LsOkl9OROJ+AwofER+clu2gsTWY/fPr7FSJqY1Z OVTzfmre1MftxrVv0hHPGp08jyJc2EBQiN/dyD6SPku3C6H/nxgh+I/FeY9NMSjOPn6B sCqbEISWxGRpJMQ3K94NUKC5pXcHPbAUR9Kz8sPojyFc0svr26hq43YwD12bVyquc/CD mg8yceVFb85i3/vNHGig0/eNSMUJzhrKBnfte1WkCerTEI1I1WbZVvthC6p68N1pcVCQ tEIw== X-Gm-Message-State: AOJu0YyVqfcGTJRzETDOrhWIHp1PolSeg9sf8gR/KRTzqfo7GZxdMTr2 QBFnP3Lc/XU5pFf9CnP5OMDIetxwZeWbqeI36NjzZsi0+5dkZeiZTH3B5E9dKGRVQ9XhKYf5DQP 3rdkkuACaYBAqSZEhK1bGgnjLV4OCGl98PBmD8mEEMBxgH+TR/ZDoxrd3JLTFKKKqeYuLb5t0r7 fm///g1EkAy/PsalxK27J+TnnX X-Received: by 2002:a05:6512:3b14:b0:50e:7479:6c5e with SMTP id f20-20020a0565123b1400b0050e74796c5emr2052519lfv.2.1705947421180; Mon, 22 Jan 2024 10:17:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IF9DhzTlhZuGdYrjmzuyHkA3X1dqGQwNwpHkSBdu/jdjM3TdyqllVhu6fIoQQqIoxaFDnCZZwS4yI5t7MmLuKs= X-Received: by 2002:a05:6512:3b14:b0:50e:7479:6c5e with SMTP id f20-20020a0565123b1400b0050e74796c5emr2052512lfv.2.1705947420752; Mon, 22 Jan 2024 10:17:00 -0800 (PST) MIME-Version: 1.0 References: <20240122172635.3641078-1-aconole@redhat.com> <20240122172635.3641078-3-aconole@redhat.com> In-Reply-To: <20240122172635.3641078-3-aconole@redhat.com> From: Michael Santana Date: Mon, 22 Jan 2024 13:16:49 -0500 Message-ID: Subject: Re: [PATCH 2/2] post_pw: Store submitted checks locally as well To: Aaron Conole Cc: ci@dpdk.org, Ilya Maximets , Jeremy Kerr X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org On Mon, Jan 22, 2024 at 12:26=E2=80=AFPM Aaron Conole = wrote: > > Jeremy Kerr reports that our PW checks reporting submitted 43000 API call= s > in just a single day. That is alarmingly unacceptable. We can store the > URLs we've already submitted and then just skip over any additional > processing at least on the PW side. > > This patch does two things to try and mitigate this issue: > > 1. Store each patch ID and URL in the series DB to show that we reported > the check. This means we don't need to poll patchwork for check statu= s Yeah, we should have done that from the start. I should have thought about this sooner. > > 2. Store the last modified time of the reports mailing list. This means > we only poll the mailing list when a new email has surely landed. > > Signed-off-by: Aaron Conole > --- > post_pw.sh | 35 ++++++++++++++++++++++++++++++++++- > series_db_lib.sh | 25 +++++++++++++++++++++++++ > 2 files changed, 59 insertions(+), 1 deletion(-) > > diff --git a/post_pw.sh b/post_pw.sh > index fe2f41c..3e3a493 100755 > --- a/post_pw.sh > +++ b/post_pw.sh > @@ -20,6 +20,7 @@ > # License for the specific language governing permissions and limitation= s > # under the License. > > +[ -f "$(dirname $0)/series_db_lib.sh" ] && source "$(dirname $0)/series_= db_lib.sh" || exit 1 > [ -f "${HOME}/.mail_patchwork_sync.rc" ] && source "${HOME}/.mail_patchw= ork_sync.rc" > > # Patchwork instance to update with new reports from mailing list > @@ -75,6 +76,13 @@ send_post() { > if [ -z "$context" -o -z "$state" -o -z "$description" -o -z "$patch= _id" ]; then > echo "Skpping \"$link\" due to missing context, state, descripti= on," \ > "or patch_id" 1>&2 > + # Just don't want to even bother seeing these "bad" patches as w= ell. > + add_check_scanned_url "$patch_id" "$target_url" > + return 0 > + fi > + > + if check_id_exists "$patch_id" "$target_url" ; then > + echo "Skipping \"$link\" - already reported." 1>&2 > return 0 > fi > > @@ -84,6 +92,7 @@ send_post() { > "$api_url")" > if [ $? -ne 0 ]; then > echo "Failed to get proper server response on link ${api_url}" 1= >&2 > + # Don't store these as processed in case the server has a tempor= ary issue. > return 0 > fi > > @@ -95,6 +104,9 @@ send_post() { > jq -e "[.[].target_url] | contains([\"$mail_url\"])" >/dev/null > then > echo "Report ${target_url} already pushed to patchwork. Skipping= ." 1>&2 > + # Somehow this was not stored (for example, first time we apply = the tracking > + # feature). Store it now. > + add_check_scanned_url "$patch_id" "$target_url" > return 0 > fi > > @@ -114,12 +126,31 @@ send_post() { > if [ $? -ne 0 ]; then > echo -e "Failed to push retults based on report ${link} to the"\ > "patchwork instance ${pw_instance} using the following R= EST"\ > - "API Endpoint ${api_url} with the following data:\n$data= \n" > + "API Endpoint ${api_url} with the following data:\n$data= \n" 1>&2 > return 0 > fi > + > + add_check_scanned_url "$patch_id" "$target_url" > } > > +# Collect the date. NOTE: this needs some accomodate to catch the month= change-overs > year_month=3D"$(date +"%Y-%B")" > + > +# Get the last modified time > +report_last_mod=3D$(curl -A "pw-post" -sSf "${mail_archive}${year_month}= /thread.html" | grep Last-Modified) please use grep -i. I doubt Last-Modified will ever change. But better be on the safe side > + > +mailing_list_save_file=3D$(echo ".post_pw_${mail_archive}${year_month}" = | sed -e "s@/@_@g" -e "s@:@_@g" -e "s,@,_,g") > + > +if [ -e "${HOME}/${mailing_list_save_file}" ]; then > + last_read_date=3D$(cat "${HOME}/${mailing_list_save_file}") wait.... what? Please correct me if I am misunderstanding this But I dont think that $last_read_date or $mailing_list_save_file ever get populated. They are always empty strings/file. I think you are missing a last_read_date=3D"$report_last_mod" somewhere in this code. It might not be in the place that I mentioned below, but I am fairly certain you are missing it somewhere > + if [ "$last_read_date" =3D=3D "$report_last_mod" ]; then > + echo "Last modified times match. Skipping list parsing." > + exit 0 > + fi Maybe if we change this if to an else like this else "$last_read_date"=3D"$report_last_mod" fi > +else > + touch "${HOME}/${mailing_list_save_file}" > +fi > + > reports=3D"$(curl -A "pw-post" -sSf "${mail_archive}${year_month}/thread= .html" | \ > grep -i 'HREF=3D' | sed -e 's@[0-9]*
  • if [ $? -ne 0 ]; then > @@ -132,3 +163,5 @@ echo "$reports" | while IFS=3D'|' read -r blank link = title; do > send_post "${mail_archive}${year_month}/$link" > fi > done > + > +echo "$last_read_date" > "${HOME}/${mailing_list_save_file}" > diff --git a/series_db_lib.sh b/series_db_lib.sh > index c5f42e0..0635469 100644 > --- a/series_db_lib.sh > +++ b/series_db_lib.sh > @@ -130,6 +130,17 @@ recheck_sync INTEGER > EOF > run_db_command "INSERT INTO series_schema_version(id) values (8)= ;" > fi > + > + run_db_command "select * from series_schema_version;" | egrep '^9$' = > /dev/null 2>&1 > + if [ $? -eq 1 ]; then > + sqlite3 ${HOME}/.series-db < +CREATE TABLE check_id_scanned ( > +check_patch_id INTEGER, > +check_url STRING > +) > +EOF > + run_db_command "INSERT INTO series_schema_version(id) values (9)= ;" > + fi > } > > function series_db_exists() { > @@ -468,3 +479,17 @@ function set_recheck_request_state() { > > echo "UPDATE recheck_requests set recheck_sync=3D$recheck_state wher= e patchwork_instance=3D\"$recheck_instance\" and patchwork_project=3D\"$rec= heck_project\" and recheck_requested_by=3D\"$recheck_requested_by\" and rec= heck_series=3D\"$recheck_series\";" | series_db_execute > } > + > +function add_check_scanned_url() { > + local patch_id=3D"$1" > + local url=3D"$2" > + > + echo "INSERT into check_id_scanned (check_patch_id, check_url) value= s (${patch_id}, \"$url\");" | series_db_execute > +} > + > +function check_id_exists() { > + local patch_id=3D"$1" > + local url=3D"$2" > + > + echo "select * from check_id_scanned where check_patch_id=3D$patch_i= d and check_url=3D\"$url\";" | series_db_execute | grep "$url" >/dev/null 2= >&1 > +} > -- > 2.41.0 >